dynamic programming, window search, disparity estimation, natural images, window size, disparity values, stereo pair, disparity space, adaptive algorithm, Pentagon image, Chang Seob Park, disparity maps, horizontal scan line, optimal path, cost function, dynamic programming algorithm, PSNR, stereo image pair, algorithm, adaptive technique, Department of Electrical Engineering, Introductory Techniques for 3-D Computer Vision, Y. Ohta, A. Verri, J.K. Aggarwal, E. Trucco, T. Kanade, MSE, M. Okutomi, KK, IEEE Trans, Korea Advanced Institute of Science and Technology, Technical Research Institute, search window, Pattern Recognition Society, Pattern Recognition, window method
34 (2001) 2573}2576 Rapid and Brief Communication A robust stereo disparity estimation using adaptive window search and dynamiC PROGRAMMING
search Chang Seob Park, Hyun Wook Park* Department of electrical engineering
, Korea Advanced Institute of Science and Technology
, 373-1 Kusung-dong, Yusong-gu, Taejon 305-701, South Korea
Received 2 November 2000; accepted 21 November 2000
1. Introduction Most research e!orts in stereo vision have been focused on an accurate disparity estimation. Several factors, such as perspective projection, intensity inconsistency between stereo image pair, and occlusion of objects, make correspondence problem
of the disparity estimation di$cult. To overcome these problems and to estimate an accurate disparity map, several schemes have been developed, which can be grouped into two categories such as area-based and feature-based approaches [1,2]. In area-based approaches, pixels or regions are used to measure the similarity between stereo pair. They yield dense disparity "elds but tend to fail because of local ambiguities in the correspondence. Various estimation techniques have been proposed to overcome these problems [1,2]. stereo matching
by using correlation or sum of squared di!erence (SSD) is a basic technique that obtains a dense disparity map from stereo images. In the stereo matching, window size
must be large enough to reduce noise-sensitive distortion. However, the position of maximum correlation or minimum SSD becomes ambiguous and blurred if the window size is too large. On the other hand, if the window size is too small, it gives poor disparity estimate when the signal-to-noise ratio of the stereo images is low. Therefore, the adaptive window method  had been proposed to overcome problems with "xed-size window. This paper provides a robust disparity estimation solution, which combines the adap- Also with Technical Research Institute
, Korean Broadcasting System, Seoul, South Korea. * Corresponding author. Tel.: #82-42-869-3466; fax: #8242-869-3410. E-mail address: [email protected]
tive window search and the Dynamic programming
search. 2. Proposed algorithm 2.1. Area-based disparity estimation by adaptive-size window Kanade and Okutomi  developed an adaptive technique with di!erent window size for each pixel, which selects an appropriate window by evaluating the local variation of the intensity and disparity. This method requires high computational costs for searching the disparity with the adaptive window iteratively. Our proposed adaptive algorithm is more e$cient in computational costs than their algorithm since it does not use iterative process
. The proposed algorithm "rstly determines the search window size for each pixel, and then performs disparity estimate with the window. Fig. 1 shows an example of window size decision. Basically, we select window size such that the area within the window has similar gray values. Variance of an object boundary is apt to change abruptly. If search window includes the abruptly change area, disparity errors can be produced because there is much depth di!erence. The thick lines in Fig. 1 are object boundaries, and point T is a target point for disparity estimation. A left image of stereo pair is diagonally divided into four parts centered at the target point. For the four divided parts, the window size is expanded from center to each direction (x\, x>, y\, y>). While the window size increases to each direction, if the local variance of a pixel within the window exceeds a certain threshold (VTH"2 in this paper), then we stop expanding the window to such direction. This means that a certain disparity discontinuous points such as an object boundary is encountered. Local variance, var*(i, j), of a left image, f*(i, j), is obtained as
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd.
All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 1 ) 0 0 0 1 6 - 4
C.S. Park, H.W. Park / Pattern Recognition 34 (2001) 2573}2576
Fig. 1. Procedure to determine adaptive-size window: (a) window size growing, and (b) "nal window of the rectangle with broken lines.
Fig. 2. Mask and disparity map: (a) original left image, (b) mask from the adaptive window search (white region corresponds to invalid disparity).
1 j)" 4
In Fig. 1, increase of window size in the x\ direction stops at the point A, x> at the point B, y\ at the point C, and y> at the point D. Thus, the points of A, B, C, and D de"ne the x}start, x}end, y}start, and y}end, respectively, of a rectangular window (dotted rectangle in Fig. 1(b)). We con"ne the maximum window size, MAXW, for saving calculation time and the minimum window size, MINW, for e$cient matching. We select MINW"5;5 and MAXW"41;41 empirically in this paper. Disparity estimation is performed on both the intensity map and the variance map by using the adaptive-size window for more reliable estimation. The Correlation Function
is used as a matching cost function as follows:
costD(i, j, d )"
+G\ ,H\ f*(i, j) f0(i!d, j)
" +G\ ,H\ f*(i, j)"" +G\ ,H\ f*(i!d, j)"
where f*(i, j) and f0 pair, respectively,
(i, j) are left and window size is
right images M;N, and
stereo is the
disparity. The cost function for variance, costT?P(i, j, d), are similarly calculated from local variance maps, var*(i, j) and var0(i, j), of left and right images, respectively, as de"ned in Eq. (1).
If the matched result is in con"dence with following
three criteria, then the value of d is the valid disparity.
However, the matched result does not satisfy with the
criteria, the pixel point is reserved to dynamic program-
ming search for more reliable estimation. Three criteria
are given as follows:
mrealaxt}ioconsotDf(ti,hje)'mo0s.9t,mwahtecrheemd apxo}incotsbtDet(iw, je)eins
(2) rm2iLgaBh}xmt}ciamoxs}atDcgo(eiss,t,jD)i-.(ei2,.,jL)Bm}imsaxat}hxce}ocssoteDsct(oDi,n(jid)," j)l' aMrgTaeHxstBIcc,oosrtrDe(wlia,hjt,ieodr)ne. between left and right images, and THI is set to
0.0001 in this paper.
(3) Disparity values from stereo images and variance
of the images dD satis"es
maraex}icdoesnttDi(cia, jl),"i.ec.,osdtDD(" i, j,ddTD?P),,
dT?P satis"es max}costT?P(i, j)"costT?P(i, j, dT?P), and
max}costT?P(i, j)"MaxBcostT?P(i, j, d).
Fig. 2(b) shows an example of mask that is produced by adaptive window search. In this mask, the black regions correspond to the valid disparity from the adaptive window search, whereas the white regions correspond to invalid disparity. Therefore, the white regions have to be processed by dynamic programming search described in the following section.
2.2. Disparity estimation by dynamic programming Dynamic programming "nds a global matching optimum as the best path between two sequences of primitives, where one of them should be ordered. Dynamic programming has been proven to be a very e!ective method for solving edge-based stereo matching within a scanline . It is based on the assumption that there are no edge reversals in the two corresponding scanlines. This is a realistic assumption because edge reversals rarely occur in natural images. When they do appear, even the Human Visual System
has problems with identifying correct matches. Cox et al.  pointed out that several good paths could be found through matching space by using only the occlusion and ordering constraints. They found set of correspondences that
C.S. Park, H.W. Park / Pattern Recognition 34 (2001) 2573}2576
maximized the cost function subject to ordering and unique constraints. Our proposed algorithm uses the same cost function as Cox et al. and use the maximum likelihood
, minimum horizontal plus vertical discontinuities (MLMHV) algorithm . Also we adopt the dynamic programming algorithm with sub-scanline that is de"ned as unreliable disparity region from previous adaptive window search. In the mask of Fig. 2(b), groups of consecutive invalid pixels (&XXXXXX' in Fig. 3) are selected along each scan line. Each group consists of the consecutive invalid pixels and their adjacent pixels having valid disparity values at the left- and right-hand side (&dXXXXXXd' in Fig. 3). Considering the start and end disparity values in each group, disparity space image (DSI) of the left and right images is generated as shown in Fig. 3 by using cost function of Cox et al. . The cost function, C(i, j), is a mean square error (MSE) function as follows:
where is a variance of white noise
in image and it is set to 4 in our experiments. The dynamic programming algorithm is applied to the DSI. The optimal path
is obtained by tracing the path that has the minimum cost from upper-right corner to lower-left corner. In Fig. 3, the optimal path is denoted by the sequence of the gray pixels. In the optimal path, we can calculate actual disparity values by using location di!erence between corresponding pixels in left and right images. In conventional disparity estimation method
, the dynamic programming is applied to whole horizontal scan line of left and right images so that its DSI is always square. However, our proposed method uses a part of scan line to generate DSI. Therefore, the number of pixels in the line segment
of left image may be di!erent from the number of corresponding pixels in right image as an
Fig. 4. The `pentagona stereo image pair of (a) left image, (b) disparity map from 5;5 window, (c) disparity map from MLMHV algorithm, and (d) disparity map from our proposed algorithm.
example of Fig. 3. Fig. 3 shows the case of d"d#1. The real disparity value of the matched pixel p in DSI is calculated as follows:
where (l, r) is the location of a pixel p in DSI and d is the disparity value of the left most pixel. If one matched pixel (&A' in Fig. 4) has l"4, r"2 and d"5, then its real disparity value becomes 7. The "nal disparity map is obtained by "lling the disparity values obtained by dynamic programming search at the invalid pixels of the adaptive window search.
Fig. 3. An example of disparity space image (DSI) when d "d #1.
3. experiment results
Experiments are performed in Pentium PC using Visual C ;; for selected natural images. In most cases, our algorithm outperforms the window or dynamic programming only methods. One result of Pentagon image is shown in Fig. 4, in which the proposed algorithm shows clear contour of the building and it shows less errors in the area of roads and parking lots
around the building. Other methods show incorrect disparity shape in roof of building, or more noise around the building. Since the quantitative evaluation of the disparity estimation is very di$cult, it is usually performed by subjective visual evaluation. Even though it is not a direct
C.S. Park, H.W. Park / Pattern Recognition 34 (2001) 2573}2576
Table 1 PSNR (dB) of the reconstructed left images from corresponding right images by using disparity maps
5;5 window MLMHV
the image size, and the images have a maximum gray value of 255. In the occluded region of the stereo image pair, we cannot reconstruct left image from right image correctly because there is no information of occluded region in one of the two images. The occluded region can be detected by dynamic programming, but the region is not specially treated for PSNR calculation in this paper. Therefore, there may be a discord between the PSNR and the subjective visual evaluation. PSNRs of the reconstructed left images are given in Table 1. In general, our method shows good results in most of the images.
evaluation, we performed a quantitative evaluation that compared the original left image and the reconstructed left image from the right image using the disparity map by measuring peak signal-to-noise ratio (PSNR) as follows:
PSNR"10 log MSE dB,
1 )\ )\
MSE" KK G
( f*(i, j)!fI *(i, j)).
left image from map, K;K is
References  U.R. Dohnd, J.K. Aggarwal, Structure from stereo: a review, IEEE Trans
. SMC 19 (6) (1989) 1489}1510.  E. Trucco, A. Verri, Introductory Techniques for 3-D Computer Vision
, Prentice-Hall, Englewood Cli!s, NJ, 1998.  T. Kanade, M. Okutomi, A stereo matching algorithm with an adaptive window: theory and experiment, IEEE Trans. Pattern Anal. Mach. Intell. 16 (9) (1994) 920}932.  Y. Ohta, T. Kanade, Stereo by intra- and inter-scanline search using dynamic programming, IEEE Trans. Pattern Anal. Mach. Intell. 7 (2) (1985) 139}154.  I.J. Cox, S.L. Hingorani, S.B. Rao, B.M. Maggs, A maximum likelihood stereo algorithm, Comput. Vision Image Understanding 63 (3) (1996) 542}567.
CS Park, HW Park