A robust stereo disparity estimation using adaptive window search and dynamic programming search, CS Park, HW Park

Tags: dynamic programming, window search, disparity estimation, natural images, window size, disparity values, stereo pair, disparity space, adaptive algorithm, Pentagon image, Chang Seob Park, disparity maps, horizontal scan line, optimal path, cost function, dynamic programming algorithm, PSNR, stereo image pair, algorithm, adaptive technique, Department of Electrical Engineering, Introductory Techniques for 3-D Computer Vision, Y. Ohta, A. Verri, J.K. Aggarwal, E. Trucco, T. Kanade, MSE, M. Okutomi, KK, IEEE Trans, Korea Advanced Institute of Science and Technology, Technical Research Institute, search window, Pattern Recognition Society, Pattern Recognition, window method
Content: pattern recognition 34 (2001) 2573}2576 Rapid and Brief Communication A robust stereo disparity estimation using adaptive window search and dynamiC PROGRAMMING search Chang Seob Park, Hyun Wook Park* Department of electrical engineering, Korea Advanced Institute of Science and Technology, 373-1 Kusung-dong, Yusong-gu, Taejon 305-701, South Korea Received 2 November 2000; accepted 21 November 2000
1. Introduction Most research e!orts in stereo vision have been focused on an accurate disparity estimation. Several factors, such as perspective projection, intensity inconsistency between stereo image pair, and occlusion of objects, make correspondence problem of the disparity estimation di$cult. To overcome these problems and to estimate an accurate disparity map, several schemes have been developed, which can be grouped into two categories such as area-based and feature-based approaches [1,2]. In area-based approaches, pixels or regions are used to measure the similarity between stereo pair. They yield dense disparity "elds but tend to fail because of local ambiguities in the correspondence. Various estimation techniques have been proposed to overcome these problems [1,2]. stereo matching by using correlation or sum of squared di!erence (SSD) is a basic technique that obtains a dense disparity map from stereo images. In the stereo matching, window size must be large enough to reduce noise-sensitive distortion. However, the position of maximum correlation or minimum SSD becomes ambiguous and blurred if the window size is too large. On the other hand, if the window size is too small, it gives poor disparity estimate when the signal-to-noise ratio of the stereo images is low. Therefore, the adaptive window method [3] had been proposed to overcome problems with "xed-size window. This paper provides a robust disparity estimation solution, which combines the adap- Also with Technical Research Institute, Korean Broadcasting System, Seoul, South Korea. * Corresponding author. Tel.: #82-42-869-3466; fax: #8242-869-3410. E-mail address: [email protected] (H.W. Park).
tive window search and the Dynamic programming search. 2. Proposed algorithm 2.1. Area-based disparity estimation by adaptive-size window Kanade and Okutomi [3] developed an adaptive technique with di!erent window size for each pixel, which selects an appropriate window by evaluating the local variation of the intensity and disparity. This method requires high computational costs for searching the disparity with the adaptive window iteratively. Our proposed adaptive algorithm is more e$cient in computational costs than their algorithm since it does not use iterative process. The proposed algorithm "rstly determines the search window size for each pixel, and then performs disparity estimate with the window. Fig. 1 shows an example of window size decision. Basically, we select window size such that the area within the window has similar gray values. Variance of an object boundary is apt to change abruptly. If search window includes the abruptly change area, disparity errors can be produced because there is much depth di!erence. The thick lines in Fig. 1 are object boundaries, and point T is a target point for disparity estimation. A left image of stereo pair is diagonally divided into four parts centered at the target point. For the four divided parts, the window size is expanded from center to each direction (x\, x>, y\, y>). While the window size increases to each direction, if the local variance of a pixel within the window exceeds a certain threshold (VTH"2 in this paper), then we stop expanding the window to such direction. This means that a certain disparity discontinuous points such as an object boundary is encountered. Local variance, var*(i, j), of a left image, f*(i, j), is obtained as
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 1 ) 0 0 0 1 6 - 4
C.S. Park, H.W. Park / Pattern Recognition 34 (2001) 2573}2576
Fig. 1. Procedure to determine adaptive-size window: (a) window size growing, and (b) "nal window of the rectangle with broken lines.
Fig. 2. Mask and disparity map: (a) original left image, (b) mask from the adaptive window search (white region corresponds to invalid disparity).
1 j)" 4
j#v)!avg* (i,
avg* (i,
j)" 4
In Fig. 1, increase of window size in the x\ direction stops at the point A, x> at the point B, y\ at the point C, and y> at the point D. Thus, the points of A, B, C, and D de"ne the x}start, x}end, y}start, and y}end, respectively, of a rectangular window (dotted rectangle in Fig. 1(b)). We con"ne the maximum window size, MAXW, for saving calculation time and the minimum window size, MINW, for e$cient matching. We select MINW"5;5 and MAXW"41;41 empirically in this paper. Disparity estimation is performed on both the intensity map and the variance map by using the adaptive-size window for more reliable estimation. The Correlation Function is used as a matching cost function as follows:
costD(i, j, d )"
+G\ ,H\ f*(i, j) f0(i!d, j)
, (3)
" +G\ ,H\ f*(i, j)"" +G\ ,H\ f*(i!d, j)"
where f*(i, j) and f0 pair, respectively,
(i, j) are left and window size is
right images M;N, and
of d
stereo is the
disparity. The cost function for variance, costT?P(i, j, d), are similarly calculated from local variance maps, var*(i, j) and var0(i, j), of left and right images, respectively, as de"ned in Eq. (1).
If the matched result is in con"dence with following
three criteria, then the value of d is the valid disparity.
However, the matched result does not satisfy with the
criteria, the pixel point is reserved to dynamic program-
ming search for more reliable estimation. Three criteria
are given as follows:
mrealaxt}ioconsotDf(ti,hje)'mo0s.9t,mwahtecrheemd apxo}incotsbtDet(iw, je)eins
the left
(2) rm2iLgaBh}xmt}ciamoxs}atDcgo(eiss,t,jD)i-.(ei2,.,jL)Bm}imsaxat}hxce}ocssoteDsct(oDi,n(jid)," j)l' aMrgTaeHxstBIcc,oosrtrDe(wlia,hjt,ieodr)ne. between left and right images, and THI is set to
0.0001 in this paper.
(3) Disparity values from stereo images and variance
maps where
of the images dD satis"es
maraex}icdoesnttDi(cia, jl),"i.ec.,osdtDD(" i, j,ddTD?P),,
dT?P satis"es max}costT?P(i, j)"costT?P(i, j, dT?P), and
max}costT?P(i, j)"MaxBcostT?P(i, j, d).
Fig. 2(b) shows an example of mask that is produced by adaptive window search. In this mask, the black regions correspond to the valid disparity from the adaptive window search, whereas the white regions correspond to invalid disparity. Therefore, the white regions have to be processed by dynamic programming search described in the following section.
2.2. Disparity estimation by dynamic programming Dynamic programming "nds a global matching optimum as the best path between two sequences of primitives, where one of them should be ordered. Dynamic programming has been proven to be a very e!ective method for solving edge-based stereo matching within a scanline [4]. It is based on the assumption that there are no edge reversals in the two corresponding scanlines. This is a realistic assumption because edge reversals rarely occur in natural images. When they do appear, even the Human Visual System has problems with identifying correct matches. Cox et al. [5] pointed out that several good paths could be found through matching space by using only the occlusion and ordering constraints. They found set of correspondences that
C.S. Park, H.W. Park / Pattern Recognition 34 (2001) 2573}2576
maximized the cost function subject to ordering and unique constraints. Our proposed algorithm uses the same cost function as Cox et al. and use the maximum likelihood, minimum horizontal plus vertical discontinuities (MLMHV) algorithm [5]. Also we adopt the dynamic programming algorithm with sub-scanline that is de"ned as unreliable disparity region from previous adaptive window search. In the mask of Fig. 2(b), groups of consecutive invalid pixels (&XXXXXX' in Fig. 3) are selected along each scan line. Each group consists of the consecutive invalid pixels and their adjacent pixels having valid disparity values at the left- and right-hand side (&dXXXXXXd' in Fig. 3). Considering the start and end disparity values in each group, disparity space image (DSI) of the left and right images is generated as shown in Fig. 3 by using cost function of Cox et al. [5]. The cost function, C(i, j), is a mean square error (MSE) function as follows:
j)"(f* (i,
j)!f0 4
where is a variance of white noise in image and it is set to 4 in our experiments. The dynamic programming algorithm is applied to the DSI. The optimal path is obtained by tracing the path that has the minimum cost from upper-right corner to lower-left corner. In Fig. 3, the optimal path is denoted by the sequence of the gray pixels. In the optimal path, we can calculate actual disparity values by using location di!erence between corresponding pixels in left and right images. In conventional disparity estimation method, the dynamic programming is applied to whole horizontal scan line of left and right images so that its DSI is always square. However, our proposed method uses a part of scan line to generate DSI. Therefore, the number of pixels in the line segment of left image may be di!erent from the number of corresponding pixels in right image as an
Fig. 4. The `pentagona stereo image pair of (a) left image, (b) disparity map from 5;5 window, (c) disparity map from MLMHV algorithm, and (d) disparity map from our proposed algorithm.
example of Fig. 3. Fig. 3 shows the case of d"d#1. The real disparity value of the matched pixel p in DSI is calculated as follows:
disparity(p)"l!r#d ,
where (l, r) is the location of a pixel p in DSI and d is the disparity value of the left most pixel. If one matched pixel (&A' in Fig. 4) has l"4, r"2 and d"5, then its real disparity value becomes 7. The "nal disparity map is obtained by "lling the disparity values obtained by dynamic programming search at the invalid pixels of the adaptive window search.
Fig. 3. An example of disparity space image (DSI) when d "d #1.
3. experiment results Experiments are performed in Pentium PC using Visual C ;; for selected natural images. In most cases, our algorithm outperforms the window or dynamic programming only methods. One result of Pentagon image is shown in Fig. 4, in which the proposed algorithm shows clear contour of the building and it shows less errors in the area of roads and parking lots around the building. Other methods show incorrect disparity shape in roof of building, or more noise around the building. Since the quantitative evaluation of the disparity estimation is very di$cult, it is usually performed by subjective visual evaluation. Even though it is not a direct
C.S. Park, H.W. Park / Pattern Recognition 34 (2001) 2573}2576
Table 1 PSNR (dB) of the reconstructed left images from corresponding right images by using disparity maps
5;5 window MLMHV
Parkmeter 31.7
Pentagon 21.5
the image size, and the images have a maximum gray value of 255. In the occluded region of the stereo image pair, we cannot reconstruct left image from right image correctly because there is no information of occluded region in one of the two images. The occluded region can be detected by dynamic programming, but the region is not specially treated for PSNR calculation in this paper. Therefore, there may be a discord between the PSNR and the subjective visual evaluation. PSNRs of the reconstructed left images are given in Table 1. In general, our method shows good results in most of the images.
evaluation, we performed a quantitative evaluation that compared the original left image and the reconstructed left image from the right image using the disparity map by measuring peak signal-to-noise ratio (PSNR) as follows:
PSNR"10 log MSE dB,
1 )\ )\
( f*(i, j)!fI *(i, j)).
rIinghEtqi.m(7a)g,efI *u(si,inj)gisobthtaeinreedcodnisstpruarcitteyd
left image from map, K;K is
References [1] U.R. Dohnd, J.K. Aggarwal, Structure from stereo: a review, IEEE Trans. SMC 19 (6) (1989) 1489}1510. [2] E. Trucco, A. Verri, Introductory Techniques for 3-D Computer Vision, Prentice-Hall, Englewood Cli!s, NJ, 1998. [3] T. Kanade, M. Okutomi, A stereo matching algorithm with an adaptive window: theory and experiment, IEEE Trans. Pattern Anal. Mach. Intell. 16 (9) (1994) 920}932. [4] Y. Ohta, T. Kanade, Stereo by intra- and inter-scanline search using dynamic programming, IEEE Trans. Pattern Anal. Mach. Intell. 7 (2) (1985) 139}154. [5] I.J. Cox, S.L. Hingorani, S.B. Rao, B.M. Maggs, A maximum likelihood stereo algorithm, Comput. Vision Image Understanding 63 (3) (1996) 542}567.

CS Park, HW Park

File: a-robust-stereo-disparity-estimation-using-adaptive-window-search.pdf
Title: PII: S0031-3203(01)00016-4
Author: CS Park, HW Park
Published: Thu Aug 23 11:18:30 2001
Pages: 4
File size: 0.21 Mb

, pages, 0 Mb

Experience-based learning, 10 pages, 0.06 Mb
Copyright © 2018 doc.uments.com