This project implements two fundamental approaches for computing disparity maps from stereo image pairs: Block Matching and Dynamic Programming. These methods are used to estimate depth information from stereo images, a core problem in computer vision.
Stereo vision is the process of extracting 3D depth information from two or more 2D images taken from different viewpoints. By finding corresponding pixels between stereo image pairs, we can calculate disparity (horizontal displacement), which is inversely proportional to depth.
- Autonomous vehicles (depth perception)
- 3D reconstruction
- Robot navigation
- Augmented reality
- Medical imaging
A sliding window approach that matches rectangular blocks between left and right images.
Features:
- Methods: Sum of Absolute Differences (SAD) and Sum of Squared Differences (SSD)
- Window sizes: 1×1, 5×5, 9×9 pixels
- Max disparity: 64 pixels
- Advantages: Simple, parallelizable, efficient
- Limitations: Sensitive to texture-less regions, produces noisy results
An optimization-based approach that finds the optimal pixel correspondence along scanlines.
Features:
- Cost function: Squared intensity difference normalized by σ²
- Occlusion handling: Penalty-based model (c₀ = 1)
- Ordering constraint: Preserves pixel order within scanlines
- Visualization: Alignment path plots showing pixel correspondences
- Advantages: Handles occlusions, globally optimal per scanline
- Limitations: O(N²) complexity, no inter-scanline consistency
- Python 3.8+
- Jupyter Notebook or JupyterLab
pip install opencv-python numpy matplotlib tqdmOr using conda:
conda install -c conda-forge opencv numpy matplotlib tqdmOpen and run block_matching.ipynb:
jupyter notebook block_matching.ipynbThe notebook will:
- Load 3 stereo image pairs from
images/ - Compute disparity maps using SAD and SSD with different window sizes
- Save results to
outputs/pair*_*.png
Open and run dynamic_programming.ipynb:
jupyter notebook dynamic_programming.ipynbThe notebook will:
- Load stereo image pairs
- Compute disparity maps using DP algorithm
- Generate visualizations with multiple colormaps
- Create alignment path plots (bonus section)
- Save all outputs to
outputs/
Block Matching:
max_disparity: Maximum horizontal search range (default: 64)window_size: Matching window size (1, 5, or 9)method: "SAD" or "SSD"
Dynamic Programming:
sigma: Pixel noise standard deviation (default: 2)c0: Occlusion penalty cost (default: 1)
-
Disparity Maps:
- Grayscale images where brightness indicates disparity (depth)
- Brighter pixels = larger disparity = closer to camera
-
Colormap Visualizations:
- Gray: Traditional representation
- Hot: Warm colors highlight large disparities
- Plasma: Perceptually uniform colormap
-
Alignment Paths (DP only):
- Shows optimal pixel matching for middle scanline
- Diagonal segments = matched pixels
- Horizontal/vertical segments = occlusions
| Method | Smoothness | Edge Preservation | Occlusion Handling | Speed |
|---|---|---|---|---|
| Block Matching (SAD) | Low | Medium | Poor | Fast |
| Block Matching (SSD) | Low | Medium | Poor | Fast |
| Dynamic Programming | High | Good | Excellent | Slow |
depth = (baseline × focal_length) / disparity
Where:
- baseline: Distance between camera centers
- focal_length: Camera focal length
- disparity: Horizontal pixel displacement
Sum of Absolute Differences (SAD):
cost(x,y,d) = Σ |Il(x+i, y+j) - Ir(x+i-d, y+j)|
Sum of Squared Differences (SSD):
cost(x,y,d) = Σ (Il(x+i, y+j) - Ir(x+i-d, y+j))²
Cost Matrix:
d(i,j) = (Il(i) - Ir(j))² / σ²
DP Recurrence:
D(i,j) = min {
D(i-1,j-1) + d(i,j) // match pixels
D(i-1,j) + c₀ // skip left pixel (occluded)
D(i,j-1) + c₀ // skip right pixel (occluded)
}
Complexity:
- Time: O(H × W²) for H×W image
- Space: O(W²) per scanline
- Computational efficiency: 1 channel vs 3 RGB channels
- Structural focus: Edges and textures visible without color
- Robustness: Less sensitive to lighting/color calibration differences
- Standard practice: Most stereo algorithms use intensity
Pixels maintain their relative order: if i₁ < i₂ in left image matches j₁ < j₂ in right image.
Pixels visible in one image but hidden in the other due to:
- Depth discontinuities
- Object boundaries
- Different viewpoints