Stereo Vision
Depth from a pair of images
Camera calibration
Use a checkerboard pattern to calibrate camera (set up a known reference plane)
Can calculate a homography based on 3 images for camera calibration
Can also model lens distortion or something
Depth
From an image we can get
Distant objects are smaller
Close objects occlude distant
Motion cues (parallax)
Perspective - converging lines
Depth from focus
Atmospheric effects
But it is independent of these - random dot stereograms
Difference between views is disparity
z = f b / (u_1 - u_2) f is focal length, b is distance between pinholes, z = depth
Estimating depth
(for two cameras with same settings)
Rectify images (warp to a simplified form where the match is only along some horizontal distance)
Don't need SIFT if using the same/similar cameras to take both pictures!
Block matching, SAD or SSD. Sliding window essentially to find the minimum points.
More generally (i.e. cameras have different calibration parameters, not a stereo camera)
Rotation between cameras
Translation along all three axes
Align XYZ plane with one camera
Epipolar geometry
Epipolar line is the intersection of the epipolar plane
plane defined by some point P, and the two centres of the cameras
If uncalibrated, we can calculate the epipolar line using fundamental matrix
7 degrees of freedom
maps image points to their corresponding epipolar lines
encapsulates both intrinsic and extrinsic properties of cameras
If calibrated, we can calculate the epipolar line using essential matrix (factor out the two K=K'=I)
relates one point on a camera to the point where it is on the other camera
without say distance between points we cannot determine absolute pose
To estimate it using corresponding image points, the intrinsic parameters of both cameras must be known.
5 degrees of freedom (3 for rotation, 3 for translation, -1 for unknown scale)
Stereo estimation
Fundamental matrix
8-point algorithm: linear system p'^T F p = 0
7-point algorithm works as only 7 DoF but harder and gives multiple solutions
Usually have lots of points.
7-point if you want fewer RANSAC trials! or need constraints on F
Solve via normalised DLT
Estimates relative pose
Estimate 3D structure
Minimise reprojection error via non-linear least squares
Essential matrix
Estimate F, compute E = K'^T F K
8 or 7 point algorithm
Alternatively: 5 point algorithm
Complex, non-linear system
Up to 10 solutions
But - fewer RANSAC trials, enforces constraints
Structural estimation (3D points)
Need the relative pose between camera (e.g. rotation, R, and translation, t, between cameras)
E tells us about R and t
E = [t]xR
Use singular value decomposition of E to get the solutions
We know w must lie in front of both cameras (to select the correct solution)
Can minimise for algebraic or for non-linear in true K, K', R, t, w_i
Initial estimate from algebraic error minimised
Minimise further via Levenberg-Marquardt
Multi-view stereo
Naive - Incremental two view approach
Better - Identify best matching pair (and reconstruct relative pose + 3D points)
Matching images
Bruteforce (matching all pairs) is expensive
Bag of Words
Group features into 'words'
Count how many times each 'word' appears in each image (histogram)
Compare histograms (Similar histograms -> similar images)
Bag of Words as vectors ignores word order and is mostly applied to counts (but can be a binary present/not)
For images
Cluster features (k clusters e.g. k-means where centers become k-'words')
Detect and describe features, count features closest to each word
An example could be number of circle features present, number of triangle features etc etc
Therefore you can do two-view stereo for relative pose and then construct 3D points from best matches
Absolute pose
Minimum need 3 points to determine pose from 2D-3D matches
Determine P (pose of camera) from the world space points A,B,C using distances and angles
P is intersection of spheres, i.e. centered at A with radius |P->A| and ....
Two solutions (use a fourth point or RANSAC to disambiguate)
2 spheres gives a ring, another sphere gives two points on that ring
Always minimising reprojection error
but reprojection error with Levenberg-Marquardt needs jacobian matrix that scales with number of measurements.. multiple images big jacobian!
But a lot of zeroes are present (sparse matrix), and they are predictable so we can skip them
SLAM
Need to built a map of the environment, and know where it is within the environment
Therefore, Simultaneous localisation and mapping
kinda like multi-view stereo, but often with a speed requirement.
Small errors add up over time, need to detect when you return to previous regions to do 'loop closure' to correct for errors.
Last updated
Was this helpful?