Garth Wales
  • Introduction
  • META
  • To be added elsewhere
  • University
    • 400 Project Research
    • Aims
    • Presentation
    • Report
    • Work Diary
  • Deep Declarative Networks
    • Potential avenues
    • Work Diary
    • Deep Declarative Networks
    • Image Segmentation
  • shortcuts / syntax / setup
    • Jupyter
    • SSH
    • Python
    • CUDA
    • Screen
    • Markdown Syntax
    • Git
    • Dendron
    • Visual Studio Code
    • LaTeX
    • Windows
    • Bash
    • Vim
  • Machine Learning
    • Markov decision process
    • Reinforcement learning
    • Deep Q Network
    • Modern hopfield network
    • Object Detection
    • Face detection
    • Activation Functions
    • Gym
    • RL frameworks
    • Hyperparameters
    • Interpretability
  • Reinforcement learning
    • Memory
    • Exploration techniques
    • Agents
  • garth.wales
    • Website redesign
  • Computer Graphics and Vision
    • Image Mosaicing
    • Stereo Vision
    • Rendering
    • Exam papers
  • Neural Networks
    • Introduction
    • Learning
    • Learning theory
    • Stochastic Neural Networks
    • Interpretability
    • Word Respresentation
    • Language modelling with recurrent networks
    • Sequence-to-sequence networks
    • Exam
Powered by GitBook
On this page
  • Camera calibration
  • Depth
  • Estimating depth
  • Epipolar geometry
  • Stereo estimation
  • Fundamental matrix
  • Essential matrix
  • Structural estimation (3D points)
  • Multi-view stereo
  • Matching images
  • Absolute pose
  • SLAM

Was this helpful?

  1. Computer Graphics and Vision

Stereo Vision

Depth from a pair of images

Camera calibration

Use a checkerboard pattern to calibrate camera (set up a known reference plane)

Can calculate a homography based on 3 images for camera calibration

Can also model lens distortion or something

Depth

From an image we can get

  • Distant objects are smaller

  • Close objects occlude distant

  • Motion cues (parallax)

  • Perspective - converging lines

  • Depth from focus

  • Atmospheric effects

But it is independent of these - random dot stereograms

Difference between views is disparity

z = f b / (u_1 - u_2) f is focal length, b is distance between pinholes, z = depth

Estimating depth

(for two cameras with same settings)

  • Rectify images (warp to a simplified form where the match is only along some horizontal distance)

  • Don't need SIFT if using the same/similar cameras to take both pictures!

  • Block matching, SAD or SSD. Sliding window essentially to find the minimum points.

More generally (i.e. cameras have different calibration parameters, not a stereo camera)

  • Rotation between cameras

  • Translation along all three axes

Align XYZ plane with one camera

Epipolar geometry

Epipolar line is the intersection of the epipolar plane

  • plane defined by some point P, and the two centres of the cameras

If uncalibrated, we can calculate the epipolar line using fundamental matrix

  • 7 degrees of freedom

  • maps image points to their corresponding epipolar lines

  • encapsulates both intrinsic and extrinsic properties of cameras

If calibrated, we can calculate the epipolar line using essential matrix (factor out the two K=K'=I)

  • relates one point on a camera to the point where it is on the other camera

  • without say distance between points we cannot determine absolute pose

  • To estimate it using corresponding image points, the intrinsic parameters of both cameras must be known.

  • 5 degrees of freedom (3 for rotation, 3 for translation, -1 for unknown scale)

Stereo estimation

Fundamental matrix

  • 8-point algorithm: linear system p'^T F p = 0

    • 7-point algorithm works as only 7 DoF but harder and gives multiple solutions

    • Usually have lots of points.

    • 7-point if you want fewer RANSAC trials! or need constraints on F

  • Solve via normalised DLT

  • Estimates relative pose

  • Estimate 3D structure

  • Minimise reprojection error via non-linear least squares

Essential matrix

  • Estimate F, compute E = K'^T F K

    • 8 or 7 point algorithm

  • Alternatively: 5 point algorithm

    • Complex, non-linear system

    • Up to 10 solutions

    • But - fewer RANSAC trials, enforces constraints

Structural estimation (3D points)

Need the relative pose between camera (e.g. rotation, R, and translation, t, between cameras)

E tells us about R and t

  • E = [t]xR

Use singular value decomposition of E to get the solutions

  • We know w must lie in front of both cameras (to select the correct solution)

Can minimise for algebraic or for non-linear in true K, K', R, t, w_i

  • Initial estimate from algebraic error minimised

  • Minimise further via Levenberg-Marquardt

Multi-view stereo

  • Naive - Incremental two view approach

  • Better - Identify best matching pair (and reconstruct relative pose + 3D points)

Matching images

Bruteforce (matching all pairs) is expensive

Bag of Words

  • Group features into 'words'

  • Count how many times each 'word' appears in each image (histogram)

  • Compare histograms (Similar histograms -> similar images)

Bag of Words as vectors ignores word order and is mostly applied to counts (but can be a binary present/not)

For images

  • Cluster features (k clusters e.g. k-means where centers become k-'words')

  • Detect and describe features, count features closest to each word

    • An example could be number of circle features present, number of triangle features etc etc

Therefore you can do two-view stereo for relative pose and then construct 3D points from best matches

Absolute pose

Minimum need 3 points to determine pose from 2D-3D matches

Determine P (pose of camera) from the world space points A,B,C using distances and angles

  • P is intersection of spheres, i.e. centered at A with radius |P->A| and ....

  • Two solutions (use a fourth point or RANSAC to disambiguate)

    • 2 spheres gives a ring, another sphere gives two points on that ring

Always minimising reprojection error

  • but reprojection error with Levenberg-Marquardt needs jacobian matrix that scales with number of measurements.. multiple images big jacobian!

    • But a lot of zeroes are present (sparse matrix), and they are predictable so we can skip them

SLAM

Need to built a map of the environment, and know where it is within the environment

  • Therefore, Simultaneous localisation and mapping

kinda like multi-view stereo, but often with a speed requirement.

Small errors add up over time, need to detect when you return to previous regions to do 'loop closure' to correct for errors.

PreviousImage MosaicingNextRendering

Last updated 3 years ago

Was this helpful?