Jacky Yeh - HackMD

Essentials of 3D Reconstruction for Visual SLAM: A Beginner's Handbook
Introduction Introduction Single-view Geometry Camera Introduction Camera Calibration Camera Models (TODO) Image Transformation Single-view Metrology Perspective-n-Points / PnP Problem
Jacky Yeh changed a year agoBook mode Like Bookmark
Image Rectification
Image recification is to make epipolar lines epipolar lines are perfectly aligned horizontally between stereo image. Recall that in epipolar geometry post, we mentions two goals for epipolar geometry: Solve correspondence problem & Triangulation. In fact, we can solve these two problems more easily after image planes are rectified(aligned). Easier to reconstruct depth Top-view stereo vision
Jacky Yeh changed a year agoView mode Like Bookmark
Introduction
Hey there! 👋 I’m currently a master student of Robotics at UIUC. The motivation for this book came out of my desire for a gentle introduction to 3D vision. During the process of learning 3D vision, I found it hard to understand and connect some essential components. Even though abundant resources has been out there in the Internet, I just feel overwhelmed and frustrated when I was new to this field. This led me to ponder if there is a better way to organize, to provide beginners a smoother learning path. This book is still working in progress. Please feel free to reach out me for any questions and welcome to point out mistakes in article. My email: jackyyeh511@gmail.com Types of Reconstruction Problem
Jacky Yeh changed a year agoView mode Like 1 Bookmark
Homography Transformation
:::success Homography matrix describes the mapping relationship of points on the same ==plane== between different images. ::: Application Before delving in, let's look at some cool applications about homography transformation first! application 1: align pictures
Jacky Yeh changed a year agoView mode Like Bookmark
Image Transformation
Transformation in 2D Let us start from 2D space, and then extend the concept to 3D space later. Noted that we use the homogeneous coordinates in this post. So matrix $M_{3 \times 3}$ is actually a 2D -> 2D transformation. Name Matrix #DoF Preserves Shape translation
Jacky Yeh changed a year agoView mode Like Bookmark
Perspective-n-Point / PnP Problem
Problem Definition Perspective-n-Point is the problem of estimating the pose of a calibrated camera given a set of n 3D points in the world and their corresponding 2D points in the image. :::success Given: Intrinsic matrix $K$
Jacky Yeh changed a year agoView mode Like Bookmark
Camera Calibration
:::info Find a camera's intrinsic and extrinsic parameters, describing the projection from 3D world coord system to 2D camera coordinate system. ::: image Perspective Projection Converting from camera-coordinate $(x_{c}, y_{c}, z_{c})$ to image-coordinate $(u, v)$ includes three tasks: Length projection using pinhole camera formula.
Jacky Yeh changed a year agoView mode Like Bookmark
Epipolar Geometry
Previously, we explored the computation of a camera's intrinsic and extrinsic parameters through one or more views using standard camera calibration or single-view metrology procedures. These methods allowed us to extract information about the 3D world from a single image. However, ==it is generally impractical to reconstruct the complete structure of the 3D world solely from a single image==. This limitation arises from the intrinsic ambiguity inherent in mapping 3D to 2D: certain details are inevitably lost. For example, the following picture shows that a man holding up the Leaning Tower of Pisa can result in ambiguous scenarios. image The ambiguiy can be solved by looking into the same 3D scene in multiple views. If the number of view is 2, the forming geometry is called Epipolar Geometry. What is epipolar geometry? Definition & Terms :::success
Jacky Yeh changed a year agoView mode Like Bookmark
Structure from Motion
Right now we go beyond geometry of two cameras to multiple cameras. With observed points from multiple views, we will be able to simultaneously determine both 3D structure of the scene and camera motion simultaneously. This problem is framed as structure from motion. Problem Definition :::success Given: $m$ images of $n$ fixed 3D points $mn$ 3D -> 2D correspondences $$ x_{ij} = M_{i}X_{j} ~~~~~~~~~~~~~~~ i=1 \ldots m; ~~ j = 1 \ldots n
Jacky Yeh changed a year agoView mode Like Bookmark
Triangulation
:::success Reconstructing 3D point from its projection points on two or more images. ::: Two-view Triangulation Figure 1
Jacky Yeh changed a year agoView mode Like Bookmark
Least Square Solutions
Homogeneous Systems of Linear Equations In the homogeneous system of linear equations, the constant term in every equation is equal to 0. Solution can be divided into the following steps: A homogenous system has the form: $$ Ax = 0 $$ We need to impose some constraint to avoid trivial solution ($x=0$). Constraints may be like:
Jacky Yeh changed a year agoView mode Like Bookmark
Single-view Metrology
:::success Recover 3D structure with a single image. ::: 2D Points and Lines at Infinity 2D Line $$
Jacky Yeh changed 2 years agoView mode Like Bookmark
Camera Introduction
Pinhole Camera Let's design a camera. The simplest way is to curve out a hole on a curtain. The light beams ejected from the object pass through this pinhole, projecting onto the image plane. The relationship between projected height and real object's height is written as follow: $$ y = f \frac{Y}{Z} $$
Jacky Yeh changed 2 years agoView mode Like Bookmark
Visual Odometry
The term "Visual Odometry" was originally coined for an analogy with wheel odometry, a method that calculates a vehicle's motion by integrating the rotations of its wheels over time. Similarly, Visual Odometry functions by incrementally estimating the pose of a vehicle by analyzing changes in the images captured by its onboard cameras in response to its motion. When I first encountered this term "visual odometry", I got so confused, especially trying to understand how it differs from VSLAM and SFM. So let us first clarify the relationship of these confusing term: Finally I realized visual odometry is an another way to say ==sequential SFM== from Prof. Davide's great slides. The structure of this post is also mainly based on it. The overall VO pipeline is shown below. This post will be mainly focusing on untangling different type of motion estimation algorithms (2D-2D, 3D-3D, 3D-2D), leaving the rest of components to other posts.
Jacky Yeh changed 2 years agoView mode Like Bookmark