Computer Vision Notes
- 3 minsThese are some processed notes of mine from a Computer Vision course I took at Aalto University in 2023.
Computer Vision Notes
1. Camera Calibration & Projective Geometry
Intrinsic Camera Matrix
\(K = \begin{bmatrix} f & s & u_0 \\ 0 & a f & v_0 \\ 0 & 0 & 1 \end{bmatrix}\)
- Parameters:
- $f$: Focal length
- $s$: Skew (non-rectangular pixels)
- $(u_0, v_0)$: Principal point
- $a$: Aspect ratio
Homogeneous Coordinates
- Projective transformations have 8 degrees of freedom (DoF) because scaling is irrelevant:
\(c \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} \equiv \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix}\)
- Parallel lines intersect at infinity in projective space.
2. Feature Detection & Matching
Harris Corner Detection
- Compute gradients $I_x$, $I_y$ (using Sobel filters).
- Construct second-moment matrix $M$:
\(M = \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix}\)
- Corner response:
\(R = \det(M) - k \cdot \text{trace}(M)^2\)
- Eigenvalues:
- $\lambda_1 \gg \lambda_2$: Edge
- $\lambda_1 \approx \lambda_2$ (large): Corner
- Both small: Flat region
SIFT Descriptors
- 128-dimensional vector per keypoint:
- Divide 16×16 neighborhood into 4×4 sub-patches.
- Compute 8-bin orientation histograms per sub-patch.
- Matching: Use NNDR (Nearest Neighbor Distance Ratio):
\(\text{NNDR} = \frac{\text{distance to 1st NN}}{\text{distance to 2nd NN}} \leq 0.8\)
3. Optical Flow & Motion Estimation
Lucas-Kanade Method
- Assumptions:
- Brightness constancy.
- Small motion between frames.
- Spatial coherence (local pixels move similarly).
- Equation:
\(\begin{bmatrix} I_x & I_y \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = -I_t\)
- Solved via least squares (normal equations).
Aperture Problem
- Ambiguity in motion direction when only edge information is available.
4. RANSAC & Model Fitting
RANSAC Algorithm
- Randomly sample minimal points (e.g., 4 for homography).
- Fit model (e.g., line, homography).
- Count inliers (points within threshold $t$).
- Refit model using all inliers.
- Threshold $t$:
\(t^2 = 3.84 \sigma^2\)
- Number of iterations $N$:
\(N = \frac{\log(1 - p)}{\log(1 - (1 - e)^s)}\)
- $p$: Desired success probability (e.g., 0.99).
- $e$: Outlier ratio.
5. Hough Transform
Line Detection
- Edge detection (e.g., Canny).
- Vote in $(\theta, \rho)$ space:
- Each edge point votes for all lines passing through it.
- Peaks in Hough space = detected lines.
6. Triangulation & 3D Reconstruction
Triangulation Equations
Given two camera matrices $P_1$, $P_2$ and corresponding points $x_1$, $x_2$:
\(x_1 \times (P_1 X) = 0, \quad x_2 \times (P_2 X) = 0\)
- Solved via SVD.
Bundle Adjustment
- Non-linear optimization to refine:
- Structure (3D points $X$).
- Motion (camera matrices $P_i$).
7. Deep Learning for Vision
CNN Basics
- Pooling: Reduces dimensionality (max/average pooling).
- Softmax + Cross-Entropy Loss:
\(\mathcal{L} = -\sum t_i \log(p_i)\)
Region Proposal Networks (RPN)
- Proposes bounding boxes for object detection.
References & Resources
- Visualizing Linear Transformations (Geogebra)
- RANSAC Lecture Notes (PSU)
- Homogeneous Coordinates (Song Ho)