Geometric Transforms: Affine, Homography, Projective in CV

Master geometric transformations like Affine, Homography, and Projective in computer vision. Learn their math, use cases, and Python implementation for AI tasks.

Geometric Transformations in Computer Vision: Affine, Homography, and Projective Transforms

Geometric transformations are a cornerstone of computer vision, enabling fundamental operations such as image warping, alignment, object tracking, and panorama stitching. This guide delves into three of the most widely used transformations: Affine, Homography, and Projective transforms, explaining their characteristics, use cases, mathematical foundations, and practical implementation in Python using OpenCV.

What is a Geometric Transform in Computer Vision?

A geometric transform is a mathematical operation that alters the spatial position of pixels within an image. These transformations can be categorized into various types, including:

  • Translation: Shifting an image horizontally or vertically.
  • Rotation: Turning an image around a central point.
  • Scaling: Resizing an image, making it larger or smaller.
  • Shearing: Skewing an image along an axis.
  • Perspective Distortion: Simulating the effect of looking at an object from different angles, often seen in photography.

1. Affine Transformation

What is an Affine Transform?

An Affine Transformation is a linear mapping that preserves certain geometric properties:

  • Points: Individual points are transformed.
  • Straight Lines: Straight lines in the original image remain straight lines after the transformation.
  • Parallelism: Lines that are parallel in the original image remain parallel after the transformation.

However, affine transformations do not preserve:

  • Angles: The angles between lines may change.
  • Lengths: The distances between points may change.

Affine Transformation Matrix

The general form of an affine transformation in 2D can be represented as:

$$ \begin{bmatrix} x' \ y' \end{bmatrix} = \begin{bmatrix} a & b \ c & d \end{bmatrix} \begin{bmatrix} x \ y \end{bmatrix} + \begin{bmatrix} e \ f \end{bmatrix} $$

Where $(x, y)$ are the original coordinates and $(x', y')$ are the transformed coordinates. The matrix $\begin{bmatrix} a & b \ c & d \end{bmatrix}$ handles rotation, scaling, and shearing, while $\begin{bmatrix} e \ f \end{bmatrix}$ handles translation.

In homogeneous coordinates, this can be represented using a 2x3 matrix:

$$ \begin{bmatrix} x' & y' & 1 \end{bmatrix} = \begin{bmatrix} x & y & 1 \end{bmatrix} \begin{bmatrix} a & b & e \ c & d & f \end{bmatrix} $$

In OpenCV, an affine transformation is represented by a 2x3 matrix, commonly obtained using cv2.getAffineTransform().

Python Code: Affine Transformation using OpenCV

import cv2
import numpy as np

# Load an image
image = cv2.imread('input.jpg')
if image is None:
    print("Error: Could not load image.")
    exit()

rows, cols = image.shape[:2]

# Define three corresponding points in the source and destination triangles
# These points define the transformation
pts1 = np.float32([[50, 50], [200, 50], [50, 200]])
pts2 = np.float32([[10, 100], [200, 50], [100, 250]])

# Get the affine transformation matrix
M = cv2.getAffineTransform(pts1, pts2)

# Apply the transformation to the image
# The output image will have the same dimensions as the input
affine_result = cv2.warpAffine(image, M, (cols, rows))

# Display the results
cv2.imshow('Original Image', image)
cv2.imshow('Affine Transform', affine_result)
cv2.waitKey(0)
cv2.destroyAllWindows()

2. Homography (Projective Transformation)

What is a Homography?

A Homography is a more general transformation that maps points from one planar surface to another. It can achieve all the effects of an affine transformation and more, including:

  • Rotation
  • Translation
  • Scaling
  • Perspective Distortion

A Homography is represented by a 3x3 matrix. This allows it to model transformations that cannot be represented by affine transformations, such as the mapping of a rectangle to a trapezoid.

Homography Matrix Formula

The homography transformation is typically expressed using homogeneous coordinates:

$$ \begin{bmatrix} x' \ y' \ w' \end{bmatrix} = \begin{bmatrix} h_{11} & h_{12} & h_{13} \ h_{21} & h_{22} & h_{23} \ h_{31} & h_{32} & h_{33} \end{bmatrix} \begin{bmatrix} x \ y \ 1 \end{bmatrix} $$

After the matrix multiplication, the new 2D coordinates $(x', y')$ are obtained by dividing by the third component ($w'$):

$$ x' = \frac{x'}{w'} \ y' = \frac{y'}{w'} $$

The input and output coordinates $(x, y)$ and $(x', y')$ are generally in homogeneous form, meaning $(x, y, 1)$ and $(x'/w', y'/w', 1)$.

The homography matrix can be computed using functions like cv2.findHomography() (which can also filter outlier correspondences) or cv2.getPerspectiveTransform() (which requires exactly four points).

Python Code: Homography with OpenCV

import cv2
import numpy as np

# Load an image
image = cv2.imread('input.jpg')
if image is None:
    print("Error: Could not load image.")
    exit()

rows, cols = image.shape[:2]

# Define four corresponding points in the source and destination planes
# These points are crucial for defining the perspective transformation
# Example: mapping a rectangular area to a distorted quadrilateral
pts_src = np.float32([[100, 100], [400, 100], [100, 400], [400, 400]])
pts_dst = np.float32([[120, 80], [380, 120], [150, 420], [420, 400]])

# Get the homography matrix using the corresponding points
# cv2.findHomography returns the matrix and a status indicating which points were inliers
H, status = cv2.findHomography(pts_src, pts_dst)

# Apply the perspective warp to the image using the homography matrix
# The output size (cols, rows) specifies the dimensions of the transformed image
homography_result = cv2.warpPerspective(image, H, (cols, rows))

# Display the results
cv2.imshow('Original Image', image)
cv2.imshow('Homography Transform', homography_result)
cv2.waitKey(0)
cv2.destroyAllWindows()

3. Projective Transformation (Perspective Transform)

What is a Projective Transform?

A Projective Transform is a specific type of homography that describes the geometric relationship between two images of the same planar surface taken from different viewpoints. It's particularly useful for correcting perspective distortions, such as when viewing a flat surface (like a document or a wall) from an angle. Projective transformations can map quadrilaterals to other quadrilaterals, allowing for full perspective correction.

Perspective Transform Formula

The mathematical formulation for a projective (perspective) transform is identical to that of a homography, using a 3x3 matrix in homogeneous coordinates:

$$ \begin{bmatrix} x' \ y' \ w' \end{bmatrix} = \begin{bmatrix} p_{11} & p_{12} & p_{13} \ p_{21} & p_{22} & p_{23} \ p_{31} & p_{32} & p_{33} \end{bmatrix} \begin{bmatrix} x \ y \ 1 \end{bmatrix} $$

And the resulting 2D coordinates are:

$$ x' = \frac{x'}{w'} \ y' = \frac{y'}{w'} $$

In OpenCV, this transformation is computed using cv2.getPerspectiveTransform() which requires exactly four corresponding points, and applied using cv2.warpPerspective().

Python Code: Perspective Transform with OpenCV

import cv2
import numpy as np

# Load an image
image = cv2.imread('input.jpg')
if image is None:
    print("Error: Could not load image.")
    exit()

rows, cols = image.shape[:2]

# Define the source and destination points for the perspective transform
# These points define the corners of a rectangle in the source image
# and their corresponding positions in the transformed image.
src_pts = np.float32([[100, 100], [300, 100], [100, 300], [300, 300]])
dst_pts = np.float32([[80, 120], [320, 80], [120, 320], [320, 300]])

# Get the 3x3 perspective transformation matrix
matrix = cv2.getPerspectiveTransform(src_pts, dst_pts)

# Warp the image using the perspective transformation matrix
# The output size (cols, rows) defines the dimensions of the warped image
projective_result = cv2.warpPerspective(image, matrix, (cols, rows))

# Display the results
cv2.imshow('Original Image', image)
cv2.imshow("Perspective Transform", projective_result)
cv2.waitKey(0)
cv2.destroyAllWindows()

Affine vs. Homography vs. Projective Transform

Here's a summary of the key differences:

FeatureAffine TransformHomography / Projective Transform
Matrix Size2x33x3
Preserves ParallelismYesNo
Handles PerspectiveNoYes
Use CasesRotation, Scale, Shear, TranslationImage stitching, AR, Camera pose estimation, perspective correction
Minimum Points Required3 points4 points

Note: A Projective Transform is a specific instance of a Homography that maps a plane to a plane. In many practical computer vision contexts, "Homography" and "Projective Transform" are used interchangeably when referring to the 3x3 matrix transformation.

Conclusion

Understanding affine, homography, and projective transformations is crucial for a wide range of computer vision tasks, including image registration, object tracking, and panorama stitching. OpenCV provides powerful and efficient functions like cv2.getAffineTransform(), cv2.getPerspectiveTransform(), and cv2.findHomography() to easily implement these transformations, enabling sophisticated image manipulation and analysis.

SEO Keywords

Geometric transformations OpenCV, Affine transform Python OpenCV, Homography matrix explained, Perspective transform OpenCV code, Image warping techniques, Affine vs homography differences, OpenCV warpAffine example, cv2.getPerspectiveTransform tutorial, Panorama stitching homography, Projective transformation OpenCV.

Interview Questions

  • What is a geometric transformation in computer vision?
  • How does an affine transformation differ from a homography?
  • What geometric properties does an affine transformation preserve?
  • Explain the homography matrix and its role in image stitching.
  • How do you compute an affine transformation matrix in OpenCV?
  • What is the difference between homography and projective transform?
  • When would you use cv2.warpAffine versus cv2.warpPerspective?
  • How many points are required to compute an affine transform versus a homography?
  • Describe a practical application of projective transformation in computer vision.
  • Can affine transformations handle perspective distortion? Explain why or why not.