When we work with images and real-world objects, we want to collect their coordinates and project them onto a plane for simplification. Such applications are useful in areas like self-driving cars, computer vision, and many more. These projections occur by transforming the coordinates of the points from one space to another. We call such transformation as projective transformation.
In this blog, we will talk about this transformation. Projective transformation involves coordinates in Euclidean and Homogeneous space, hence we will start by understanding those terms. Later in this blog, we will implement this concept into a small Python application. Below are the topics that we will cover in this article:
Table of Contents
[Note that throughout this blog, I will use the terms ‘homogeneous’ and ‘projective’ interchangeably].
Understanding homogeneous coordinates in 2-dimension
Imagine a point A in a two-dimensional plane whose coordinates are (x, y).
This point-A exists in Euclidean space. A third dimension is added to the vector when we project it to a homogeneous space. Hence, point A in a homogeneous or projective space will have coordinates (x, y, w). Now you may wonder where this ‘w‘ came from and what it means. To understand this, we will consider an example where you are watching the infamous 2021 Formula 1 Abu Dhabi Grand Prix on a projector (to relive the memory!). It is important to understand that when you are watching something through a projector, all the points of the motion picture are projected onto a projective plane.
Let us assume that the distance between the projector lens and the screen is 1 meter. Thus, some point on the projective plane will have coordinates (x, y, 1). This is to say that the third dimension here represents the distance between the image plane and the lens. Now what happens when you move the projector far from the screen? The image gets bigger. So, in terms of a homogeneous system, you are scaling the coordinates. If you double the value of w (2 meters), the coordinates will also get scaled up by a factor of two, hence the bigger picture!
The above theory can be generalized by saying that a point of n-dimension in Euclidean system will have (n+1) in Homogeneous system
.
To convert a point from homogeneous back to Euclidean space, you divide all the elements by the last element of the coordinates, i.e., w.
(x,y) —> (u,v,w) Euclidean to Homogeneous
(u, v, w) –> (u/w, v/w, w/w) –> (x, y, 1) –> (x, y) Homogeneous to Euclidean
Understanding homogeneous coordinates in 3-dimension
The same theory holds also for three-dimensional points, i.e., a point whose dimension is (x,y,z) in Euclidean space will have coordinates (x,y,z,w) in projective space. If we increase the value of w, the 3-dimensional coordinates will also be scaled up and vice versa.
Better a Unit value than a Random one
In practical applications, the value of w is generally chosen as 1. The reason for this is to avoid any scaling of the original coordinates. Hence, the entity (image, matrix, coordinates, etc.) does not grow or shrink when converting from Euclidean to Homogeneous. In special cases, values higher or lower than 1 are also preferred but it must never be zero. This is because converting the coordinates back from Homogeneous to Euclidean will result in infinity (division by zero).
Significance of Homogeneous Systems in Computer Vision
Now one might wonder, “Why bother working with a homogeneous system when we are all okay with our good old Euclidean space?” This question can be answered by two words- Matrix Transformations.
When we take a picture of a scene, we are transforming the image matrix from one system to another. This transformation consists of processes namely translation, rotation, and scaling. While rotation and scaling can easily be performed using matrix multiplication, translation is performed by adding a vector to the existing matrix. Let us look at an example. Assume that we have a point (x, y)T. Looking at its dimensions, it is clear that the point exists in Euclidean space. Let us have a look at the three transformations on this point:
Scaling:
\(
\left[\begin{matrix}
x _1\\ y_1
\end{matrix}\right] = \begin{bmatrix} k_x & 0 \\ 0 & k_y \end{bmatrix} \begin{bmatrix} x_0 \\ y_0 \end{bmatrix}\)
Rotation
\(
\left[\begin{matrix}
x _1\\ y_1
\end{matrix}\right] = \begin{bmatrix} cos\theta & -sin\theta \\ sin\theta & cos\theta \end{bmatrix} \begin{bmatrix} x_0 \\ y_0 \end{bmatrix}\)
Translation
\(
\left[\begin{matrix}
x _1\\ y_1
\end{matrix}\right] = \begin{bmatrix} a_x \\ a_y\end{bmatrix} + \begin{bmatrix} x_0 \\ y_0 \end{bmatrix}\)
As you see, both scaling and rotation can be performed on a given point simultaneously. However, translation needs an extra step and cannot be performed together with the other two operations.
Projective transformation helps us solve this problem. By adding an extra dimension, we can perform all the transformations by simple matrix multiplication. So the transformations now will look as follows:
Scaling
\(
\left[\begin{matrix}
x _1\\ y_1 \\ 1
\end{matrix}\right] = \begin{bmatrix} k_x & 0 & 0 \\ 0 & k_y & 0 \\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} x_0 \\ y_0 \\ 1 \end{bmatrix}\)
Rotation
\(
\left[\begin{matrix}
x _1\\ y_1 \\ 1
\end{matrix}\right] = \begin{bmatrix} cos\theta & -sin\theta & 0 \\ sin\theta & cos\theta & 0 \\ 0& 0& 1 \end{bmatrix} \begin{bmatrix} x_0 \\ y_0 \\ 1 \end{bmatrix}\)
Translation
\(
\left[\begin{matrix}
x _1\\ y_1 \\ 1
\end{matrix}\right] = \begin{bmatrix} 1 & 0 & a_x \\ 0 & 1 & a_y \\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} x_0 \\ y_0 \\ 1 \end{bmatrix}\)
Because all the transformations can now be represented by simple matrix multiplication, these can be combined into a single step in programming, thereby making the work efficient. Hence, working with homogeneous coordinates makes matrix transformations convenient for us as it saves us all those extra steps.
Now it’s time to put this theory into a simple application.
Python Implementation
Say we are developing a self-driving car and we want to project 3-dimensional LiDAR points onto 2-dimensional camera points. Performing this projective transformation involves converting 3D points from a LiDAR sensor’s coordinate system into 2D image points in a camera’s image plane. For this task, we assume that the camera’s intrinsic and extrinsic parameters are already known to us.
[Intrinsic parameters are properties of the camera that are related to its internal characteristics and how it captures images like focal length, principal point, lens distortion, etc. Extrinsic parameters describe the camera’s position and orientation in the 3D world relative to a reference coordinate system like the rotation matrix and transformation matrix.]
Below is a Python example that illustrates this projection using projective transformation:
import numpy as np
# Define camera intrinsic matrix (example values)
K = np.array([[1000, 0, 320],
[0, 1000, 240],
[0, 0, 1]])
# Define camera extrinsic matrix (example values)
R = np.array([[0.866, -0.5, 0],
[0.5, 0.866, 0],
[0, 0, 1]])
T = np.array([1, 0, 0])
# Define a LiDAR point in 3D coordinates (x, y, z)
lidar_point = np.array([10, 5, 2])
# Convert LiDAR point to camera coordinates using extrinsic parameters
lidar_point_homogeneous = np.append(lidar_point, 1) # Convert to homogeneous coordinates
camera_point_homogeneous = np.dot(np.hstack((R, T.reshape(3, 1))), lidar_point_homogeneous)
# Project camera coordinates to image coordinates using intrinsic parameters
projected_point_homogeneous = np.dot(K, camera_point_homogeneous)
projected_point = projected_point_homogeneous[:2] / projected_point_homogeneous[2]
# Display the results
print("LiDAR Point (X, Y, Z):", lidar_point)
print("Projected Point in Camera Coordinates (X, Y, Z):", camera_point_homogeneous[:3])
print("Projected Point in Image Coordinates (u, v):", projected_point)
In this example, we define the camera’s intrinsic and extrinsic parameters (K, R, T), representing the camera’s calibration and position in the world. We also define a LiDAR point in 3D coordinates, convert it to camera coordinates using extrinsic parameters, and then project it onto the camera’s image plane using intrinsic parameters. The result is the 2D image coordinates (u, v) where the LiDAR point is visible in the camera’s image.
Apart from projective transformation, another popular term that comes along in computer vision is convolution-the principle behind many image processing techniques and even object recognition. Have a look at the principle of convolution that plays a vital role in image processing and machine learning.
Summary
In this blog, we understood what homogeneous coordinates are and how projective transformation helps us to make things convenient and simple. Moreover, we developed a practical Python code where we transformed LiDAR points into camera points.
If you have read through all those mathematical equations and long texts, it shows that you enjoyed this post and found it interesting. You can support me by following me on social media:
I am certain that you would like to read more of such interesting posts. Subscribe to my weekly newsletter so you don’t miss out on any posts: