“Machine learning is not magic; it’s just a tool, like a hammer or a wrench. And like any tool, it has limitations, which you need to understand in order to use it effectively.”
Pedro Domingos
Measuring distances between points or vectors plays a crucial role in machine learning algorithms like k-Nearest Neighbours and k-Means. The distance allows us to understand the similarity between the vectors, thereby comparing the hypothesis presented to us by the model and the ground truth. Machine learning algorithms call for different types of distance metrics, depending on the type of problem statement.
In this blog, we will talk about 5 different types of distance metrics that are used in machine learning along with their Python code.
- Minkowski Distance
- Hamming Distance
- Cosine Distance
1. Minkowski Distance
Minkowski Distance represents a generalized form of Manhattan and Euclidean distance metrics. The formula for Minkowski distance is given as:
\( (\sum_{i=1}^{n} |x_i – y_i|^p)^{1/p}\)
By substituting the value of \(p\) in the formula above, we can determine the distance between vectors differently. Hence, Minkowski distance is also popularly known as Lp Norm. This name might ring a bell if you are familiar with the topic of regularization in machine learning. We calculate the L2 norm to reduce the overfitting of the model.
Some commonly used values of p are:
- p = 1, which gives us the Manhattan Distance or the L1 Norm
- p = 2, which gives us the Euclidean Distance or the L2 Norm
- p = ∞, which gives us the Chebychev Distance or the L∞ Norm
We will discuss Manhattan and Euclidean distance.
1.1 Manhattan Distance
If you look at the map of Manhattan, you’ll notice that the roads are perpendicular to each other and the area looks like a matrix.

As it is clear, one must need to follow a grid-like path to calculate the distance between two points. This is illustrated in the example below:

The path from A to B is not straight and involves taking a turn as highlighted in red.
This is the idea behind calculating the Manhattan distance between two points. In a given two-dimensional space, the distance is determined not by taking the ‘direct path’ from one point to another but by following along the X-axis or the Y-axis at a time.

Manhattan distance is calculated by substituting the value of p by 1 in the Minkowski formula.
\( \text{Manhattan distance} = \sum_{i=1}^{n} |x_i – y_i|\)
The L1 norm finds its application in cases where the dimension of data is high. In the paper “On the Surprising Behavior of Distance Metrics in High Dimensional Space” by Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim, it is stated that lower values of p are preferable for problems with high value of dimensionality.
Let us go ahead and try this in Python. For this, we will consider a vector \((1, 1)^T\) and calculate its distance from the origin.
1.2 Euclidean Distance
Contrary to Manhattan distance, Euclidean distance provides us with the direct distance between two points. The following illustration represents the difference between Manhattan and Euclidean distance.

As mentioned earlier, setting the value to p to 2 in Minkowski distance yields the Euclidean distance between two points:
\(\text{Euclidean distance} = (\sum_{i=1}^{n} |x_i – y_i|^2)^{1/2} = \sqrt{\sum_{i=1}^{n} (x_i – y_i)^2}\)
Euclidean distance also helps in determining the length of the vector. In such a case, the distance is taken between the vector and the origin. One particular application of this which I personally came across was calculating the length of the eigenvector. The length of the vector provided information on how much the basis vector would scale upon transformation.
2. Hamming Distance
Hamming distance is used to compare two data strings. While it is mostly implemented to compare binary data strings, Hamming distance, in general, returns the number of positions at which the corresponding elements in two strings of equal length are different. One popular example of its application is in coding theory, where the minimum Hamming distance is used to detect errors between two data.
Let us go ahead and try one example where we will compare two strings and calculate the Hamming distance between them.
When you run the above example, you will notice that the elements in string_2 differ from that in string_1 at four different places, namely the letters h, s, r, and o. Hence the function outputs the Hamming distance to be 4. Go ahead and try it yourself.
3. Cosine Distance
The procedure to find cosine distance involves two steps: calculating the cosine similarity and then the distance itself. The formula for cosine similarity is given by the following formula:
\(\text{cosine similarity} = cos \theta = \frac{A \cdot B}{\sqrt{\sum_{i=1}^{n} A_i^2}\sqrt{\sum_{i=1}^{n} B_i^2}}\)
The cosine similarity function is represented by calculating the dot product of two vectors and dividing the product by the Euclidean distance of each. This similarity serves as a complement to the cosine distance in a positive space, that is:
\(\text{cosine distance} + \text{cosine similarity} = 1\)
\(\Rightarrow \text{cosine distance} = 1 – \text{cosine similarity}\)
Let us break this and understand this with an example. Consider two vectors \(A\) and \(B\) where \(A = (1, 1)^T\) and \(B = (1, 0)^T\).
Calculating the cosine similarity between them, we get
\(\text{cosine similarity} = cos \theta = \frac{(1, 1) \cdot (1, 0)}{\sqrt{2}\sqrt{1}}=0.7071\)
\(\text{cosine distance} = 0.2928\)
The IDE below shows the Python code for this example.
Cosine distance is widely implemented in algorithms like k-Nearest Neighbors where the distance between the “neighbors” is calculated using the cosine distance. One popular example of this is in movie recommendations where similar movies are recommended based on the movies the user liked previously.
Summary
In this blog, we saw three main types of distances: Minkowski distance, Hamming distance, and Cosine distance. Further, we distinguished between the sub-categories of Minkowski distance, i.e., Manhattan distance and Euclidean distance. Manhattan distance is preferred for data with high dimensionality, whereas the application of Euclidean distance is in avoiding overfitting in machine learning models. Furthermore, Hamming distance is used in coding theory, and lastly, cosine distance is implemented in algorithms like kNN and movie recommending systems.
If you’re interested in such exciting and interesting topics of machine learning, subscribing to a blog newsletter is a great way to stay informed and engaged. By subscribing, you’ll receive regular updates on the topics of Machine Learning, OpenCV, and Python. Learn a little every week!