When we process images, we are altering or analyzing the pixel values within the image. Whether it is image blurring, image sharpening, edge detection, or even object recognition (heard of YOLO?), the pixels are analyzed and treated to yield us desired result. Whatever the transformation is, there is one common principle that plays an important role in these image-processing tasks: Convolution! Take a quick look here to see the capabilities of convolution and how you can use it on images.
If you have worked with image data, then you might be familiar with the term “convolution”! As per the definition, convolution is a mathematical process where the integral of the product of two functions yields a third function. Mathematically, it is described as follows:
Now this would be the reaction of many of you:
The formula does seem complicated and is challenging to wrap your head around the process. So let us keep the formula aside for a moment, and understand convolution via an example.
Understanding convolution through an example
[The example considered here is inspired by this article, so many thanks to its author].
Consider that you are working on a task in your company.
– You spend 6 hours on the first day of the task and 3 hours on the second day. So the task has the working hour sequence [6 3].
– Now, on the second day itself, you get another task. We assume that the sequence of working hours remains the same, i.e., [6 3]. So on the second day, you are working 3 hours on the first task and 6 hours on the second task. So a total of 9 hours.
-Further, on the third day, you get TWO new tasks. So now you have 6+6 hours of the 2 new tasks and 3 pending hours of the previous task. From now on, maybe most of us will simply lose the count! So there must be a simpler way to calculate this, and there is: Convolution.
Let us form a row vector of the incoming tasks and call it f = [1 1 2 1 3]. This means that on day 1, you get 1 task. On day 2, you again get 1 task. 2 more tasks add up on day 3. On day 4, there is a single task and on the last day, you are piled up with 3 tasks. The working pattern for one single task remains the same, i.e., [6 3]. We’ll denote it by g.
So we calculate the convolution between f and g. For that, we reverse the order of elements of g, i.e.[6 3] will be [3 6]. Now we slide this reversed g step by step under f, and calculate the product of every overlapping element, and sum up all the products:
1 1 2 1 3 |
6 3 |
3 |
1 1 2 1 3 |
6 3 |
3 9 |
1 1 2 1 3 |
6 3 |
3 9 12 |
1 1 2 1 3 |
6 3 |
3 9 12 |
1 1 2 1 3 |
6 3 |
3 9 12 15 |
1 1 2 1 3 |
6 3 |
3 9 12 15 15 |
1 1 2 1 3 |
6 3 |
3 9 12 15 15 18 |
We ultimately get [3 9 12 15 15 18] as our end result. As the tasks pile up, you will be working for 18 hours on the last day. Don’t worry, this is just a hypothetical scenario!
What we just did is convolution in one dimension. In the process, we found a pattern of increasing working tasks/hours in our vector f with the help of g.
When it comes to images, we call our function g as filters. Note that the terms “filter” and “kernel” are interchangeable. We use these filters to recognize the pattern in the input images (f). The process of convolution remains the same, i.e., the filter slides over the image pixels, and for every step, we calculate the sum of the products of the image and filter elements. The following GIF demonstrates convolution process between input image and a filter:
In this way, the filters convolved with the input images, learn the values of the images and try to understand the pattern within.
Popular Image Filters in Convolution
We will understand different image filters using the example of the image of Pickle Rick:
Let us have a look at the results we get when different filters undergo the convolution process with the original image:
This provides an overview of what effect different filters have on the image. If you want hands-on exercise with image processing and convolution using Python, I’ve got you covered. You can click on the following article to directly get started with image processing using Python:
If you happen to have any issues understanding the code, feel free to DM me on Instagram: @machinelearningsite.
Conclusion
In conclusion, convolution is a fundamental concept in the field of image recognition and machine learning. Through the process of convolution, mathematical operations are applied to input data, enabling the extraction of meaningful features and patterns. This technique is widely used in object recognition models to detect and classify objects within images.
Convolution allows for the identification of specific features in an image by utilizing filters or kernels. These filters act as templates that highlight certain characteristics such as edges, textures, or shapes. By convolving these filters with the input image, the model can capture and analyze important visual information.
Your support motivates me to create interesting content. All I ask is you to follow me on social media:
Already interested in more topics? Join my newsletter today. Don’t worry, you won’t get spammed everyday with unnecessary emails.