Mastering Image Processing using Python: 6 Hands-On Exercises to Enhance Your Skills

You may have heard of object recognition models like YOLO, Fast R-CNN, LeNet, etc. The input these models take for object recognition is an image matrix. When we train image data for such use cases, we try to manipulate the samples (commonly known as image processing), to decrease overfitting. In this way, we allow the model to generalize well over the entire dataset.

In this blog on image processing using Python, we will see different techniques and exercises to manipulate images using Python libraries like OpenCV and Pillow. But before we proceed, let us quickly go through the fundamentals of image data. Understanding these fundamental concepts is crucial for tasks like image enhancement, filtering, and feature extraction in Python. Here’s a quick overview of the topics we will see in this blog:

Image Processing Fundamentals

When you click an image using your phone or a camera, the digital image that is saved contains four layers, namely, Red, Green, Blue, and Alpha. Alpha is the layer for opacity, so assuming that we are working with images that are 100% opaque, we will ignore this layer in this scope.

The RGB layers contain values from 0 to 255, and a combination of these values produces different colors in an image. Hence each pixel in the image is represented by a combination of red, green, and blue (RGB) values. Pixels of a black image will have a value of 0 across all three layers. White image, on the contrary, will have a value of 255. This makes sense as black represents the absence of any information/value whereas the combination of all colors yields white. The table below shows how different RGB values give us different colors:

COLORRGB Value
Red(255,0,0)
Blue(0,0,255)
Green(0,255,0)
Yellow(255,255,0)
Purple(128,0,128)
Black(0,0,0)
White(255,255,255)
Gray(128,128,128)

Grayscale images, on the other hand, use a single intensity value per pixel. As it might be obvious, 0 represents the black pixels in a grayscale image whereas 255 represents the white pixels. These image arrays can be of various data types, such as integers or floating-point numbers, depending on the image’s bit depth and the required precision for the task at hand.

Now that we have knowledge of image data type, let us start with image processing techniques in Python.

Image Processing in Python

Before we start with our source code, we need to install the dependencies that we will be using for image processing. There are two popular Python libraries that we need to install in our virtual environment: Pillow and OpenCV.

pip3 install Pillow
pip3 install opencv-python

Now that we have our dependencies installed, we can start with the code. We will use an image of a very infamous criminal who never got caught:

image processing using python

1. Image Resizing

Image resizing is the process of changing the dimensions of an image. When you resize an image, you alter the image matrix to adjust the number of pixels in both width and height. This can lead to a decrease in image detail when reducing the size or an increase in detail when enlarging. Resizing is a simple but crucial operation for preparing images for various applications, and it can impact the image’s overall quality. It is often used to make images more suitable for specific purposes, like displaying on a web page or reducing the computational load for processing.

from PIL import Image

# Open an image file
img = Image.open('input.jpg')

# Resize the image to a specific size
img_resized = img.resize((200, 100))
img_resized.save('output.jpg')

Output:

image processing using python

2. Image Rotation

Rotating an image allows you to change its orientation. During this process, the image matrix is transformed by adjusting the positions of pixels based on the desired rotation angle. Depending on the degree of rotation, the image matrix can undergo significant changes. This technique is commonly used to correct image alignment issues or create artistic effects.

from PIL import Image

# Open an image file
img = Image.open('input.jpg')

# Rotate the image by 90 degrees
img_rotated = img.rotate(90)
img_rotated.save('output.jpg')

Output:

image processing using python

3. Image Grayscale Conversion

Converting an image to grayscale simplifies it to a single-channel image, effectively reducing the image matrix to grayscale intensity values. This transformation simplifies image data and reduces the file size. Grayscale images contain only the luminance information of the original, allowing for simpler processing and analysis of image content.

from PIL import Image

# Open an image file
img = Image.open('input.jpg')

# Convert the image to grayscale
img_gray = img.convert('L')
img_gray.save('output.jpg')

Output:

image processing using python

4. Image Cropping

Cropping an image involves selecting a specific region of interest (ROI) and discarding the rest. During cropping, the image matrix is altered to exclude the pixel data that falls outside the specified cropping region. This technique is useful for focusing on a particular part of the image, and it directly affects the content and composition of the image. However, it is important to note that cropping leads to loss of information and may affect the performance of a machine learning model.

from PIL import Image

# Open an image file
img = Image.open('input.jpg')

# Define the cropping region as (left, upper, right, lower)
box = (200, 300, 600, 600)
img_cropped = img.crop(box)
img_cropped.save('output.jpg')

Output:

image processing using python

5. Image Blurring

Image filtering techniques, such as blurring, are applied to alter the image matrix by averaging pixel values within a defined kernel region. This process results in a smoother appearance, reducing noise and fine details. The size and shape of the kernel dictate the extent of the blurring effect, impacting the overall visual quality of the image. They can be used to enhance the quality of images.

import cv2

# Read an image
img = cv2.imread('input.jpg')

# Apply Gaussian blur with a kernel size
blurred_img = cv2.GaussianBlur(img, (51, 51), 0)
cv2.imwrite('output.jpg', blurred_img)

Output:

image processing using python

6. Image Thresholding

Thresholding is a critical image processing technique used to separate objects or features from the background in an image. It works by creating a binary image where pixels are classified as either foreground or background based on their intensity values. When you apply thresholding, you effectively change the image matrix so that pixel values below a certain threshold become one color (typically black), and values above the threshold become another color (typically white). This process can highlight specific objects or regions of interest within the image, making it a fundamental step in tasks like image segmentation.

import cv2

# Read an image in grayscale
img = cv2.imread('input.jpg', 0)

# Apply binary thresholding to create a binary image
ret, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
cv2.imwrite('output.jpg', thresh)

Output:

image processing using python

Conclusion

In this blog, we implemented different image-processing techniques using Pillow and OpenCV. These are just a few fundamental image manipulation techniques in Python. You can explore more advanced techniques such as edge detection, image segmentation, and color manipulation depending on your specific requirements and the libraries you choose to work with. If you want me to also cover these advanced image-processing techniques, leave a comment and I’ll consider making a post on it.

I encourage you to go ahead and try at least 2 of the above processing techniques of an image of your choice and try different values for parameters like cropping box, blur threshold, etc. Upload the results on Instagram and tag me in it @machinelearningsite. Looking forward to seeing your image results 🙂

Leave a Reply