Style Transfer in Python: 6 Brilliant Steps to Turn Code into Art with PyTorch

You were probably supposed to be working on that feature ticket you promised your PM three sprints ago. But here you are, elbow-deep in convolutional layers, trying to make your dog look like a Monet painting. Bravo.

In this project, you’ll take two images — this “content” image (of a donkey on a horse) and this “style” image (Van Gogh’s Starry Night) — and you’ll smash them together until the result looks something like this:

Content + Style = Output that might trick your mom into thinking you’re an artist.

Whether you’re doing this in the name of “machine learning experimentation” or as a last-ditch effort to convince your startup that you’re still innovating, neural style transfer is your next rabbit hole. It’s fun, flashy, and only semi-practical — like Web3, but with prettier outputs and fewer crypto bros.

In this post, we’ll dive into the psychedelic swamp of neural style transfer using PyTorch, the deep learning framework of choice for people who think TensorFlow is “just too enterprise.” You’ll learn what style transfer is, why it works (spoiler: math), and how to implement it without melting your laptop. Yes, there will be code. Yes, it will probably break. No, I’m not your debugger.

What Is Style Transfer, and Why Is It Haunting Your GitHub History?

Neural style transfer is the process of blending two images together: one for content (usually your selfie, your cat, or your cat taking a selfie), and another for style (think Van Gogh, Picasso, or that one AI-generated abomination that looks like an oil painting of a printer).

The result? A new image that keeps the content of the original but reimagines it with the style of the second. Like if you wrote a love letter, but Shakespeare rewrote it in his usual iambic mess.

Under the Hood: Not Just Copy-Paste with a Paint Filter

Let’s lift the hood (gently) and see what’s going on:

Content representation is captured by passing your base image through a pre-trained convolutional neural network (usually VGG19, because it’s old, reliable, and doesn’t ask questions).
Style representation is based on the Gram matrix of features from the style image. Gram matrices are like wine tasting notes for CNN layers — they capture the “flavor” of the texture and color, not the raw details.
You then optimize a new image (usually initialized as a copy of the content image) by minimizing a loss function that combines:
- Content loss (how close your generated image is to the content image)
- Style loss (how close your generated image is to the style image)

This is done using backpropagation. Not to train a model — but to iteratively change the pixels of the image itself. It’s like gradient descent, but make it fashion.

You Brought Code Into This, Now Let’s Suffer Together

Okay, let’s build a working neural style transfer pipeline using PyTorch. You’ll need:

Python 3.x (preferably something not from 2014)
PyTorch installed (pip install torch torchvision)
GPU (optional, unless you enjoy watching paint dry)

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, transforms
from PIL import Image
import matplotlib.pyplot as plt
import copy

# === Device Setup ===
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
imsize = 512 if torch.cuda.is_available() else 256

# === Image Loader and Preprocessing ===
loader = transforms.Compose([
    transforms.Resize((imsize, imsize)),
    transforms.ToTensor()
])

def image_loader(image_path):
    image = Image.open(image_path).convert("RGB")
    image = loader(image).unsqueeze(0)  # Add batch dimension
    return image.to(device, torch.float)

def imshow(tensor, title=None):
    image = tensor.cpu().clone().squeeze(0)
    image = transforms.ToPILImage()(image)
    plt.imshow(image)
    if title:
        plt.title(title)
    plt.pause(0.001)

# === Gram Matrix ===
def gram_matrix(input):
    b, c, h, w = input.size()
    features = input.view(b * c, h * w)
    G = torch.mm(features, features.t())
    return G.div(b * c * h * w)

# === Loss Modules ===
class ContentLoss(nn.Module):
    def __init__(self, target):
        super(ContentLoss, self).__init__()
        self.target = target.detach()

    def forward(self, input):
        self.loss = nn.functional.mse_loss(input, self.target)
        return input

class StyleLoss(nn.Module):
    def __init__(self, target_feature):
        super(StyleLoss, self).__init__()
        self.target = gram_matrix(target_feature).detach()

    def forward(self, input):
        G = gram_matrix(input)
        self.loss = nn.functional.mse_loss(G, self.target)
        return input

# === Build Model with Style & Content Losses ===
def get_style_model_and_losses(cnn, style_img, content_img,
                                content_layers=['conv_4'],
                                style_layers=['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']):
    cnn = copy.deepcopy(cnn)
    model = nn.Sequential()

    content_losses = []
    style_losses = []

    i = 0  # Incremental conv layer count
    for layer in cnn.children():
        if isinstance(layer, nn.Conv2d):
            i += 1
            name = f"conv_{i}"
        elif isinstance(layer, nn.ReLU):
            name = f"relu_{i}"
            layer = nn.ReLU(inplace=False)
        elif isinstance(layer, nn.MaxPool2d):
            name = f"pool_{i}"
        elif isinstance(layer, nn.BatchNorm2d):
            name = f"bn_{i}"
        else:
            raise RuntimeError(f"Unrecognized layer: {layer.__class__.__name__}")

        model.add_module(name, layer)

        if name in content_layers:
            target = model(content_img).detach()
            content_loss = ContentLoss(target)
            model.add_module(f"content_loss_{i}", content_loss)
            content_losses.append(content_loss)

        if name in style_layers:
            target_feature = model(style_img).detach()
            style_loss = StyleLoss(target_feature)
            model.add_module(f"style_loss_{i}", style_loss)
            style_losses.append(style_loss)

    # Trim model after last loss
    for i in range(len(model) - 1, -1, -1):
        if isinstance(model[i], (ContentLoss, StyleLoss)):
            break
    model = model[:i+1]

    return model, style_losses, content_losses

# === Run Style Transfer ===
def run_style_transfer(cnn, content_img, style_img, input_img, num_steps=300,
                       style_weight=1e6, content_weight=1):
    model, style_losses, content_losses = get_style_model_and_losses(cnn, style_img, content_img)
    optimizer = optim.LBFGS([input_img.requires_grad_()])

    print("Optimizing...")
    run = [0]
    while run[0] <= num_steps:
        def closure():
            input_img.data.clamp_(0, 1)
            optimizer.zero_grad()
            model(input_img)
            style_score = sum(sl.loss for sl in style_losses)
            content_score = sum(cl.loss for cl in content_losses)
            loss = style_score * style_weight + content_score * content_weight
            loss.backward()

            if run[0] % 50 == 0:
                print(f"Step {run[0]}: Style Loss: {style_score.item():.4f}, Content Loss: {content_score.item():.4f}")
            run[0] += 1
            return loss

        optimizer.step(closure)

    input_img.data.clamp_(0, 1)
    return input_img

# === Main Entry Point ===
def main(content_path, style_path, output_path="output.png"):
    content_img = image_loader(content_path)
    style_img = image_loader(style_path)

    assert content_img.size() == style_img.size(), \
        "Style and content images must be the same size"

    input_img = content_img.clone()

    cnn = models.vgg19(pretrained=True).features.to(device).eval()

    output = run_style_transfer(cnn, content_img, style_img, input_img)

    # Save output
    output_img = output.cpu().clone().squeeze(0)
    output_img = transforms.ToPILImage()(output_img)

    # Show it
    np_image = np.array(output_img)  # this assumes mat is convertible
    np_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
    np_image = cv2.resize(np_image, (1531,1021))
    sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
    np_image = cv2.filter2D(np_image, -1, sharpen_kernel)

    kernel2 = np.ones((2, 2), np.float32)/4
    np_image = cv2.filter2D(src=np_image, ddepth=-1, kernel=kernel2)
    cv2.imwrite('/output.png', np_image)
    cv2.imshow('The Art', np_image)
    cv2.waitKey(0)

# === Run Script ===
if __name__ == "__main__":
    # Replace these with your own paths
    style_image_path = "/image_of_painting_your_like.jpg"
    content_image_path = "/your_image.jpeg"

    if not os.path.exists(content_image_path) or not os.path.exists(style_image_path):
        print("Please place your content and style images in the current directory.")
        print("And name them 'your_face.jpg' and 'starry_night.jpg' or change the script paths.")
    else:
        main(content_image_path, style_image_path)

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, transforms
from PIL import Image
import matplotlib.pyplot as plt
import copy

# === Device Setup ===
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
imsize = 512 if torch.cuda.is_available() else 256

# === Image Loader and Preprocessing ===
loader = transforms.Compose([
    transforms.Resize((imsize, imsize)),
    transforms.ToTensor()
])

def image_loader(image_path):
    image = Image.open(image_path).convert("RGB")
    image = loader(image).unsqueeze(0)  # Add batch dimension
    return image.to(device, torch.float)

def imshow(tensor, title=None):
    image = tensor.cpu().clone().squeeze(0)
    image = transforms.ToPILImage()(image)
    plt.imshow(image)
    if title:
        plt.title(title)
    plt.pause(0.001)

# === Gram Matrix ===
def gram_matrix(input):
    b, c, h, w = input.size()
    features = input.view(b * c, h * w)
    G = torch.mm(features, features.t())
    return G.div(b * c * h * w)

# === Loss Modules ===
class ContentLoss(nn.Module):
    def __init__(self, target):
        super(ContentLoss, self).__init__()
        self.target = target.detach()

    def forward(self, input):
        self.loss = nn.functional.mse_loss(input, self.target)
        return input

class StyleLoss(nn.Module):
    def __init__(self, target_feature):
        super(StyleLoss, self).__init__()
        self.target = gram_matrix(target_feature).detach()

    def forward(self, input):
        G = gram_matrix(input)
        self.loss = nn.functional.mse_loss(G, self.target)
        return input

# === Build Model with Style & Content Losses ===
def get_style_model_and_losses(cnn, style_img, content_img,
                                content_layers=['conv_4'],
                                style_layers=['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']):
    cnn = copy.deepcopy(cnn)
    model = nn.Sequential()

    content_losses = []
    style_losses = []

    i = 0  # Incremental conv layer count
    for layer in cnn.children():
        if isinstance(layer, nn.Conv2d):
            i += 1
            name = f"conv_{i}"
        elif isinstance(layer, nn.ReLU):
            name = f"relu_{i}"
            layer = nn.ReLU(inplace=False)
        elif isinstance(layer, nn.MaxPool2d):
            name = f"pool_{i}"
        elif isinstance(layer, nn.BatchNorm2d):
            name = f"bn_{i}"
        else:
            raise RuntimeError(f"Unrecognized layer: {layer.__class__.__name__}")

        model.add_module(name, layer)

        if name in content_layers:
            target = model(content_img).detach()
            content_loss = ContentLoss(target)
            model.add_module(f"content_loss_{i}", content_loss)
            content_losses.append(content_loss)

        if name in style_layers:
            target_feature = model(style_img).detach()
            style_loss = StyleLoss(target_feature)
            model.add_module(f"style_loss_{i}", style_loss)
            style_losses.append(style_loss)

    # Trim model after last loss
    for i in range(len(model) - 1, -1, -1):
        if isinstance(model[i], (ContentLoss, StyleLoss)):
            break
    model = model[:i+1]

    return model, style_losses, content_losses

# === Run Style Transfer ===
def run_style_transfer(cnn, content_img, style_img, input_img, num_steps=300,
                       style_weight=1e6, content_weight=1):
    model, style_losses, content_losses = get_style_model_and_losses(cnn, style_img, content_img)
    optimizer = optim.LBFGS([input_img.requires_grad_()])

    print("Optimizing...")
    run = [0]
    while run[0] <= num_steps:
        def closure():
            input_img.data.clamp_(0, 1)
            optimizer.zero_grad()
            model(input_img)
            style_score = sum(sl.loss for sl in style_losses)
            content_score = sum(cl.loss for cl in content_losses)
            loss = style_score * style_weight + content_score * content_weight
            loss.backward()

            if run[0] % 50 == 0:
                print(f"Step {run[0]}: Style Loss: {style_score.item():.4f}, Content Loss: {content_score.item():.4f}")
            run[0] += 1
            return loss

        optimizer.step(closure)

    input_img.data.clamp_(0, 1)
    return input_img

# === Main Entry Point ===
def main(content_path, style_path, output_path="output.png"):
    content_img = image_loader(content_path)
    style_img = image_loader(style_path)

    assert content_img.size() == style_img.size(), \
        "Style and content images must be the same size"

    input_img = content_img.clone()

    cnn = models.vgg19(pretrained=True).features.to(device).eval()

    output = run_style_transfer(cnn, content_img, style_img, input_img)

    # Save output
    output_img = output.cpu().clone().squeeze(0)
    output_img = transforms.ToPILImage()(output_img)

    # Show it
    np_image = np.array(output_img)  # this assumes mat is convertible
    np_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
    np_image = cv2.resize(np_image, (1531,1021))
    sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
    np_image = cv2.filter2D(np_image, -1, sharpen_kernel)

    kernel2 = np.ones((2, 2), np.float32)/4
    np_image = cv2.filter2D(src=np_image, ddepth=-1, kernel=kernel2)
    cv2.imwrite('/output.png', np_image)
    cv2.imshow('The Art', np_image)
    cv2.waitKey(0)

# === Run Script ===
if __name__ == "__main__":
    # Replace these with your own paths
    style_image_path = "/image_of_painting_your_like.jpg"
    content_image_path = "/your_image.jpeg"

    if not os.path.exists(content_image_path) or not os.path.exists(style_image_path):
        print("Please place your content and style images in the current directory.")
        print("And name them 'your_face.jpg' and 'starry_night.jpg' or change the script paths.")
    else:
        main(content_image_path, style_image_path)

Alright, I just dropped the entire working code. Top to bottom. Fully runnable. No “continued in part two” nonsense, no newsletter gatekeeping, no “link in bio” tomfoolery. Now, in return, I ask for something simple:
Head over to Instagram @machinelearningsite and at least peek at the posts. If the memes are somewhat relatable, the reels teach you something, or make you question your life choices in a fun way — maybe throw a follow. If not? Cool, carry on with your blurry paintings.

Now, let’s break this code down so you can understand what you just ran — and maybe even tweak it without accidentally summoning a cursed GPU demon.

Okay, But What Did We Just Do?

Let’s break this down like you’re explaining it to your tech lead who pretends to understand anything past REST APIs.

Step 1: We Load the Images. Badly, but It Works

You load two images — one content (e.g., your selfie), one style (e.g., Starry Night). We resize, tensor-ify, and send them to your device. If you’re on CPU, sorry for your upcoming fan noise.

image = Image.open(image_path).convert("RGB")
image = loader(image).unsqueeze(0)  # Shape: [1, 3, H, W]

image = Image.open(image_path).convert("RGB")
image = loader(image).unsqueeze(0)  # Shape: [1, 3, H, W]

Step 2: Reanimate VGG19

Like a mad scientist. VGG19 is an old CNN trained on ImageNet that sees edges and textures and thinks about chihuahuas a lot.

cnn = models.vgg19(pretrained=True).features.to(device).eval()

cnn = models.vgg19(pretrained=True).features.to(device).eval()

We use its convolutional layers as a glorified feature extractor. We don’t train it, we just ask it to quietly judge our images.

Step 3: Content Loss and Style Loss Are Just… Losses

We define two modules:

ContentLoss: This module’s job is simple but essential — it makes sure our generated image doesn’t completely forget what it’s supposed to be. It compares the activations (i.e., features) of the generated image with those of the original content image at a certain layer of the network. Usually we pick something in the mid-layers like conv_4, because early layers just detect edges and textures, and deeper layers get a bit too abstract, like interpreting your cat as a small, furry blender. The content loss uses Mean Squared Error (MSE) between these feature representations, punishing your generated image every time it strays too far from the source. It penalizes deviation from the original content image
StyleLoss: This one’s the artsy-fartsy sibling. It doesn’t care about what’s in the image — it cares about how it feels. To capture the style of an image, we look at correlations between feature maps in the network. That’s what the Gram matrix does. It’s a glorified dot-product that tells us how feature channels interact with each other. If you think of a feature map as detecting “brush strokes” or “color blobs,” the Gram matrix says how often these things co-occur. Then we compute the MSE between the Gram matrices of the style image and the generated image, effectively telling the network, “make your strokes more Van Gogh-y.” It penalizes deviation from the texture/style using a Gram matrix

The Gram matrix is just some nasty matrix multiplication that turns activation maps into “style vibes.” To get more idea of these words you just read, have a look at this article that explains the theory of Style Transfer.

Step 4: Creating our own Neural Network for Style Transfer

We take a scalpel to a pre-trained VGG19 — a deep convolutional network trained to recognize everything from golden retrievers to pretzels — and slice off just the layers we need. These early and mid-level convolutional layers are great at extracting rich features: edges, textures, patterns — the good stuff.

As we walk through the network layer by layer, we drop in our custom ContentLoss and StyleLoss modules immediately after the layers we’re interested in. Think of it as spying on VGG19 while it’s doing its thing, then yelling “WRONG!” every time the stylized image drifts from the desired content or style.

Why not just use the whole VGG19? Because deeper layers tend to get increasingly abstract — and frankly, we’re not trying to recreate Picasso’s emotions, just his texture. Plus, we trim off layers after the last loss layer because, let’s be honest, your GPU has better things to do than calculate gradients for filters that don’t contribute to the final loss. This isn’t a Netflix data center; we need to be efficient.

Step 5: Optimize the Pixels, Not the Model

This is where your intuition breaks and neural networks start feeling like wizardry. Normally, we train the weights of a model to fit the data. Here, the model (VGG19) is frozen — it’s not changing at all. Instead, we create a clone of the content image and treat its pixels as the learnable parameters.

Yes, you are training an image. You’re optimizing RGB values so they reduce both content loss and style loss when passed through the network. It’s like trying to brute-force Photoshop by gradient descent.

To do this, we use the L-BFGS optimizer, which is fancy and memory-hungry but well-suited to this kind of constrained, weird optimization. You can swap in Adam if LBFGS doesn’t behave, but LBFGS typically converges faster for this specific use case.

optimizer = optim.LBFGS([input_img.requires_grad_()])

optimizer = optim.LBFGS([input_img.requires_grad_()])

Notice that .requires_grad_() — without it, PyTorch won’t bother calculating gradients for your image tensor, and you’ll be stuck wondering why your image is just blurring like a Windows 95 screensaver.

Step 6: Pray, Clamp, Display

We clamp the image values to keep them between 0 and 1 (or else you’ll get an eldritch horror), then display the result.

input_img.data.clamp_(0, 1)
imshow(output, title="Stylized Image")

input_img.data.clamp_(0, 1)
imshow(output, title="Stylized Image")

This is crucial, because neural networks are happy to hallucinate pixel values way outside the visible range — like negative reds or ultra-magenta explosions — which your screen can’t render, and your soul shouldn’t witness.

Once clamped, we turn the tensor back into an image and display it. If all went well, you’ll see something that looks like your content image, dressed in the textures, strokes, and palette of your style image. If it didn’t go well… well, it’ll still look like art. Just call it post-modern abstraction and move on.

Debugging Tips (Because You Won’t Get This Right in the First Try)

Your first stylized image could probably look like it was painted by a colorblind squid on a caffeine bender. It’s okay. Welcome to the sacred tradition of tweaking neural net hyperparameters until your GPU starts passive-aggressively heating your room.

Here are a few things to keep in mind when you’re wondering why your output looks like abstract garbage (and not the good kind):

1. Tune the Style and Content Weights Like You’re Mixing a Cocktail
The real magic happens in this line:

def run_style_transfer(cnn, content_img, style_img, input_img,
                       num_steps=250, style_weight=1e6, content_weight=1.6):

def run_style_transfer(cnn, content_img, style_img, input_img,
                       num_steps=250, style_weight=1e6, content_weight=1.6):

That style_weight and content_weight are the entire vibe of your output. Crank up style_weight if you want more texture and color from the style image to dominate (a.k.a. “Picasso on LSD”). Boost content_weight if your image is starting to look like a melted version of the original. Play around. There’s no single right combo, just what looks good and what doesn’t make your eyes bleed.

2. Your Output is a Sad, Blurry Thumbnail? Fix It with Some Good Old OpenCV Sorcery
Sometimes the result is way too low-res and looks like someone tried to compress it into a .txt file. That’s because your original image tensor was resized to match the style transfer model’s input dimensions, and we never told PyTorch we wanted the result in full HD IMAX glory.

Here’s a quick fix using OpenCV voodoo to upscale, sharpen, and generally make things less sad:

# Convert to NumPy and make it OpenCV-friendly
np_image = np.array(output_img)  # assumes 'output_img' is PIL.Image
np_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)

# Resize to your desired final dimensions
np_image = cv2.resize(np_image, (1531, 1021))

# Apply a sharpening kernel to enhance details
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
np_image = cv2.filter2D(np_image, -1, sharpen_kernel)

# Optionally smooth a bit to reduce over-sharpening artifacts
kernel2 = np.ones((2, 2), np.float32) / 4
np_image = cv2.filter2D(src=np_image, ddepth=-1, kernel=kernel2)

# Save and view the result
cv2.imwrite('output.png', np_image)
cv2.imshow('The Art', np_image)
cv2.waitKey(0)

# Convert to NumPy and make it OpenCV-friendly
np_image = np.array(output_img)  # assumes 'output_img' is PIL.Image
np_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)

# Resize to your desired final dimensions
np_image = cv2.resize(np_image, (1531, 1021))

# Apply a sharpening kernel to enhance details
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
np_image = cv2.filter2D(np_image, -1, sharpen_kernel)

# Optionally smooth a bit to reduce over-sharpening artifacts
kernel2 = np.ones((2, 2), np.float32) / 4
np_image = cv2.filter2D(src=np_image, ddepth=-1, kernel=kernel2)

# Save and view the result
cv2.imwrite('output.png', np_image)
cv2.imshow('The Art', np_image)
cv2.waitKey(0)

Now What?

You made a machine hallucinate a selfie in the style of a dead painter using Style Transfer in machine learning. You are officially a digital sorcerer.

Upload your cursed artwork to Instagram or Threads, tag me @machinelearningsite (yes, I’ll do share it in my story), and confuse your followers into thinking you’re suddenly “into AI art.”

What’s Next?

Now that you’ve made your laptop sweat after rendering a selfie in the style of a long-dead oil painter using Style Transfer, you might be wondering: what other poor decisions can you make with PyTorch? Well, real-time style transfer is the next logical mess—hook up your webcam, apply a model on each frame, and pretend you’re doing avant-garde Zoom calls, all while your GPU audibly begs for mercy. Or maybe you want to dive into GANs and generate cursed deepfakes of yourself as a pixelated anime character. That’ll definitely help your online dating profile stand out.

You just built a virtual artist and you’re this close to convincing your friends (and maybe your boss) that this is useful. Naturally, you want others to use it too. So what do you do? Maybe upload the .py file to Google Drive, share a cryptic README, and let your users decipher the imports like it’s an escape room puzzle?

Absolutely not.

We want to be professional — or at least look like it on the internet. That means sharing your Python project the right way, with packaging, sanity, and minimal public embarrassment. No worries, I’ve got you covered.

Have a look at the efficient way of publishing your Python app — and finally ship something your non-coder friends can run without panic-DMing you about ModuleNotFoundError.