How I trained an algorithm to have its own handwritten style

using a Generative Adversarial Network

Aditya M
10 min readDec 30, 2021

I like to believe, my handwriting is has a special creative touch. SO, lately, I have been wondering, what if a computer could have its own handwritten style? How creative can a computer be with hundreds of pixels, instead of a pen? With some further investigation into Generative Adversarial Neural-nets (GANs), I ended up building a model that could handwrite.

Believe it or not, this is what its output looked like.

Can you read it?

The computer decided to create its own kind of style! Although there were letters that you needed to squint at to understand (cough cough how people read my handwriting), overall I was impressed with what a couple of lines of code could do!

Well… what is a Generative Adversarial Network??

I am glad you asked.

GANs are, in my opinion, the most exciting class of neural network models, that push the boundary of what neural networks can do. They use sets of existing data to try to learn how to create new data. Today, they are solving a wide variety of problems, like improving image quality, morphing audio for certain projects, or creating data for another model to train on if there is a shortage. However, quite intuitively, they can be scary. They are being used in scams and deep fakes of politicians. They also may be a major part of a terminator in the future.

Hopefully not… 🤖

GANs stan for Generative Adversarial Networks and are used for generative modeling, which is where the generative portion of the name comes from. They are considered to be an unsupervised technique that takes advantage of two competing supervised learning models. 😅 Ironic eh? This is where the “Adversarial” portion of the name comes from. One model AKA the Generator tries to trick the other and the other AKA the Discriminator tries to not be tricked.

Let’s simplify what GANs are using an analogy.

Imagine a counterfeiter, who is trying to make fake twenty-dollar bills. He examines some bills and creates a fake. He then goes to a local grocery store to try and trick the cashier, hoping that he won’t get caught. Most likely he will get rejected, but that doesn’t stop him from asking politely, “what made you think it was fake?” The cashier is happy to help and tells him what went wrong “there is no braille in the bottom left corner”.

With new valuable information, he heads back to his lab and creates a better version. By repeating this process a couple of hundred times he can create an unbelievably realistic but fake twenty-dollar bill if he is not recognized and taken to jail. The repetitions not only allow him to get better at creating fakes but also the cashier to get better at recognizing fakes (which allows him to get even better).

The generator in this example is the counterfeiter, whose job is to create fake bills to trick the cashier.
The discriminator is the cashier, whose job is to examine bills to determine if they are fake or real to not be tricked.

With more repetitions, the two together, get better at doing their jobs. As a result, they create unbelievably realistic-looking fake bills (if training goes as expected)… 🦹‍♀️💶🦹‍♂️

So what would a GAN structure look like?

A GAN structure is built upon a feedback loop between the generator and the discriminator. (ง’̀-’́)ง

The discriminator is a classifier, meaning it tells us what group its input fits in. Its input comes from two places, the training data (real) and the data created by the generator (fake). The fake inputs are used as negative examples and, the real inputs are used as positive examples (usually assigned a label of 1).

The inputs are then fed into a series of layers and activations. These can be convolutional or pooling layers (for a CNN) or a simple MLP (what I stuck with for my project).

The accuracy of the model is measured by how well the model predicts the identity of a data point correctly. With training, the model will be better at classifying its inputs, and hopefully (in the discriminator’s point of view) not be fooled by the generator.

On the other hand, the generator wants to trick the discriminator by generating fake data. Its input is a latent vector or a vector with completely random values. It is responsible to convert complete randomness into a data instance that hopefully looks convincing (in my project, handwritten letters).

The data is fed into multiple layers like a discriminator and then into an activation function. Then, for some “feedback” or to see how much the discriminator was fooled, the data moves into the discriminator to classify it as “fake” or “real”. The accuracy of the generator is measured by how easily the discriminator is fooled.

Training these networks are usually done with 2 steps:

  1. Calculating the Loss to measure the accuracy of the model. A loss function is used to calculate the accuracy of the model after a run-through of the network with sample vector inputs.
  2. Backpropagation to adjust the parameters of the network for better accuracy (what allows the networks to learn and improve over time). The model propagates backward tweaking the parameters using gradient descent. Gradient descent simply is about tuning each of the model’s weights and biases by subtracting them with a gradient of the current loss to the parameters. (There are multiple ways to approach gradient descent but for my project, I used an Adam optimizer because it converges faster and provides lower loss values for the generator).
My drawing is wonderful… isn’t it? 😏

Training a generator is much more of a challenge than training a discriminator mainly because it has more variables or parameters to consider. Backpropagation for a generator requires you to calculate derivatives from the loss function, through the discriminator, all the way to the generator inputs. While for a discriminator backpropagation only requires going from the loss function to the discriminator’s inputs (no need for extra math power).

In the first couple of training steps, the generator produces obvious fake data. So, the discriminator quickly concludes that it is fake. Therefore, the amount of error for the generator will be low and the discriminator will be high. 😒

With more training steps, the generator gets closer to producing outputs that can somewhat fool the discriminator. Gradually the amount of error for the generator will decrease and the discriminator will increase. 🤔

If the training process goes well, the discriminator will have a harder time differentiating between fake and real data (50% chance that the inputs will be real or not). In this case, the amount of error for the generator should be low and the discriminator will be high. 😄

The amount of error for each is like a seesaw, when one goes up, the other goes down.

How did you train your model?

I wanted to create a model that could learn and think of its own readable handwritten style. You can check out my code over here. I have broken down what my code means into 3 steps (1) Getting the data ready (2) Defining the models (3) Training.

Getting The Data Ready

Enough data is one of the most important parts to create a working model. If you train your model with bad data (data with not much variety, or not enough data), your model will have bad predictions. Garbage in → Garbage out

I used a data set from Kaggle with over 300,000 images of handwritten letters in CSV format. Each image is made up of 28 by 28 pixels, each with a value ranging from 0 to 255 representing its color. The images also have a color channel of one meaning colors come in shades of grey.

A sample image (“J”) from the dataset
import torch, torchvision
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
import torch.nn as nn
from IPython.display import Image
from torchvision.utils import save_image
import pandas as pd
import numpy as np

This first part of the code is for importing the necessary modules into the IPython notebook. They allow me to start coding with PyTorch, graph images, and also save generated images. What each module does will make more sense soon.

data = pd.read_csv("the_data_file.csv").astype('float32')

This code reads the CSV dataset that I want to use and creates a pandas DataFrame so I can do some essential preprocessing. The dataset is in alphabetical order which makes everything a whole lot easier.

I realized that the CSV data that I wanted to use was completely out of shape. There were around 50 times as many Os than there were Fs or Is, values for each pixel weren’t ranging from -1 to 1 (will talk about why later), and the dataset wasn’t the right shape.

# Standardizing the number of data points
constant = 4182 # Upto this number
new_data = pd.concat((data.iloc[indexes[letter][0]:indexes[letter][1]].sample(frac = 1).iloc[:contant].iloc[:,1:] for letter in indexes), ignore_index=True)
# Reshaping the dataset
X = new_data.values
X = X.reshape((abba.shape[0], 28, 28))
# Normalizing the data
X = (X - 127.5) / 127.5
# Converting into a Tensor
X = torch.tensor(X)

The code above fixes all those issues.

I set the number of images for each data set, split the data set into 26 subsets, drop extra values for each letter, and then concatenated all 26 subsets into one new better dataset. Second, I convert the dataset into a NumPy array to reshape it from a 1 X 784 to a 1 X 28 X 28 matrix. This will help us display the image if we want to. Then, I normalize the data converting values ranging from 0 to 255 to values ranging from -1 to 1. This is important because we will be using the hyperbolic tangent activation function for the generator later on (which returns values ranging from -1 to 1). Finally, I convert the array into a PyTorch tensor.

batch_size = 64
data_loader = DataLoader(X, bs, shuffle=True)
show_data(next(iter(data_loader)))

Above, I created a data loader that provides us with a sample of the data, each time it is iterated over. I am using a batch size of 64, which means that each sample will have a size of 64 data points. Using a data loader rather than the entire training set together will reduce computational power and increase accuracy.

Defining the Neural Networks

from torch.nn.modules.linear import Linear
slope = 0.2 #for leakyReLU
Dnet = nn.Sequential(
nn.Linear(784, 256),
nn.LeakyReLU(slope),
nn.Linear(256, 256),
nn.LeakyReLU(slope),
nn.Linear(256, 1),
nn.Sigmoid()
)

This code defines the discriminator neural network. It has 3 dense layers, each with an activation function of leaky ReLU. The output is sent through a sigmoid function. The Leaky ReLU allows the pass of the gradient with slightly negative values, during backpropagation. Instead of passing a gradient of 0 (if using a regular ReLU), it passes a small negative gradient during backpropagation. This allows gradients from the discriminator to flow backward stronger into the generator. Usually, the leaky ReLU function returns better results than a regular ReLU

latent_size=64
Gnet = nn.Sequential(
nn.Linear(latent_size, 256),
nn.ReLU(),
nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, 784),
nn.Tanh() #This because we want -1 to 1 values
)

This code defines the generator. It has a latent vector input of size 64 and three dense layers like a discriminator, each with a ReLU activation function. Using a hyperbolic tangent function allows the model to learn more quickly to saturate and cover the color space of the training distribution. We normalized the training data to range from -1 to 1 because sample data from the generator and the training set will be passed to the discriminator (they cannot be different).

Training the Model

#Loss Function
criterion = nn.BCELoss()
#Optimizers
dnet_optimizer = torch.optim.Adam(Dnet.parameters(), lr=0.0002)
gnet_optimizer = torch.optim.Adam(Gnet.parameters(), lr=0.0002)

Since the discriminator is a binary classification model, I used the Binary Cross-entropy loss function to measure how accurate the model is. I am also using an Adam optimizer for training both neural networks.

def train_discriminator(images):
# Calculate Real Loss
real_labels = torch.ones(bs, 1)
fake_labels = torch.zeros(bs, 1)
output = Dnet(images)
d_loss_real = criterion(output, real_labels)
real_score = output
# Calculate Fake Loss
fake_images = Gnet(torch.randn(bs, latent_size))
output = Dnet(fake_images)
d_loss_fake = criterion(output, fake_labels)
fake_score = output
# Calculate Total Loss
d_loss = d_loss_real + d_loss_fake

# Gradient Decent
d_loss.backward()
dnet_optimizer.step()
reset_grad()
return d_loss, real_score, fake_score

How the discriminator is trained is very straightforward. As described before, the model is run to measure its accuracy for fake images and real images (labels: one → real image and zeroes → fake images). The losses are summed and the model is backpropagated using the Adam optimizer. The losses and scores are returned so that we can graph them.

def train_generator():
# Feedforward and Calculate Loss
fake_images = Gnet(torch.randn(bs, latent_size))
labels = torch.ones(bs, 1)
descriminator_thinks = Dnet(fake_images)
g_loss = criterion(descriminator_thinks, labels)

# Gradient Decent
g_loss.backward()
gnet_optimizer.step()
reset_grad()
return g_loss, fake_images

Training the generator is a bit more complex. First, we generate fake images and see what the discriminator thinks about it. Since the generator wants to trick the discriminator into believing that its images are not fake, the labels are set to 1 (real). The loss is calculated and the model is backpropagated using the Adam optimizer. The fake images and the generator loss are returned so we can graph them.

epochs = 450
d_losses, g_losses, real_scores, fake_scores = [], [], [], []

for epoch in range(epochs):
for i, images in enumerate(data_loader):
batch = images.view(bs, -1).to(device)
d_loss, real_score, fake_score = train_discriminator(batch)
g_loss, fake_images = train_generator()
if (i+1) % 200 == 0:
d_losses.append(d_loss.item())
g_losses.append(g_loss.item())
real_scores.append(real_score.mean().item())
fake_scores.append(fake_score.mean().item())

This code is the most exciting part. Here we train the model woohoo! 🎉 With experimentation, I figured that 450 training steps were ideal for the model to train on. The training functions are called and the losses and scores are recorded.

After training, by giving the generator a random latent vector, we can get a hopefully very realistic fake image. Here’s a little something for you guys!

The training process of my model

Hopefully, you have learned a lot, and are inspired to learn more! If this article provided you with some insight, clap it, share it, and connect with me on Linkedin! If you have cool stuff related to AI, post it in the comments! And finally, subscribe to my monthly newsletter to follow along my journey through emerging technology!

--

--

Aditya M

15 y/o student with a vision of making a difference in the world. Looking to learn at labs!