Published by: Amit Nikhade

June 12 . 2021

In recent times the generative model has gained huge attention due to its

state-of-art performance and hence achieved massive importance in the

marketplace and is also used widely. Variational Autoencoders are

deep learning techniques used to learn the latent representations they

are one of the finest approaches to unsupervised learning. VAE shows

exceptional results in generating various kinds of data.

Autoencoder comprises an encoder, decoder, and a bottleneck. The encoder

simply transforms the input into a digital representation to the lowest

dimension into the bottleneck to absorb its salient features and the

decoder reconstructs back the output from the representations nearly similar

to the input.The Autoencoder aims to minimize the reconstruction loss. The

reconstruction loss is the difference between the original data and the

reconstructed data. L2 Loss function is used to calculate the loss in AE. i.e

the sum of all the squared differences between the true value and the predicted value.

The applications of an Autoencoder include Denoising, Dimensionality Reduction, etc.

VAE is also a kind of Autoencoder which not only reconstructs the output but also

generates new content. Stating explicitly, VAE is a generative model and Autoencoders

are not. The Autoencoder learns to transform an input into some vector representation

by minimizing the *reconstruction loss* calculated from the input and the reconstructed

image, VAE, on the other hand, generates output by minimizing the reconstruction as

well as the ** KL Divergence loss** which is the difference between the actual and observed

probability distribution, It is the symmetrical score as well as the distance measure

between two probabilistic distributions, in terms of VAE it tells whether the

distribution learned is not far from a normal distribution.

The above is the ** k-l** divergence between distributions

* *

*Variational autoencoder can be defined as an autoencoder whose training is *

*regularized to avoid overfitting problems and it makes sure that the latent space *

*assimilates fruitful results that generate some distinctive and unique results.*

*The variational autoencoder consists of an encoder, decoder, and a loss function. *

*The encoder and decoder are simple neural networks. When input data X is passed *

*through the encoder, the encoder outputs the latent state *

*distributions (**Mean ***μ***, Variance ***σ***) from which a vector is sampled **Z**. We always *

*make an assumption that the latent distribution is always a Gaussian distribution. *

*The input **x** is compressed by the encoder into a smaller dimension. Which is *

*typically referred to as bottleneck or the latent space. From which some data is *

*randomly sampled and the sample is decoded by **backpropagating** the *

*reconstruction loss and we get a new generated variety.*

After the distribution is thrown out of the encoder, the sample is chosen by a random

node which cannot make backpropagation possible. We need to backpropagate

the encoder-decoder model to make it learn. To overcome the backpropagation, we

use the epsilon (**ε**) with the mean and variance to maintain the stochasticity. So, at a

time we can also choose a random sample and also learn with the latent distribution

states. During the iterations, the epsilon remains the random sample and the

parameters of the encoder output are updated.

After the distribution is thrown out of the encoder, the sample is chosen by a random

node which cannot make backpropagation possible. We need to backpropagate the

encoder-decoder model to make it learn. To overcome the backpropagation, we use

the epsilon (**ε**) with the mean and variance to maintain the stochasticity. So, at a

time we can also choose a random sample and also learn with the latent distribution

states. During the iterations, the epsilon remains the random sample and the

parameters of the encoder output are updated.

Let try to implement the VAE into our code with MNIST data using PyTorch.

Install PyTorch with Torchvision

#command line>> pip3 install pip install torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html>> pip3 install numpy

**Import Libraries**

import torchvision.transforms as transformsimport torchvision

from torchvision.utils import save_imagefrom torch.utils.data import DataLoaderimport torchimport numpy as npimport torch.nn as nnimport torch.optim as optim

** **

**Prepare Data**

We are using the MNIST dataset, so we’ll transform it by resizing it to 32×32 and converting it to tensor. Make data ready with the best ever PyTorch data loader with batches of 64

transform = transforms.Compose([transforms.Resize((32,32)),transforms.ToTensor(),])trainset = torchvision.datasets.MNIST(root='./', train=True, download=True, transform=transform)trainloader = DataLoader(trainset, batch_size=64, shuffle=True)testset = torchvision.datasets.MNIST(root='./', train=False, download=True, transform=transform)

MNIST Images have a single channel with 28×28 pixels.

define device to be used as per our requirement

`dev = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")`

** **

**Define Loss function**

def final_loss(bce_loss, mu, logvar): BCE = bce_loss

KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

return BCE + KLD

The Loss is the sum of the Kullback-Leibler divergence and Binary Cross-Entropy

**Define parameters**

`z_dim =20`

`lr = 0.001`

`criterion = nn.BCELoss(reduction='sum')`

`epochs = 1`

`batch_size = 64`

** **

**Create Variational Autoencoder model**

The Encoder consists of convolution, batch normalization layers with leaky relu. The output of the Encoder is the mean vector and the standard deviation vector.

class VAE(nn.Module):

def __init__(self):

super(VAE, self).__init__()#encoderself.conv1 = nn.Conv2d(1 ,8 ,4 ,stride =2 ,padding =1 ) self.BN1 = nn.BatchNorm2d(8) self.af1 = nn.LeakyReLU() self.conv2 = nn.Conv2d(8 ,16 ,4 ,stride =2 ,padding = 1) self.BN2 = nn.BatchNorm2d(16) self.af2 = nn.LeakyReLU() self.conv3 = nn.Conv2d(16 ,32 ,4 ,stride =2 ,padding = 1) self.BN3 = nn.BatchNorm2d(32) self.af3 = nn.LeakyReLU() self.conv4 = nn.Conv2d(32 ,64 ,4 ,stride =2 ,padding = 0) self.BN4 = nn.BatchNorm2d(64)

self.af4 = nn.LeakyReLU()

Fully connected layers Providing the mean and log variance value (Bottleneck part)

self.fc1 = nn.Linear(64,128) self.fc_mu = nn.Linear(128, z_dim) self.fca1 = nn.LeakyReLU() self.fcd1 = nn.Dropout(0.2) self.fc_log_var= nn.Linear(128, z_dim) self.fca2 = nn.LeakyReLU() self.fcd2 = nn.Dropout(0.2)

The Decoder just reconstructs the sampled latent vector representations And brings out a new variant of the original one.

self.fc2 = nn.Linear(z_dim, 64) self.da1 = nn.LeakyReLU() self.dd1 = nn.Dropout(0.2) self.deu1 = nn.UpsamplingNearest2d(scale_factor=2) self.dec1 = nn.ConvTranspose2d(64 ,64 ,4 ,stride =2 ,

padding = 0) self.deb1 = nn.BatchNorm2d(64) self.dea1 = nn.LeakyReLU() self.deu2 = nn.UpsamplingNearest2d(scale_factor=2) self.dec2 = nn.ConvTranspose2d(64 ,32 ,4 ,stride =2 ,

padding = 1) self.deb2 = nn.BatchNorm2d(32) self.dea2 = nn.LeakyReLU() self.deu3 = nn.UpsamplingNearest2d(scale_factor=2) self.dec3 = nn.ConvTranspose2d(32 ,16 ,4 ,stride =2 ,

padding = 1) self.deb3 = nn.BatchNorm2d(16) self.dea3 = nn.LeakyReLU() self.deu4 = nn.UpsamplingNearest2d(scale_factor=2) self.dec4 = nn.ConvTranspose2d(16 ,1 ,4 ,stride =2 ,

padding = 1) self.dea4 = nn.Sigmoid()

The random vector is sampled out from the mean vector and the standard deviation. which is further reconstructed by applying upsampling and ConvTransposed layers, I used both the upsampling and ConvTransposed it gave me better results.

def sampling(self, mu, log_var): std = torch.exp(log_var / 2) epsilon = torch.randn_like(std) return mu + epsilon * stddef forward(self, x):#creating encoderX = self.conv1(x) x = self.BN1(X) x = self.af1(x) x = self.conv2(x) x = self.BN2(x) x = self.af2(x) x = self.conv3(x) x = self.BN3(x) x = self.af3(x) x = self.conv4(x) x = self.BN4(x) x = self.af4(x) x = x.view(x.size()[0], -1) x = self.fc1(x) mu = self.fc_mu(x) mu = self.fca1(mu) mu = self.fcd1(mu) log_var = self.fc_log_var(x) log_var = self.fca2(log_var) log_var = self.fcd2(log_var)#creating samplingz = self.fc2(self.sampling(mu, log_var)) z = self.da1(z) z = self.dd1(z) z = z.view(-1,64,1,1)#creating decoderd = self.dec1(z) d = self.deb1(d) d = self.dea1(d) d = self.dec2(d) d = self.deb2(d) d = self.dea2(d) d = self.dec3(d) d = self.deb3(d) d = self.dea3(d) d = self.dec4(d) recontruction = self.dea4(d) return recontruction, mu, log_var

**Configure device**

`device = dev`

model = VAE().to(device)

**Define Optimizer**

`optimizer = optim.Adam(model.parameters(), lr=lr)`

**Start the training**

grid_images = []

train_loss = []

valid_loss = []

def validate(model, dataloader, dataset, device, criterion):

model.eval()

running_loss = 0.0

counter = 0

with torch.no_grad():

for i, data in tqdm(enumerate(dataloader), total=int(len(dataset)/batch_size)):

counter += 1

data= data[0]

data = data.to(device)

reconstruction, mu, logvar = model(data)

bce_loss = criterion(reconstruction, data)

loss = final_loss(bce_loss, mu, logvar)

running_loss += loss.item()

# save the last batch input and output of every epoch

if i == int(len(dataset)/batch_size) - 1:

recon_images = reconstruction

val_loss = running_loss / counter

return val_loss, recon_imagesdef train(model, dataloader, dataset, device, optimizer, criterion):

model.train()

running_loss = 0.0

counter = 0

for i, data in tqdm(enumerate(dataloader), total=int(len(dataset)/batch_size)):

counter += 1

data = data[0]

data = data.to(device)

optimizer.zero_grad()

reconstruction, mu, logvar = model(data)

bce_loss = criterion(reconstruction, data)

loss = final_loss(bce_loss, mu, logvar)

loss.backward()

running_loss += loss.item()

optimizer.step()

train_loss = running_loss / counter

return train_losscount = 0

for epoch in range(epochs):

count = count+1

if count == epochs:

train_epoch_loss = train(

model, trainloader, trainset, device, optimizer, criterion

)

valid_epoch_loss, recon_images = validate(

model, testloader, testset, device, criterion

)

train_loss.append(train_epoch_loss)

valid_loss.append(valid_epoch_loss)

save_image(recon_images.cpu(), f"./output{epoch}.jpg") image_grid = make_grid(recon_images.detach().cpu())

grid_images.append(image_grid) print(f"Train Loss: {train_epoch_loss:.4f}")

print(f"Val Loss: {valid_epoch_loss:.4f}")

Here our model is trained and our reconstructed Images get saved into the defined path. The image below is the reconstructed Image I got after 5 epochs.

Try to reconstruct an image using your own custom-made image data. Hope you may get some surprising results, just try to Hypertune the model with different combinations. Try to increase the number of epochs. Try to play with the Z dim, learning rate, the convolution layers, strides, and much more.

**About Me**

My GitHub: https://github.com/AmitNikhade

link to code: https://github.com/AmitNikhade/VAE_RCimages

Follow me on LinkedIn

All contents are Copyright © 2021 Amit Nikhade. All Rights Reserved.

Post Views:
165