Music Transformer's Improved Relative Positional Encoding

This post contains my notes about the improved relative positional encoding method from the paper “Music Transformer” (Huang et al., 2018). References to “the paper” will refer to the aforementioned one, unless otherwise stated. This assumes knowledge of the original Transformer and relative positional encoding. The improved relative positional encoding method proposed by the paper is a significant contribution, as it reduces the intermediate memory requirement of relative positional encodings in self-attention (as in “Self-Attention with Relative Position Representations” (Shaw et al....

2022-02-05

Self-Attention with Relative Positional Encoding

This post contains my notes about the relative positional encoding method proposed as an alternative to sinusoidal positional encoding in Transformers, as in the paper “Self-Attention with Relative Position Representations” (Shaw et al., 2018). References to “the paper” will refer to the aforementioned one, unless otherwise stated. This post assumes knowledge of self-attention in the original Transformer. Absolute versus relative positional encoding Before discussing absolute and relative positional encodings, here is what I mean by them in this context....

2022-01-30

Sinusoidal Positional Encoding in the Transformer

This post contains my notes about the positional encoding in the original Transformer (a recurrence-free, attention-based, sequence-to-sequence processing model), as in the paper “Attention Is All You Need” (Vaswani et al., 2017). I want to give the positional encoding special attention, because I found it to be the most non-trivial part of the Transformer architecture and to be interesting. Transformer architecture. Image source: "Attention Is All You Need" (Vaswani et al....

2022-01-23

Multi-Head Attention in the Transformer

This post contains some of my notes mostly on the multi-head attention aspect of the original Transformer (a recurrence-free, attention-based, sequence-to-sequence processing model), as in the paper “Attention Is All You Need” (Vaswani et al., 2017)". It assumes familiarity with the attention mechanism (see my post on that). Transformer architecture. Image source: "Attention Is All You Need" (Vaswani et al., 2017) Above is a diagram of the Transformer architecture for reference....

2022-01-17

Introduction to the Attention Mechanism

This post is about the basics of the attention mechanism in sequence-to-sequence models. The attention mechanism is a foundation for highly performant NLP models like the Transformer. What is a sequence-to-sequence model? A sequence-to-sequence model consists of an encoder and decoder. The encoder encodes a variable-length source sequence into something that the decoder decodes into a variable-length target sequence. They are often used for tasks like language translation, and question-answering....

2022-01-09

Image Super-Resolution With Generative Adversarial Networks

This is an overview of my implementation of image super-resolution using generative adversarial networks, based on the paper “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” (Ledig et al., 2017). I did this project to increase my understanding of how GANs is used in practice, and to get better at reading and implementing research papers. Problem We want to be able to upscale a given image to 4x its resolution, so that the upscaled details look as realistic as possible Overview of solution We will train two neural networks, a generator and a discriminator, against each other Generator network will be trained to upscale images to 4x their resolution that fool the discriminator into classifying them as natural (“real”) high-resolution images Discriminator network will be trained to discriminate between given two types of images: natural high-resolution images, and images upscaled by the generator Both adversarial (cross-entropy-based) loss and perceptual (VGG-based) loss, will be used Neural Network Architecture Diagram of the neural architecture, from the paper "...

2021-12-15

Introduction to Generative Adversarial Networks

This post is a summary of some of my notes on vanilla generative adversarial networks (GANs). What is a generative adversarial network? What is a generative adversarial network? It is a neural network architecture in which two separate neural networks are trained using each other’s outputs, in an adversarial game of sorts. One network is called the generator $G$, and tries to generate samples that the other network, the discriminator $D$, classifies as “real” (where “real” means from the same distribution as some training set)....

2021-12-05

Addressing Border And Checkerboard Artifacts in CNNs

This post visually demonstrates why reflection padding and transpose convolutions used in some CNNs, for example the one used in “Perceptual Losses for Real-Time Style Transfer and Super-Resolution” (Johnson et al., 2016), are problematic. It also shows how some proposed solutions mitigate those problems. How to address border and checkerboard artifacts in CNNs? While reading “A Learned Representation For Artistic Style” (Dumoulin et al., 2016), I learned of their methodology that addresses the border and checkerboard artifacts sometimes noticed in CNNs, specifically in neural style transfer....

2021-11-23

GPU-accelerated Computing With CUDA Python

What if you would like to leverage your GPU to massively parallelize computations, but you cannot trivially do that using a machine learning framework like Tensorflow? You can do that very easily using Numba and CUDA Python! This post is a basic practical guide on how to use them for GPU-accelerated computing. I originally followed this NVIDIA guide to set up CUDA Python for GPU-accelerated computing. I had just acquired a CUDA-ready graphics card and wanted to leverage it to render my complex-number fractals more quickly, because I like exploring them....

2021-11-18

Neural Style Transfer: Introduction and Basic Techniques

This is an introduction to neural style transfer (NST), and an explanation of a few very basic techniques used. I was motivated to learn about neural style transfer because I thought it would be a fun way to get better at implementing research paper ideas in Tensorflow, and to learn more about image-processing neural networks. What does neural style transfer do? Given a content image and a style image, a neural style transfer algorithm outputs a new image whose contents (ex....

2021-11-10