Machine Learning Newsletter

Discrete Cosine Transform and A Case Study on Image Compression

Discrete Fourier Transform(DFT) and Discrete Cosine Transform(DCT) are commonly used algorithms to represent an arbitrary signal in terms of orhonormal basises. In DFT case, these basis functions are cosine and sinusoids(in complex form) where DCT depends on cosine signals to represent the signal. Even though DCT is more used than DFT, the approach that they use to represent signal are similar but they only differ by basis functions.

Definitions

DFT

DFT tries to represent the signal in the following form: $$ X_k = \sum_{n=0}^{N-1} x_n e^{{-2j\pi kn}{N}} $$ where $$k \in Z $$ and $e^{\frac{-2j\pi kn}{N}}$ could be simplified by the Euler's formula in the following form: $$e^{\frac{-2j\pi kn}{N}} = cos(\frac{2\pi kn}{N}) + jsin(\frac{2\pi kn}{N})$$

DCT

DCT uses only cosines $$ X_k = \sum_{n=0}^{N-1} x_n cos[\frac{\pi}{N} (n + \frac{1}{2}) k] $$ where $$ k = 0, \ldots, N-1 $$

Since the basis functions are real cosine functions in DCT, the signals that we are trying to compress using DCT needs to be real as well, where DFT could represent complex signals as well. Experimentally and also theoretically(with a better boundary condisitions), DCT transform yields smaller coefficients for natural images and audios. Therefore, for compression purposes, DCT is de facto method for a number of applications including JPEG and lossy compression for audios.

Let's see how one might approach the compression of images using.

DCT

In [3]:
%matplotlib inline
import io
import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from scipy import fftpack
import urllib2
import IPython

I am reading the image from url using PIL and converting it into a numpy array after converted grayscale image.

Scipy provides dct out of the box in the fftpack submodule, so one can implement 2-d dct by taking dct on the rows of the image, and then get the dct of the transposed image in the same way. I am using nomr='ortho' as this corresponds to Type II dct which is when people say dct, what they refer to. If you are familiar with Matlab, this optional parameter makes what you get when you use dct(y) in Matlab, too. (Note that there are eight different variants of dct and this one is the most common, I will not mention differences, but feel free to check it in wikipedia.)

For idct, it is almost same with dct, with two i's are introduced into the function as fftpack provides very similar api for idct with dct, it becomes relatively straightforward.

In order to reconstruct the image, the image is clipped between 0 and 255 in order to have uint8 to have an image that we begin with.

In [4]:
image_url='http://i.imgur.com/8vuLtqi.png'

def get_image_from_url(image_url='http://i.imgur.com/8vuLtqi.png', size=(128, 128)):
    file_descriptor = urllib2.urlopen(image_url)
    image_file = io.BytesIO(file_descriptor.read())
    image = Image.open(image_file)
    img_color = image.resize(size, 1)
    img_grey = img_color.convert('L')
    img = np.array(img_grey, dtype=np.float)
    return img
 
def get_2D_dct(img):
    """ Get 2D Cosine Transform of Image
    """
    return fftpack.dct(fftpack.dct(img.T, norm='ortho').T, norm='ortho')

def get_2d_idct(coefficients):
    """ Get 2D Inverse Cosine Transform of Image
    """
    return fftpack.idct(fftpack.idct(coefficients.T, norm='ortho').T, norm='ortho')

def get_reconstructed_image(raw):
    img = raw.clip(0, 255)
    img = img.astype('uint8')
    img = Image.fromarray(img)
    return img
In [6]:
pixels = get_image_from_url(image_url=image_url, size=(256, 256))
dct_size = pixels.shape[0]
dct = get_2D_dct(pixels)
reconstructed_images = []

for ii in range(dct_size):
    dct_copy = dct.copy()
    dct_copy[ii:,:] = 0
    dct_copy[:,ii:] = 0
    
    # Reconstructed image
    r_img = get_2d_idct(dct_copy);
    reconstructed_image = get_reconstructed_image(r_img);

    # Create a list of images
    reconstructed_images.append(reconstructed_image);
In [7]:
plt.figure(figsize=(16, 12));
plt.scatter(range(dct.ravel().size), np.log10(np.abs(dct.ravel())), c='#348ABD', alpha=.3);
plt.title('DCT Coefficient Amplitude vs. Order of Coefficient');
plt.xlabel('Order of DCT Coefficients');
plt.ylabel('DCT Coefficients Amplitude in log scale');
In [8]:
plt.figure(figsize=(16, 12))
plt.hist(np.log10(np.abs(dct.ravel())), bins=100, color='#348ABD', alpha=.3, histtype='stepfilled');
plt.xlabel('Amplitude of DCT Coefficients (log-scaled)');
plt.ylabel('Number of DCT Coefficients');
plt.title('Amplitude distribution of DCT Coefficients');

These graphs are important in several ways. First it tells us that large coefficients and very small coefficients is quite small number.(Note that X axis is log-scaled.) So, the trade-off between compression and image quality generally depends on the coefficients that have middle range values. It is easy to get very large coefficients and reject very small coefficients in the reconstructed image but not very easy to either include or reject middle values based on their solely amplitudes. In these coefficients, one needs to look at the frequencies that they belong to, if they are in somehow high frequency range, then it would be rejected whereas if they belong to lower frequency range, it may introduce noticeable and large artifacts into the signal. Instead of comparing to only Root Mean Squared Error(RMMS) to learn where to stop in the coefficients, one could check better metrics which consider visual fidelity or even perceived quality to find the sweet spot between compression ratio and image quality. In implementations at least for JPEG, this is done by a predetermined number. Therefore, if an image has a different pixel distributions than most of the natural images, or has very high frequency compoenents, then one could see very noticeable artifacts around the edges in the images.

Wavelet based approaches(JPEG 2000) is far better than DCT based approaches in textured images or images that have a lot of edges as they could preserve local texture better than dct-based approaches. However, JPEG2000's licence is not permissive unlike JPEG and JPEG seems pretty work well for people who are not professional photographers. Therefore, the adoption.

In [9]:
plt.matshow(np.abs(dct[:50, :50]), cmap=plt.cm.Paired);
plt.title('First 2500 coefficients in Grid');

If we look at the first first 2500 coefficients in a 50x50 grid, then we could see that a lot of the coefficients are actually very small comparing to the few very large ones. This not only provides a good compaction for the image(less coeffcients means high compaction rate), but also provides a good compromise between compression and image quality. Generally, very low frequencies have a higher ratio of magnitude orders and similar to very high frequencies.

First 64 Images

In [10]:
fig = plt.figure(figsize=(16, 16))
for ii in range(64):
    plt.subplot(8, 8, ii + 1)
    plt.imshow(reconstructed_images[ii], cmap=plt.cm.gray)
    plt.grid(False);
    plt.xticks([]);
    plt.yticks([]);