Implementing a JPEG Encoder in Python (Part 1)

2025-12-25

JPEG is everywhere. It’s the standard for compressing continuous-tone images like photographs. But how does it actually work? In this two-part series, we’re going to build a JPEG codec from scratch in Python to understand the magic behind the compression.

In Part 1, we will focus on the Encoder—taking a raw image and crunching it down.

The Pipeline

Encoding a JPEG involves several distinct steps. We’ll implement them one by one:

Color Space Conversion: RGB to YCbCr.
Subsampling: Reducing color resolution.
Blocking: Splitting the image into 8x8 blocks.
DCT (Discrete Cosine Transform): Converting spatial data to frequency domain.
Quantization: Throwing away the high-frequency detail.
Entropy Coding: Compressing the result (we’ll implement a basic version of this).

Let’s dive in.

1. Color Space Conversion (RGB to YCbCr)

Computers usually represent images in RGB (Red, Green, Blue). However, the human eye is more sensitive to brightness (Luminance) than color (Chrominance). JPEG takes advantage of this by separating these components.

Y: Luminance (Brightness)
Cb: Blue-difference chroma
Cr: Red-difference chroma

Here is the formula and the Python implementation:

 1import numpy as np
 2from PIL import Image
 3
 4def rgb_to_ycbcr(r, g, b):
 5    y  =  0.299 * r + 0.587 * g + 0.114 * b
 6    cb = -0.1687 * r - 0.3313 * g + 0.5 * b + 128
 7    cr =  0.5 * r - 0.4187 * g - 0.0813 * b + 128
 8    return y, cb, cr
 9
10# Ideally, we process the image as numpy arrays for speed
11def convert_image(image_path):
12    img = Image.open(image_path)
13    img_arr = np.array(img, dtype=float)
14    r, g, b = img_arr[:,:,0], img_arr[:,:,1], img_arr[:,:,2]
15    return rgb_to_ycbcr(r, g, b)

2. 8x8 Blocking & Subsampling

JPEG processes images in 8x8 distinct blocks. All subsequent operations (DCT, Quantization) happen on these independent blocks.

We also perform Chroma Subsampling here. Since eyes are less sensitive to color, we can halve the resolution of the Cb and Cr channels (e.g., 4:2:0 subsampling) without much perceptible loss.

 1def make_blocks(channel, N=8):
 2    h, w = channel.shape
 3    # Pad to multiple of 8
 4    h_pad = (h + N - 1) // N * N
 5    w_pad = (w + N - 1) // N * N
 6    padded = np.zeros((h_pad, w_pad))
 7    padded[:h, :w] = channel
 8    
 9    blocks = []
10    for i in range(0, h_pad, N):
11        for j in range(0, w_pad, N):
12            blocks.append(padded[i:i+N, j:j+N])
13    return blocks

3. Discrete Cosine Transform (DCT)

This is the heart of JPEG. The DCT transforms our 8x8 pixel block from the spatial domain (pixel values) to the frequency domain (coefficients).

The top-left coefficient (DC) represents the average color of the block. The bottom-right coefficients represent high-frequency details (noise, sharp edges).

1import scipy.fftpack
2
3def dct_2d(block):
4    # Perform 2D DCT (Type II)
5    return scipy.fftpack.dct(
6        scipy.fftpack.dct(block.T, norm='ortho').T, norm='ortho'
7    )

4. Quantization

This is where the “lossy” part happens. We divide our DCT coefficients by a “Quantization Table” and round to the nearest integer. The tables are carefully tuned to discard high-frequency information that we can’t see well.

 1# Standard JPEG Luminance Quantization Table
 2Q_TABLE = np.array([
 3    [16, 11, 10, 16, 24, 40, 51, 61],
 4    [12, 12, 14, 19, 26, 58, 60, 55],
 5    [14, 13, 16, 24, 40, 57, 69, 56],
 6    [14, 17, 22, 29, 51, 87, 80, 62],
 7    [18, 22, 37, 56, 68, 109, 103, 77],
 8    [24, 35, 55, 64, 81, 104, 113, 92],
 9    [49, 64, 78, 87, 103, 121, 120, 101],
10    [72, 92, 95, 98, 112, 100, 103, 99]
11])
12
13def quantize(block, q_table):
14    return np.round(block / q_table)

After quantization, you’ll find that most of your 64 coefficients are now zeros, especially in the bottom-right corner. This is why JPEG compresses so well!

5. Zigzag Scan & Entropy Coding

To store these coefficients efficiently, we linearize the 8x8 matrix using a “zigzag” pattern, grouping the zeros at the end. We can then use Run-Length Encoding (RLE) to say “skip 15 zeros” instead of writing “0, 0, 0…”.

Finally, these symbols are Huffman encoded.

(Full Huffman implementation is a bit verbose for this post, but you can find the complete source code in the gist below!)

Conclusion

We’ve transformed raw pixels into quantized frequency coefficients. The file size is now a fraction of the original.

In Part 2, we will reverse this entire process to build the Decoder and view our compressed image.

Check out the full Encoder Implementation here (Gist)