Project 2: Fun with Filters and Frequencies!

[View Assignment]

Part 1: Fun with Filters

Part 1.1: Convolutions from Scratch!

I will be using my profile picture for the convolutions (it will be read as grayscale):

Four For-Loop Convolution Implementation


                        def convolution_four(im, filter):
                            H, W = im.shape
                            h, w = filter.shape
                            pad_h, pad_w = h // 2, w // 2

                            im_pad = np.pad(im, ((pad_h, pad_h), (pad_w, pad_w)))
                            out = np.zeros((H, W))

                            for i in range(H):
                                for j in range(W):
                                    val = 0
                                    for m in range(h):
                                        for n in range(w):
                                            val += im_pad[i + m, j + n] * filter[(h-1)-m, (w-1)-n] # get flipped filter index
                                    out[i, j] = val

                            return out

Two For-Loop Convolution Implementation


                        def convolution_two(im, filter):
                            H, W = im.shape
                            h, w = filter.shape
                            pad_h, pad_w = h // 2, w // 2

                            im_pad = np.pad(im, ((pad_h, pad_h), (pad_w, pad_w)))
                            out = np.zeros((H, W))

                            filter = filter[::-1, ::-1] # flip filter horizontally/vertically
                            for i in range(H):
                                for j in range(W):
                                    patch = im_pad[i:i+h, j:j+w]
                                    out[i, j] = np.sum(patch * filter)

                            return out

Both implementations handle the boundaries using zero-fill padding, extending the image in both kernel height H and kernel width W by h and w // 2.

Comparison to Built-In Convolution (`scipy.signal.convolve2d`)

Now, I will use my profile image to test my implementations with the built-in convolution function scipy.signal.convolve2d. I will try each convolution method with a box filter, and the finite difference operators \(D_x = \begin{pmatrix} 1 & 0 & -1 \end{pmatrix}\) and \(D_y \begin{pmatrix} 1 \\ 0 \\ -1 \end{pmatrix}\).


                        me_gray = skio.imread(f'./data/me.jpg', as_gray=True)
                        me_gray = sk.img_as_float(me_gray)
                        box_filter = np.ones((9, 9)) / 81
                        Dx = np.array([[1, 0, -1]])
                        Dy = np.array([[1], [0], [-1]])

                        start = perf_counter()
                        conv_four_box = convolution_four(me_gray, box_filter)
                        end = perf_counter()
                        print(f'Convolution (four loops) time: {end - start:.4f} seconds')
                        conv_four_dx = convolution_four(me_gray, Dx)
                        conv_four_dy = convolution_four(me_gray, Dy)

                        start = perf_counter()
                        conv_two_box = convolution_two(me_gray, box_filter)
                        end = perf_counter()
                        print(f'Convolution (two loops) time: {end - start:.4f} seconds')
                        conv_two_dx = convolution_two(me_gray, Dx)
                        conv_two_dy = convolution_two(me_gray, Dy)

                        start = perf_counter()
                        conv_scipy_box = sp.signal.convolve2d(me_gray, box_filter, mode='same')
                        end = perf_counter()
                        print(f'Built-In Convolution (scipy) time: {end - start:.4f} seconds')
                        conv_scipy_dx = sp.signal.convolve2d(me_gray, Dx, mode='same')
                        conv_scipy_dy = sp.signal.convolve2d(me_gray, Dy, mode='same')


                        Convolution (four loops) time: 11.0136 seconds
                        Convolution (two loops) time: 1.6353 seconds
                        Built-In Convolution (scipy) time: 1.0861 seconds

As expected, the four for-loop convolution is significantly slower than the two for-loop convolution and the built-in convolution. The built-in convolution is faster than the two for-loop convolution, but they are in the same order of magnitude.

Here are the results for each convolution method:

Four For-Loop Convolution

Box Filter

\(D_x\) (Vertical edge detector)

\(D_y\) (Horizontal edge detector)

Two For-Loop Convolution

Box Filter

\(D_x\) (Vertical edge detector)

\(D_y\) (Horizontal edge detector)

Built-In Convolution

Box Filter

\(D_x\) (Vertical edge detector)

\(D_y\) (Horizontal edge detector)

All three convolution methods produce the same results. The box filter smooths/blurs the image, while the finite difference operators highlight edges in the horizontal and vertical directions.

Part 1.2: Finite Difference Operator

I will be using this image:

Code


                        Dx = np.array([[1, 0, -1]])
                        Dy = np.array([[1], [0], [-1]])
                        ...
                        cameraman = skio.imread(f'./data/cameraman.jpg', as_gray=True)
                        cameraman = sk.img_as_float(cameraman)

                        Ix = sp.signal.convolve2d(cameraman, Dx, mode='same', boundary='symm')
                        Iy = sp.signal.convolve2d(cameraman, Dy, mode='same', boundary='symm')
                        gradient_magnitude = np.sqrt(Ix**2 + Iy**2)
                        edge_image = (gradient_magnitude > 0.35).astype(np.uint8) * 255

Here, I convolved the image with the finite difference operators \(D_x\) and \(D_y\) in order to get the partial derivatives in \(x\), \(I_x\), and \(y\), \(I_y\). Then using \(I_x\) and \(I_y\), I computed the gradient magnitude image by \(\sqrt{I_x^2 + I_y^2}\).

Finally, I created a binarized edge image by thresholding the gradient magnitude image at \(>0.35\). This took a few iterations fine-tuning, as I was aiming for an edge image that have all prominent edges, while removing as much noise possible (e.g., the grass). Note that it was difficult getting the edges of the buildings in the background, as the filters could only extract the edges of the cameraman in the foreground.

Partial derivative in \(x\), \(I_x\)

Partial derivative in \(y\), \(I_y\)

Gradient Magnitude Image

A binarized edge image (threshold \(>0.35\))

Part 1.3: Derivative of Gaussian (DoG) Filter

I will be using the same image from Part 1.2. I wasn't able to cleanly remove the noise in the previous results, so now I will use the Gaussian filter to first smooth/blur the image. Using the built in 1D Gaussian filter cv2.getGaussianKernel(), I created a 2D Gaussian filter for images by taking the outer product of the 1D Gaussian and its transpose.


                    def gaussian_filter(kernel_size=9, sigma=2):
                        g1d = cv2.getGaussianKernel(kernel_size, sigma)
                        return g1d @ g1d.T

This is what the blurred version of the cameraman looks like after convolving it with this Gaussian filter:

Now repeating the same procedure in Part 1.2,


                    G = gaussian_filter()
                    blurred = sp.signal.convolve2d(cameraman, G, mode='same', boundary='symm')

                    Ix = sp.signal.convolve2d(blurred, Dx, mode='same', boundary='symm')
                    Iy = sp.signal.convolve2d(blurred, Dy, mode='same', boundary='symm')
                    gradient_mag_blur = np.sqrt(Ix**2 + Iy**2)
                    edge_image_blur = (gradient_mag_blur > 0.09).astype(np.uint8) * 255

which gets these results:

Partial derivative in \(x\), \(I_x\) (blurred)

Partial derivative in \(y\), \(I_y\) (blurred)

Gradient Magnitude Image (blurred)

A binarized edge image (threshold \(>0.09\)) (blurred)

I had to lower the thresholding value to \(>0.09\). Now, I can do the same thing but avoid having to smooth the original image by convolving the Gaussian with \(D_x\) and \(D_y\) to get derivative of Gaussian (DoG) filters:

Derivative of Gaussian in \(x\)

Derivative of Gaussian in \(y\)

Note: the filters themselves are small kernels, but when applied they reveal strong horizontal/vertical edges.


                    DoGx = sp.signal.convolve2d(G, Dx, mode='same', boundary='symm')
                    DoGy = sp.signal.convolve2d(G, Dy, mode='same', boundary='symm')

                    Ix_DoG = sp.signal.convolve2d(cameraman, DoGx, mode='same', boundary='symm')
                    Iy_DoG = sp.signal.convolve2d(cameraman, DoGy, mode='same', boundary='symm')
                    gradient_mag_dog = np.sqrt(Ix_DoG**2 + Iy_DoG**2)
                    edge_image_dog = (gradient_mag_dog > 0.09).astype(np.uint8) * 255

And we get practically the same results:

Partial derivative in \(x\), \(I_x\) (DoG)

Partial derivative in \(y\), \(I_y\) (DoG)

Gradient Magnitude Image (DoG)

A binarized edge image (threshold \(>0.09\)) (DoG)

I used the same thresholding value \(>0.09\). We can compare the edge images for all three methods:

Derivative of Gaussian (DoG) filter method

Gaussian smoothing method

Finite difference method

The results for DoG and Gaussian smoothing are much cleaner, as the Gaussian filter was able to smooth out the noise in the image.

Part 2: Fun with Frequencies!

Part 2.1: Image "Sharpening"

This is my implementation of the unsharp mask filtering technique:


                    def unsharp_masking(im, kernel_size=9, sigma=2, alpha=1.0):
                        G = gaussian_filter(kernel_size, sigma)

                        sharpened = np.zeros_like(im, dtype=np.float32)
                        high_freq_all = np.zeros_like(im, dtype=np.float32)
                        for channel in range(im.shape[2]):
                            blurred = sp.signal.convolve2d(im[:, :, channel], G, mode='same', boundary='symm')
                            high_freq = im[:, :, channel] - blurred
                            high_freq_all[:, :, channel] = high_freq
                            sharpened[:, :, channel] = im[:, :, channel] + (alpha * high_freq)

                        return np.clip(sharpened, 0, 1), high_freq_all

Here, I use the Gaussian filter as a low-pass (blur) filter to extract the low frequencies of each color channel of an image, then subtract these blurry images from the original image to obtain only the high frequencies. With these high frequencies, I added it back into the original image to "sharpen" the original. I noticed that my profile image was a bit blurry. So after running,


                    taj = skio.imread(f'./data/taj.jpg')
                    taj = sk.img_as_float(taj)

                    me = skio.imread(f'./data/me.jpg')
                    me = sk.img_as_float(me)

                    sharpened_taj, high_freq_taj = unsharp_masking(taj)
                    sharpened_me, high_freq_me = unsharp_masking(me)

these are the results of a Taj Mahal image and me:

Original blurry image of the Taj Mahal

High-frequencies of the Taj Mahal

Sharpened version of the Taj Mahal

Original blurry image of me

High-frequencies of me

Sharpened version of me

For evaluation, I blurred the sharpened image of the Taj Mahal and then tried to sharpen it again:


                    blurred_sharp_taj = np.zeros_like(sharpened_taj, dtype=np.float32)
                    for channel in range(sharpened_taj.shape[2]):
                            blurred_sharp_taj[:, :, channel] = sp.signal.convolve2d(sharpened_taj[:, :, channel], G, 
                                                                                                mode='same', boundary='symm')
                    sharpened_blurred_sharp_taj, high_freq_blurred_sharp_taj = unsharp_masking(blurred_sharp_taj)

These are the results:

Blurred version of the sharpened Taj Mahal

High-frequencies of the blurred sharpened Taj Mahal

Sharpened version of the blurred sharpened Taj Mahal

As you can see, the "sharpening" doesn't seem to work because blurring an image will remove most of the high frequencies, so trying to "sharpen" it will only add little high frequency it has left to the original image, and thus the result looks similar to the input.

I will now demonstrate trying different \(\alpha\) values (varying the sharpening amount) affects the results. The previous results used \(\alpha = 1.0\).

\(\alpha = 0.5\)

\(\alpha = 2.0\)

\(\alpha = 10.0\)

The higher the \(\alpha\), the stronger the sharpness becomes in the resulting image. \(\alpha\) essentially amplifies the high frequencies that are added back into the original image.

Part 2.2: Hybrid Images

I will create hybrid images for the following 3 pairs:

I first use the provided starter code (align_image_code.py's align_images()) to align the pairs of images to where I want (e.g., for Derek + Nutmeg, align their eyes).

Then, I implemented code to low-pass filter to one image (Derek) and high-pass filter the other image (Nutmeg). I used a standard 2D Gaussian filter for the low-pass filter, and simply subtract the low-pass filtered image from the original.


                    def low_pass_filter(im, kernel_size=9, sigma=2):
                        G = gaussian_filter(kernel_size, sigma)
                        blurred_im = np.zeros_like(im, dtype=np.float32)
                        for channel in range(im.shape[2]):
                                blurred_im[:, :, channel] = sp.signal.convolve2d(im[:, :, channel], G, mode='same', boundary='symm')
                        return blurred_im

                    def high_pass_filter(im, kernel_size=9, sigma=2):
                        return im - low_pass_filter(im, kernel_size, sigma)

Using these pass filters, I simply add together the low-pass filtered image with the high-pass filtered image with adjustable cutoff-frequencies \(\sigma\) for each filter.


                    def hybrid_image(high_im, low_im, kernel_size=9, sigma_high=2, sigma_low=2):
                        high_im = high_pass_filter(high_im, kernel_size, sigma_high)
                        low_im = low_pass_filter(low_im, kernel_size, sigma_low)
                        hybrid = high_im + low_im
                        return np.clip(hybrid, 0, 1), high_im, low_im

I also wanted to keep track of the frequency content of all the images throughout the process. For visualization, I compute the log-magnitude of the Fourier transform, with the zero-frequency component shifted to the center of the image using np.fft.fftshift(). This makes it easier to see the distribution of high- and low-frequency components.


                    def save_fft(im, filename):
                        if im.ndim == 3:
                            im_gray = np.mean(im, axis=2)  # convert to grayscale
                        else:
                            im_gray = im

                        f = np.fft.fft2(im_gray)
                        fshift = np.fft.fftshift(f)
                        magnitude = np.log(np.abs(fshift))

                        magnitude = magnitude / np.max(magnitude)  # normalize
                        convert_and_save_uint8(magnitude, filename)

Now, choosing the cutoff frequencies \(\sigma_{\text{high}} = 10\) and, \(\sigma_{\text{low}} = 25\) the result of the code


                    derek = skio.imread(f'./data/DerekPicture.jpg')
                    derek = sk.img_as_float(derek)

                    nutmeg = skio.imread(f'./data/nutmeg.jpg')
                    nutmeg = sk.img_as_float(nutmeg)

                    nutmeg_aligned, derek_aligned = align_images(nutmeg, derek)

                    sigma_high = 10
                    sigma_low = 25
                    hybrid, high_im, low_im = hybrid_image(high_im=nutmeg_aligned, low_im=derek_aligned, sigma_high=sigma_high, 
                                                                                                                sigma_low=sigma_low)

                    save_fft(nutmeg_aligned, 'fft_nutmeg')
                    save_fft(derek_aligned, 'fft_derek')
                    save_fft(low_im, 'fft_low')
                    save_fft(high_im, 'fft_high')
                    save_fft(hybrid, 'fft_hybrid')

is:

Aligned images

Aligned Nutmeg

Aligned Derek

Aligned Nutmeg (log-magnitude FFT)

Aligned Derek (log-magnitude FFT)

Filtered results

High-pass filtered Nutmeg

Low-pass filtered Derek

High-pass filtered Nutmeg (log-magnitude FFT)

Low-pass filtered Derek (log-magnitude FFT)

Final image

Hybrid image

Hybrid image (log-magnitude FFT)

Here are the results of the other pairs of images:

Hybrid image (Curious me)

Hybrid image (Basketball)

Part 2.3: Gaussian and Laplacian Stacks

Here is my implementation for the Gaussian and Laplacian stacks:


                    def gaussian_stack(im, levels=5, sigma=2):
                        G = gaussian_filter(9, sigma)
                        stack = [im]
                        for _ in range(1, levels):
                            if im.ndim == 3: # color image
                                level = np.zeros_like(stack[-1])
                                for channel in range(im.shape[2]):
                                    level[:, :, channel] = sp.signal.convolve2d(stack[-1][:, :, channel], G, 
                                                                                    mode='same', boundary='symm')
                            else:
                                level = sp.signal.convolve2d(stack[-1], G, mode='same', boundary='symm')
                            stack.append(level)
                        return stack

                    def laplacian_stack(im, levels=5, sigma=2):
                        g_stack = gaussian_stack(im, levels, sigma)
                        l_stack = []
                        for i in range(levels-1):
                            l_stack.append(g_stack[i] - g_stack[i+1])
                        l_stack.append(g_stack[-1])  # last level
                        return l_stack, g_stack

Applying these to the oraple for 9 levels:

Part 2.4: Multi-Resolution Blending (a.k.a. the Oraple!)

Using the Gaussian and Laplacian stacks, I implemented Burt and Adelson's multi-resolution spline algorithm from the 1983 paper in order to create the oraple, along with two other images.


                    def multires_spline(im1, im2, mask, levels=5, sigma=2):
                        # Step 1a
                        l1, _ = laplacian_stack(im1, levels, sigma)
                        l2, _ = laplacian_stack(im2, levels, sigma)

                        # Step 1b
                        gr = gaussian_stack(mask, levels, sigma)

                        # Step 2
                        LS = []
                        for i in range(levels):
                            LS.append(gr[i] * l1[i] + (1 - gr[i]) * l2[i])
                        
                        # Step 3
                        out = LS[-1]
                        for i in range(len(LS)-2, -1, -1):
                            out += LS[i]

                        return out

Since I am using stacks instead of pyramids, you will need to use a mask for the images.

Oraple

I will be using a smoothed vertical seam as the mask to nicely blend the apple and orange together.

Inputs

An apple

An orange

Masked Inputs

An apple (masked)

An orange (masked)

Multi-resolution blended image

An oraple

Rothko

I will be using a smoothed horizontal seam as the mask to blend both of the Rothkos.

Inputs

Mark Rothko, Orange and Tan. 1954

Mark Rothko, No. 8. 1949

Masked Inputs

Mark Rothko, Orange and Tan. 1954 (masked)

Mark Rothko, No. 8. 1949 (masked)

Multi-resolution blended image

A hybrid of Orange and Tan and No. 8 by Mark Rothko

Manga

I will be using a hand-drawn mask in order to stitch together two manga panels. This is what the mask looks like before smoothing:

Inputs

A panel from Vagabond, by Takehiko Inoue

A panel from Hunter x Hunter, by Yoshihiro Togashi

Masked Inputs

A panel from Vagabond, by Takehiko Inoue (masked)

A panel from Hunter x Hunter, by Yoshihiro Togashi (masked)

Multi-resolution blended image

A hybrid of manga panels from Vagabond and Hunter x Hunter

Conclusion

Through this project, I had experience with both spatial and frequency domain techniques in image processing. Starting from basic convolution implementations, I moved toward unsharp masking for sharpening, hybrid images that exploit low- and high-frequency perception, and finally Gaussian/Laplacian stacks for multi-resolution blending.

One of the key lessons was that blending quality depends critically on the mask, as a sharp binary mask produces visible seams, while smoothing the mask before applying the multi-resolution algorithm yields seamless transitions (shoutout to Anonymous Clam). Conceptually, I also deepened my understanding of how the Fourier transform reveals frequency content and why manipulating low vs. high frequencies leads to different perceptual effects.

Project 2: Fun with Filters and Frequencies!

Part 1: Fun with Filters

Part 1.1: Convolutions from Scratch!

Four For-Loop Convolution Implementation

Two For-Loop Convolution Implementation

Comparison to Built-In Convolution (scipy.signal.convolve2d)

Four For-Loop Convolution

Two For-Loop Convolution

Built-In Convolution

Part 1.2: Finite Difference Operator

Code

Part 1.3: Derivative of Gaussian (DoG) Filter

Part 2: Fun with Frequencies!

Part 2.1: Image "Sharpening"

Part 2.2: Hybrid Images

Aligned images

Filtered results

Final image

Part 2.3: Gaussian and Laplacian Stacks

Part 2.4: Multi-Resolution Blending (a.k.a. the Oraple!)

Oraple

Inputs

Masked Inputs

Multi-resolution blended image

Rothko

Inputs

Masked Inputs

Multi-resolution blended image

Manga

Inputs

Masked Inputs

Multi-resolution blended image

Conclusion

Comparison to Built-In Convolution (`scipy.signal.convolve2d`)