Project 2: Fun with Filters and Frequencies!

[View Assignment]

Part 1: Fun with Filters

Part 1.1: Convolutions from Scratch!

I will be using my profile picture for the convolutions (it will be read as grayscale):

Four For-Loop Convolution Implementation


                        def convolution_four(im, filter):
                            H, W = im.shape
                            h, w = filter.shape
                            pad_h, pad_w = h // 2, w // 2

                            im_pad = np.pad(im, ((pad_h, pad_h), (pad_w, pad_w)))
                            out = np.zeros((H, W))

                            for i in range(H):
                                for j in range(W):
                                    val = 0
                                    for m in range(h):
                                        for n in range(w):
                                            val += im_pad[i + m, j + n] * filter[(h-1)-m, (w-1)-n] # get flipped filter index
                                    out[i, j] = val

                            return out
                    

Two For-Loop Convolution Implementation


                        def convolution_two(im, filter):
                            H, W = im.shape
                            h, w = filter.shape
                            pad_h, pad_w = h // 2, w // 2

                            im_pad = np.pad(im, ((pad_h, pad_h), (pad_w, pad_w)))
                            out = np.zeros((H, W))

                            filter = filter[::-1, ::-1] # flip filter horizontally/vertically
                            for i in range(H):
                                for j in range(W):
                                    patch = im_pad[i:i+h, j:j+w]
                                    out[i, j] = np.sum(patch * filter)

                            return out
                    

Both implementations handle the boundaries using zero-fill padding, extending the image in both kernel height H and kernel width W by h and w // 2.

Comparison to Built-In Convolution (scipy.signal.convolve2d)

Now, I will use my profile image to test my implementations with the built-in convolution function scipy.signal.convolve2d. I will try each convolution method with a box filter, and the finite difference operators \(D_x = \begin{pmatrix} 1 & 0 & -1 \end{pmatrix}\) and \(D_y \begin{pmatrix} 1 \\ 0 \\ -1 \end{pmatrix}\).


                        me_gray = skio.imread(f'./data/me.jpg', as_gray=True)
                        me_gray = sk.img_as_float(me_gray)
                        box_filter = np.ones((9, 9)) / 81
                        Dx = np.array([[1, 0, -1]])
                        Dy = np.array([[1], [0], [-1]])

                        start = perf_counter()
                        conv_four_box = convolution_four(me_gray, box_filter)
                        end = perf_counter()
                        print(f'Convolution (four loops) time: {end - start:.4f} seconds')
                        conv_four_dx = convolution_four(me_gray, Dx)
                        conv_four_dy = convolution_four(me_gray, Dy)

                        start = perf_counter()
                        conv_two_box = convolution_two(me_gray, box_filter)
                        end = perf_counter()
                        print(f'Convolution (two loops) time: {end - start:.4f} seconds')
                        conv_two_dx = convolution_two(me_gray, Dx)
                        conv_two_dy = convolution_two(me_gray, Dy)

                        start = perf_counter()
                        conv_scipy_box = sp.signal.convolve2d(me_gray, box_filter, mode='same')
                        end = perf_counter()
                        print(f'Built-In Convolution (scipy) time: {end - start:.4f} seconds')
                        conv_scipy_dx = sp.signal.convolve2d(me_gray, Dx, mode='same')
                        conv_scipy_dy = sp.signal.convolve2d(me_gray, Dy, mode='same')
                    

                        Convolution (four loops) time: 11.0136 seconds
                        Convolution (two loops) time: 1.6353 seconds
                        Built-In Convolution (scipy) time: 1.0861 seconds
                    

As expected, the four for-loop convolution is significantly slower than the two for-loop convolution and the built-in convolution. The built-in convolution is faster than the two for-loop convolution, but they are in the same order of magnitude.

Here are the results for each convolution method:

Four For-Loop Convolution
Two For-Loop Convolution
Built-In Convolution

All three convolution methods produce the same results. The box filter smooths/blurs the image, while the finite difference operators highlight edges in the horizontal and vertical directions.

Part 1.2: Finite Difference Operator

I will be using this image:

Code


                        Dx = np.array([[1, 0, -1]])
                        Dy = np.array([[1], [0], [-1]])
                        ...
                        cameraman = skio.imread(f'./data/cameraman.jpg', as_gray=True)
                        cameraman = sk.img_as_float(cameraman)

                        Ix = sp.signal.convolve2d(cameraman, Dx, mode='same', boundary='symm')
                        Iy = sp.signal.convolve2d(cameraman, Dy, mode='same', boundary='symm')
                        gradient_magnitude = np.sqrt(Ix**2 + Iy**2)
                        edge_image = (gradient_magnitude > 0.35).astype(np.uint8) * 255
                    

Here, I convolved the image with the finite difference operators \(D_x\) and \(D_y\) in order to get the partial derivatives in \(x\), \(I_x\), and \(y\), \(I_y\). Then using \(I_x\) and \(I_y\), I computed the gradient magnitude image by \(\sqrt{I_x^2 + I_y^2}\).

Finally, I created a binarized edge image by thresholding the gradient magnitude image at \(>0.35\). This took a few iterations fine-tuning, as I was aiming for an edge image that have all prominent edges, while removing as much noise possible (e.g., the grass). Note that it was difficult getting the edges of the buildings in the background, as the filters could only extract the edges of the cameraman in the foreground.

Part 1.3: Derivative of Gaussian (DoG) Filter

I will be using the same image from Part 1.2. I wasn't able to cleanly remove the noise in the previous results, so now I will use the Gaussian filter to first smooth/blur the image. Using the built in 1D Gaussian filter cv2.getGaussianKernel(), I created a 2D Gaussian filter for images by taking the outer product of the 1D Gaussian and its transpose.


                    def gaussian_filter(kernel_size=9, sigma=2):
                        g1d = cv2.getGaussianKernel(kernel_size, sigma)
                        return g1d @ g1d.T
                

This is what the blurred version of the cameraman looks like after convolving it with this Gaussian filter:

Now repeating the same procedure in Part 1.2,


                    G = gaussian_filter()
                    blurred = sp.signal.convolve2d(cameraman, G, mode='same', boundary='symm')

                    Ix = sp.signal.convolve2d(blurred, Dx, mode='same', boundary='symm')
                    Iy = sp.signal.convolve2d(blurred, Dy, mode='same', boundary='symm')
                    gradient_mag_blur = np.sqrt(Ix**2 + Iy**2)
                    edge_image_blur = (gradient_mag_blur > 0.09).astype(np.uint8) * 255
                

which gets these results:

I had to lower the thresholding value to \(>0.09\). Now, I can do the same thing but avoid having to smooth the original image by convolving the Gaussian with \(D_x\) and \(D_y\) to get derivative of Gaussian (DoG) filters:

Note: the filters themselves are small kernels, but when applied they reveal strong horizontal/vertical edges.


                    DoGx = sp.signal.convolve2d(G, Dx, mode='same', boundary='symm')
                    DoGy = sp.signal.convolve2d(G, Dy, mode='same', boundary='symm')

                    Ix_DoG = sp.signal.convolve2d(cameraman, DoGx, mode='same', boundary='symm')
                    Iy_DoG = sp.signal.convolve2d(cameraman, DoGy, mode='same', boundary='symm')
                    gradient_mag_dog = np.sqrt(Ix_DoG**2 + Iy_DoG**2)
                    edge_image_dog = (gradient_mag_dog > 0.09).astype(np.uint8) * 255
                

And we get practically the same results:

I used the same thresholding value \(>0.09\). We can compare the edge images for all three methods:

The results for DoG and Gaussian smoothing are much cleaner, as the Gaussian filter was able to smooth out the noise in the image.

Part 2: Fun with Frequencies!

Part 2.1: Image "Sharpening"

This is my implementation of the unsharp mask filtering technique:


                    def unsharp_masking(im, kernel_size=9, sigma=2, alpha=1.0):
                        G = gaussian_filter(kernel_size, sigma)

                        sharpened = np.zeros_like(im, dtype=np.float32)
                        high_freq_all = np.zeros_like(im, dtype=np.float32)
                        for channel in range(im.shape[2]):
                            blurred = sp.signal.convolve2d(im[:, :, channel], G, mode='same', boundary='symm')
                            high_freq = im[:, :, channel] - blurred
                            high_freq_all[:, :, channel] = high_freq
                            sharpened[:, :, channel] = im[:, :, channel] + (alpha * high_freq)

                        return np.clip(sharpened, 0, 1), high_freq_all
                

Here, I use the Gaussian filter as a low-pass (blur) filter to extract the low frequencies of each color channel of an image, then subtract these blurry images from the original image to obtain only the high frequencies. With these high frequencies, I added it back into the original image to "sharpen" the original. I noticed that my profile image was a bit blurry. So after running,


                    taj = skio.imread(f'./data/taj.jpg')
                    taj = sk.img_as_float(taj)

                    me = skio.imread(f'./data/me.jpg')
                    me = sk.img_as_float(me)

                    sharpened_taj, high_freq_taj = unsharp_masking(taj)
                    sharpened_me, high_freq_me = unsharp_masking(me)
                

these are the results of a Taj Mahal image and me:

For evaluation, I blurred the sharpened image of the Taj Mahal and then tried to sharpen it again:


                    blurred_sharp_taj = np.zeros_like(sharpened_taj, dtype=np.float32)
                    for channel in range(sharpened_taj.shape[2]):
                            blurred_sharp_taj[:, :, channel] = sp.signal.convolve2d(sharpened_taj[:, :, channel], G, 
                                                                                                mode='same', boundary='symm')
                    sharpened_blurred_sharp_taj, high_freq_blurred_sharp_taj = unsharp_masking(blurred_sharp_taj)
                

These are the results:

As you can see, the "sharpening" doesn't seem to work because blurring an image will remove most of the high frequencies, so trying to "sharpen" it will only add little high frequency it has left to the original image, and thus the result looks similar to the input.

I will now demonstrate trying different \(\alpha\) values (varying the sharpening amount) affects the results. The previous results used \(\alpha = 1.0\).

The higher the \(\alpha\), the stronger the sharpness becomes in the resulting image. \(\alpha\) essentially amplifies the high frequencies that are added back into the original image.

Part 2.2: Hybrid Images

I will create hybrid images for the following 3 pairs:

Nutmeg
Derek
Curious
Me
Basketball
Basketball 1

I first use the provided starter code (align_image_code.py's align_images()) to align the pairs of images to where I want (e.g., for Derek + Nutmeg, align their eyes).

Then, I implemented code to low-pass filter to one image (Derek) and high-pass filter the other image (Nutmeg). I used a standard 2D Gaussian filter for the low-pass filter, and simply subtract the low-pass filtered image from the original.


                    def low_pass_filter(im, kernel_size=9, sigma=2):
                        G = gaussian_filter(kernel_size, sigma)
                        blurred_im = np.zeros_like(im, dtype=np.float32)
                        for channel in range(im.shape[2]):
                                blurred_im[:, :, channel] = sp.signal.convolve2d(im[:, :, channel], G, mode='same', boundary='symm')
                        return blurred_im

                    def high_pass_filter(im, kernel_size=9, sigma=2):
                        return im - low_pass_filter(im, kernel_size, sigma)
                

Using these pass filters, I simply add together the low-pass filtered image with the high-pass filtered image with adjustable cutoff-frequencies \(\sigma\) for each filter.


                    def hybrid_image(high_im, low_im, kernel_size=9, sigma_high=2, sigma_low=2):
                        high_im = high_pass_filter(high_im, kernel_size, sigma_high)
                        low_im = low_pass_filter(low_im, kernel_size, sigma_low)
                        hybrid = high_im + low_im
                        return np.clip(hybrid, 0, 1), high_im, low_im
                

I also wanted to keep track of the frequency content of all the images throughout the process. For visualization, I compute the log-magnitude of the Fourier transform, with the zero-frequency component shifted to the center of the image using np.fft.fftshift(). This makes it easier to see the distribution of high- and low-frequency components.


                    def save_fft(im, filename):
                        if im.ndim == 3:
                            im_gray = np.mean(im, axis=2)  # convert to grayscale
                        else:
                            im_gray = im

                        f = np.fft.fft2(im_gray)
                        fshift = np.fft.fftshift(f)
                        magnitude = np.log(np.abs(fshift))

                        magnitude = magnitude / np.max(magnitude)  # normalize
                        convert_and_save_uint8(magnitude, filename)
                

Now, choosing the cutoff frequencies \(\sigma_{\text{high}} = 10\) and, \(\sigma_{\text{low}} = 25\) the result of the code


                    derek = skio.imread(f'./data/DerekPicture.jpg')
                    derek = sk.img_as_float(derek)

                    nutmeg = skio.imread(f'./data/nutmeg.jpg')
                    nutmeg = sk.img_as_float(nutmeg)

                    nutmeg_aligned, derek_aligned = align_images(nutmeg, derek)

                    sigma_high = 10
                    sigma_low = 25
                    hybrid, high_im, low_im = hybrid_image(high_im=nutmeg_aligned, low_im=derek_aligned, sigma_high=sigma_high, 
                                                                                                                sigma_low=sigma_low)

                    save_fft(nutmeg_aligned, 'fft_nutmeg')
                    save_fft(derek_aligned, 'fft_derek')
                    save_fft(low_im, 'fft_low')
                    save_fft(high_im, 'fft_high')
                    save_fft(hybrid, 'fft_hybrid')
                

is:

Aligned images
Nutmeg

Aligned Nutmeg

Derek

Aligned Derek

Nutmeg (log-magnitude FFT)

Aligned Nutmeg (log-magnitude FFT)

Derek (log-magnitude FFT)

Aligned Derek (log-magnitude FFT)

Filtered results
Filtered Nutmeg

High-pass filtered Nutmeg

Filtered Derek

Low-pass filtered Derek

Filtered Nutmeg (FFT)

High-pass filtered Nutmeg (log-magnitude FFT)

Filtered Derek (FFT)

Low-pass filtered Derek (log-magnitude FFT)

Final image

Here are the results of the other pairs of images:

Part 2.3: Gaussian and Laplacian Stacks

Here is my implementation for the Gaussian and Laplacian stacks:


                    def gaussian_stack(im, levels=5, sigma=2):
                        G = gaussian_filter(9, sigma)
                        stack = [im]
                        for _ in range(1, levels):
                            if im.ndim == 3: # color image
                                level = np.zeros_like(stack[-1])
                                for channel in range(im.shape[2]):
                                    level[:, :, channel] = sp.signal.convolve2d(stack[-1][:, :, channel], G, 
                                                                                    mode='same', boundary='symm')
                            else:
                                level = sp.signal.convolve2d(stack[-1], G, mode='same', boundary='symm')
                            stack.append(level)
                        return stack

                    def laplacian_stack(im, levels=5, sigma=2):
                        g_stack = gaussian_stack(im, levels, sigma)
                        l_stack = []
                        for i in range(levels-1):
                            l_stack.append(g_stack[i] - g_stack[i+1])
                        l_stack.append(g_stack[-1])  # last level
                        return l_stack, g_stack
                

Applying these to the oraple for 9 levels:

Part 2.4: Multi-Resolution Blending (a.k.a. the Oraple!)

Using the Gaussian and Laplacian stacks, I implemented Burt and Adelson's multi-resolution spline algorithm from the 1983 paper in order to create the oraple, along with two other images.


                    def multires_spline(im1, im2, mask, levels=5, sigma=2):
                        # Step 1a
                        l1, _ = laplacian_stack(im1, levels, sigma)
                        l2, _ = laplacian_stack(im2, levels, sigma)

                        # Step 1b
                        gr = gaussian_stack(mask, levels, sigma)

                        # Step 2
                        LS = []
                        for i in range(levels):
                            LS.append(gr[i] * l1[i] + (1 - gr[i]) * l2[i])
                        
                        # Step 3
                        out = LS[-1]
                        for i in range(len(LS)-2, -1, -1):
                            out += LS[i]

                        return out
                

Since I am using stacks instead of pyramids, you will need to use a mask for the images.

Oraple

I will be using a smoothed vertical seam as the mask to nicely blend the apple and orange together.

Inputs
Masked Inputs
Multi-resolution blended image

Rothko

I will be using a smoothed horizontal seam as the mask to blend both of the Rothkos.

Inputs
Masked Inputs
Multi-resolution blended image

Manga

I will be using a hand-drawn mask in order to stitch together two manga panels. This is what the mask looks like before smoothing:

Inputs
Masked Inputs
Multi-resolution blended image

Conclusion

Through this project, I had experience with both spatial and frequency domain techniques in image processing. Starting from basic convolution implementations, I moved toward unsharp masking for sharpening, hybrid images that exploit low- and high-frequency perception, and finally Gaussian/Laplacian stacks for multi-resolution blending.

One of the key lessons was that blending quality depends critically on the mask, as a sharp binary mask produces visible seams, while smoothing the mask before applying the multi-resolution algorithm yields seamless transitions (shoutout to Anonymous Clam). Conceptually, I also deepened my understanding of how the Fourier transform reveals frequency content and why manipulating low vs. high frequencies leads to different perceptual effects.