I'm taking Computational Photography this semester and we recently had a homework assignment to implement Gradient Domain Cloning. Gradient Domain Cloning is a technique for blending together two images so that they fit together seamlessly.
In the past, the naive approach to doing this would be to identify two images with similar backgrounds and simply past one on top of the other. Using wikipedia's creepy example of an eye on a hand, this is what would happen:
The result is okay, but generally not great. It turns out, however, that if we analyze the images in the gradient domain (the gradient part of Gradient Domain Cloning), we can automate this process!
What is the gradient domain, you may ask? It can mean several things depending on what field you're in, but for our purposes, it means quantifying the change in pixel values in the X and Y directions. If we consider each color channel of a pixel at a time, we can calculate the slope for each channel in the X-direction by finding the difference in color values between the pixel to the left and right of the current pixel, and then dividing it by 2 (the distance between the pixels). We can do the same process with the pixel above and below the current pixel to determine the slope in the Y-direction.
(editor's note: what I'm describing is a generalization of a gradient, in the context of image processing. In Mathematics, the gradient is a well-defined concept which wikipedia has a great article on)
Performing this gradient operation on an image and then visualizing the output gives us something similar to edge-detection filters:
Now, what would happen if we performed our naive approach from above, but this time doing it with the gradient of each image? Wikipedia comes to our rescue with another creepy eye/hand picture to illustrate the result (Apple: you should patent this!):
This brings us one step closer to our goal of automated blending. Since we chose our two source images to have similar backgrounds (e.g. the eye and hand have the same skin tone), unsurprisingly, their gradient images should fit together nicely without any need for blending. The next step, then, would be to apply a magical mathematical operation to convert the combined image from the gradient domain to the original image domain.
Doing this is not so simple because we can't just do the functional inverse of taking a gradient (i.e. integrating in 2D); if we did, we would get back the naive approach's result. We need to take into account the colors in the foreground and background images so that they both match at the edge between the two images.
This paper goes into significant depth for how a solution was derived, but suffice it to say, we can reduce the problem down to solving a linear system of equations via Poisson's equation (in fact, Prof. Barnes simplified this to equation #7 on the homework assignment page). Once in this form, we can use SciPy's built-in solver for linear systems to solve for the blended image. Below is the result for our eye/hand example:
I've posted my code for this project on GitHub; the project is written in Python.
To create the image at the top of this post, use ron2.jpg as the foreground image, Mona_Lisa.jpg as the background image, and ron_matte.png to define the boundaries of the foreground image. All of these are found in the "imgs" directory on the GitHub page.