Wednesday, December 10, 2014

Hyperlapse Video With Python

For my final project in Computational Photography, I wrote software that would emulate the Hyperlapse video effect found in Instagram's Hyperlapse App.

For those who are unaware, Hyperlapse video is the equivalent of a motion time-lapse picture. Typically, this is created with a series of image stabilization and subsampling algorithms. Below is an example of a Hyperlapse created with Instagram's app:

I did some digging to try to find out how these videos are created, and I found that Instagram uses the gyroscope built-into the iPhone for motion stabilization. As each frame is captured, the frame is translated in an amount and direction that is opposite to the motion of the camera, thus creating a synthetically smooth motion. In addition to the stabilization, the application crops, or in some cases, stretches each frame so that no black borders are visible from the translation of each frame.

Microsoft Research created a similar piece of software, but instead of using a gyroscope, they processed the video based on information from the image frames. Using a computation-heavy process of reconstructing a 3D point cloud of the entire scene, calculating an optimally smooth camera path, and then re-rendering the synthetic frames, they were able to output Hyperlapse-esque video.


Not having the resources of Microsoft nor an iPhone, I decided to try making a Hyperlapse effect with Python and OpenCV. In general, my algorithm was the following:
  1. Calculate motion between each video frame with sparse optical flow, then integrate the motion in the time-domain to calculate an estimated position of each frame at each time step
  2. Filter the position data with a low-pass filter to smooth the motion
  3. Subsample frames to speed up the footage
  4. Filter the subsampled position data with a low-pass filter again to remove high-frequency data that was added from subsampling
  5. Calculate how much to crop the video in order to remove the black borders from frame translations (if too much is cropped, overlay the frames)
  6. Translate and render the stabilized and subsampled frames
While not perfect, it ended up working pretty well! You can download the code for this from Github:

Below are some videos I rendered with the program:

Friday, October 24, 2014

Gradient Domain Cloning

I'm taking Computational Photography this semester and we recently had a homework assignment to implement Gradient Domain Cloning. Gradient Domain Cloning is a technique for blending together two images so that they fit together seamlessly.

In the past, the naive approach to doing this would be to identify two images with similar backgrounds and simply past one on top of the other. Using wikipedia's creepy example of an eye on a hand, this is what would happen:


The result is okay, but generally not great. It turns out, however, that if we analyze the images in the gradient domain (the gradient part of Gradient Domain Cloning), we can automate this process!

What is the gradient domain, you may ask? It can mean several things depending on what field you're in, but for our purposes, it means quantifying the change in pixel values in the X and Y directions. If we consider each color channel of a pixel at a time, we can calculate the slope for each channel in the X-direction by finding the difference in color values between the pixel to the left and right of the current pixel, and then dividing it by 2 (the distance between the pixels). We can do the same process with the pixel above and below the current pixel to determine the slope in the Y-direction.

(editor's note: what I'm describing is a generalization of a gradient, in the context of image processing. In Mathematics, the gradient is a well-defined concept which wikipedia has a great article on)

Performing this gradient operation on an image and then visualizing the output gives us something similar to edge-detection filters:

(source: wikipedia)

Now, what would happen if we performed our naive approach from above, but this time doing it with the gradient of each image? Wikipedia comes to our rescue with another creepy eye/hand picture  to illustrate the result (Apple: you should patent this!):

(source: wikipedia)

This brings us one step closer to our goal of automated blending. Since we chose our two source images to have similar backgrounds (e.g. the eye and hand have the same skin tone), unsurprisingly, their gradient images should fit together nicely without any need for blending. The next step, then, would be to apply a magical mathematical operation to convert the combined image from the gradient domain to the original image domain.

Doing this is not so simple because we can't just do the functional inverse of taking a gradient (i.e. integrating in 2D); if we did, we would get back the naive approach's result. We need to take into account the colors in the foreground and background images so that they both match at the edge between the two images.

This paper goes into significant depth for how a solution was derived, but suffice it to say, we can reduce the problem down to solving a linear system of equations via Poisson's equation (in fact, Prof. Barnes simplified this to equation #7 on the homework assignment page). Once in this form, we can use SciPy's built-in solver for linear systems to solve for the blended image. Below is the result for our eye/hand example:

(source: wikipedia)

I've posted my code for this project on GitHub; the project is written in Python.

To create the image at the top of this post, use ron2.jpg as the foreground image, Mona_Lisa.jpg as the background image, and ron_matte.png to define the boundaries of the foreground image. All of these are found in the "imgs" directory on the GitHub page.

Happy cloning!

Tuesday, May 6, 2014

Ergodox Keyboard

I spend most of my day staring at a computer screen, so over the past couple of years I've been transitioning my workspace to be more ergonomic. My most recent acquisition towards this goal has been building an ErgoDox keyboard.

The ErgoDox is an ergonomic, split keyboard that is designed to protect the user from repetitive strain injures (e.g. carpal tunnel). Furthermore, the code and hardware design for the keyboard is completely open-source and the keyboard uses a Teensy microcontroller for interfacing with the computer. The Teensy can be reprogrammed to reflect any key-layout that you would want and shows up on the computer as a generic USB keyboard. This is especially useful because there are no special drivers you have to install to use the keyboard and no extra software required to remap keys.

Most people buy an ErgoDox keyboard as a kit from the group-buy service, Massdrop. I originally planned to do this, but I didn't want to wait several weeks or months for the kit to arrive, so I opted to source the parts on my own. The parts list is available on the Ergodox website, and Matt Adereth has a great blog post about his experience with sourcing the parts. I bought my parts from the same places that Matt mentioned, with the only deviations of not building a case, combining a set of rainbow keycaps from WASD Keyboards with a blank modifier keycap set from Signature Plastics, and choosing brown Cherry switches.

At the time I was considering to buy a kit from Massdrop, getting the PCB, electronic components, blank keycaps, and a case would be ~$240. Sourcing the parts on my own, I was able to get the PCB, electronic components, rainbow keycaps, blank modifier keys, and brown Cherry switches for ~$212.

I held off on buying sheets of acrylic for a case because I wasn't able to secure access to a laser cuter at the time. I don't feel bad about this, though, because I only plan to keep the keyboard at my desk. To protect the diodes on the bottom from shorting and to protect my desktop from getting scratched, I put rubber feet (salvaged from an old project) on the bottom of the PCB.


I'm now a couple weeks into using the keyboard and I am very satisfied with the project. I find that I don't notice the keyboard's layout as I use it, which is good, because it means that I've memorized the layout. Currently, I'm using a modified QWERTY layout, but I'm also interested in trying more optimized layouts such as Dvorak or Colemak. The only annoyance I've found as a result of the Ergodox is that I have trouble adjusting back to traditional QWERTY keyboards. Specifically, I have a habit of thinking that traditional keyboards have a split spacebar and each half of the spacebar has a different function. However, this is a small price to pay for pain-free typing when I use the ErgoDox!

(I'm sorry I don't have more photos of the keyboard--I took several during the build process but forgot to import them into my computer before wiping my camera's SD card)