Image Analogies

Version 0.00 - 20040510

Table Of Contents

  1. Purpose
  2. Overview
  3. References
  4. Basic Algorithm
  5. Pyramids
  6. Neighborhoods
  7. Pixel Handling
  8. Disclaimer
  9. Usage
  10. Supported Options
  11. Screenshot
  12. Results
  13. TODO
  14. History
  15. Platform
  16. About

Purpose

The purpose of this project is to create program that performs an image analogy. More specifically, the program takes 3 input images and produces a fourth. The first two images (A1 and A2), show some sort of relationship between each other. The third image (B1) is the image to apply that relationship to. The image created (B2) is image B1 with the relationship of A1 and A2 applied to it.

Back to TOC

Overview

This part of the document is meant to be the overview of the project.

References

Basic Algorithm

function CreateImageAnalogy(A1, A2, B1)
{
    Create Pyramids(A1, A2, B1, B2)
    For each level l in num_pyramids, do:
    {
        For each pixel q in B2(l), do:
        {
            p = Best Match(A1, A2, B1, B2, l, q)
            if (PSRC) B2(l)(q) = A2(l)(p)
            else if (DIFF) B2(l)(q) = B1(l)(q) + A2(l)(p) - A1(l)(p)
        }
    }
}
function BestMatch(A1, A2, B1, B2, l, q)
{
    N_B1 = BuildNeighborhood(B1, l, q)
    N_B2 = BuildCausalNeighborhood(B2, l, q)
    best_pixel = 0
    best_compare = INFINITY
    For all pixels p in A1 do:
    {
        N_A1 = BuildNeighborhood(A1, l, p)
        N_A2 = BuildCausalNeighborhood(A2, l, p)
        compare = compare(N_B1, N_A1) + compare(N_B2, N_A2)
        if (best_compare > compare)
        {
            best_pixel = p 
            best_compare = compare
        }
    }
    return best_pixel
}

Pyramids

The algorithm uses pyramids so as to avoid losing the overall shape of the image (or for not capturing objects larger than the neighborhood size). The multi-level pyramids are fairly straightforward to generate. Each lower level of the pyramid has it's dimensions shrunk by half. To populate the new levels of the pyramid we can use several different algorithms:

For the image we're trying to create, we simply populate the pyramids in the reverse order, i.e., we start with the smallest pyramid and build it up to the largest level (the actual image). This way the original shape of the image is retained.

Neighborhoods

The algorithm uses neighborhoods to compute the similarity of two pixels. Each pixel's neighborhood consists of pixels next to it within a certain radius. So a size (width) 3 neighborhood would consist of the pixel and the 8 pixels surrounding it, while a size (width) 5 neighborhood consists of the pixel, the 8 surrounding pixels and the 16 pixels surround those 8. To compute two neighborhoods we do a sum squared difference of all the components (the upper right pixel of one neighborhood is compared against the upper right pixel of another, then the next pair is compared and the result added, and so on).

There are two things to consider while computing these neighborhoods. First, care must be taken to only compute pixels that are valid. While generating the image in scan-line order, we can only use the pixels that we have already generated (causal neighborhood). For example a size 3 neighborhood, has only the top row (3 pixels) and the first pixel of the middle row as pixels that we can actually use. We can't use the center pixel because it's the one we're trying to synthesize. We can't use the rest because they haven't been sythesized. The second issue to think about is the way the distance we're trying to compute is handled in each neighborhood. Do we want to give each pixel the same weight, or do they become less important the further they are from the pixel we are trying to synthesize?

Pixel Handling

This program has an extra (optional) feature over the standard image analogy. The standard image analogy uses the best match in A' to use as the new pixel of B'. The original paper can get away with this because it uses YIQ color scheme and only compares the Y channel (black and white data), and the IQ (color data) are copied over from B. I use the DIFF method to capture the difference from A to A' and then apply it B to get B'. However, it is only an option and the original method is still there to be experimented with, though the YIQ implementation seems a bit flaky.

Back to TOC

Disclaimer

This is not "commercial quality" software. There are no guarantees, etc., and the GPL is implied.

Other known issues:

Back to TOC

Usage

The program is a command-line tool. Output is always generated to `output.bmp'. Hit q at any time to exit out of program if using the GUI, or CTRL-C if in non-graphical mode. A very minimialistic use of it is here:

analogy -a original_a.bmp -A changed_a.bmp -b image_to_apply_analogy_to.bmp

Supported Options

-G
Run in non-gui mode. The default mode is GUI. If your image (along with the pyramids it will create) is bigger than the screen resolution of the program (800x600) then the program will crash (attempting to draw to pixels off the screen). Use the non-gui mode. You probably shouldn't be trying to run the program on such large images anyway, since we don't use any speed optimizations the program will run until the sun blows up.
-h
Displays a simple help menu listing all the options.
-r
Specifies to populate the pyramids with random noise from rand().
-a filename
Required. Specifies the A image, i.e., the image that has some sort of relationship with A'.
-A filename
Required. Specifies the A' image, i.e., the image that is some changed version of A. A and A' must be the same dimensions.
-b filename
Required. Specifies the B image, i.e., the image that will have the relationship between A and A' applied to it.
-B filename
Optional. Specifies the B' image, i.e., the image that the construction pyramid will be intialized to (instead of all black pixels (default) or random noise).
-c color
Specifies the color mode to run in. The choices are RGB (default) or YIQ. YIQ is an optimization that allows to theoretically do 1/3 of the computations, but it seems a bit flaky (at least in the latest implementation).
-m method
Specifies the method of searching to perform. Currently only BRUTE (as in brute force) is available. Eventually things like ANN, TSVQ, COH _might_ be available.
-n neighborhood_width
Specifies the neighborhood width. I.e. a width of 5 will generate 5x5 squares for regular neighborhoods. The width has to be odd and positive.
-p number_of_passes
Specifies the number of passes to perform at each level. Default is 1. Each pass makes you program run longer, but will sometimes produce drastically different results.
-s seed
Allows to specify the seed that rand() will use throughout the entire program. (Currently the only thing that uses rand() is the population of the B' pyramids)
-S source
Allows to specify how to handle the matching pixel in A'. PSRC works just like in the image analogies paper (assigns the value directly into B'). DIFF works by first computing the difference between A and A' at that pixel and then applying it to B, and then setting the result to B'. This option will drastically change the resulting image depending on the A and A' images used.
-t type
Allows to specify the type of algorithm to use to populate the pyramids. BRUTE simply takes every other pixel. AVG blends the 9 pixel area. GAUSSIAN blends the area giving extra weight to the center pixel.

Screenshot

Back to TOC

Results

Feel free to browse the examples directory. Each contains files called a1, a2 and b1. Most output files will also have an output.txt file showing exactly how it was created.

Back to TOC

TODO

Back to TOC

History

2005-05-10
  • Release of v0.00 of this document.
  • Release of v0.00 of the software.
Back to TOC

Platform

The idea behind this project is to have highly portable source. Ideally there'd be no porting to get done, only recompiling on a different OS.

Developer OS
  • Linux/Unix
Supportable OS
  • Linux/Unix
  • MS Windows (mostly NT)
  • Mac OS X (Unix enough?)
Programming Language
  • c
APIs, etc.
  • libsdl, libsdl/image
Tools
  • Gimp (Photoshop substitute)
Back to TOC

Files

Entire Source and example images and presentation (~2MB)

Windows Binary and examples(~1MB)

Back to TOC

About

This document was originally written by Bartosz Luczynski.

Last document update
2005-05-10
Back to TOC