How does it work?

The most amazing thing about the multiresolution image query is how simple it is. Here's an overview of how it works.

The Wavelet Signature

  1. The image is rescaled to 128 X 128. This consolidates important details and makes the size of the image a power of two for processing (this is important for the wavelet decomposition step). The small size also makes processing faster, of course.
  2. This rescaled image is converted to the YIQ color space (Y is luminance, I is red minus Y, and Q is blue minus Y; this is the color space used for television broadcasts in North America). YIQ is used because it weights the most important information into the Y channel, as opposed to RGB which spreads the information out more evenly over the channels.
  3. Each one of the Y, I, and Q channels undergoes wavelet decomposition. Wavelet decomposition is a process of successively averaging image values together at coarser and coarser resolutions so that the entire image can be reconstituted at any of those resolutions (this is a rather gross explanation; see my references page for more information about wavelets.). Hence the name "multi-resolution". At the end of this process, there are 3 * 128 * 128 = 49,152 wavelet coeffients.
  4. The average color of each channel (the final result of the averaging process that is wavelet decomposition) is saved.
  5. The absolute largest n decomposed wavelet coeffients are saved from each channel. The rest are set to 0. In my implementation, 60 coeffients are saved from each channel, so there are only 180 coeffients saved from the original 49,152.
  6. The truncated coeffients are quantized: positive values are set to +1 and negative values are set to -1. Hence, only the sign of the largest coeffients is saved from the original image.
The result of this process is 3 arrays (one for each color channel) holding the truncated, quantized wavelet coeffients, plus the average colors for each channel. This is the wavelet signature.

Using an metric devised by Finkelstein, et al., wavelet signatures can be compared. Basically, the algorithm is this: The more signature matches between two images, the closer the score of the two images.

The Image Database

To construct the image database, a bunch of images are preprocessed using the wavelet signature algorithm described above and saved. Currently, this is implemented as a data structure containing all of the ImageMetaData elements (a container holding image location, MIME type, size, and SHA hash) and a collection of SearchArrays, which are 128x128 2D arrays of lists. There is one list for each (x,y) position that a coeffient is possible at. There are 6 SearchArrays, one for each possible sign (+1 or -1) and color channel (Y, I, or Q).

When the user wants to look up an image in the database, he specifies the query image (this is often done with a sketch; my system is non-interactive, so the user points to a URL of an image to use as a query). The query image is processed using the wavelet signature algorithm and then compared to all the other signatures in the database using the scoring metric. The lowest scores are the images that are the best matches for the query image and are returned.

The authors of Fast Multiresolution Query have an image that explains this process much better than I could ever do on their webpage

Eikon Implementation Details

Eikon is implemented in Java 1.4 (the current beta version). It uses the new ImageIO classes that Java 1.4 provides for converting real image formats into a Java BufferedImage object. That's the only 1.4 feature used.

The image database is stored in an object called ImageMetaDataCollection. This object is serialized and written to a file between Eikon sessions to provide for persistence. Perhaps an RDBMS solution would have been more stable, but serializing the object was much faster to implement, and more realistic for an end-user application (but not, of course, for a server-based application!).

Here is what a typical run of Eikon is like:

  1. Program started. Eikon looks for the saved database file. If found, it is read in and unserialized. If not, a new database is created.
  2. The query image (presented as a String representing a URL) is downloaded (assuming the MIME type is image/jpeg or image/gif)
  3. The SHA hash of the query image is calculated as a unique identifier and as a test for exact matches.
  4. The wavelet signature is calculated, as described above.
  5. The wavelet signature is used to calculate the closest matches from the database.
  6. The SHA hash of the query image is used to see if there is an exact match in the database. If there is, it is returned as part of the image matches output to the user. If not, then the query image is added to the database.
  7. The image database is serialized and written to a file. The program ends.

Does it work?

Sort of! In my tests, Eikon consistently returns differently sized versions of the same image as close matches. So, for example, the thumbnail of an image can be used as a very accurate search template for the larger version. This works quite well.

I've also had some success in getting different versions of the same picture to match each other. For example, using one Mona Lisa image to find others.

It is harder to tell if finding images which are similar to each other, but not directly related, will work well. The image database is so small right now (approximately 400 images) that many of the matches aren't very meaningful.

In conclusion, remember that this is a demo. It has not been tuned, tweaked, or extensively tested. However, it is a strong foundation upon which to build a real application. With a good amount of work, I think it could become a truly useful application.


Luke Francl
Last modified: Tue Jun 26 15:54:30 CDT 2001