A powerful new technology makes it possible to catalog and retrieve images without using verbal descriptions.
Early clients range from textile companies to law enforcement authorities.
IN BRIEF:
A team at IBM's Almaden Research Center has developed a technology that permits users to catalog and retrieve images from databases without having to describe them verbally. Query by Image Content (QBIC) relies on a simple concept: the best way to query a database of images is to "show it" an image similar to that being sought, and to ask for all images that match it in some way or other. QBIC has already been incorporated into two IBM commercial products. It is finding use in applications that range from textiles to art history to police work.
If a picture is worth a thousand words, imagine the difficulty of cataloging a database of up to 100,000 images by providing a written description of each one. Not only would the cataloger have to describe the color, shape and texture of every element within each picture and its relationship to every other element; but anyone searching for an image would have to guess exactly what words had been used in the description. With computers capable of displaying millions of colors and an effectively infinite number of shapes and textures, the task would be unenviable, if not impossible.
"Try to describe a wallpaper pattern over the phone," observes Dragutin Petkovic, manager of advanced algorithms, architectures and applications at IBM's Almaden Research Center. "No matter how simple the pattern and how precise you are in your description, you would have great difficulty finding a match."
Today, thanks to a powerful new means of cataloging and retrieving images and other multimedia documents developed by Petkovic's team at Almaden, image databases can be sorted and queried by color, shape and texture. The new technology is called Query By Image Content (QBIC) and pronounced "cubic." Combined with simple keyword descriptors and text-search capability, it promises to revolutionize the storage and retrieval of multimedia images.
The ability to search by image content supplements, rather than replaces, traditional query techniques. In fact, a QBIC database can contain as much textual information as is needed to simplify the search. Neither QBIC nor any similar technology can automatically associate a word, such as house, with an image. However, such associations can be added by hand, just as in an ordinary database. If QBIC were being used for an online clothing catalog, for example, the automatically generated image descriptors would be accompanied by such information as the name of the item,
the material it's made of, its price and its catalog number.
QBIC emerged when Almaden researchers sought to apply the skills they had developed in creating robotic machine vision inspection systems for microelectronic and data storage manufacturing to the broader, newer content-management business. The project involved extensive cooperation. Complementing the efforts of Petkovic, Wayne Niblack and others at Almaden was work on color matching by Jung-kook Hong's group at the Tokyo Research Laboratory and on matching shapes by Gabriel Taubin at the Thomas J. Watson Research Laboratory.
So far, IBM has incorporated the technology into two of its commercial products: Ultimedia® Manager and DB2 Extensions. The company is also licensing the core search engine to developers and resellers. QBIC is already being used by a variety of customers, ranging from textile manufacturers and art libraries to law enforcement agencies (see below). It has also been incorporated into custom software tailored for specific applications, such as selecting wallpaper and tracking down photographs in stock collections. A group, headed by Frank Tung at the Software Solutions Division's Santa Teresa Laboratory, is developing new products, while Drew Clark spearheads efforts to license the technology.
QBIC operates on a simple principle: the best way to query a database of images is to "show it" an image similar to the one you're seeking and to ask for all the images that match the sample in one or more features. For example, a photo researcher might select a shade of red and sketch a freehand outline of a spiky flower. In an instant, the spiky red flowers in the database that best match the drawing will pop up on the screen along with any other object that resembles a spiky red flower. The researcher can choose how many images appear. Most will select no more than the 10 or 20 small "thumbnail" images that fill a single screen.
If, for example, when viewing those images, the researcher is struck by one in which the red flower appears against a blue sky (while in most of the other images the flowers are framed by green grass). The researcher can simply click on the target image. In response, QBIC instantly bring up all images visually similar to red flowers and blue sky.
How QBIC quantifies images
When an image is scanned into QBIC, the computer calculates numerical values of several image descriptors:
- First, the technology computes the average color of the entire area as well as a "color histogram." QBIC puts each of the 16 million colors found in digital images into one of 64 psychologically meaningful color bins and determines that the photo consists, for example, of 20 percent yellow shades, 60 percent blues, and the remainder a mixture of other colors.
- Next, QBIC computes three measures of texture. These are: the extent of contrast (a zebra would be high contrast, a polar bear in a snowstorm, low); the amount of coarseness (a tictactoe board is coarser than a checkerboard); and the degree of directionality (a picket fence has more directionality than the leaves on a tree).
- The system can also record the positions of different colors within the image. This permits QBIC to answer such "color layout" queries as "show me all images with blue at the top and white at the bottom," for which the query input is an image drawn by the user or a sample image.
- QBIC can also sort for more complex shapes outlined by the user, as well as their locations within the image. Usually, the searcher outlines the image by hand before inputting it to the QBIC system. But the process can be automated if the background is uniform and contrasts highly with the shape.
When responding to a query, QBIC ranks the numerical values of images in the database to indicate their similarity to the query image.
Supplementing traditional techniques
Combining text-based searches with those based on image content makes the power of QBIC especially apparent. An individual browsing through the clothing catalog, for example, might choose search terms so as to show only men's cotton shirts costing less than $40. That still leaves a wide variety of styles and colors. If he were interested only in green shirts with stripes, he'd click first on any striped shirt and then on a green color-selector box. Anyone who's ever noticed the wide variety of color names - a recent popular mail-order catalog identifies no fewer than 20 shades of green - will appreciate the convenience of using nonverbal descriptions of color.
Moreover, explains team member Myron Flickner, "with visual material, you often don't know what you like until you see it, so browsing and searching are integrally connected. By starting with a fuzzy, imprecise query, which is inherent in content-based query, you end up with the best matches ranked in order of similarity. Although you may start out looking for a yellow sky, you may prefer a pink one once you see it."
Sometimes this leads serendipitously to the discovery of images quite remote from the original search range. For instance, a designer looking for yellow flowers may be presented with a photo of the Statue of Liberty, her golden crown gleaming against an azure sky. Artists have even invented a new term for this kind of "mistake." They call it "visual rhyming," and they can make creative use of it - by connecting semantically disparate scenes with the scenes that have similar color content, for example.
Visual rhyming points up the value of cooperation between human and machine in using QBIC technology. "Occasionally you get something that looks like 'junk' to you," admits Petkovic. "But your visual system is very good at rejecting these false alarms. There are tasks that machines do very well: counting, measuring, applying the same algorithm objectively over large amounts of data. But the knowledge, understanding and adaptability of humans is something we cannot replicate."
The next steps
While current versions of QBIC have already proved helpful in several settings, the Almaden team is working on improvements that will make QBIC even more useful. "We want to speed it up," says Petkovic, "so professional users can brainstorm and navigate with it, rather than just wait for images. We also want to have more intuitive methods of feedback. We want to develop a way for the user to provide both positive and negative feedback on image targets, so that you could say, 'I like this and this and this, but I don't like that and that and that.'"
An even more significant improvement in QBIC: its imminent extension to video images. A version of QBIC that can quantify video images is under development at Almaden, although it is not yet available in a commercial product.
In processing a video image, video QBIC can determine where scenes begin and end, and will soon be able to find one or more representative frames to describe each scene. It can also measure camera movements, such as pans and zooms, and it can create a mosaic from those movements that can delete objects moving within the frame. For example, if the video sequence includes a pan from left to right in a room, and a person is moving within that scene, video QBIC can produce a still image of the room from which the person has been neatly extracted.
Further development of video QBIC will allow users to quantify and search for many aspects of a moving scene. It will, for example, be possible for the manager of a stock footage agency to pull up all the sequences in which a red race car is accelerating rapidly from the lower left to the upper right of the screen. With that capability on the way, QBIC promises to satisfy the most demanding of searchers for images.
Robert Finn is a freelance science writer based in Long Beach, California.
For More Information: QBIC in Action
While it's still in the early stages of application, QBIC is already making a difference in several areas. Here are three examples:
Software engineer Edgard Aun is using QBIC to help textile companies in Brazil to keep track of hundreds of patterns that date back 30 years or more. Why? Because public taste always has a yen for nostalgia. As Aun puts it, "fashions always come back."
Since textile companies often find it too much trouble to search through old fabric swatches piled in a back room, they frequently hire artists to redesign patterns similar to those they already own. Aun and some partners started a company that may help end that inefficiency.
Western Imaging of Brazil goes from mill to mill, scans in digital images of the back room swatches (up to 3,000 images for some mills), and puts them on a CD-ROM. The company adds IBM's Ultimedia Manager for rapid QBIC searching of the results.
These databases find use externally as well as internally. Aun's company has equipped about 100 sales representatives from the mills with notebook computers that contain CD-ROMs filled with the latest season's designs. Instead of dragging fabric samples from clothing manufacturer to upholsterer to curtain maker, the reps now use the searchable, and much more easily transported, QBIC database.
With 200,000 images, the slide library in the Art and Art History Department at the University of California, Davis, provides a valuable resource to artists and others all over campus. But fine arts librarian Bonnie Holt discovered that even simple queries were taking up a great deal of her time.
A biologist might come in and ask for all the collection's paintings of fish. An art historian might want to see all sculptures made from black stone. A studio artist who had hit a snag might want to see how others had solved similar problems. Because of her familiarity with the vast collection under her charge, Holt could often answer such queries. But doing so meant dropping whatever she was doing, pulling slides from drawers, consulting with the patron once again and refining the search.
So she sought a way to do this electronically. She wanted not only to save time, but also to prevent wear and tear on the slides and, perhaps most important, to put the searches more directly under the control of the patrons. QBIC fit the bill. In one of its first real world applications, the technology was used to classify and store an initial selection of 2,000 of the library's slides. One application of the new technology particularly excites Holt: using the database to consider queries for similarity in race, gender or even class among the many individuals depicted in the artwork.
Janet Hethorn, professor of design at the University of California, Davis, uses QBIC in another unexpected way. "My research is all about how people see other people and what they're wearing," she explains, "and how that visual response affects their decision making about the meaning of clothes and about trends in fashion."
For a study on skiwear, Hethorn and her coworkers took about 2,000 photographs of people at ski resorts. They asked each photo subject about their skiing ability and sought their opinions on the clothes they were wearing. Hethorn plans to use the resulting QBIC database to better understand what people want in skiwear. Connecting the visual data with text responses provides intriguing information. For example, it is possible to see how differently beginning and advanced skiers interpret the word "comfortable."
In another study, Hethorn is studying the visual factors that lead to gang identity among California teenagers. "We're looking at the intersections between violence and style," she explains. "We gave cameras to kids all over the state of California. We just finished collecting more than 1,000 images, and we've been doing focus group interviews in which kids have been talking about what it means to them when they look at these photographs."
This study has more than academic interest. Hethorn is working with law enforcement agencies, among them the California Gang Investigators Association and the Los Angeles County Sheriff's Department, to see whether QBIC can be used to help understand gang identity. For example, if a suspect has a tattoo with a certain type of star motif, the police could compare the marking with those in a QBIC database to determine if it's characteristic of a certain gang.