§ IBM T. J. Watson Research Center (1997-present)
§ Columbia University (<1997)
IBM T. J. Watson Research Center (1997-present):
-
MARVEL: Multimedia Analysis and Retrieval System (demo): We are developing a prototype multimedia search engine called MARVEL, which recently won the Wall Street Journal 2004 Innovation Award in the multimedia category. MARVEL replaces the costly, time-consuming, error-prone processes requiring 100% manual annotation of multimedia content (e.g., video, images, audio) with a semantics machine learning approach that requires annotation of only 1-5% of the multimedia data. MARVEL works by building statistical models from multi-modal features (e.g., speech transcripts, visual appearance, sounds) using labeled training examples, which allows the annotations to be automatically propagated to the remaining unlabeled data. This process greatly reduces the total cost of annotation, reduces annotation errors and allows effective search and retrieval.
The MARVEL system consists of two components: the MARVEL multimedia analysis engine and the MARVEL multimedia search engine.
· The MARVEL multimedia analysis engine – applies machine learning techniques to model semantic concepts in video from automatically extracted audio, speech, visual content. It automatically assigns labels (with associated confidence scores) to new video data to reduce manual annotation load and improve searching and organizes semantic concepts using ontologies that exploit semantic relationships for improving detection performance.
· The MARVEL multimedia retrieval engine – integrates multimedia semantics-based searching with other search techniques (speech, text, metadata, audio-visual features, etc.). It also combines content-based, model-based, and text-based searching for video searching.
Figure: The MARVEL multimedia search engine allows searching over large video repository using automatically generated semantic labels.
-
SLAM: Semantic Learning and Analysis of Multimedia
Bridging the gap between features and semantics.... Multimedia content is an essential part of information technology. However, the difficulty in filtering, searching, and summarizing video has so far hindered the effective utilization of video databases. Users want to filter and query video by high-level (semantic) concepts, while automatic algorithms can extract only low-level features (e.g., color, texture, shape, amount of motion). Bridging this gap is thus the most challenging problem in video (multimedia) indexing, summarization, retrieval, and filtering.
-
Content Adaptation Framework: Universal Multimedia Access
The information revolution of the last decade has resulted in a phenomenal increase in rich multimedia content on the Web. At the same time, a growing heterogeneity of pervasive devices is gaining access to the Web. While in some cases, specific services are being developed to target specific devices, such as, mobile phones, wireless PDAs, the gap is growing significantly between the richness of the available multimedia content and the widely varying capabilities of the devices and methods for accessing the Web. The future of Universal Multimedia Access (UMA) promises to allow seamless access to rich multimedia content across devices, networks and access methods, usage contexts, and users. We are addressing these challenges from several important directions including the standardization of metadata (eg., MPEG-7 XML) allowing selection of variations of multimedia resources and content adaptation and the development of systems and network architectures that allow rich media content to be transformed at peers within a processing chain. From the end-user perspective, we are exploring rich standardized descriptions of user environment (eg., MPEG-21 Digital Item Adaptation), including device capabilities, networks and bandwidth, usage contexts, and user preferences.

-
MediaStar: Video Semantic Summarization Systems
The MediaStar Video Semantic Summarization System generates a summarized video for a user based on his/her preference and delivers the personalized content effectively to the user. It is a complete summarization system to dynamically generate personalized video summaries using MPEG-7 descriptions of video contents in a middleware architecture. The Video Semantic Summarization System is designed and implemented for: (1) the stand-alone application, (2) mobile platform, and (3) web browser. Each system allows the user to specify topic preferences, query keywords and total summary time. The summarization techniques involve optimizing the relevance scores of user parameters against the MPEG-7 semantic descriptions of our video content.
-
VideoZoom: Wavelet Video Zooming
VideoZoom is a framework for progressively retrieving video sequences over the Internet at multiple spatial and temporal resolutions. The goal of VideoZoom is to greatly speed up delivery of video over the Internet and provide efficient multi-resolution interaction with video. VideoZoom uses wavelet technology to efficiently compact video sequences into small files suitable for distribution over the Internet. The VideoZoom progressive compression format allows initial fast retrieval and playback of coarse versions of the video. Additional residual information is subsequently retrieved in a number of progressive stages to fill in spatial and temporal details to reproduce the high-quality, full-resolution video.
-
SFGraph: Progressive High-Resolution Image Zooming
SFGraph is a framework for compressing and viewing large, high-resolution images. It provides an encoder and viewer for compressing and browsing large images such as aerial photographs, satellite images, high-resolution documents and maps, and high-resolution color photographs. SFGraph's goal is to speed up the delivery of, and interaction with, high-resolution images over the Internet. SFGraph's encoder uses wavelet technology to compresses large images in the PGM, PPM, PNM, JPEG, GIF, and raw RGB color and gray-scale formats. The SFGraph encoder allows control of the compression factors to provide a trade-off between the size of the compressed files and the fidelity of the reconstructed image. SFGraph also includes a Netscape plug-in-based image viewer that allows the SFGraph compressed images to be displayed interactively by zooming in and out and panning around the image.
-
WebSEEk: image and video search engine for the World Wide Web
-
VisualSEEk and SaFe: Spatial and Feature query system

