Demos

  1. The IBM Video Retrieval System
  2. Semantic Filtering using Automatic Concept Detection and Context Reinforcement
  3. Detection of Recurrent Audiovisual Patterns using Hierarchical Hidden Markov Models
  4. Video Query by Example using Audio-visual features and Dynamic Programming to Match Temporal Patterns
  5. Audio-visual Relevance Feedback using Dynamic Programming

 


 

The IBM Vide Retrieval System MARVEL Multimedia Video Retrieval System can be found at http://mp7.watson.ibm.com/

MPEG-7 video search engine

Abstract

This system supports query by example in low level feature space as well high level model-vector space, where the model vectors are generated for the TREC lexicon using semantic learning and multimedia analysis techniques

Details: The IBM Research TREC-2002 Video Retrieval System, NIST TREC 2002 notebook paper

Contact: John R. Smith

 


Semantic Filtering using Automatic Concept Detection and Context Reinforcement

Semantic Filtering Demo Using Concept Detection and Context Reinforcement: Naphade

Abstract:

Video query by semantic key-words is one of the most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and the high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we propose probabilistic multimedia objects (multijects). Examples of multijects in videos include "explosion", "mountain", "beach", "outdoor", "music" etc. Semantic concepts in videos interact and appear in context. To model this interaction explicitly, we propose a network of multijects (multinet). To model the multinet computationally, we propose a factor graph framework for,
which can enforce spatio-temporal constraints. Using probabilistic models for multijects and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships With this framework we show how keyword based query and semantic filtering is possible for a predetermined set of concepts.

 

Details: M. Naphade, I. Kozintsev and T. Huang, “A Factor Graph Framework for Semantic Video Indexing”, accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology


Detection of Recurrent Audiovisual Patterns using Hierarchical Hidden Markov Models

Detection of Recurrent Audiovisual Pattern: Explosion Detection of Recurrent Audio Pattern: Laughter

 

Abstract: 

Detection of recurrent temporal patterns in digital media is a first step in the next generation data mining for media content. Production videos such as news, sports and movies have a definitive structure that involves short term interaction as well as long term correlation. This structure in video can be captured by models that take into consideration the short term statistics as well as long term recurrence. We investigate the application of probabilistic models that capture this structure. The novel
approach is to characterize the short term events in video by models that can account for temporal support in terms of piece-wise stationary signals with transitions. These short term events can then be embedded within another temporal model that accounts for transitions between these event and thus
characterizes long term history. This also leads to the detection of recurring events in video using a monolithic model. The proposed approach is an unsupervised algorithm for event detection and it can be used for summarization, similarity based matching and enhanced browsing. With certain extensions similar algorithms can be used for data mining for temporal patterns in other domains such as bio-surveillance, where the multimodal sensor streams may be comprised of traditional and non-traditional data sources.

Details: M. Naphade, T. Huang "Discovering Recurrent Events in Multichannel Data Streams Using Unsupervised Methods", IEEE ICIP Rochester, NY 2002


 

Video Query by Example using Audio-visual features and Dynamic Programming to Match Temporal Patterns

Supporting Query by Example for Multimodal Temporal Patterns

 

Abstract:

A necessary capability for content-based retrieval is to support the paradigm of query by example.
In the past, there have been several attempts to use low-level features for video retrieval. Few of the approaches however use the multimedia information content of the video. We present an algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video retrieval.
The novel ability of our approach to use the information content in multiple media coupled with a strong emphasis on temporal similarity differentiates it from the state-of-the-art in content-based
retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm, which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to grouping of shots based on audio-visual similarity. This is much more effective in constructing scenes from shots than using only visual content to do the same.

Details: M. Naphade, R. Wang and T. Huang, Multimodal pattern matching for audio-visual query and retrieval”, Proc. SPIE, Storage and Retrieval for Media databases, M. Naphade et al, Volume 4315, pages 188-195, Jan 2001, San Jose, CA.


 

Supporting Audio-visual Relevance Feedback using Dynamic Programming 

Dynamic Programming and Relevance Feedback for Multimodal Temporal  Patterns

Abstract:

A necessary capability for content-based retrieval is to support the paradigm of query by example. Most systems for video retrieval support queries using image sequences only. We present an
algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video retrieval. The novel ability of our approach to use the information content in multiple media coupled
with a strong emphasis on temporal similarity differentiates it from the state-of-the-art in content-based retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm,
which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to grouping of shots based on audio-visual similarity. We also support relevance feedback. The user can provide feedback to the system, by choosing clips, which are closer to the user's desired target. The system then automatically adjusts the relative weights or relevance of the media and fetches different sets of target clips accordingly. It is our observation that a few iterations of such feedback are generally sufficient, for retrieving the desired video clips.

Details: M. Naphade, R. Wang and T. Huang,  “Supporting Audio-visual Query using dynamic programming”, ACM Multimedia, Oct 2001.