- The IBM Video Retrieval System
- Semantic Filtering using Automatic Concept Detection and Context Reinforcement
- Detection of Recurrent Audiovisual Patterns using Hierarchical Hidden Markov Models
- Video Query by Example using Audio-visual features and Dynamic Programming to Match Temporal Patterns
- Audio-visual Relevance Feedback using Dynamic Programming
The IBM Vide Retrieval System MARVEL Multimedia Video Retrieval System can be found at http://mp7.watson.ibm.com/.
Abstract
This system supports query by example in low level feature space as well high level model-vector space, where the model vectors are generated for the TREC lexicon using semantic learning and multimedia analysis techniques
Details: The IBM Research TREC-2002 Video Retrieval System, NIST TREC 2002 notebook paper
Contact: John R. Smith
Semantic Filtering using Automatic Concept Detection and Context Reinforcement
Abstract:
Video query by semantic key-words is one of the most challenging research issues in video data management. To go beyond low-level
similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and the
high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern
recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we
propose probabilistic multimedia objects (multijects). Examples of multijects in
videos include "explosion", "mountain",
"beach", "outdoor", "music" etc. Semantic concepts in videos interact and
appear in context. To model this interaction explicitly, we propose a network of multijects
(multinet). To model the multinet computationally, we propose a factor graph framework for,
which can enforce spatio-temporal constraints. Using probabilistic models for
multijects and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We
demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships
With this framework we show how keyword based query and semantic filtering
is possible for a predetermined set of concepts.
Details: M. Naphade, I. Kozintsev and T. Huang, “A Factor Graph Framework for Semantic Video Indexing”, accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology
Detection of Recurrent Audiovisual Patterns using Hierarchical Hidden Markov Models
![]() |
![]() |
Abstract:
Detection of recurrent temporal patterns in digital media is a first step in the next generation data mining for media content.
Production videos such as news, sports and movies have a definitive structure that involves short term interaction as well
as long term correlation. This structure in video can be captured by models that take into consideration the short term statistics
as well as long term recurrence. We investigate the application of probabilistic models that capture this structure. The novel
approach is to characterize the short term events in video by models that can account for temporal support in terms of
piece-wise stationary signals with transitions. These short term events can then be embedded within another temporal model that
accounts for transitions between these event and thus
characterizes long term history. This also leads to the detection of recurring events in video using a monolithic model. The
proposed approach is an unsupervised algorithm for event detection and it can be used for summarization, similarity based matching
and enhanced browsing. With certain extensions similar algorithms can be used for data mining for temporal patterns in other domains
such as bio-surveillance, where the multimodal sensor streams may be comprised of traditional and non-traditional data sources.
Details: M. Naphade, T. Huang "Discovering Recurrent Events in Multichannel Data Streams Using Unsupervised Methods", IEEE ICIP Rochester, NY 2002
Video Query by Example using Audio-visual features and Dynamic Programming to Match Temporal Patterns
Abstract:
A necessary capability for content-based retrieval is to support the
paradigm of query by example.
In the past, there have been several attempts to use low-level features for video retrieval.
Few of the approaches however use the multimedia information content of the video.
We present an algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video retrieval.
The novel ability of our approach to use the information content in multiple media coupled with a strong emphasis on temporal similarity
differentiates it from the state-of-the-art in content-based
retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm, which leads to a significant improvement in performance.
Coupling the use of audio with video this algorithm can be applied to grouping of shots based
on audio-visual similarity. This is much more effective in constructing scenes from shots than
using only visual content to do the same.
Details: M. Naphade, R. Wang and T. Huang, “Multimodal pattern matching for audio-visual query and retrieval”, Proc. SPIE, Storage and Retrieval for Media databases, M. Naphade et al, Volume 4315, pages 188-195, Jan 2001, San Jose, CA.
Supporting Audio-visual Relevance Feedback using Dynamic Programming
Abstract:
A necessary capability for content-based retrieval is to support
the paradigm of query by example. Most systems for video retrieval support queries using image sequences only. We present an
algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video retrieval. The novel ability of our
approach to use the information content in multiple media coupled
with a strong emphasis on temporal similarity differentiates it from the state-of-the-art in content-based retrieval. At the core
of the pattern matching scheme is a dynamic programming algorithm,
which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to
grouping of shots based on audio-visual similarity. We also support relevance feedback. The user can provide feedback to the
system, by choosing clips, which are closer to the user's desired target. The system then automatically adjusts the relative weights
or relevance of the media and fetches different sets of target clips accordingly. It is our observation that a few iterations of
such feedback are generally sufficient, for retrieving the desired video clips.
Details: M. Naphade, R. Wang and T. Huang, “Supporting Audio-visual Query using dynamic programming”, ACM Multimedia, Oct 2001.






