Multimodal Video Search

The multimodal, Soft-Boolean video search system was initiated and jointly developed in the CueVideo project and has later been redesigned and expanded in the Multimedia Mining Adventurous Research (MMAR) project. It has been evaluated in the Manual Search and Interactive Search tasks of the NIST TRECVID evaluation, years 2001-04.

CueVideo multimodal video search
expand

The search system uses multiple indices: speech (combined speech recognition word search, phonetic search and closed-captions when available), text from video frames (OCR), fifty visual concepts including face, car, person, sky, crowd, sport, ... and content-based query-by-image index. All the indexes are computed automatically without human intervention using image processing and machine learning techniques, mostly developed in the above two projects.

Soft-Boolean search uses Boolean-like queries, entered via a single text field. Unlike standard, "strict" Boolean search, Soft-Boolean search does not require exact binary satisfaction of the query. The Soft-Boolean search engine perfoms the multimodal search and produces a ranked list of relevant video segments. The videos' on-line, query-based segmentation into relevant segments and their scores are based on how well each segment satisfies the boolean formula, combined with inter- and intra-document term frequencies in the different modalities, time proximity of the corresponding video labels and the labels' confidence (provided, e.g., from the speech recognition engine). This entire process takes only a few seconds.

Mutual relevance feedback is used to identify new relevant search terms. It helps the user to refine and expand the query. A well-refined query is a way to describe the user's information need to the search system. The query may be saved for future use, e.g., on new video data when arrived.

The system provides efficient video browsing tools, including video streaming with offest playback, storyboards, and QBIC-like keyframes search. During a search session, the user composes and refines the query, browse the results and mark correct and incorrect shots. The media streaming player is augmented with synchronized switching between multiple playback views, such as video and slide shows with speedup audio. The user can email or bookmark any point in any video for future reference.

Download and play screen captures of two search examples : A search for missiles (25Mb, Quicktime mp4 file)
A search for basketball shots (30Mb, QuickTime mp4 file)

For more information please contact Arnon Amir.