Efficient Video Annotation (EVA) system

Innovation Matters


The explosion of digital media content is driving the need for more effective solutions for managing and searching large repositories of images, video and audio. Media enterprises engage in costly and time-consuming processes of manually indexing content, which typically produces inconsistent and inadequate results. At the same time, there is increasing user expectation that content will be searchable. Clearly, today’s manual annotation processes cannot satisfy user demands. To address these problems, digital media indexing needs to be automated to unlock the value of the repositories. To address this problem, IBM Research is developing a novel solution called Marvel.

The Marvel multimedia analysis and retrieval system use statistical machine learning techniques to build models of semantic concepts in image and video content, including events, objects, people, places, scenes, and topics. Marvel uses the models to automatically index new content. This allows users to search without requiring the repositories to be manually indexed. An essential first step in Marvel is the manual annotation of a representative subset of content for training purposes. This annotated content is then processed using statistical machine learning techniques to build models used by the indexing system.

EVA
IBM’s Efficient Video Annotation (EVA) System allows users to rapidly index the semantic concepts in large video datasets by clicking on positive and negative examples

IBM’s Efficient Video Annotation (EVA) System, which is being developed by the Intelligent Information Management Department, is a novel Web tool designed to support distributed collaborative indexing of semantic concepts in large image and video collections. The EVA Web-based user-interface is easy to use, requiring annotators to simply click on positive and negative examples of semantic concepts in the content shown on the users’ screen (see Figure above). The interface allows users to use either mouse or keyboard for rapidly labeling the image and video content. IBM’s EVA system additionally includes functions allowing user to set entire pages to positive or negative, which can greatly speed up indexing of both rare and very popular concepts. The annotators can also customize the EVA screen layout and optionally work on one or more semantic concepts at a time as they go through a large image or video data set.

The EVA system was recently used in the TRECVID Annotation Forum, which developed a large corpus of training and evaluation data for the annual TREC Video Evaluation Benchmark. The EVA system was used by more than 100 participants in the TRECVID Annotation Forum to label 39 semantic concepts in 80 hours of video. This annotated dataset was then made available to participating institutions to use for developmental purposes in creating systems for automatically indexing new video content. This provided a common ground-truth for systems for the TRECVID high-level feature task, which involved the detection of 10 of these semantic concepts in a new large video data set. The remainder of the concepts, along with the 10 from the high-level feature task, were made available for use in the TRECVID search task for answering the benchmark query topics.


EVA Zoom Window
IBM’s EVA System allows users zoom-in on each key-frame image as they rapidly navigate through the video data set

Besides producing high-quality image and video training data, an important goal of the EVA project is to investigate research issues related to image understanding, image indexing subjectivity and human-machine interaction. The use of the EVA System for TRECVID annotation effort is providing tremendous opportunities for studying these issues.

The design goals for the EVA system were primarily to allow rapid indexing of semantic concepts by end-users, provide basic administrative functions allowing creation of user accounts and assignment of user workloads, and allow full metering of the collaborative annotation process. The administration functions of the EVA system allow creation of user-groups and accounts and the dynamic allocation of workload. Furthermore, the EVA system collects user data during the annotation process, including time spent on each page, number and size of thumbnails, and statistics about the usage of keyboard and mouse. Metering the annotation process has provided valuable feedback for not only improving the EVA system but also for improving quality of annotations produced in the first large annotation effort for TRECVID.

The performance of semantic concept modeling and retrieval systems, such as Marvel, greatly depends on the quality of the training data. As a result, false positive and false negative errors in indexing during training can adversely impact performance. However, arriving at high-quality annotations for large image and video collections is a daunting task— it is inherently time consuming, subjective and error prone. Ideally, redundant annotations should be obtained when possible to resolve mistakes and problems in subjectivity. The added overhead for redundant annotations must be consider as a trade-off, though, since it requires greater overall human effort and can slow down the overall indexing process. The EVA system allows great flexibility in configuring the redundancy factor individually for each semantic concept, such as to tune to the popularity of each concept. This can have great impact on the overall performance of modeling and detecting the semantic concepts in new video content.


EVA screen shot 2
IBM’s EVA System provides administrator views that report the progress of users in completing an annotation workload in terms of number of concepts and amount of video data completed



Related Publications  

Timo Volkmer, John R. Smith, Apostol (Paul) Natsev, Murray Campbell and Milind Naphade. A Web-based System for Collaborative Annotation of Large Image and Video Collections. In Proceedings of the 13th annual ACM international conference on Multimedia. November 2005.


Rate this article

Innovator's corner  

Timo VolkmerTimo Volkmer Researcher

What is the most exciting potential future use for the work you're doing?
The EVA system will have a critical role in capturing semantics of different domains of image and video content, e.g., news, sports, medical, entertainment, etc., by allowing users to rapidly annotate representative content. The annotated data will be used to build models of semantic concepts and improve capabilities for digital media search. This is important in bridging the widening gap resulting from the explosion of digital media content and inability of today’s manual annotation processes to efficiently and effectively index the content.


What is the most interesting part of your research?
Image and video search is an extremely challenging area of research that requires knowledge of many topics including signal processing, statistical modeling, knowledge representation, information retrieval, and user interfaces. At the same time more effective systems for managing digital media content are required in many domains, including media enterprises, home users, government, traditional enterprises and Web. Breakthroughs in our research can have tremendous impact in the marketplace and towards improving people’s lives.


What inspired you to go into this field?
There is a universal appeal in interacting with image and video content. A picture truly is worth a thousand words and every video tells a thousand stories. Helping computers to understand the semantic meaning of images and video is a challenging and fascinating problem.

What is your favorite invention of all time?
The transistor. It has had an inarguably positive impact on society. Consequences of other great inventions such as the automobile or nuclear power have not been so clearly positive. Modern electronics has improved our lives in so many ways. Consider how hopeless we would stand in the fight against diseases without the electronic microscope

Timo, is a Ph.D. student at the School of Computer Science and Information Technology at RMIT University in Melbourne, Australia. He has been working at IBM Hawthorne as a Technical Co-op in the Intelligent Information Management Department from January to September 2005.

Related Research  

Disciplines: Computer Science , Electrical Engineering
Research Areas: Multimedia