Semantics-based Multimodal Input Interpretation
Smart Visual Analytics
Semantics-based Multimodal Input Interpretation
Members
- Keith Houck, Shimei Pan, Peter Kissa and Michelle Zhou.
Overview
It is highly desirable to have a robust and accurate input interpretation engine that can understand diverse user expressions in context. Our current multimodal input interpretation framework is called TAICHI. It allows users to specify their diverse information needs using multiple input modalities such as natural language, visual queries and deictic gestures. It also dynamically recommends queries in context to help users recover from interpretation errors and accomplish their tasks more effectively.
Research areas
We employ three complementary strategies to enable robust and accurate interpretation of diverse user requests in context. First, we use a context-driven approach for natural language interpretation. Second, we employ a two-way adaptation-based framework to help users to adapt to a system’s interpretation capability through automated query recommendation [Pan-IJCAI05] and also let the system gradually expand its interpretation capability through self-adaptation. Third, we leverage the strength of multiple modalities to achieve robust and effective interpretation.
- Context-driven natural language interpretation
Currently, TAICHI focuses on user requests to databases. As we have observed (e.g., from our WOZ study), while these requests exhibit substantial syntactic variations, they share a common semantic structure. Based on this observation, we use a set of semantic constructs to model a user request. Specifically, a user request includes two top-level constructs: intention and attention. Intention encodes the user information seeking task (e.g., data access or comparison). Attention captures the data target of the intention, made up of lower-level constructs, such as data concepts/attributes to be retrieved, and a set of constraints that the retrieved data must satisfy. It also includes derived meta features that characterize the overall properties of a request. Such meta features are used to tailor TAICHI responses to the query context. To interpret an input, TAICHI first identifies various semantic constructs using a lexicon that is largely derived automatically from the databases. TAICHI then resolves references and semantic ambiguities by uniformly modeling contextual cues as a set of constraints, including conversation history and data semantics. As a result, TAICHI can handle a wide range of user expressions regardless their syntactic forms, ranging from keywords (e.g., "colonials 3+ bedrooms") to full English sentences, all in context. Such flexibility is much appreciated in a practical application, where TAICHI must accommodate various user linguistic styles, and tolerate imperfect user inputs (e.g., abbreviated and ungrammatical expressions). Moreover, our approach helps to minimize the effort for supporting new domains, since it does not require a large training corpus or a large set of syntactic rules.
- Two-way adaptive query interpretation
Despite our effort described above to help achieve more accurate and robust interpretation, TAICHI’s interpretation capability may still be insufficient for real-world applications. Instead of directly improving TAICHI interpretation capability in a conventional way, we build a two-way adaptation engine that allows both users and TAICHI to dynamically adapt to each other’s expressions in the course of interaction [Pan-IUI05]. Consequently, the adaptation enhances the usability of TAICHI by turning a novice user into a power user, who can work effectively within TAICHI’s capability. Moreover, TAICHI improves its interpretation capability through self-adaptation, minimizing the overall effort of developing an effective interaction system.
- Integrating visual and natural language queries
Besides combining natural language inputs and deictic gestures as in other systems, we have explored the usage of visual queries to complement natural language inputs for two reasons. First, it is easier for users to use visual queries to express certain data requests. Second, visual queries are explicit and thus can be interpreted by TAICHI robustly. To take advantage of the strength of both query interfaces while overcoming their deficiencies, we developed a set of integration techniques that seamlessly blend the use of visual query and natural language interfaces. Some of these techniques improve the performance of TAICHI by allowing users to flexibly combine the use of the two interfaces. Other techniques focus on facilitating a context-preserving integration, where users can effectively employ the two interfaces to support their context-sensitive, information-seeking tasks.
Publications
- Shimei Pan and James Shaw. Natural Language Query Recommendation in Conversation Systems. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2007.
- Joyce Chai, Shimei Pan and Michelle X. Zhou. MIND: A Context-based Multimodal Interpretation Framework in Conversation Systems. Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, J. Kuppervelt, L. Dybkjaer and N. Bernsen (eds). Kluwer. 2005.
- Shimei Pan, Siwei Shen, Michelle X. Zhou and Keith Houck. Two-Way Adaptation for Robust Input Interpretation for Practical Multimodal Interaction. Proceedings of ACM Conference on Intelligent User Interfaces (IUI), pages 25-32, 2005.
- Joyce Chai, Pengyu Hong and Michelle X. Zhou. A Probabilistic Approach to Reference Resolution in Multimodal Interfaces. Proceedings of ACM Conference on Intelligent User Interfaces (IUI), pages 70-77, 2004.
- Joyce Chai, Pengyu Hong, Michelle X. Zhou and Zahar Prasov. Optimization in Multimodal Interpretation. Proceedings of Association of Computational Linguistics (ACL), pages 1-8, 2004.
- Keith Houck. Contextual Revision in Information Seeking Conversation Systems. Proceedings of International Conference on Spoken Language Processing (ICSLP), 2004.
- Joyce Chai. Semantics-based Representation for Multimodal Interpretation in Conversational Systems. Proceedings of International Conference on Computational Linguistics (COLING), 2002.
- Joyce Chai. Operations for Context-based Multimodal Interpretation. Proceedings of International Conference on Spoken Language Processing (ICSLP), 2002.
- Joyce Chai, Shimei Pan and Michelle X. Zhou. MIND: A Semantics-based Multimodal Interpretation Framework for Conversation Systems. Proceedings of International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialog Systems , 2002.
- Joyce Chai, Shimei Pan, Michelle X. Zhou and Keith Houck. Context-based Multimodal Input Understanding in Conversational Systems. Proceedings of IEEE International Conference on Multimodal Interfaces (ICMI), pages 87-92, 2002.
back to top
Smart Visual Analytics
Members
- Zhen Wen, Michelle X Zhou.
Overview
In a highly dynamic interactive visual analytic system as we support, it is difficult to predict how the interaction would unfold. It is thus impractical to plan in advance the content and forms of all possible visual responses. Moreover, designing quality visualization that is tailored to users’ context requires visualization skills and significant effort. However, users of visual analytic systems are usually not visualization experts.
![]() |
Figure: Smart visualization pipeline – Improvise |
- Content selection that dynamically decides the proper response content (e.g., what a sub-set of data attributes to present).
- Example-based visualization sketch design that decides the proper visual metaphor for the given content.
- Data transformation that dynamically transforms raw input data to ensure visualization quality, because raw data are often not in the “pristine” form that is ready to be viewed. For example, the raw input data may be noisy so that the corresponding visualization can be distorted.
- Visualization realizer that instantiate the selected visualization sketch using the transformed data.
- Visual context management that dynamically incorporates new information into existing visualization to assist users to integrate information across successive displays.
IMPROVISE offers two unique advantages for dynamic interactive visual analytic systems:
- IMPROVISE provides both an end-to-end framework as well reusable components for assisting users to generate effective visualization in a wide variety of visualization scenarios.
- IMPROVISE helps to ensure visualization in diverse situations and is highly extensible by leveraging machine learning techniques and feature-based modeling.
Examples
Content selection example
|
|
Figure: Content selection example |
For a user request “Find houses under $1M in Chappaqua”, the content selection component selects relevant house attributes based on contextual factors (e.g., user profiles). For example, for a user interested in house cost, exterior and interior properties, the system shows the more relevant house attributes such as house price as well as siding. In contrast, for a user who cares more about house size and amenities, the system would select house acreage, heating and fuel type.
- Sketch Design example
|
|
Figure: Sketch design example |
For a user request “Show houses under $600K in Cortland”, the system will retrieve a data table containing geographical information like house locations as well as house attributes. Based on the data properties, the system selects suitable visualization forms from a visualization corpus. In response to this particular query, the system finds three possible visualizations including 3D map, 2D map and bar chart. A user can then interact with the suggested visualization sketches to generate a visualization tailored to his context (e.g., a 2D map to show geographical information as well as a bar chart to compare information).
- Dynamic data transformation example
|
|
Figure: Dynamic data transformation example |
For a user request “Map houses in Westchester”, the map produced with the raw input data may be distorted because of house with wrong locations. Because users usually deal with large amount dynamic data in visual analytic tasks, they may not be able to afford manual data cleaning. In IMPROVISE, the dynamic data transformation component ties to derive a set of appropriate data transformations based on data properties. As a result, users can work with visualization of better quality.
- Visual context management example
|
|
Figure: Visual context management example |
In response to a user query, the system shows a set of houses in multiple towns in existing display (St). Then the user requests for more information about one of houses. To help the user easily comprehend the requested information in the context of all the houses, the system not only shows the requested new information but also keeps other visual information as context. Moreover, to avoid cluttered display and distracting users from the focus object, the system also simplifies the visual objects that are not in focus. Here is a video with more examples: link.
Publications
- Zhen Wen and Michelle Zhou. An Optimization-based Approach to Dynamic Data Transformation for Smart Visualization. To appear in Proceedings of the International Conference on Intelligent User Interfaces (IUI ’08), 2008.
- Zhen Wen, Michelle Zhou and Vikram Aggarwal. Context-Aware, Adaptive Information Retrieval for Investigative Tasks. In Proceedings of the International Conference on Intelligent User Interfaces (IUI ’07), pages 122-131, 2007.
- Zhen Wen, Michelle X. Zhou and Vikram Aggarwal. An Optimization-based Approach to Dynamic Visual Context Management. Proceedings of IEEE Symposium on Information Visualization (InfoVis), pages 187-194, 2005.
- Michelle X. Zhou, Zhen Wen and Vikram Aggarwal.A Graph-Matching Approach to Dynamic Media Allocation in Intelligent Multimedia Interfaces. Proceedings of ACM Conference on Intelligent User Interfaces (IUI), pages 114-121, 2005. Best paper award.
- Michelle X. Zhou and Vikram Aggarwal. An Optimization-based Approach to Dynamic Data Content Selection in Intelligent Multimedia Interfaces. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST), pages 227-236, 2004.
- Michelle X. Zhou and Min Chen. Automated Generation of Graphical Sketches by Example. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pages 65-74, 2003.
- Michelle X. Zhou, Min Chen and Ying Feng. Building a Visual Database for Example-based Graphics Generation. Proceedings of IEEE Symposium on Information Visualization (InfoVis), pages 23-30, 2002.
- Michelle X. Zhou and Sheng Ma. Representing and Retrieving Visual Presentations for Example-based Graphics Generation. Proceedings of Smart Graphics, pages 87-94, 2001.
- Michelle X. Zhou. Visual Planning: A Practical Approach to Automated Visual Presentation. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pages 634-641, 1999.





