Focused Areas


Semantics-based Multimodal Input Interpretation


Smart Visual Analytics






Semantics-based Multimodal Input Interpretation



Members



  • Keith Houck, Shimei Pan, Peter Kissa and Michelle Zhou.



Overview



It is highly desirable to have a robust and accurate input interpretation engine that can understand diverse user expressions in context. Our current multimodal input interpretation framework is called TAICHI. It allows users to specify their diverse information needs using multiple input modalities such as natural language, visual queries and deictic gestures. It also dynamically recommends queries in context to help users recover from interpretation errors and accomplish their tasks more effectively.

Research areas


We employ three complementary strategies to enable robust and accurate interpretation of diverse user requests in context. First, we use a context-driven approach for natural language interpretation. Second, we employ a two-way adaptation-based framework to help users to adapt to a system’s interpretation capability through automated query recommendation [Pan-IJCAI05] and also let the system gradually expand its interpretation capability through self-adaptation. Third, we leverage the strength of multiple modalities to achieve robust and effective interpretation.

  • Context-driven natural language interpretation


Currently, TAICHI focuses on user requests to databases. As we have observed (e.g., from our WOZ study), while these requests exhibit substantial syntactic variations, they share a common semantic structure. Based on this observation, we use a set of semantic constructs to model a user request. Specifically, a user request includes two top-level constructs: intention and attention. Intention encodes the user information seeking task (e.g., data access or comparison). Attention captures the data target of the intention, made up of lower-level constructs, such as data concepts/attributes to be retrieved, and a set of constraints that the retrieved data must satisfy. It also includes derived meta features that characterize the overall properties of a request. Such meta features are used to tailor TAICHI responses to the query context. To interpret an input, TAICHI first identifies various semantic constructs using a lexicon that is largely derived automatically from the databases. TAICHI then resolves references and semantic ambiguities by uniformly modeling contextual cues as a set of constraints, including conversation history and data semantics. As a result, TAICHI can handle a wide range of user expressions regardless their syntactic forms, ranging from keywords (e.g., "colonials 3+ bedrooms") to full English sentences, all in context. Such flexibility is much appreciated in a practical application, where TAICHI must accommodate various user linguistic styles, and tolerate imperfect user inputs (e.g., abbreviated and ungrammatical expressions). Moreover, our approach helps to minimize the effort for supporting new domains, since it does not require a large training corpus or a large set of syntactic rules.

  • Two-way adaptive query interpretation


Despite our effort described above to help achieve more accurate and robust interpretation, TAICHI’s interpretation capability may still be insufficient for real-world applications. Instead of directly improving TAICHI interpretation capability in a conventional way, we build a two-way adaptation engine that allows both users and TAICHI to dynamically adapt to each other’s expressions in the course of interaction [Pan-IUI05]. Consequently, the adaptation enhances the usability of TAICHI by turning a novice user into a power user, who can work effectively within TAICHI’s capability. Moreover, TAICHI improves its interpretation capability through self-adaptation, minimizing the overall effort of developing an effective interaction system.

  • Integrating visual and natural language queries


Besides combining natural language inputs and deictic gestures as in other systems, we have explored the usage of visual queries to complement natural language inputs for two reasons. First, it is easier for users to use visual queries to express certain data requests. Second, visual queries are explicit and thus can be interpreted by TAICHI robustly. To take advantage of the strength of both query interfaces while overcoming their deficiencies, we developed a set of integration techniques that seamlessly blend the use of visual query and natural language interfaces. Some of these techniques improve the performance of TAICHI by allowing users to flexibly combine the use of the two interfaces. Other techniques focus on facilitating a context-preserving integration, where users can effectively employ the two interfaces to support their context-sensitive, information-seeking tasks.




Publications




back to top




Smart Visual Analytics



Members


  • Zhen Wen, Michelle X Zhou.


Overview



In a highly dynamic interactive visual analytic system as we support, it is difficult to predict how the interaction would unfold. It is thus impractical to plan in advance the content and forms of all possible visual responses. Moreover, designing quality visualization that is tailored to users’ context requires visualization skills and significant effort. However, users of visual analytic systems are usually not visualization experts.

Smart Visualization Pipeline

Figure: Smart visualization pipeline – Improvise

To tailor system visual responses to a user interaction context, we have developed a smart visualization framework, called IMPROVISE. IMPROVISE consists of five key components in supporting the creation of a tailored visualization:
  1. Content selection that dynamically decides the proper response content (e.g., what a sub-set of data attributes to present).
  2. Example-based visualization sketch design that decides the proper visual metaphor for the given content.
  3. Data transformation that dynamically transforms raw input data to ensure visualization quality, because raw data are often not in the “pristine” form that is ready to be viewed. For example, the raw input data may be noisy so that the corresponding visualization can be distorted.
  4. Visualization realizer that instantiate the selected visualization sketch using the transformed data.
  5. Visual context management that dynamically incorporates new information into existing visualization to assist users to integrate information across successive displays.

IMPROVISE offers two unique advantages for dynamic interactive visual analytic systems:
  1. IMPROVISE provides both an end-to-end framework as well reusable components for assisting users to generate effective visualization in a wide variety of visualization scenarios.
  2. IMPROVISE helps to ensure visualization in diverse situations and is highly extensible by leveraging machine learning techniques and feature-based modeling.


Examples


  • Content selection example



Content Selection Example

Figure: Content selection example


For a user request “Find houses under $1M in Chappaqua”, the content selection component selects relevant house attributes based on contextual factors (e.g., user profiles). For example, for a user interested in house cost, exterior and interior properties, the system shows the more relevant house attributes such as house price as well as siding. In contrast, for a user who cares more about house size and amenities, the system would select house acreage, heating and fuel type.

  • Sketch Design example



Sketch design example

Figure: Sketch design example


For a user request “Show houses under $600K in Cortland”, the system will retrieve a data table containing geographical information like house locations as well as house attributes. Based on the data properties, the system selects suitable visualization forms from a visualization corpus. In response to this particular query, the system finds three possible visualizations including 3D map, 2D map and bar chart. A user can then interact with the suggested visualization sketches to generate a visualization tailored to his context (e.g., a 2D map to show geographical information as well as a bar chart to compare information).

  • Dynamic data transformation example



Dynamic data transformation example

Figure: Dynamic data transformation example


For a user request “Map houses in Westchester”, the map produced with the raw input data may be distorted because of house with wrong locations. Because users usually deal with large amount dynamic data in visual analytic tasks, they may not be able to afford manual data cleaning. In IMPROVISE, the dynamic data transformation component ties to derive a set of appropriate data transformations based on data properties. As a result, users can work with visualization of better quality.

  • Visual context management example



Visual context management example

Figure: Visual context management example


In response to a user query, the system shows a set of houses in multiple towns in existing display (St). Then the user requests for more information about one of houses. To help the user easily comprehend the requested information in the context of all the houses, the system not only shows the requested new information but also keeps other visual information as context. Moreover, to avoid cluttered display and distracting users from the focus object, the system also simplifies the visual objects that are not in focus. Here is a video with more examples: link.




Publications




back to top