 |

Getting the answers
By Pamela Kramer
| |
Like a lot of people, Eric Brown wishes there were a better way to get the most out of meetings. The IBM researcher finds it cumbersome to take notes, and he gets annoyed when the flow of information is interrupted by the need to find someone—anyone—with the answers to questions that arise along the way. Perhaps most irksome to Brown and legions of other meeting-goers is the time they often waste on problems that have already been solved—unbeknownst to the people in the meeting.
When Brown learned about an internal IBM contest for project proposals in knowledge management, he started thinking about ways to harness the extensive resources at any company's disposal to make meetings more efficient. What started out as an innovative project to create real-time transcripts has grown into a prototype code-named Meeting Miner. The system, which resides in Brown's ThinkPadŽ, draws on and advances research in text analysis, natural language processing and information retrieval and extraction—all with the intent of supporting meetings while they're going on.
Connected to a set of knowledge resources selected by the user, Meeting Miner answers questions as they arise. The device benefits from the development of new, fast algorithms that allow online text processing to happen as close to real time as today's hardware allows.
On a 50-inch plasma screen displaying the instant transcript, it provides hyperlinks to pertinent information—anything from medical advances to patent documentation, depending on the discussion. Meeting Miner spontaneously offers additional information on names that come up—people, places, organizations or technical terms in a particular domain. It also keeps track of topics covered for quick summaries.
Audio gathered by microphones in a meeting room is processed through IBM's ViaVoice™ speech recognition system. As Meeting Miner generates the transcript, it moves the data through a series of software analyzers that can simultaneously and continuously pick through the text for questions to answer, names or technical terms to research and topics to track.
Purposeful prototype
"Meeting Miner is a great sandbox for the technology in this area, a great place to play around and see what happens," says James Allan, an assistant professor of computer science at the University of Massachusetts.
"They're trying to bring a lot of these technologies together and see how they interact with each other," says Allan, an expert in information retrieval. "These are all very active research areas. Very few of them are solved, but some of them are good enough that they can be used."
Researchers envision such future possibilities as a personal assistant who whispers pertinent information about people and situations you encounter throughout the day-the kinds of things you always wish you could remember, from an associate's latest professional endeavor to a spouse's name.
But for now, researchers are still in the sandbox, trying to figure out how to make the technologies play together in a prototype. They haven't yet determined the best way for Meeting Miner to interface with people in the meeting. For instance, the prototype now searches for answers to all questions asked, which can create some confusion if you have a colleague who punctuates his speech with rhetorical questions. The system can communicate with people through a combination of text windows displayed on the plasma screen, voice synthesis and notification chimes.
Many of the challenges encountered in the Meeting Miner project hinge on speech recognition technology. The font of all information that flows through Meeting Miner, its error rates can make speech recognition troublesome. But despite that difficulty, the technology is advanced enough now for exploration of other leading-edge components.
"We want to put ourselves in the position that when we do cross the speech recognition threshold, we understand what we want to do with it," says Eric Brown, the Research staff member leading the Meeting Miner project.
What's in a name?
One of Meeting Miner's analyzers relies on text clues like capitalization and punctuation, and on checks of dictionaries or other text sources to identify named entities—people, geographic locations, organizations or technical terms. Those names are sent to a search engine that retrieves information from a predetermined body of text documents.
Plumbing a corporate database, this feature could instantly yield contact and professional background information about meeting participants, or about employees whose names come up during the meeting. As currently configured, the identifier's search doesn't kick in unless someone clicks on the name, which is highlighted in the transcript.
Is that your final answer?
Even as that analyzer trawls for names, another one looks in the transcript for questions. When it finds words like "who, what, where, when, why" and "how," it sends the sections of text that follow to a sophisticated question-answering system.
The relatively new field of question-answering research combines information retrieval and natural language processing. An early pioneer in the area of information retrieval, IBM is also a research leader in the younger discipline of question-answering.
"Regular search engines process masses of text, but don't attempt to understand the material," says John Prager, a Research staff member in the knowledge structures group. "Natural language processing, on the other hand, tends to try to understand the structure and the semantics of these (documents)—but it's computationally very expensive. You need to be able to do both to do question answering."
For example, consider the question, "When was George W. Bush elected president?" A regular search engine typically throws out common words like "when" or "was," and does a "bag-of-words" search for documents with the highest occurrence of the other words in the query. But in a newspaper archive, for instance, many documents about George W. Bush's presidential election may not even include the date.
The question-answering system, on the other hand, is programmed to recognize question words as clues to specific answers. In this case, "when" tells it to look for a date. A complex, automated indexing method records information not only as words, but also as semantic categories describing the words; in this example, 2000 is indexed as a word and as a date.
The system knows that it's looking for a date, and the other words in the query will lead it to a date related to George W. Bush, an election (as opposed to his birth date), and specifically a presidential election (as opposed to his gubernatorial wins in Texas).
Some question-answering systems require that people manually create a knowledge base by anticipating questions that will be asked. But Meeting Miner's question-answering system works with algorithms that allow it to automatically index the body of documents available for searching, giving it great flexibility to search anything from corporate memos to corporate Web pages.
In the prototype, Meeting Miner's answers can appear after their related questions in a window on the plasma screen, with the option of viewing the documents from which the answers were drawn. The answers also can be delivered via voice synthesis, although that approach could be disruptive. The full analysis yielding the best answer takes anywhere from a few seconds to a half-minute, and offers the option to view the full source of the information. And at this point, Meeting Miner—like people, at times—has trouble discerning between factual and rhetorical questions. If it can't find a clear-cut answer, the system says it doesn't know, but offers its two best guesses.
Staying on track
A third analyzer allows Meeting Miner to track the topics covered in a meeting. This enables the program to provide additional information about these topics, or create summaries of the entire meeting. The system analyzes the transcript, dividing its content into topic categories that have been defined in advance. A technology company's brainstorming meetings would have a different set of categories, for instance, than the strategy meetings of an insurance company. The system can then offer basic summaries by topic, or it can search for additional information from the knowledge sources to which it is linked.
Brown says he hasn't yet found the best use for the topic tracker in meetings. But it's a natural for another IBM endeavor that shares core technologies with Meeting Miner—a data broadcasting project that automatically categorizes topics on television news shows so that viewers can access additional information even as they're watching.
"The automatic translation of speech is one thing," says Anni Coden, an IBM Research staff member who works on the data broadcasting project and co-developed some of the core technologies it shares with Meeting Miner. "(But) we are trying to figure out the concepts that people are talking about, versus just the words. That's what categorization achieves for you."
The ears have it (sometimes)
The engine of Meeting Miner's "train of thought" is speech recognition, an area in which IBM has pioneered many fundamental techniques used in most of today's state-of-the-art systems. But for meetings, it's the little engine that sometimes could, with error rates of 40 to 50 percent. That compares with rates of 20 to 30 percent for a conversation between just two people, and rates of five to ten percent in one individual's more controlled, carefully enunciated dictation.
Recognizing speech in a meeting poses a challenge not only because of questionable audio quality and the confusion of people talking over each other, but because speakers slur their words relentlessly.
"You'd be surprised at how many speech sounds you think you heard, but aren't there," says Michael Picheny, manager of speech and language algorithms in the human language technologies department at IBM Research. "Because we reconstruct the sentence in our heads, we think people are saying all these sounds. But they're not."
The system can be programmed to recognize some of the more common slurs: "gonna" instead of "going to;" "gimme" versus "give me" or "wanna" rather than "want to," for instance. More broadly, researchers believe we throw away sounds predictably.
"But we just don't understand all the rules at this point about in what context these deletions tend to occur," Picheny says. "There are hundreds of thousands of combinations that you have to learn."
Another problem is that speech patterns in meetings are very different than those found in the printed documents used to instruct recognition systems about which words are most likely to follow other words. "There are sequences of words that people say to each other that you would never see in printed text," Picheny says. "There isn't that much (modeling) data for transcribed meetings."
Even as researchers work through these and other problems, they consider the possibilities not yet mined by Meeting Miner. For instance, speaker identification could not only add the speaker's name to the transcript, but could enable a new advance in which the system adapts itself to increase accuracy as a particular individual speaks.
As the technologies develop, the diversity of their potential applications is clear—from Meeting Miner's boardroom, to the living room addressed by the data broadcasting project. Customer help desks could also use the topic-tracking technology to improve efficiency with customer queries. And down the road, there's always the potential for that personal assistant whispering pertinent information to help you through the day.
"This whole notion of analyzing a transcript as it's generated on the fly and then searching collections of related information—that's a novel idea," Brown says. "We regularly get inquiries from people with customers who want something like this. So even though we're trying to solve problems for the future, it appears that customers today need this kind of technology."