IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Solutions

By Mark Fischetti

Software That Automatically Sorts Incoming Messages Could Speed Responses, Improve Service And Yield Valuable Clues About Customer Needs

In Brief:

A machine-learning system being developed at IBM Research can quickly classify and route email messages in a sample provided by NationsBank. It could also be adapted to provide templates that speed employees'
responses to customer queries, or reply on its own. Generating and refining its own classification rules on new data, the system is gradually improving in accuracy.
Sooner or later, anyone who uses email discovers that there really can be too much of a good thing. Large businesses are especially hard hit. Some corporations, flooded with tens of thousands of electronic messages a month, are forced to hire scores of people just to sort and respond to the steady stream of inquiries.New software being developed at IBM Research could provide the tools to manage this important but sometimes overwhelming medium. One prototype analyzes, categorizes and routes messages according not merely to their subject lines but to their contents. Potentially, it could save a company thousands of person-hours -- and millions of dollars -- each year while improving customer service.

NationsBank learned just how daunting large quantities of email can be. In 1996, as its electronic banking clientele grew from 50,000 to 250,000, the number of messages soared from a few hundred to 20,000 a month. The following year, the bank had to hire about 100 people to cope with the extra load. To complicate matters, three or four times as much email arrived on Monday as on any other day, creating a major work-balancing problem. NationsBank wanted to reply quickly in order to provide better customer service than its competitors, but the bank's studies showed that an employee could reliably answer 20 to 25 email requests per day. Just routing the mail was a formidable task.

The bank asked Jim Deupree, a principal in IBM Consulting's Banking, Finance and Securities Industry unit, if software could be developed to categorize incoming email automatically.
Deupree approached David Johnson, manager of natural language understanding at IBM's Thomas J. Watson Research Center, who had been working on email categorization. The inquiry led to the establishment of a "first-of-a-kind" project with NationsBank, which began in April 1997.

The resulting email classifier works with Lotus Notes® and runs on the
Windows 95®, Windows NT® and AIX® operating systems. The system continues to be improved, and could ultimately be developed into a commercial product. Related work is
also under way to create categorization software that individuals can use to simplify the task of filing mail in Lotus Notes.

Filling A Tall Order

When NationsBank approached IBM, it soon became apparent that the bank needed a highly sophisticated system. The software would have to be far more powerful than the "filter" functions provided by email packages that file incoming messages according to the sender's name, or a word in the subject line.

For a bank, the sender's name would suggest nothing about the message's content, and a word like "loan" in a subject line could refer to a mortgage, home improvement, business or car loan -- each of which would need to be routed differently. Even if the subject line specified "mortgage loan," the sender could be asking for payment status, a copy of a statement or tax information -- again, different tasks handled by different people. To effectively categorize email, software would have to analyze the full text of every message.

To speed responses, the bank also wanted a
system that could provide templates for replies. If the request was for tax information, the system could provide a boilerplate letter, so a customer service representative would have only to type in a few particulars. Ideally, the system would even respond to standard requests automatically.
Only the small number of inquiries remaining would require full human attention.

The bank hoped to reap downstream benefits, as well. By analyzing past email, the system should be able to unearth patterns in customer complaints or requests, yielding strong clues about new products or services worth developing and old ones that need fixing, as well as
opportunities for cross-selling.

To address the NationsBank project, Johnson formed a team that included Watson colleagues Fred Damerau, Thilo Goetz, Frank Oles and Thomas Hampp. Their work was motivated by a text categorization system developed several years earlier by Chid Apte, manager of data abstraction research at Watson, Damerau, and Professor Sholom Weiss, a visiting scientist from Rutgers University. That system was based on a machine-learning algorithm known as Swap, written by Weiss. Swap discovered classification rules by
systematically searching training data that had appropriate categories preassigned for combinations of words predictive of each category.

The trio had tested Swap on the Reuters financial newswire database in a joint project with the Reuters news agency, which wanted a program to create the category headers under which stories were sent over the wires, freeing editors from the chore. The researchers trained Swap on some 10,000 existing stories, then turned it loose on several thousand more.

The Swap-based system scored high on a key metric known as the break-even point -- the point at which the system's recall rate (the percentage of items it categorizes) matches its precision (the percentage of categorized items that are labeled correctly). The system achieved a break-even point of 80.5 percent on all of the 93 Reuters categories, compared with a previous best of 67 percent. Although Reuters needed guarantees of accuracy that the Swap-based system could not provide, the system has since become recognized as a seminal contribution to the text-classification field.

Johnson's team faced some stiff challenges. They needed to build a system that would outperform the Swap-based system on more difficult data. NationsBank had identified more than 100 types of responses it sent to email correspondents; these would form the categories.

The team first developed a text categorizer toolkit that helped them determine the best techniques for selecting the words and phrases to be used in training. They also evaluated three machine-learning algorithms, which led to the selection of a decision-tree system developed at IBM's Almaden Research Center by the data mining and decision support group managed by Rakesh Agrawal. The system is part of Intelligent Miner™, an IBM product that uncovers patterns in data. Johnson's team also wrote a program that converts decision trees to classification rules and developed a general categorization engine that applies classification rules to new documents.

They first benchmarked the prototype email classifier on the Reuters data set, achieving a
precision of 88 percent and a recall of 78 percent on all 93 categories. This was quite encouraging, since NationsBank was particularly interested in
precision. "As far as I am aware, this result is among the best ever reported on Reuters," Johnson says. Beginning in February, Johnson's team started working on NationsBank email. In March, they trained the classifier on almost 5,000 messages and then presented it with about 1,000 new ones.

The preliminary experiments on 14 categories of email received by one bank center yielded an average precision of 91 percent and an average recall of 81 percent. "We were very excited about the result," Johnson says. "With more data, there is every reason to believe the current system will do even better."

To date, the system performs well enough to meet some, but not all, of the performance requirements of NationsBank. For example, says Johnson, it is not yet precise enough to respond automatically to incoming mail -- that would require a percentage in the very high 90s. But the system's current precision would be good enough for prompting customer service representatives with reply templates. Indeed, work is under way with Lotus Consulting to integrate the categorizer into a Domino™-based Lotus Notes system that customer service representatives use to respond to email. Whereas they must now scroll through a list of response templates, the categorizer will pop up a suggested response template, speeding up this tedious task.

The Watson researchers are continuing to improve the classifier. One development in the works is a machine-learning algorithm specifically designed for text categorization that, unlike the current algorithm, can be trained incrementally and can provide confidence measures.

Along with Weiss and Apte, the team is also experimenting with a data sampling technique called "boosting," which applies multiple decision trees to the same data. Used with other systems, boosting achieved results superior to all previously reported results on the Reuters domain. "We expect similar results when we use boosting with our current algorithm," Johnson predicts.

As more messages from NationsBank become available, the team plans to continue to experiment with the classifier, to see if the program can improve further. Marshall Schor, manager of knowledge systems, who oversees Johnson's group, is eager for the answer, because it will show how far the system's machine-learning capability can refine rules. "That," he says, "is the key to classifying a message that humans readily understand but that lacks certain key terms -- for example, a request for a mortgage that reads, 'I want to build a house, and need money.'"

Branching Out

The scope of tasks the email classifier could
tackle is broad, spanning numerous industries, according to Deupree. "Any company that provides general product information or customer service could use the system to answer requests for parts, service or documentation," he says.

For a closer look, click here
Eventually, the approach could be applied to voice messages as well -- a boon to call centers -- since the rules needed to analyze content are the same. In collaboration with the data abstraction research group, Johnson's group is also working with IBM Global Services on "intelligent call routing," which involves categorizing summaries of telephone conversations describing problem statements to determine the proper work queue. Companion work on categorizing voice is in progress both at Watson and at IBM's Haifa Research Laboratory.

"Personally, I'd love it if IBM had an email classification product for its own use," says
Bill Pulleyblank, director of Watson's mathematical sciences department. "It would save me at least an hour a day."


Mark Fischetti is a freelance science writer in Lenox, Massachusetts.


More Information:

Metrics for Categorization

Your Personal Email Assistant

Teaching Itself to Learn


Metrics for Categorization

The abilities of text categorizers can be measured. First, sample data with preassigned categories are randomly divided into a training set and an independent test set. The training set is provided to the machine-learning algorithm, which attempts to discover rules that will correctly predict the categories of unseen data in the test set.

There are two metrics: precision and recall. High precision means that the system categorized few test documents incorrectly, whereas high recall means that many of the articles were in fact categorized by the system. Both metrics are needed to evaluate an algorithm. A system could achieve perfect recall at the expense of high precision by simply labeling each document with every category. Conversely, a system might achieve perfect precision at the expense of high recall by identifying only one document out of a large collection but identifying it correctly. Researchers often report text categorization results in terms of a precision-recall "break-even point," a hypothetical point at which precision and recall are the same.


Your Personal Email Assistant

You walk into your office, hang up your coat and check your email. There are 28 new messages. It's the same chore every morning. Don't you wish your computer could help you file them all? If Jeff Kephart succeeds, it soon may.

Regardless of which email program you use, filing mail is tedious. To save a message, you click on an icon that says something like "Move to Folder." You then have to scroll and click through levels of folders to find the right destination. "It takes about 10 seconds," says Kephart, manager of agents and emergent phenomena at Research. "That may not sound like much, but it's enough of a barrier to cause people like me to let email pile up, and finding anything in that mess becomes a nightmare."

Kephart and his colleagues Rich Segal and Hoi Chan have created software that learns how you file your mail. As you open a message, the program scans the contents and presents buttons showing the three most likely folders for saving it. Click one, and the message is filed there. "It only takes about one and a half seconds," Kephart says.

The program must be trained on past email, but it also learns from the manual filings you may make each day, so its predictive accuracy continues to improve. The researchers trained the program on 500 messages and 25 possible folders. When they tested it on 500 new messages, one of the three buttons showed the correct folder 98 percent of the time.

The mail categorizer was designed to work with Lotus Notes. "But," Kephart says, "nothing would prevent it from being used in just about any commercial email application."

The next stage of development is to test the program for several months on a handful of IBM Research employees, who will use it to file 100,000 messages. "A lot of people have volunteered,"says Kephart.


Teaching Itself to Learn

The key to the email classifier's performance is a text categorizer, which creates the rules by which email is sent to one category or another during a preliminary training phase. Rules are formulated in several steps, using a collection of email already categorized by the bank. First, the system scans all the email and eliminates words such as "the," "Sincerely," and so on that are common to email. As it proceeds, it builds a "local dictionary" of characteristic terms for each category. These dictionaries are then submitted to a machine-learning algorithm.

For each email category, the algorithm builds a decision tree that determines which words - and how many occurrences of those words - distinguish messages within that category from those in other categories. The algorithm does this by asking questions such as "Which word best splits the data into a 'Mortgage Loan' class and everything else? Which other word would best refine the previous split?" Eventually, the system decides it can no longer improve by asking new questions and so halts. A program developed by Johnson's team translates the trees into rules, which are expressed as conditionals - for example, if the words "apply" and "card" occur at least once, the message belongs in the "Credit Card Application" category.

Once the rules are devised, determining the category of a new message is simple and fast. The email classifier simply counts the occurrences of the words in the text and works through the set of rules to see which, if any, apply. If it does not find a rule, it leaves the category blank, for a human operator to determine. The proper categories for those messages, and corrections of the system's mistaken categorizations, can be fed back to the system so that it can refine its own rules, improving its performance.

The rules devised by the algorithm are not always intuitive, Johnson explains. "Some are based on relationships that no human would ever conceive or have thought sensible." For example, when the email classifier was run on the Reuters sample, it created a rule stating that, "If 'U.S.' occurs no more than twice, and 'estimate' occurs no more than once, and 'vs.' occurs at least once, then the article belongs in the category 'Earnings.'" This rule proved wrong only twice when applied to 942 articles, for a precision of 99.8 percent.





    About IBMPrivacyContact