IBM Research is developing an enterprise-class anti-spam filter as part of our overall strategy of attacking the Spam problem on multiple fronts. Our anti-spam filter, SpamGuru, mirrors this philosophy by incorporating several different filtering technologies and intelligently combining their output to produce a single spamminess rating or score for each incoming message. The use of multiple algorithms improves the system's effectiveness and makes it more difficult for spammers to attack. While a spammer may defeat any single algorithm, SpamGuru can rely on its remaining algorithms to maintain a high-degree of effectiveness.
We are using SpamGuru as a testbed for exploring a number of existing and new technologies for indentifying incoming spam. The main technologies currently under investigation include these:
- JClassifier is a Bayesian-style text classifier loosely based on Paul Graham's original design.
- Chung-Kwei applies advanced pattern matching algorithms developed in IBM's bioinformatics group to spam detection. This new classification algorithm can detect complex patterns in messages that go beyond the simple word or word phrases used in most algorithms.
- Plagiarism Detection. SpamGuru's employs research in plagiarism detection to accurately detect textual variations of known spam to ensure that simple variations of previously identified spam are also blocked. SpamGuru's plagiarism detection algorithms have a low very false positive rate due to their reliance on a near match to known spam.
- Spoof Detection. SpamGuru's spoof detection algorithm analyzes DNS and domain records to determine whether the message is likely to have been spoofed or sent from a less reliable SMTP server. SpamGuru’s DNS analysis provides most of the advantages of the MARID MTA authentication record without the need for explicit publication of outgoing mail servers.
- Intelligent Rendering. SpamGuru's intelligent rendering algorithm analyzes a messages MIME encoding to extract what the user is likely to see when reading a message rather than what a spammer wants the filter to see. In the process, attempts to obsfucate a message's true content are noted and passed as features to SpamGuru's classification algorithms.
- Classifier Aggregation. We are investigating a variety of dynamic adaptive techniques for combining the evidence provided by multiple classifiers. This yields a single classifier that is both more accurate and more robust than any of its constituents.
SpamGuru technology forms the basis of the Intelligent Mail Filter that is included in Lotus Workplace 2.5, IBM's next-generation messaging and collaboration framework. Please see the Lotus Workplace web site for availibility and purchasing information.
