Skip to main content

IBM Israel Research Seminars

 

Whereas most traditional research in natural language processing and information retrieval has focused on analyzing the "topic" of a text ("what" it says), there is also much important and useful information carried in the "style" of a text ("how" it says it). Potential areas of application include author identification and profiling, determining a text's purpose or the feeling it evokes, and determining social relationships implicit in a text. Style differs from topic in that (a) the textual features that realize style are typically very diffuse over a text, not being tightly related by syntactic relations, and (b) a given feature will typically be indicative of multiple stylistic 'dimensions' at once. Hence great attention must be paid to the empirical question of effective feature design.

In this talk I will describe our methods for stylistic text classification, which use modern machine learning methods applied to textual features derived from principles of functional linguistics. Features are based on computing conditional frequencies of a variety of functional lexical and phrasal features in a text. Support vector machines are used for text classification, and the resulting models analyzed. Our goals in this research are twofold: (i) to attain accurate classification of documents for stylistic differences, and (ii) to gain insight into the linguistic nature of the stylistic classes being analyzed. I will present recent results on several stylistic classification problems, including determining the sentiment (positive or negative) of a text and analyzing variation in rhetorical style among scientific articles.

This research is partly supported by the NSF and the Binational Science Foundation, and has been carried out in collaboration with several colleagues and students.

Speaker Bio
Dr. Shlomo Argamon is Associate Professor of Computer Science at the Illinois Institute of Technology, which he joined in 2002. He previous held academic positions in Israel at Bar-Ilan University, where he held a Fulbright Postdoctoral Fellowship (1994-96), and at the Jerusalem College of Technology. Dr. Argamon received his B.S. (1988) in Applied Mathematics from Carnegie-Mellon University, and his M.Phil. (1991) and Ph.D. (1994) in Computer Science from Yale University, where he was a Hertz Foundation Fellow. His current research interests lie mainly in the use of machine learning methods to aid in functional analysis of natural language, with particular focus on questions of style. During his career, Dr. Argamon has worked on a variety of problems in experimental machine learning, ranging from robotic map-learning, to theory revision, to natural language processing, and has over 50 journal and conference publications in these areas.

Dr. Argamon has served as Workshop Chair for CIKM and on the program committees of various international conferences (AAAI, IJCAI, ACM CIKM, NAACL/HLT, BISFAI, CSFL). He also organized the first ever workshop on Computational Approaches to Style at IJCAI'03. Further efforts for this emerging research community include: Co-chairing the AAAI Fall Symposium on Style and Meaning (2004), co-chairing an ACM SIGIR workshop on Textual Stylistics in Information Access (2005), and current co-editing a book on computational stylistics. He is a member of the Association for Computing Machinery, the Association for Computational Linguistics, the American Association for Artificial Intelligence, and the Association for Literary and Linguistic Computing.