Skip to main content

IBM Israel Research Seminars

 

Relational or semi-structured data is naturally represented in a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous, in the sense that they include different types of nodes and different types of links. Relevant examples include citation networks, social networks (including persons, events and other entities) and more; we represent personal information as an entity-relation graph, in which email messages, meeting entries, social network information, text and a timeline are inter-connected via relations derived from textual and structural information residing in a personal workstation or in a corporate database.

Given an entity-relation graph, a question of interest is how to determine the nature of relationship between two entities that are not directly connected in the graph. We apply a framework of lazy random graph walks to derive an extended measure of entity similarity. The lazy graph walk can be viewed as propagating "similarity" from a start node through edges in the graph – incidentally accumulating evidence of relatedness over multiple connecting paths. We use this similarity metric as a tool for performing search across the nodes in the graph. Further, we are interested in investigating learning methods in this framework to improve the similarity measure for predefined sets of tasks. We evaluate methods that tune the set of graph weights defined per edge type in the graph; we also propose re-ranking as an alternative and complementary learning method, using features that capture "global" properties of the graph walk.

In the talk I will present a set of results using the framework of lazy random walks and learning for various tasks in the personal information management domain. In particular, I will show how seemingly different tasks like person name disambiguation, threading and contextual search are addressed uniformly as search queries in this framework. In addition, I will present preliminary results from the domain of text processing, where the underlying graph represents a corpus as networked sentence dependency structures. Finally, I will discuss the properties of the suggested framework and related scalability issues, where our goal is to effectively apply our framework for large data volumes.

About the speaker
Einat Minkov is a Ph.D student in the Computer Science school, Language Technologies Institute at Carnegie Mellon University (CMU), working on research in the interactions of natural language applications and machine learning. The research she is presenting in this talk was previously described in papers at SIGIR 06, CEAS 06 and Web-KDD 07.