Skip to main content

IBM Israel Research Seminars

 

Time is a very important dimension in the co-citation literature. Time is considered one of the most important factors in detecting subjects that are obsolete and those that are emerging. Success through time is also a measure used by libraries to rank journals as part of their decision to subscribe or unsubscribe to journals. Authors of scientific papers decide where to publish their papers based on the current popularity of a journal and the recency & importance of citations made to that journal. It has been shown that citations of journal articles behave in a consistent manner. In general, the more time passes the less citations a paper receives. In fact, a journal will be considered more prominent the higher its citation half-life is (i.e., how old in years are most of the papers currently cited in the literature that were previously published in this journal). Combined with another measure called impact-factor (the frequency with which the average article in a given journal has been cited in a particular year), libraries determine the value of a certain journal to their collection. Since the value of journals can change over time, this evaluation is carried out in many libraries on an annual or bi-annual basis. Furthermore, authors learn about the importance of their acceptance to a journal or the citation of their work in a certain journal based on such evaluations.

In contrast, when plotting similar measures for citations on the Web, the reverse behaviour is exhibited: the more time passes the more citations a page receives. Furthermore, unlike the publications studied in co-citation analysis, pages on the Web are modified and updated with respect to real world events. There have been numerous attempts to make use of time to predict trends on the Web. However all of those studies emphasised the detection of the change itself and not the temporal nature of the data studied. None of these studies looked into how to incorporate time into the processes that are currently used for ranking web pages, computing link-based measures of site popularity, and link analysis in general. In fact, to the best of our knowledge, the Web Information Retrieval community has never proposed such a temporal approach.

In this talk I will discuss several aspects and uses of temporal data in the context of Web IR. The main contribution of this work is first and foremost in raising the issue of utilizing the time dimension in the context of link analysis. I will demonstrate the benefits of this approach by showing how we incorporated this additional dimension into two applications. The first application measures the activity within a topical community as a function of time. The second application is an adaptation of link-based ranking schemes that captures timely authorities, the authorities that are on the rise today and should be ranked over the resources of days past.

Joint work with Ronny Lempel, Uri Weiss & HRL's IR Group members.