How often have you entered keywords in a search engine, only to be rewarded with a huge list of items that are way off the mark? Our research attempts to predict the chances of a search engine providing the correct documents for a given query. Such prediction is useful as feedback to the user and the search engine administrator. It is also used for uniting data from different sources, as well as to predict when relevant documents cannot be retrieved simply because they do not exist in the repository.
Our work attempts to predict the quality of the answers retrieved by search engines using machine learning methods. This work opens a new range of opportunities for improving search engines and their interaction with users.
Today, most search engines return results that contain the query terms. It’s a well-known fact that just because certain keywords can be found in a document, it doesn’t mean the document answers your query. However, that document will be listed in your query results. Take, for example, the query “IBM”. Any search engine will list huge reams of results, but what are the chances of them being relevant to the search? And what exactly are you searching for? Information on IBM products? IBM stocks? Or maybe the location of a nearby IBM dealer? The ambiguity of that query makes it difficult for any search engine to answer.
The reasons for low quality search results are numerous, but two main explanations stand out. First, engines are faced with linguistic challenges, where words like ‘fire’ or ‘present’ can have multiple meanings. Second, the data sources being searched may not have any information on the topics being searched for. The missing content may simply not exist in the indexed collection.
We started our work by trying to find indicative features for easy and difficult queries. We examined many queries and the documents retrieved in response to these queries. This was done for both queries that were answered suitably (easy queries) and for queries that were not (difficult queries). We observed that when queries were easy, searching for each query term separately will yield similar results to a search for the entire phrase. This closely resembles a situation where a patient consults with four different physicians. If all physicians give the same diagnosis, the patient will naturally have greater confidence in their judgment.
We used this agreement as input to a prediction algorithm. This algorithm uses a set of training queries to learn how to map these agreements to the difficulty of the query. When a new query is entered, the prediction algorithm can forecast its difficulty. When presented with 49 previously unseen queries from the NIST TREC conference, the predicted difficulties estimated by the algorithm were in a correlation of 0.57 with the actual difficulties. This result is significantly higher than any other previously-suggested method, although we are currently working on improvements to the algorithms which we hope will increase this correlation.
Figure 1. Prediction of query difficulty presented together with search engine results. This prediction informs users if the retrieved documents are likely to be useful. If not, they can decide, for example, to reformulate their queries.
Once the difficulty of a query can be predicted, it can be used in a variety of ways. The most common use is to give feedback to the user (see image). We have also used this information to combine information from different sources by weighting the results from different repositories according to their predicted quality. This approach improved retrieval by up to 13% on the collections we tested. Finally, we have shown that it is possible to use the technology developed for query prediction to detect when the quality of results is poor simply because the document collection does not contain pertinent results. This information is of great importance to the search engine administrator because it signifies information that users are seeking but is not currently being answered. Thus, the administrator can use this information to decide which information should be added to the repository. In our experiments we demonstrated that over 70% of queries with no content could be identified at the cost of incorrectly labeling less than 10% of queries with content.
Finally, we have shown that it is possible to use the technology developed for query prediction to detect when the quality of results is poor simply because the document collection does not contain pertinent results. This is done using a two-staged process. First, a query predictor detects those queries that are likely to be difficult. Then, a second predictor is trained to distinguish between queries which are difficult because there is no relevant content in the repository and those which are difficult for other reasons (for example, due to linguistic ambiguity). The second predictor is trained in a manner identical to the first one except that the labels for its training queries are different (they can be either that the query has content or it does not).
This information is of great importance to the search engine administrator because it signifies information that users are seeking but is not currently being answered. Thus, the administrator can use this information to decide which information should be added to the repository.
Our work on query prediction won the Best Paper Award at the Association of Computing Machinery’s (ACM) 2005 conference on information retrieval (SIGIR 2005).
Related Publications
Elad Yom-Tov, Shai Fine, David Carmel and Adam Darlow. Learning to Estimate Query Difficulty with Applications to Missing Content Detection and Distributed Information Retrieval . 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Recipient of the Best Paper Award . ACM, August 2005.
E. Yom-Tov, S. Fine, D. Carmel, A. Darlow and E. Amitay. Juru at TREC 2004: Experiments with prediction of query difficulty. Text Retrieval Conference (TREC). 2004.
E. Yom-Tov, D. Carmel, A. Darlow, D. Pelleg, S. Errera-Yaakov and S. Fine. Juru at TREC 2005: Query Prediction in the Terabyte and the Robust tracks. Text Retrieval Conference (TREC). 2005.





