SHER

SHER is an OWL reasoner that is designed to provide semantic querying of large relational datasets using OWL ontologies.




SHER Overview



SHER (Scalable Highly Expressive Reasoner) is a breakthrough technology that provides ontology analytics (OWL-DL without nominals) over highly expressive ontologies.
  • SHER does not do any inferencing on load, and hence deals better with fast changing data (the downside is of course that reasoning is performed at query time).
  • SHER can reason on ~7 million triples in seconds, and scales to datasets with 60 million triples, responding to queries in minutes. SHER has been used to semantically index 300 million triples from the medical literature.
  • SHER tolerates logical inconsistencies in the data, and can quickly point users to these inconsistencies in the data, and help the user clean up inconsistencies before issuing semantic queries.
  • SHER provides explanations (or justifications) for why a particular result set is an answer to the query. This is useful for validation by domain experts.






  • How it works


    SHER's reasoning technique relies on a novel combination of indexing the instances of the database from the perspective of reasoning. This indexing technique summarizes the instance data into a very compact representation that is used for reasoning. For details, see Summarization, ISWC 2006 and Tech report for ISWC 2006.

    SHER uses this representation to efficiently filter instance data that is irrelevant for answering a certain query, and selectively uncompresses portions of the summarized representation relevant for the query, in a process called refinement. The combination of summarization and refinement is key to SHER's scalability. For details, see Paper in AAAI 2007 and Tech report for AAAI 2007. Internally, SHER uses the popular open-source OWL-DL reasoner, Pellet, to reason over the summarized data and obtain justifications for the data inconsistency.

    SHER performs membership query answering as well as conjunctive query answering using a set of optimization techniques described in this paper on optimizing membership query answering and conjunctive querying. These optimization techniques leverage summarization in the context of conjunctive querying, and also incorporate faster incomplete reasoning techniques into query answering. SHER therefore has an internal knob which can be used to get fast, incomplete answers to queries. This faster algorithm can help retrieve large result sets for most queries within a minute or two.




    Use cases


    Automated Clinical trials matching using ontologies



    In collaboration with researchers at Columbia University Medical Center, we used SHER to find electronic patient records that match clinical trials criteria. The problem in automating clinical trials matching is that patient data is noisy, coded in local terminologies, and highly specific. Clinical trials queries, however, tend to be much more general. Bridging the gap between the two requires significant knowledge engineering, which has to be customized for each institution. For example, the patient records contain records such as Patient X was medicated with a vendor specific drug Y. Clinical trials criteria, however, are specified in terms of a broad class of drugs, such as patients that are on medications involving an active ingredient Z.

    Together with researchers from Columbia, we investigated whether it was possible to re-use the knowledge in the SNOMED ontology to bridge the gap between electronic medical records and clinical trials queries. SHER successfully found matches for the clinical trials queries on a large 1 year patient dataset from Columbia (60 million triples). For details, see ISWC 2007, Clinical trials matching paper. For a set of slides about this case study, see Clinical trials matching case study slides.

    Cleaning up text extraction output using ontologies


    SHER has been used in the context of SemanticClean, a project that examines whether it is possible to use OWL reasoning to clean up inconsistencies in data generated by text extraction. Depending on the number of inconsistent patterns present in the data, SHER can detect several thousand inconsistencies, and it takes between 10-67 minutes for dataset sizes that are between ~800K-2 million triples. For details, see ISWC 2007, paper on SemanticClean.

    Searching PubMed with ontologies on AnatomyLens


    AnatomyLens provides a semantic, concept based search over annotations of PubMed articles and GOA annotations using the GO (gene ontology) and FMA (Foundational Model of Anatomy) ontologies. Users enter anatomy terms, MeSH terms, and biological processes as search keywords. Anatomy Lens is more precise and has better recall than text search. For example, for the query Alzheimer's, brain, neuron development, Anatomy Lens will match Alzheimer's articles that discuss dendrite development in the hippocampus, whereas a standard text search will only find articles containing the queried keywords explicitly and might also find articles that are unrelated (such as articles about neuron development in the spine).

    Here's a URL to try it: Anatomy Lens

    Here's a URL demonstrating the basics of AnatomyLens: Anatomy Lens Video


    Availability



    We plan on making SHER available with a free for academic use license mid-June, 2008.

    Team members



    Kavitha Srinivas, SHER Project lead
    Edith Schonberg, SHER Project Manager
    Achille Fokoue
    Aditya Kalyanpur
    Li Ma
    Aaron Kershenbaum



    Collaborators for the SHER project


    Julian Dolby (Watson - dolby@us.ibm.com)
    Christopher Welty (Watson - welty@us.ibm.com)
    Bill Murdock (Watson - murdockj@us.ibm.com)
    James Fan (Watson - fanj@us.ibm.com)
    Xing Zhi Sun (CRL - sunxingz@cn.ibm.com)
    Robert Schiaffino (Watson & Iona College - rjschiaf@us.ibm.com)
    Chintan Patel (Columbia Medical Center)
    James Cimino (Columbia Medical Center)




    Additional information




    The datasets used in our experimental evaluation in the Summarization, ISWC 2006 paper are available:
    Semantic Traceability ontology and data.
    UOBM benchmark ontology and data.



    Last updated 21 May 2008

    Content navigation