Semantic Web technologies enhance enterprise Master Data Management

Innovation Matters


IBM Research investigated the use of Semantic Web technologies for semantic query and analytics for master data management. These technologies will help enterprises manage core business objects, such as customer and product data, in an efficient way.

IBM plays a key role in the development of Master Data Management systems. These systems encompass core business entities that a company uses repeatedly across many business processes and systems, such as lists or hierarchies of customers, suppliers, accounts, products or organizational units. Master Data Management (MDM) systems include Websphere Product Center for product information management and Websphere Customer Center for customer data integration. IBM expects to release a powerful multiform MDM platform, which will simultaneously manage products, customers and accounts, and which will have a significant impact on the most important business processes in enterprises.

A novel project called Semantic Master Data Management (SMDM) uses semantic Web technologies for semantic query and analytics and will effectively enhance existing MDM solutions. Users will be able to query the integrated master data without needing to understand the data model or without needing to write complex SQL statements. Moreover, they will be able to make data analysis, as well as manage and discover implicit relationships among master data based on semantic modeling and reasoning.

Ontology – the fundamental concept in the Semantic Web – is defined as an explicit specification of shared conceptualization. It provides a formal semantics and is a key structure for knowledge representation. In the SMDM project, we generate an OWL ontology, which is then enriched from the logical data model of an MDM system. Thus, the master data is represented as the instances of the ontology in the form of <subject, predicate, object>. This is similar to human language and can be interpreted easily. Next, using an ontology query language called SPARQL, queries on the master data can be formed easily without needing to know how the data is modeled.

Ontology offers an additional benefit for MDM by way of its reasoning ability. Implicit information can be discovered by ontology reasoning, which can be categorized into four major features. (See Figure 1, which uses product information management, or PIM, as a way to illustrate these new features.)

Semantic query and analytics for MDM
Domain ontology based classification. Many domain ontologies for the purpose of classification are developed, such as an ontology of industry sectors. Ontologies that encode common knowledge can be leveraged by existing MDM systems to simplify the organization of master data. By simply associating products with such domain ontologies, for example, users can classify their products by geography, sector, etc.

Dynamic categorization. The expressivity of OWL allows for defining logical classes (intersection, union and complement operators), which enables automatic classification for the master data. In addition, OWL restrictions support the creation of new categories of the master data. As shown in Figure 1, we can use an OWL hasValue restriction to define a new category “aluminum products,” which includes all items containing aluminum. Using OWL reasoning, instances of this newly defined category can be retrieved automatically. We can also apply rules to formally describe new categories. One example is to define VIP customers as those whose minimum payment is greater than $10K. Through dynamic categorization, end-users can freely define new categories at query time and classify customers using reasoning, rather than changing the deployed master data model and storage model to represent new categories.

Relationship query and discovery. OWL allows for defining richer relationships (Object Properties defining relationships between individuals and Datatype Properties defining literal values). It is possible to define an Object Property as symmetric, functional, inverse functional or transitive. OWL Object Properties are suitable for describing the complex relationships among the master data. We can also use rules to define more general relationships flexibly. In particular, using OWL and datalog rule inference, we can effectively discover implicit relationships by explicit assertions.

Faceted search with relationship navigation. In addition to structured SPARQL queries, we wanted to have a convenient and friendly search diagram for navigation over the master data along their rich relationships. We implemented this by extending the index structure of existing full text search engines to the index of both asserted and inferred triples.



Fig. 1. New features for Product Information Management

Figure 2 shows the architecture of the SMDM system. First, a mapping is created to link the master data with the OWL ontology generated and enriched from the MDM logical data model. Next, users can define business rules for analytics and domain ontologies or OWL restrictions for classification. These rules and ontologies will be stored in the MDM hub and managed by the ontology and rule manager.

At query time, users can issue SPARQL queries – including classes and properties defined in ontologies and terms defined in the rule head – to the ontology-based semantic engine. A query will be parsed into a SPARQL query pattern tree and each node in the tree will be analyzed and checked if it implies a kind of ontology or rule reasoning. If rule inference is needed (for user-defined rules), a datalog rule evaluation will be conducted. If ontology reasoning for instance classification is required (namely, get the type of an instance, or all instances of an OWL class), an ontology reasoner will be called.

The inference results are temporarily materialized as a table in the MDM store. For nodes without the need of inference, a SQL query will be generated based on the defined mapping between ontologies and the schema of the master data store.

Finally, the SQL Generator will combine the resulting SQL statements with the results stored in the temporary tables to generate a single SQL, which is evaluated by the database engine where the master store is built.



Fig. 2. SMDM architecture.

The main challenges for the SMDM project are twofold:

A gap exists between ontology and physically-stored data. The current D2RQ mapping technique permits accessing of relational databases through SPARQL queries. The main technical problem is to transform the SPARQL query to SQL statement(s) according to the mapping between ontology and relational schema. The query transformation, however, often leads to the SQL with many redundant sub-queries, which causes a dramatic decrease in query performance. We proposed a number of approaches in the SMDM for generating optimal SQL statements. The result shows the significant improvement on query performance.

How do we integrate reasoning into SPARQL query? SPARQL query language is tailored for Resource Description Framework (RDF) data in which the reasoning is not considered. To take advantage of ontology reasoning and rule reasoning in the SMDM project, the SPARQL query language is extended to support user-defined rules. The correspondent query processing is designed as follows: First, the SPARQL query is parsed into a pattern tree. Next, any leaf node that requires reasoning will either be expanded into a sub-pattern tree or replaced by a temporal pattern (temporal table). Finally, every node on the tree can be directly interpreted and processed by the SPARQL query engine.

A real-world example: Searching for better product data
Consider the highly structured searches that a retailer needs to issue on an electronic product distributor’s website when, say, assembling a computer. The retailer might submit this query:

“Find all LCD displays that use an LVDS bus interface and conform to a display standard that is better than (successorOf) VGA.”

The query can be expressed by an OWL restriction as shown below. Using ontology subsumption, allValuesFrom and transitive reasoning, instances of this defined OWL restriction can be retrieved automatically.



Summary
SMDM technology makes it easy for clients to query and analyze the master data in existing MDM systems. The SMDM system can be generalized as a platform for assessing relational data by semantic SPARQL queries and can be widely used by applications where automatic reasoning over existing data is needed. In the near further, we hope that end-users will not need developers to help them make ad-hoc queries.

Related Publications  

Jean-Sébastien Brunner, Li Ma, Chen Wang, Lei Zhang, Daniel C. Wolfson, Yue Pan and Kavitha Srinivas. Explorations in the use of semantic Web technologies for product information management. 16th International World Wide Web Conference (WWW2007). 2007. [ download ]

Julian Dolby, Achille Fokoue, Aditya Kalyanpur, Aaron Kershenbaum, Li Ma, Edith Schonberg and Kavitha Srinivas. Scalable semantic retrieval through summarization and refinement. 21st Conference on Artificial Intelligence (AAAI). 2007.

M. K. Smith,, C. Welty and D. L. McGuinness. OWL web ontology language guide. W3C recommendation. 2004.

Robert Lu, Feng Cao, Li Ma, Yong Yu and Yue Pan. An effective SPARQL support over relational databases. VLDB2007 Joint ODBIS & SWDB workshop on Semantic Web, Ontologies, Databases. 2007.

Jing Lu, Li Ma, Lei Zhang, Jean-Sébastien Brunner, Chen Wang, Yue Pan and Yong Yu. SOR: A practical system for OWL ontology storage, reasoning and search. 33rd International Conference on Very Large Data Bases (VLDB). 2007.

Li Ma, Jing Mei, Yue Pan, Krishna Kulkarni, Achille Fokoue and Anand Ranganathan. Semantic Web technologies and data management. W3C workshop on RDF access to relational databases. [ download ]

Li Ma, Zh. Su, Y. Pan, L. Zhang and T. Liu. RStar: An RDF storage and query system for enterprise resource management. Proc. of 13th ACM International Conference on Information and Knowledge Management (CIKM). 2004. [ download ]

E. Prud’hommeaux and A. Seaborne. SPARQL query language for RDF. W3C candidate recommendation. 2006.

Lei Zhang, Qiaoling Liu, Jie Zhang, Haofen Wang, Yue Pan and Yong Yu. Semplore: An IR approach to scalable hybrid query of semantic Web data. Proc. of the 6th International Semantic Web Conference (ISWC 2007). November 2007.

Related links
IBM Master Data Management.

The D2RQ Platform v0.5.1 - Treating non-RDF relational databases as virtual RDF graphs.

E. Prud’hommeaux and A. Seaborne. SPARQL query language for RDF. W3C candidate recommendation. 2006.

Michael A. Covington, Donald Nute, Andre Vellino, Prolog Programming in Depth, 1996, ISBN 0-13-138645-X.

M. K. Smith,, C. Welty and D. L. McGuinness. OWL web ontology language guide. W3C recommendation. 2004.

Last updated January 15, 2008

Innovator's corner  

Li MaLi Ma Researcher

What's the potential for the work you are doing?
We are trying to apply semantic Web technologies to master data management. Doing so is a critical step toward synchronizing core business objects among various enterprise applications and realizing the promise of Service-Oriented Architecture (SOA). Our work provides classification services, relationships querying and discovery services for the master data over current MDM solutions via semantic modeling and reasoning. We are also helping to implement general semantic information integration where RDF access to relational databases is used.

What is the most interesting part of your research?
Semantic Web provides formal methods to capture and use the semantics of business entities. In addition, Master Data Management addresses an important problem within the enterprise as key business objects increasingly must be shared and synchronized across various applications. We are greatly interested in investigating these relatively new and challenging areas so that we can solve real customer problems.

Who or what inspired you to go into this field?
Exploring new and novel technologies for real-world problems really inspired us to begin the SMDM project. We want to thank Dan Wolfson from IBM software group for his support and encouragement to the team.

What is your favorite invention of all time?
The World Wide Web (WWW). This is one of the most exciting and important inventions in modern society. It constructs a virtual and open world and enables people to study and collaborate together.