Andreas Blumauer

From Taxonomies over Ontologies to Knowledge Graphs

With the rise of linked data and the semantic web, concepts and terms like ‘ontology’, ‘vocabulary’, ‘thesaurus’ or ‘taxonomy’ are being picked up frequently by information managers, search engine specialists or data engineers to describe ‘knowledge models’ in general. In many cases the terms are used without any specific meaning which brings a lot of people to the basic question:

What are the differences between a taxonomy, a thesaurus, an ontology and a knowledge graph?

This article should bring light into this discussion by guiding you through an example which starts off from a taxonomy, introduces an ontology and finally exposes a knowledge graph (linked data graph) to be used as the basis for semantic applications.

1. Taxonomies and thesauri

Taxonomies and thesauri are closely related species of controlled vocabularies to describe relations between concepts and their labels including synonyms, most often in various languages. Such structures can be used as a basis for domain-specific entity extraction or text categorization services. Here is an example of a taxonomy created with PoolParty Thesaurus Server which is about the Apollo programme:

Apollo programme taxonomyThe nodes of a taxonomy represent various types of ‘things’ (so called ‘resources’): The topmost level (orange) is the root node of the taxonomy, purple nodes are so called ‘concept schemes’ followed by ‘top concepts’ (dark green) and ordinary ‘concepts’ (light green). In 2009 W3C introduced the Simple Knowledge Organization System (SKOS) as a standard for the creation and publication of taxonomies and thesauri. The SKOS ontology comprises only a few classes and properties. The most important types of resources are: Concept, ConceptScheme and Collection. Hierarchical relations between concepts are ‘broader’ and its inverse ‘narrower’. Thesauri most often cover also non-hierarchical relations between concepts like the symmetric property ‘related’. Every concept has at least on ‘preferred label’ and can have numerous synonyms (‘alternative labels’). Whereas a taxonomy could be envisaged as a tree, thesauri most often have polyhierarchies: a concept can be the child-node of more than one node. A thesaurus should be envisaged rather as a network (graph) of nodes than a simple tree by including polyhierarchical and also non-hierarchical relations between concepts.

2. Ontologies

Ontologies are perceived as being complex in contrast to the rather simple taxonomies and thesauri. Limitations of taxonomies and SKOS-based vocabularies in general become obvious as soon as one tries to describe a specific relation between two concepts: ‘Neil Armstrong’ is not only unspecifically ‘related’ to ‘Apollo 11′, he was ‘commander of’ this certain Apollo mission. Therefore we have to extend the SKOS ontology by two classes (‘Astronaut’ and ‘Mission’) and the property ‘commander of’ which is the inverse of ‘commanded by’.

Apollo ontology relationsThe SKOS concept with the preferred label ‘Buzz Aldrin’ has to be classified as an ‘Astronaut’ in order to be described by specific relations and attributes like ‘is lunar module pilot of’ or ‘birthDate’. The introduction of additional ontologies in order to expand expressivity of SKOS-based vocabularies is following the ‘pay-as-you-go’ strategy of the linked data community. The PoolParty knowledge modelling approach suggests to start first with SKOS to further extend this simple knowledge model by other knowledge graphs, ontologies and annotated documents and legacy data. This paradigm could be memorized by a rule named ‘Start SKOS, grow big’.

3. Knowledge Graphs

Knowledge graphs are all around (e.g. DBpedia, Freebase, etc.). Based on W3C’s Semantic Web Standards such graphs can be used to further enrich your SKOS knowledge models. In combination with an ontology, specific knowledge about a certain resource can be obtained with a simple SPARQL query. As an example, the fact that Neil Armstrong was born on August 5th, 1930 can be retrieved from DBpedia. Watch this YouTube video which demonstrates how ‘linked data harvesting’ works with PoolParty.

Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.

Why should I transform my content and data into a large knowledge graph?

The answer is simple: to being able to make complex queries over the entirety of all kind of information. By breaking up the data silos there is a high probability that query results become more valid.

With PoolParty Semantic Integrator, content and documents from SharePoint, Confluence, Drupal etc. can be tranformed automatically to integrate them into enterprise knowledge graphs.

Taxonomies, thesauri, ontologies, linked data graphs including enterprise content and legacy data – all kind of information could become part of an enterprise knowledge graph which can be stored in a linked data warehouse. Based on technologies like Virtuoso, such data warehouses have the ability to serve as a complex question answering system with excellent performance and scalability.

4. Conclusion

In the early days of the semantic web, we’ve constantly discussed whether taxonomies, ontologies or linked data graphs will be part of the solution. Again and again discussions like ‘Did the current data-driven world kill ontologies?‘ are being lead. My proposal is: try to combine all of those. Embrace every method which makes meaningful information out of data. Stop to denounce communities which don’t follow the one or the other aspect of the semantic web (e.g. reasoning or SKOS). Let’s put the pieces together – together!

 

Thomas Thurner

Energy Buildings Performance Scenarios as Linked Open Data

The reduction of green house gas emissions is one of the big global challenges for the next decades. (Linked) Open Data on this multi-domain challenge is key for addressing the issues in policy, construction, energy efficiency, production a like. Today – on the World Environment Day 2014 – a new (linked open) data initiative contributes to this effort: GBPN’s Data Endpoint for Building Energy Performance Scenarios.

gbpn-scenariosGBPN (The Global Buildings Performance Network) provides the full data set on a recently made global scenario analysis for saving energy in the building sector worldwide, projected from 2005 to 2050. The multidimensional dataset includes parameters like housing types, building vintages and energy uses  – for various climate zones and regions and is freely available for full use and re-use as open data under CC-BY 3.0 France license.

To explore this easily, the Semantic Web Company has developed an interactive query / filtering tool which allows to create graphs and tables in slicing this multidimensional data cube. Chosen results can be exported as open data in the open formats: RDF and CSV and also queried via a provided SPARQL endpoint (a semantic web based data API). A built-in query-builder makes the use as well as the learning and understanding of SPARQL easy – for advanced users as well as also for non-experts or beginners.

gbn-filter

The LOD based information- & data system is part of Semantic Web Companies’ recent Poolparty Semantic Drupal developments and is based on OpenLinks Virtuoso 7 QuadStore holding and calculating ~235 million triples as well as it makes use of the RDF ETL Tool: UnifiedViews as well as D2R Server for RDF conversion. The underlying GBPN ontology runs on PoolParty 4.2 and serves also a powerful domain-specific news aggregator realized with SWC’s sOnr webminer.

reegle.info-trusted-linksTogether with other Energy Efficiency related Linked Open Data Initiatives like REEEP, NREL, BPIE and others, GBPNs recent initative is a contribution towards a broader availability of data supporting action agains global warming – as also Dr. Peter Graham, Executive Director of GBPN emphasized “…data and modelling of building energy use has long been difficult or expensive to access – yet it is critical to policy development and investment in low-energy buildings. With the release of the BEPS open data model, GBPN are providing free access to the world’s best aggregated data analyses on building energy performance.”

The Linked Open Data (LOD) is modelled using the RDF Data Cube Vocabulary (that is a W3C recommendation) including 17 dimensions in the cube. In total there are 235 million triples available in RDF including links to DBpedia and Geonames – linking the indicators: years – climate zones – regions and building types as well as user scenarios….

Enhanced by Zemanta
Thomas Thurner

American Physical Society Taxonomy – Case Study

image_jb

Joseph A Busch

Taxonomy Strategies has been working with the American Physical Society (APS) to develop a new faceted classification scheme.

The proposed scheme includes several discrete sets of categories called facets whose values can be combined to express concepts such as existing Physics and Astronomy Classification Scheme (PACS) codes, as well as new concepts that have not yet emerged, or have been difficult to express with the existing PACS.

PACS codes formed a single-hierarchy classification scheme, designed to assign the “one best” category that an item will be classified under. Classification schemes come from the need to physically locate objects in one dimension, for example in a library where a book will be shelved in one and only one location, among an ordered set of other books. Traditional journal tables of contents similarly place each article in a given issue in a specific location among an ordered set of other articles, certainly a necessary constraint with paper journals and still useful online as a comfortable and familiar context for readers.

However, the real world of concepts is multi-dimensional. In collapsing to one dimension, a classification scheme makes essentially arbitrary choices that have the effect of placing some related items close together while leaving other related items in very distant bins. It also has the effect of repeating the terms associated with the last dimension in many different contexts, leading to an appearance of significant redundancy and complexity in locating terms.

A faceted taxonomy attempts to identify each stand-alone concept through the term or terms commonly associated with it, and have it mean the same thing whenever used. Hierarchy in a taxonomy is useful to group related terms together; however the intention is not to attempt to identify an item such as an article or book by a single concept, but rather to assign multiple concepts to represent the meaning. In that way, related items can be closely associated along multiple dimensions corresponding to each assigned concept. Where previously a single PACS code was used to indicate the research area, now two, three, or more of the new concepts may be needed (although often a single new concept will be sufficient). This requires a different mindset and approach in applying the new taxonomy to the way APS has been accustomed to working with PACS; however it also enables significant new capabilities for publishing and working with all types of content including articles, papers and websites.

To build and maintain the faceted taxonomy, APS has acquired the PoolParty taxonomy management tool. PoolParty will enable APS editorial staff to create, retrieve, update and delete taxonomy term records. The tool will support the various thesaurus, knowledge organization system and ontology standards for concepts, relationships, alternate terms etc. It will also provide methods for:

  • Associating taxonomy terms with content items, and storing that association in a content index record.
  • Automated indexing to suggest taxonomy terms that should be associated with content items, and text mining to suggest terms to potentially be added to the taxonomy.
  • Integrating taxonomy term look-up, browse and navigation in a selection user interface that, for example, authors and the general public could use.
  • Implementing a feedback user interface allowing authors and the general public to suggest terms, record the source of the suggestion, and inform the user on the disposition of their suggestion.

Arthur Smith, project manager for the new APS taxonomy notes “PoolParty allows our subject matter experts to immediately visualize the layout of the taxonomy, to add new concepts, suggest alternatives, and to map out the relationships and mappings to other concept schemes that we need. While our project is still in an early stage, the software tool is already proving very useful.”

About

Taxonomy Strategies (www.taxonomystrategies.com) is an information management consultancy that specializes in applying taxonomies, metadata, automatic classification, and other information retrieval technologies to the needs of business and other organizations.

The American Physical Society (www.aps.org) is a non-profit membership organization working to advance and diffuse the knowledge of physics through its outstanding research journals, scientific meetings, and education, outreach, advocacy and international activities. APS represents over 50,000 members, including physicists in academia, national laboratories and industry in the United States and throughout the world. Society offices are located in College Park, MD (Headquarters), Ridge, NY, and Washington, DC.

Enhanced by Zemanta
Christian Mader

Online checker for SKOS vocabularies now available

Create better SKOS vocabularies
quality_check_finishedPoolParty team likes to announce the availability of the new online vocabulary quality checker for SKOS vocabularies. It finds over 20 kinds of potential quality problems in controlled vocabularies that are expressed using SKOS. The service is based on the qSKOS open-source tool.
The main features of the service are:
  • No need for registration – you can log in with your existing accounts at Google, Xing, LinkedIn or Twitter
  • Upload and check as many vocabularies you like (100MB maximum size for each vocabulary)
  • Access reports of the quality checks for all uploaded vocabularies
  • Quality reports can also be sent to you by email, if you wish
PoolParty team asks for feedback and suggestions on the service!

Either contact support@poolparty.biz or fill in our feedback form!