Andreas Blumauer

From Taxonomies over Ontologies to Knowledge Graphs

With the rise of linked data and the semantic web, concepts and terms like ‘ontology’, ‘vocabulary’, ‘thesaurus’ or ‘taxonomy’ are being picked up frequently by information managers, search engine specialists or data engineers to describe ‘knowledge models’ in general. In many cases the terms are used without any specific meaning which brings a lot of people to the basic question:

What are the differences between a taxonomy, a thesaurus, an ontology and a knowledge graph?

This article should bring light into this discussion by guiding you through an example which starts off from a taxonomy, introduces an ontology and finally exposes a knowledge graph (linked data graph) to be used as the basis for semantic applications.

1. Taxonomies and thesauri

Taxonomies and thesauri are closely related species of controlled vocabularies to describe relations between concepts and their labels including synonyms, most often in various languages. Such structures can be used as a basis for domain-specific entity extraction or text categorization services. Here is an example of a taxonomy created with PoolParty Thesaurus Server which is about the Apollo programme:

Apollo programme taxonomyThe nodes of a taxonomy represent various types of ‘things’ (so called ‘resources’): The topmost level (orange) is the root node of the taxonomy, purple nodes are so called ‘concept schemes’ followed by ‘top concepts’ (dark green) and ordinary ‘concepts’ (light green). In 2009 W3C introduced the Simple Knowledge Organization System (SKOS) as a standard for the creation and publication of taxonomies and thesauri. The SKOS ontology comprises only a few classes and properties. The most important types of resources are: Concept, ConceptScheme and Collection. Hierarchical relations between concepts are ‘broader’ and its inverse ‘narrower’. Thesauri most often cover also non-hierarchical relations between concepts like the symmetric property ‘related’. Every concept has at least on ‘preferred label’ and can have numerous synonyms (‘alternative labels’). Whereas a taxonomy could be envisaged as a tree, thesauri most often have polyhierarchies: a concept can be the child-node of more than one node. A thesaurus should be envisaged rather as a network (graph) of nodes than a simple tree by including polyhierarchical and also non-hierarchical relations between concepts.

2. Ontologies

Ontologies are perceived as being complex in contrast to the rather simple taxonomies and thesauri. Limitations of taxonomies and SKOS-based vocabularies in general become obvious as soon as one tries to describe a specific relation between two concepts: ‘Neil Armstrong’ is not only unspecifically ‘related’ to ‘Apollo 11’, he was ‘commander of’ this certain Apollo mission. Therefore we have to extend the SKOS ontology by two classes (‘Astronaut’ and ‘Mission’) and the property ‘commander of’ which is the inverse of ‘commanded by’.

Apollo ontology relationsThe SKOS concept with the preferred label ‘Buzz Aldrin’ has to be classified as an ‘Astronaut’ in order to be described by specific relations and attributes like ‘is lunar module pilot of’ or ‘birthDate’. The introduction of additional ontologies in order to expand expressivity of SKOS-based vocabularies is following the ‘pay-as-you-go’ strategy of the linked data community. The PoolParty knowledge modelling approach suggests to start first with SKOS to further extend this simple knowledge model by other knowledge graphs, ontologies and annotated documents and legacy data. This paradigm could be memorized by a rule named ‘Start SKOS, grow big’.

3. Knowledge Graphs

Knowledge graphs are all around (e.g. DBpedia, Freebase, etc.). Based on W3C’s Semantic Web Standards such graphs can be used to further enrich your SKOS knowledge models. In combination with an ontology, specific knowledge about a certain resource can be obtained with a simple SPARQL query. As an example, the fact that Neil Armstrong was born on August 5th, 1930 can be retrieved from DBpedia. Watch this YouTube video which demonstrates how ‘linked data harvesting’ works with PoolParty.

Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.

Why should I transform my content and data into a large knowledge graph?

The answer is simple: to being able to make complex queries over the entirety of all kind of information. By breaking up the data silos there is a high probability that query results become more valid.

With PoolParty Semantic Integrator, content and documents from SharePoint, Confluence, Drupal etc. can be tranformed automatically to integrate them into enterprise knowledge graphs.

Taxonomies, thesauri, ontologies, linked data graphs including enterprise content and legacy data – all kind of information could become part of an enterprise knowledge graph which can be stored in a linked data warehouse. Based on technologies like Virtuoso, such data warehouses have the ability to serve as a complex question answering system with excellent performance and scalability.

4. Conclusion

In the early days of the semantic web, we’ve constantly discussed whether taxonomies, ontologies or linked data graphs will be part of the solution. Again and again discussions like ‘Did the current data-driven world kill ontologies?‘ are being lead. My proposal is: try to combine all of those. Embrace every method which makes meaningful information out of data. Stop to denounce communities which don’t follow the one or the other aspect of the semantic web (e.g. reasoning or SKOS). Let’s put the pieces together – together!


Thomas Schandl

Which kind of controlled vocabularies matter?

Looking at intermediate results of the Controlled Vocabularies Survey an interesting finding concerns the question which types of knowledge models are currently best fit for actual use in applications.

So far 143 people whose organization already make use of controlled vocabularies answered the question “Which kind of controlled vocabulary do you use or plan to use in your applications?”.
The results so far show that lightweight models like taxonomies and thesauri are somewhat preferred over ontologies:

Taxonomies are the favorite, as 73.6% of participants use or plan to use them, followed by thesauri (62%) and ontologies (61.2%), while simple glossaries lag considerably behind with a usage of 31.4%.

This survey will close in about a week, so please take this chance to make your opinions on this topic count! You can find the questions here, it will take 5-10 minutes to answer them.

All participants will gain access to a report with the results within the following month. The most interesting results will be made public on this blog.

Jana Herwig

GoodRelations webcast & spreading the word about the Semantic Web

You have probably already heard about GoodRelations, “the web ontology for e-commerce”. Martin Hepp from Bundeswehr University in Munich recently created a webcast, giving a short introduction to semantic web-based E-Commerce and to the GoodRelations vocabulary – I want to see more of such introductions which aim at a wider audience in terms of style and intellectual accessibility!

Last week I had an an encounter with a social scientist (within an academic setting) who argued that discussing the Semantic web would not make sense for him (as a social scientist), because of the present lack of social practices in that field… (*jaw-dropping*) I could not persuade him with the argument that the Linked data cloud itself was the result of a social practice – the view he had of the semantic web (which I assume was not an uneducated one) even led him to denounce that developments like Dbpedia, Twine, Revyu, or the use of metadata in general had anything to do with the Semantic Web.

And this is a big challenge.

On the one hand, it is a good thing that there are social scientists who at least have a certain notion of the Semantic Web – on the other, it seems as if all the exciting ideas and developments that have taken place in the last few years have failed to reach those who have been sensitized for the SemWeb project when the idea was first conceived. I am not meaning to make a statement about social scientists here, but rather about the need to communicate what has further happened to the original idea outside also outside of one’s own community.

Btw: In its current issue, quarterly (German-language) magazine t3n is featuring a Web 3.0 and Applied Semantic Web topic as its opener. And that is a good sign, too!

Reblog this post [with Zemanta]