The Semantic Puzzle

Andreas Blumauer

Do you like Google’s Knowledge Graph?

Semantic Enterprise Search enters the second phase.

Finally the Knowledge Graph has arrived in EuropeEurope is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally divided from Asia to its east by the water divide of the Ural Mountains, the Ural River, the Caspian Sea, the Caucasus Mountains, and the Black Sea to the ...: What has been provided on googleGoogle Inc. is a multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program. The company was for the US-Market since May 2012, is now available also for most European countries. Search results are no longer only a list of documents (and advertisements) but also a mashup of facts, points of interest, events etc. referring to the search phrase.

For example, if the user is searching for ‘Wiener Philharmoniker’ (‘ViennaVienna (/viˈɛnə/; German: Wien, pronounced [viːn]) is the capital and largest city of Austria, and one of the nine states of Austria. Vienna is Austria's primary city, with a population of about 1.8 million (2.6 million within the metropolitan area, nearly one third of Austria's ... Philharmonic Orchestra’) a factbox including related searches is provided:

Do you like this rather new way of knowledge discovery? We do, except the fact that Google hasn´t properly explained to the audience which technology is behind the Knowledge GraphIn mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges. ... which is the Web of Linked Data aka the Semantic Web (Do you want to know more about the relationship between the Knowledge Graph and Linked Data? Click here).

But anyway, here are some benefits we can see, if search technologiesA search engine is an information retrieval software program that discovers, crawls, transforms and stores information for retrieval and presentation in response to user queries.Search engines normally consist of a crawler (also known as a spider or bot) that traverse a document collection. The ... make use of a ‘knowledge graph’, a ‘knowledge model’, a ‘thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than ...’ or generally spoken: Linked Data.

  • Facts around an object (or an entity) can be found nicely packed up to a dossier
  • Serendipity can be stimulated by ‘related searches’ which means: Users can discover the formely ‘unknown’ in a more comfortable way
  • Data from various sources can be pulled together to a mashup (e.g. ‘upcoming events’ could come from a different database than the basic facts of Vienna Philharmonic Orchestra)
  • Search phrases are well understood by the engine since they are based on concepts and not anymore on literals, e.g. if the user searches for ‘Red Bull Stratos’, also results for ‘Felix Baumgartner’ will be delivered
  • Search can be refined, e.g. if one searches for ‘Vienna’, a list of POIs will be displayed to refine the actual place the user is looking for

Now imagine you would have a search engine in your company’s intranet based on a knowledge graph which is about the enterprise you are working for.

Such an advanced search application would look like this:

  • Data streams and all kind of content from internal sources are nicely mashed with information from the web (e.g. from Twitter, Youtube etc.)
  • Search assistants are provided to help users to refine their information needs to make them more specific
  • Entities and their sub-concepts (e.g. subsidiaries of large companies or regions of countries) are nicely packed together to one dossier

The key question now is: “how to set up a customised knowledge graph for a certain company?”.

Corporate Semantic Web based applications can be realised on top of software platforms like PoolParty. They all have a customised knowledge graph in their core. This is always the basis for concept-based indexingAutomatic indexing is the ability for a computer to scan large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and use those controlled terms to quickly and effectively index large document depositories. As the number of documents exponentially increases ... of specialised content from a corporate intranet. The basic standard for this is SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ... which can be used together with advanced query languages like SPARQL. Such graphs can be used for semantic indexing but also to ask for relations like ‘is point-of-interest in’, ‘is event of’, ‘is related search for’ etc. This is the next-generation semantic search which help decision-makers, information professionals and all kind of knowledge workers to improve their work significantly.
One comfortable way to create customised knowledge graphs is to make use of Linked Data sources like FreebaseFreebase is a large collaborative knowledge base. It is an online collection of structured data harvested from many sources, including individual 'wiki' contribution. Freebase aims to create a global resource which allows people (and machines) to access common information more effectively. It is ... (like Google does) or DBpediaDBpedia is a project aiming to extract structured information from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, .... More details wanted? Take a look at the PoolParty approach for efficient knowledge modeling.