Do you like Google’s Knowledge Graph?

Semantic Enterprise Search enters the second phase.

Finally the Knowledge Graph has arrived in Europe: What has been provided on for the US-Market since May 2012, is now available also for most European countries. Search results are no longer only a list of documents (and advertisements) but also a mashup of facts, points of interest, events etc. referring to the search phrase.

For example, if the user is searching for ‘Wiener Philharmoniker’ (‘Vienna Philharmonic Orchestra’) a factbox including related searches is provided:

Do you like this rather new way of knowledge discovery? We do, except the fact that Google hasn´t properly explained to the audience which technology is behind the Knowledge Graph which is the Web of Linked Data aka the Semantic Web (Do you want to know more about the relationship between the Knowledge Graph and Linked Data? Click here).

But anyway, here are some benefits we can see, if search technologies make use of a ‘knowledge graph’, a ‘knowledge model’, a ‘thesaurus’ or generally spoken: Linked Data.

  • Facts around an object (or an entity) can be found nicely packed up to a dossier
  • Serendipity can be stimulated by ‘related searches’ which means: Users can discover the formely ‘unknown’ in a more comfortable way
  • Data from various sources can be pulled together to a mashup (e.g. ‘upcoming events’ could come from a different database than the basic facts of Vienna Philharmonic Orchestra)
  • Search phrases are well understood by the engine since they are based on concepts and not anymore on literals, e.g. if the user searches for ‘Red Bull Stratos’, also results for ‘Felix Baumgartner’ will be delivered
  • Search can be refined, e.g. if one searches for ‘Vienna’, a list of POIs will be displayed to refine the actual place the user is looking for

Now imagine you would have a search engine in your company’s intranet based on a knowledge graph which is about the enterprise you are working for.

Such an advanced search application would look like this:

  • Data streams and all kind of content from internal sources are nicely mashed with information from the web (e.g. from Twitter, Youtube etc.)
  • Search assistants are provided to help users to refine their information needs to make them more specific
  • Entities and their sub-concepts (e.g. subsidiaries of large companies or regions of countries) are nicely packed together to one dossier

The key question now is: “how to set up a customised knowledge graph for a certain company?”.

Corporate Semantic Web based applications can be realised on top of software platforms like PoolParty. They all have a customised knowledge graph in their core. This is always the basis for concept-based indexing of specialised content from a corporate intranet. The basic standard for this is SKOS which can be used together with advanced query languages like SPARQL. Such graphs can be used for semantic indexing but also to ask for relations like ‘is point-of-interest in’, ‘is event of’, ‘is related search for’ etc. This is the next-generation semantic search which help decision-makers, information professionals and all kind of knowledge workers to improve their work significantly.
One comfortable way to create customised knowledge graphs is to make use of Linked Data sources like Freebase (like Google does) or DBpedia. More details wanted? Take a look at the PoolParty approach for efficient knowledge modeling.
Semantic Web Company’s Florian Kondert starts his mission in San Francisco

Semantic Web Company’s Florian Kondert takes part in the technology initiative “GoSiliconValley”, which is part of the go-international export initiative run by the Austrian Federal Economic Chamber and the Federal Ministry of Economy, Family and Youth. It allows Austrian IT to complete a 3-month program using the Business Accelerator “Plug and Play Tech Center” in Sunnyvale to kick-start their businesses. Florian Kondert has just arrived there and told us about his first impressions.

CC by sa / Coolcaesar

Florian Kondert: The Valley is famous for innovations in high tech – for a good reason. Professionals from varying domains share their expertise and discuss about opportunities to fix a problem faster and more intensely than anywhere else on the world. In my point of view semantic technologies are considered as a very normal approach to face information-intense issues. But – here you don’t talk about how the technology works in detail – you focus on living examples, business use cases and benefits that you could receive by using semantic technologies. Long story short: “if you can help me to solve my problem, I don’t care what it is.”

Semantic Puzzle: Somehow it seems that leading US companies in the field brew their own kind of Semantic Web. Google’s Knowledge Graph and Facebook’s Social Graph are the Semantic Web re-brushed and renamed. Is this some kind of adoption of the technology by the US market? And as a European visitor, how do you see this uncooporative gesture to the Semantic Web (which has a lot of roots on the old continent)?

Florian Kondert: People understand “knowledge” and people have an idea about “social”, but how many people know what’s behind “semantic”? Here it might happen, that you don’t use the word “semantic” during a whole discussion once. From a visitors perspective I have the impression that communicators here work really hard to make 120% sure that people understand what you are talking about. Thus, you use your opposite number’s vocabulary. Another reason is again the speed of interaction. You can’t stick within definitions or explanations. So you “simply” engage people to understand the benefit, like “it is more social” or it “helps you to enlarge your knowledge”.

Semantic Puzzle: In Europe, Linked Data still attracts only companies and institutions of a bigger scale. SME and smaller organizations do not invest in this technological change. Is this the same in the US or is linked data a broader issue?

Florian Kondert: Based on the discussions I’ve had until now, linked (open) data is a new thing for most people, even here at Silicon Valley. But SME guys are totally positive towards the idea of using freely available, structured data. Why? Because competition here is amazing. There are probably 10 companies in an area of 20 miles developing pretty much the same service. The more efficient (integrating data) and valuable (data mash-ups) the service is, the better the position on the market. Within one week 6 SMEs asked me to show them more about linked data principles and the technologies that should be used. People here are very curious, not afraid about new things.

Semantic Puzzle: From now on you are on a mission to meet and talk to various experts in the field. What are the next planned stops and events? How can someone meet you?

Florian at SFNewtech

Florian Kondert: I’m grateful that I can meet some real semantic web and linked data ambassadors like Roger MacDonald form (Leader of Semantic Open Media Platform Project at Internet Archive) or Jeanne Holm (Evangelist at; W3C Co-Chair eGovernment Interest Group, Chief Knowledge Architect at NASA). Just on Tuesday I had my first US-presentation at SFNewTech and I’m invited to present what SWC does at Lotico meetup in San Francisco on Wednesday 8th of August and at the LA Semantic Web meetup Group on Tuesday 14th of August. From August 21-23 I’ll join NoSQL Now! conference in San José. More to come.

You’ll most probably meet me around in Sunnyvale, San José, Palo Alto and San Francisco – whenever the discussion turns towards sophisticated information management. I’m drinking a lot of coffee these days – so let’s get one and discuss about´ your issues.

Has Google hi-jacked the Semantic Web?

Just recently Google has launched the ‘Knowledge Graph‘ (GKG) which “understands real-world entities and their relationships to one another: things, not strings.” Has Google hi-jacked the idea of the ‘Semantic Web’ or at least its vocabulary?

Sean Golliher has compared the most central concepts of the SemWeb community to the wording of Google in his blog post, for instance: Google doesn´t talk about ‘Linked data’ or ‘URIs’ but rather about ‘things and their relationships’. We don´t know if Google uses standards like RDF but obviously a lot of concepts and ideas developed by the SemWeb community in recent years were implemented in GKG. Some people complain that Google should clearly state that this is an implementation of the ‘Semantic Web’ (which was not invented by Google), others say that most concepts like ‘taxonomies’ have been around for hundreds of years anyway.

I believe that both sides have now a great chance to work together: Whether Google’s goal, to “build the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do”, can be reached or not is a matter of the intelligence of the employees. A lot of potential can be found within the semantic web community: If Google gives credit where it is due, semantic web people will be a bit more inspired to support an eco-system built around GKG – and it won´t last long until an ‘Open Knowledge Graph’ will fit together with Google´s revenue model.

Transforming spreadsheets into SKOS with Google Refine

Looking for high quality enterprise vocabularies we recently turned our attention to the Global Industry Classification Standard (GICS), which is an industry taxonomy designed to categorize any private company. It was developed by Morgan Stanley Capital International and Standard & Poor’s and is mainly used by the global financial community to aid in the investment research process.

It is available for download as .xls spreadsheet files in several languages. Of course it would be much better to have this valuable taxonomy in a standard and machine-readable format. The Simple Knowledge Organization System SKOS is a perfect fit for a taxonomy like GICS. But how to turn a spreadsheet into SKOS with minimal manual effort?

I chose to try Google Refine for this task, as recently a promising RDF extension had been released by DERI‘s Fadi Maali and Richard Cyganiak.

Google Refine is “a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases”. Previously it was known as Freebase Gridworks which is now further developed by Google since its acquisition of Metaweb.


Google Refine UI

Refine is a very useful tool to filter and consequently transform rows, colums and cells according to customizable patterns.

After applying all necessary transformations to the spreadsheet one can edit the “RDF Skeleton”, where the columns can be mapped to literals, RDF properties and RDF classes (which can be imported from their namespaces).

RDF Sekeleton

Editing the RDF Sekeleton

Once you got your valid SKOS model ready you can export it in RDF/XML or Turtle format. Then you may want to load it into an ontology editor like Protégé or a thesaurus management tool like PoolParty in order to build upon it or connect it to other knowledge models. With PoolParty the GICS taxonomy can also be utilized to tag and categorize documents, provide semantic search and facetted navigation and it can be published as Linked Data without further effort.

GICS in PoolParty screenshot

GICS loaded in PoolParty

Working with Refine and its RDF extension was easy and fun. It’s even possible to isolate and save the transformation steps done with Refine, so one can re-apply them on similar structured spreadsheets. This came in very handy as GICS is published in nine languages and as many separate, identically structured spreadsheets.