Andreas Blumauer

Metaweb´s Jamie Taylor: “Freebase provides a large and user extensible vocabulary for RDF/RDFa”

Jamie Taylor, Metaweb

Jamie Taylor, Metaweb

Andreas Blumauer from Semantic Web Company (SWC) talked with Jamie Taylor, Minister of Information at Metaweb Technologies Inc. about Freebase & Linked Data and Google´s announcement to use RDFa.

SWC: At ISWC 2008 Freebase became “officially” part of the LOD Cloud. What exactly has changed since that time?

Jamie: Since Freebase is a community writable semantic database, the addition of the RDF interface allows anyone to publish data into the LOD cloud. LOD Applications can access any Freebase Topic through the RDF interface by constructing a URI from the Freebase identifier.  But perhaps more importantly, because entities in Freebase can be annotated with multiple identifiers, Freebase Topics can be retrieved by constructed URIs using the identifiers used by other systems and data sets.
For instance, the movie Blade Runner can be referred to as http://rdf.freebase.com/ns/en.blade_runner, but it can also be referenced as http://rdf.freebase.com/ns/authority.netflix.movie.70053131 using the Netflix identifier, http://rdf.freebase.com/ns/authority.imdb.title.tt0083658 using the IMDB identifier, or as http://rdf.freebase.com/ns/wikipedia.en.Dangerous_Days using a Wikipedia wikiword (which in this case is a Wikipedia redirect to the wikiword Blade_Runner).
Freebase also provides a user maintained mapping of how these identifiers can be used to address resources in other LOD systems. The sameas.freebase.com schema can tell an LOD user that the Freebase Blade Runner Topic can also be found in DBpedia using Wikipedia identifiers or how musical artists can be found at the BBC using Musicbrainz identifiers.  In fact, the Freebase RDF interface uses the sameas.freebase.com schema to create the owl:sameAs links in the RDF output allowing the user community to expand the interconnections between Freebase and the LOD Cloud.
Linked Data providers are also using the strong identifiers in Freebase to identify entities such as companies and locations in their own data sets.  When they find an entity that is not represented in Freebase, they simply add the entity to Freebase and use the newly minted Freebase identifier.  This permits anyone using their data to understand how their entities relates to any of the more than 5 million things interconnected within Freebase.

The RDF interface can also be used to reference the Freebase type system, giving LOD data set providers vocabularies across a wide range of subject areas.  And because anyone can expand Freebase’s data model, data providers can use our schema development tools to build and extend these vocabularies to suite their needs.
Freebase was not designed for ephemeral or fast changing data, like weather conditions or stock ticks.  But this type of information is well suited for publication as Linked Data.  Freebase entities representing a location or company can be annotated with references to LOD services that provide these types of volatile data.  Similarly, Linked Data provides a great way to disseminate very fined grained information that might be associated with a scientific study or financial report.  Linked Data provides a seemless transition from Freebase, where a user (or application) can run a query with constraints that run across a wide range of types to find entities of interest along with the LOD services that provide access to temporal or high resolution data not available in Freebase.
We recently demonstrated MQL Extensions which allows the Metaweb Query Language to use data from other systems as a part of the query constraint and result set.  While MQL Extensions are user extensible and work with a wide array of systems,  this capability makes the connection between Freebase and the LOD Cloud even more transparent.
For example, because US companies that are registered with the SEC are annotated CIK code in Freebase and the sameas.freebase.com schema indicates that the CIK annotation can be used to create a URI that is dereferencable at rdfabout.com, it is possible to write a MQL query that asks who is on the board of financial services companies that trade on NASDAQ and are  headquartered in California (and using another MQL Extension, you can ask for their stock price as well!)

SWC: Many organisations are very interested in Linking Open Data now but they are still not sure if they can benefit from publishing data on the web – what´s your experience so far?

Jamie: Linked Open Data provides a simple, standard way for organizations to distribute structured data.  For most organizations, providing access to data is another important outlet to announce the availability of higher value services.  For organizations involved in building or selling physical goods, the bits representing what they provide are not the goods themselves, but a way of attracting potential customers.  Making catalogs and specification sheets available in electronic form, so other applications can connect buyers to their physical goods is simply an effective marketing system.  Even for firms involved in electronic services, providing access to open structured data is generally a lead-in to value added services.  For instance, if I ran a service collecting hard-to-find information about manufacturing relationships between medium sized businesses, I would publish open company profiles covering things like market size, industry, location for the medium-sized businesses I tracked, so potential users the premium data would know I had the coverage they were looking for.

SWC: Just recently Google has announced to use RDFa to enhance their search results. What do you think?

Jamie: We are excited about Google’s announcement. Yahoo’s use of RDFa for Search Monkey and Google’s announcement gives RDFa users tangible benefits. The Search Monkey team was very quick to realize that because users can create data models in Freebase, and because the elements of those models all have strong RDF identifiers, Freebase provides a large and user extensible vocabulary for RDF/RDFa (see the list of vocabularies). When a user wants to create a Search Monkey application that works with their film review site, they need not invent a new vocabulary (that will probably be used only once),  they can use the Freebase Film Domain vocabulary which supports over 63,000 instances in Freebase alone.
Similarly, with over 5 Million well described Topics in Freebase and over 14,000,000 Named Objects (Topics, images, musical tracks and documents) when a user wants to unambiguously identify a subject or object in RDF/RDFa, Freebase has an extremely large collection of identifiers to draw from.  These cover people, places, companies, movies, music, books and wide variety of other subjects.  If Freebase doesn’t have the entity the user is looking for, they can of course add it themselves and make use of the identifier immediately. I think this is why Google used some Freebase identifiers in their examples. We hope that with Yahoo and Google’s support for RDFa the web will become a strongly annotated source of data which can support a wide range of user applications.

SWC: Thank you, Jamie!

Reblog this post [with Zemanta]
Andreas Blumauer

BBC Music relaunch: Linked Data goes Business?

Since SWC is involved in a couple of semantic web projects in the media industry, I was watching for the BBC Music relaunch. Now the new platform is online – and from an enduser’s perspective the new system offers comfortable ways to navigate through the world of music: Bands, their members, biographies and outgoing links like to Wikipedia or MySpace are retrieved from MusicBrainz and mashed up with BBC blogs, playlists or reviews.

bbc_music

Matthew Shorter, interactive editor for music at the BBC, told silicon.com:

We’re kind of on a journey of moving from what’s effectively a magazine/print publication-based metaphor around web publishing…to a world where we recognise that that’s not the way that people use the web.

No doubt: Linked Data is a great deal for the end-users but what´s in for the providers, in this case for BBC?

From a media company’s perspective Shorter has mentioned a handful of interesting arguments why linked data could be useful:

  1. reusing data from MusicBrainz and Wikipedia also provides better value for the licence payer as the BBC isn’t wasting resources reproducing data already in the public domain
  2. from an SEO point of view, once we start generating a lot of meaningful links among our pages, then we’re going to improve the find-ability of our content via web search
  3. by having as open a platform as we can, then our hope at least is that people will pick up that content and do things with it and we’ll benefit from incoming links as a result

This could be summarised as follows (by adding a fourth item):

  1. re-use existing data
  2. increase find-ability
  3. extend your eco-system
  4. understand users’ interests

By saying that linked data can help providers to understand their users in a more profound way which is based on the more granular way how information is offered in the linked data world (paradigm shift: page versus linked data) I´d like to ask a short, value-free question: Which side of the internet will drive the business in the future – the visible web or the deep web? Was linked data designed only for the visible web?

Reblog this post [with Zemanta]
Andreas Blumauer

Pimp your Google

Sure, that´s not the end of the flagpole – but “a little semantics goes a long way” (Jim Hendler): With two Firefox add-ons, you can pimp your Google and you will get (1) a better overview over the search results, (2) kind of a moderated search and (3) information from Wikipedia along with the results.

Install Cloudlet and Googlepedia (Don´t forget to donate!) and you will see something like this:

pimp_your_google

Sure, both “mashups” are not based on RDF, and the “TagCloud” is not as accurate as we wished, but let us be patient again. At least this picture makes end-users yearning for a bit more semantics (which goes a long way…) on top of the usual lists of search results.

Pascal Hitzler

Semantic MediaWiki In Popular Media

Semantic MediaWikiSemantic MediaWiki is being featured in issue 12/2008 of the German popular computer magazine iX in an article about wiki engines. It’s the only semantic wiki among those presented, and although it is an extension of MediaWiki (which underlies Wikipedia) – which is also in the article – it is discussed separately and thus receives quite some emphasis in the article. iX has featured Semantic MediaWiki before, more precisely in an article dedicated to it in 11/2007. It’s well-deserved, I think, considering the many sites which use Semantic MediaWiki.

It’s good to see that the visibility of Semantic Web is also growing outside academia and involved industry.

Author: Pascal Hitzler