Jamie Taylor, Metaweb
Andreas Blumauer from Semantic Web Company (SWC) talked with Jamie Taylor, Minister of Information at Metaweb Technologies Inc. about Freebase & Linked Data and Google´s announcement to use RDFa.
SWC: At ISWC 2008 Freebase became “officially” part of the LOD Cloud. What exactly has changed since that time?
Jamie: Since Freebase is a community writable semantic database, the addition of the RDF interface allows anyone to publish data into the LOD cloud. LOD Applications can access any Freebase Topic through the RDF interface by constructing a URI from the Freebase identifier. But perhaps more importantly, because entities in Freebase can be annotated with multiple identifiers, Freebase Topics can be retrieved by constructed URIs using the identifiers used by other systems and data sets.
For instance, the movie Blade Runner can be referred to as http://rdf.freebase.com/ns/en.blade_runner, but it can also be referenced as http://rdf.freebase.com/ns/authority.netflix.movie.70053131 using the Netflix identifier, http://rdf.freebase.com/ns/authority.imdb.title.tt0083658 using the IMDB identifier, or as http://rdf.freebase.com/ns/wikipedia.en.Dangerous_Days using a Wikipedia wikiword (which in this case is a Wikipedia redirect to the wikiword Blade_Runner).
Freebase also provides a user maintained mapping of how these identifiers can be used to address resources in other LOD systems. The sameas.freebase.com schema can tell an LOD user that the Freebase Blade Runner Topic can also be found in DBpedia using Wikipedia identifiers or how musical artists can be found at the BBC using Musicbrainz identifiers. In fact, the Freebase RDF interface uses the sameas.freebase.com schema to create the owl:sameAs links in the RDF output allowing the user community to expand the interconnections between Freebase and the LOD Cloud.
Linked Data providers are also using the strong identifiers in Freebase to identify entities such as companies and locations in their own data sets. When they find an entity that is not represented in Freebase, they simply add the entity to Freebase and use the newly minted Freebase identifier. This permits anyone using their data to understand how their entities relates to any of the more than 5 million things interconnected within Freebase.
The RDF interface can also be used to reference the Freebase type system, giving LOD data set providers vocabularies across a wide range of subject areas. And because anyone can expand Freebase’s data model, data providers can use our schema development tools to build and extend these vocabularies to suite their needs.
Freebase was not designed for ephemeral or fast changing data, like weather conditions or stock ticks. But this type of information is well suited for publication as Linked Data. Freebase entities representing a location or company can be annotated with references to LOD services that provide these types of volatile data. Similarly, Linked Data provides a great way to disseminate very fined grained information that might be associated with a scientific study or financial report. Linked Data provides a seemless transition from Freebase, where a user (or application) can run a query with constraints that run across a wide range of types to find entities of interest along with the LOD services that provide access to temporal or high resolution data not available in Freebase.
We recently demonstrated MQL Extensions which allows the Metaweb Query Language to use data from other systems as a part of the query constraint and result set. While MQL Extensions are user extensible and work with a wide array of systems, this capability makes the connection between Freebase and the LOD Cloud even more transparent.
For example, because US companies that are registered with the SEC are annotated CIK code in Freebase and the sameas.freebase.com schema indicates that the CIK annotation can be used to create a URI that is dereferencable at rdfabout.com, it is possible to write a MQL query that asks who is on the board of financial services companies that trade on NASDAQ and are headquartered in California (and using another MQL Extension, you can ask for their stock price as well!)
SWC: Many organisations are very interested in Linking Open Data now but they are still not sure if they can benefit from publishing data on the web – what´s your experience so far?
Jamie: Linked Open Data provides a simple, standard way for organizations to distribute structured data. For most organizations, providing access to data is another important outlet to announce the availability of higher value services. For organizations involved in building or selling physical goods, the bits representing what they provide are not the goods themselves, but a way of attracting potential customers. Making catalogs and specification sheets available in electronic form, so other applications can connect buyers to their physical goods is simply an effective marketing system. Even for firms involved in electronic services, providing access to open structured data is generally a lead-in to value added services. For instance, if I ran a service collecting hard-to-find information about manufacturing relationships between medium sized businesses, I would publish open company profiles covering things like market size, industry, location for the medium-sized businesses I tracked, so potential users the premium data would know I had the coverage they were looking for.
SWC: Just recently Google has announced to use RDFa to enhance their search results. What do you think?
Jamie: We are excited about Google’s announcement. Yahoo’s use of RDFa for Search Monkey and Google’s announcement gives RDFa users tangible benefits. The Search Monkey team was very quick to realize that because users can create data models in Freebase, and because the elements of those models all have strong RDF identifiers, Freebase provides a large and user extensible vocabulary for RDF/RDFa (see the list of vocabularies). When a user wants to create a Search Monkey application that works with their film review site, they need not invent a new vocabulary (that will probably be used only once), they can use the Freebase Film Domain vocabulary which supports over 63,000 instances in Freebase alone.
Similarly, with over 5 Million well described Topics in Freebase and over 14,000,000 Named Objects (Topics, images, musical tracks and documents) when a user wants to unambiguously identify a subject or object in RDF/RDFa, Freebase has an extremely large collection of identifiers to draw from. These cover people, places, companies, movies, music, books and wide variety of other subjects. If Freebase doesn’t have the entity the user is looking for, they can of course add it themselves and make use of the identifier immediately. I think this is why Google used some Freebase identifiers in their examples. We hope that with Yahoo and Google’s support for RDFa the web will become a strongly annotated source of data which can support a wide range of user applications.
SWC: Thank you, Jamie!