The Semantic Puzzle

Andreas Blumauer

Metaweb´s Jamie Taylor: “Freebase provides a large and user extensible vocabulary for RDF/RDFa”

Jamie Taylor, Metaweb

Jamie Taylor, Metaweb

Andreas BlumauerAndreas Blumauer has studied business informatics at Vienna University of Economics and Business and at Vienna University of Technology and has obtained a Master's degree in business studies. He started his career in 2001 as software developer for financial services organisations. In 2001 he was ... from Semantic Web CompanyThe Semantic Web Company (SWC), based in Vienna, provides companies, institutions and organizations with professional services related to the Semantic Web, semantic technologies and Social Software (SWCThe Semantic Web Company (SWC), based in Vienna, provides companies, institutions and organizations with professional services related to the Semantic Web, semantic technologies and Social Software) talked with Jamie Taylor, Minister of Information at Metaweb Technologies Inc. about FreebaseFreebase is a large collaborative knowledge base. It is an online collection of structured data harvested from many sources, including individual 'wiki' contribution. Freebase aims to create a global resource which allows people (and machines) to access common information more effectively. It is ... & Linked Data and GoogleGoogle Inc. is a multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program. The company was ...´s announcement to use RDFa.

SWC: At ISWC 2008 Freebase became “officially” part of the LOD Cloud. What exactly has changed since that time?

Jamie: Since Freebase is a community writable semantic database, the addition of the RDF interface allows anyone to publish data into the LOD cloud. LODLinked Open Data (LOD) stands for freely available data on the World Wide Web, which can be identified via Uniform Resource Identifier (URI) and can be accessed and retrieved directly via HTTP. Finally link your data to other data to provide context. Applications can access any Freebase Topic through the RDF interface by constructing a URI from the Freebase identifier.  But perhaps more importantly, because entities in Freebase can be annotated with multiple identifiers, Freebase Topics can be retrieved by constructed URIs using the identifiers used by other systems and data sets.
For instance, the movie Blade Runner can be referred to as http://rdf.freebase.com/ns/en.blade_runner, but it can also be referenced as http://rdf.freebase.com/ns/authority.netflix.movie.70053131 using the Netflix identifier, http://rdf.freebase.com/ns/authority.imdb.title.tt0083658 using the IMDBThe Internet Movie Database (IMDb) is an online database of information related to movies, television shows, actors, production crew personnel, video games, and most recently, fictional characters featured in visual entertainment media. IMDb launched on October 17, 1990, and in 1998 was acquired ... identifier, or as http://rdf.freebase.com/ns/wikipedia.en.Dangerous_Days using a Wikipedia wikiword (which in this case is a Wikipedia redirect to the wikiword Blade_Runner).
Freebase also provides a user maintained mapping of how these identifiers can be used to address resources in other LOD systems. The sameasService that helps to find co-references between different data sets. (http://sameas.org).freebase.com schema can tell an LOD user that the Freebase Blade Runner Topic can also be found in DBpedia using Wikipedia identifiers or how musical artists can be found at the BBC using Musicbrainz identifiers.  In fact, the Freebase RDF interface uses the sameas.freebase.com schema to create the owl:sameAs links in the RDF output allowing the user community to expand the interconnections between Freebase and the LOD Cloud.
Linked Data providers are also using the strong identifiers in Freebase to identify entities such as companies and locations in their own data sets.  When they find an entity that is not represented in Freebase, they simply add the entity to Freebase and use the newly minted Freebase identifier.  This permits anyone using their data to understand how their entities relates to any of the more than 5 million things interconnected within Freebase.

The RDF interface can also be used to reference the Freebase type system, giving LOD data set providers vocabularies across a wide range of subject areas.  And because anyone can expand Freebase’s data model, data providers can use our schema development tools to build and extend these vocabularies to suite their needs.
Freebase was not designed for ephemeral or fast changing data, like weather conditions or stock ticks.  But this type of information is well suited for publication as Linked Data.  Freebase entities representing a location or company can be annotated with references to LOD services that provide these types of volatile data.  Similarly, Linked Data provides a great way to disseminate very fined grained information that might be associated with a scientific study or financial report.  Linked Data provides a seemless transition from Freebase, where a user (or application) can run a query with constraints that run across a wide range of types to find entities of interest along with the LOD services that provide access to temporal or high resolution data not available in Freebase.
We recently demonstrated MQL Extensions which allows the Metaweb Query Language to use data from other systems as a part of the query constraint and result set.  While MQL Extensions are user extensible and work with a wide array of systems,  this capability makes the connection between Freebase and the LOD Cloud even more transparent.
For example, because US companies that are registered with the SEC are annotated CIK code in Freebase and the sameas.freebase.com schema indicates that the CIK annotationAn annotation is notes that you make to yourself while you are reading information in a book, document, online record, video, software code or other information, "in the margin", or perhaps just underlined or highlighted passages. Annotated bibliographies, give descriptions about how each source ... can be used to create a URI that is dereferencable at rdfabout.com, it is possible to write a MQL query that asks who is on the board of financial services companies that trade on NASDAQ and are  headquartered in CaliforniaCalifornia is the most populous state in the United States, and the third largest by area. California is the second most populous sub-national entity in the Americas, behind only São Paulo, Brazil. It is located on the West Coast of the United States, and is bordered by Oregon to the north, ... (and using another MQL Extension, you can ask for their stock price as well!)

SWC: Many organisations are very interested in Linking Open DataLinked Open Data (LOD) stands for freely available data on the World Wide Web, which can be identified via Uniform Resource Identifier (URI) and can be accessed and retrieved directly via HTTP. Finally link your data to other data to provide context. now but they are still not sure if they can benefit from publishing data on the web – what´s your experience so far?

Jamie: Linked Open DataLinked Open Data (LOD) stands for freely available data on the World Wide Web, which can be identified via Uniform Resource Identifier (URI) and can be accessed and retrieved directly via HTTP. Finally link your data to other data to provide context. provides a simple, standard way for organizations to distribute structured data.  For most organizations, providing access to data is another important outlet to announce the availability of higher value services.  For organizations involved in building or selling physical goods, the bits representing what they provide are not the goods themselves, but a way of attracting potential customers.  Making catalogs and specification sheets available in electronic form, so other applications can connect buyers to their physical goods is simply an effective marketing system.  Even for firms involved in electronic services, providing access to open structured data is generally a lead-in to value added services.  For instance, if I ran a service collecting hard-to-find information about manufacturing relationships between medium sized businesses, I would publish open company profiles covering things like market size, industry, location for the medium-sized businesses I tracked, so potential users the premium data would know I had the coverage they were looking for.

SWC: Just recently Google has announced to use RDFa to enhance their search results. What do you think?

Jamie: We are excited about Google’s announcement. YahooYahoo! Inc. is an American public corporation headquartered in Sunnyvale, California,, that provides Internet services worldwide. The company is perhaps best known for its web portal, search engine, Yahoo! Directory, Yahoo! Mail, Yahoo! News, advertising, online mapping, video sharing, and ...’s use of RDFa for Search Monkey and Google’s announcement gives RDFa users tangible benefits. The Search Monkey team was very quick to realize that because users can create data models in Freebase, and because the elements of those models all have strong RDF identifiers, Freebase provides a large and user extensible vocabulary for RDF/RDFa (see the list of vocabularies). When a user wants to create a Search Monkey application that works with their film review site, they need not invent a new vocabulary (that will probably be used only once),  they can use the Freebase Film Domain vocabulary which supports over 63,000 instances in Freebase alone.
Similarly, with over 5 Million well described Topics in Freebase and over 14,000,000 Named Objects (Topics, images, musical tracks and documents) when a user wants to unambiguously identify a subject or object in RDF/RDFa, Freebase has an extremely large collection of identifiers to draw from.  These cover people, places, companies, movies, music, books and wide variety of other subjects.  If Freebase doesn’t have the entity the user is looking for, they can of course add it themselves and make use of the identifier immediately. I think this is why Google used some Freebase identifiers in their examples. We hope that with Yahoo and Google’s support for RDFa the web will become a strongly annotated source of data which can support a wide range of user applications.

SWC: Thank you, Jamie!

Reblog this post [with Zemanta]