Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

The Day after Freebase went RDF

October 30, 2008 By: Jana Herwig Category: Linked Data & Open Data, Mashups & Web services 6 Comments →

So what’s been happening on the blogosphere after John Giannandrea’s keynote at ISWC and the revelation that Freebase now produces Linked Data from an RDF service

Tetherless World sums up the Freebase facts (e.g. 156,000,000 assertions made; 1370 published types; 75 domains; graph model, identity, web based) and further points out that ontology creation “is a social process, and both freebase and semantic wiki are tools that enable users to create ontological vocabulary without worrying too much on building a comprehensive ontology.”

Inkdroid notes that the RDF service release “is important news because Freebase is an active community of content creators, creating rich data-centric descriptions with a wiki style interface, fancy data loaders, and useful machine APIs.” This is followed up by a quick and handy tutorial how you can get machine readable data back from freebase using a URI with Freebase. Conclusion:

So why is this important? Because following your nose in HTML is what enabled companies like Lycos, AltaVista, Yahoo and Google to be born. It allowed for agents to be able to crawl the web of documents and build indexes of the data to allow people to find what they want (hopefully). Being able to link data in this way allows us to harvest data assets across organizational boundaries and merge them together. It’s early days still, but seeing an organization like Freebase get it is pretty exciting.

Yves Raimond was the first to wonder on the public W3C LOD mailinglist: “now, to see whether it links to other datasets :-) ” – the idea of having linked data without the linkage would indeed seem like love’s labour lost. Semantic Focus / James Simmons seconds: “One downside is the data doesn’t appear to link to external resources, in a sense walling itself in. It should be trivial to link the topics that came from Wikipedia back to Wikipedia as well as DBpedia (which would be killer, by the way).” This is followed up a later post, where James expresses concerns regarding the relationship DBpedia / Freebase: “Freebase may see a drop in userbase growth and participation if it becomes a mirror of DBpedia (or vice-versa) and the popularity once garnered by one project may shift towards the other, or away entirely.”

More News / Andrew Newman puts the Freebase RDF service release in context with Cathrin Weiss’ “250 million triples on your iphone” submission, iMoCo, to the Billion triples challenges, also DBpedia and Semaplorer, developed at the University of Koblenz:

DBPedia stood out because it was the only one that allowed you to write data to the Semantic Web rather than just read the carefully prepared triples. For a similar reason I though SemaPlorer was good because they tried to do more than just the standard triples but went that extra bit further by making it more generic like integrating flickr. But they were all excellent, all of them showing what you get with a billion or more triples and inferencing.

That combined with the guys at Freebase making all of their data available as RDF and it was a big day for the Semantic Web.

ARQtick / AndyS plays a bit with the Blade Runner example cited by Freebase, e.g. takes a look at the graph, looks for interesting properties and extracts author names

N.B. If you want to follow ARQtick’s example: use the Linked Data browser plugin Tabulator or go to the Marbles site to view the RDF – without a data browser you’ll be redirected to the HTML page. You will also need it to make sense of rdf.freebase.com.

Sphere: Related Content

Bringing (Legacy) Data to the Web [WOD-PD]

October 22, 2008 By: Jana Herwig Category: Conferences & Events, Linked Data & Open Data 2 Comments →

The third session at WOD-PD was dedicated to “Bringing (Legacy) Data on the Web“, and led by Sören Auer (University of Leipzig, Germany) and Orri Erling (OpenLink Software) .

Sören Auer giving a talkSören Auer described the difference between the Web 1.0, 2.0 and 3.0 as follows: On the Web 1.0, you had many websites that provided unstructured, mainly textual content. On the Web 2.0, you have a few large websites that are specialised on specific content types. And, finally, on the Web 3.0, there are many websites which contain, and are able to semantically syndicate, arbitrarily structured content.

So why would we need another web? What you cannot do with the current web is finding answers to seemingly complex, yet in reality pretty mundane question such as: Where in Leipzig do I find an apartment that is close to bilingual, German-French child care facilities? Are there any ERP service providers which have offices in Vienna and Berlin? Who are the researchers in South-East Asia currently working on database related topics?

Sören further discussed three of the present means of bringing relation data to the web: Triplify (a web application plugin that exposes data from relational databases in RDF), D2RQ (a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies, developed at Free University Berlin), and Virtuoso Universal Server (a middleware and database engine hybrid delivering for instance data integration for SQL, RDF, XML, Web Services). With respect to Triplify, Sören – who is Triplify’s founder and main developer at AKSW Uni Leipzig – showed and discussed the configuration for Wordpress 2.1., which can be found here (click here for more configurations, e.g. for Joomla, OpenConf and Drupal). The next aim for Triplify is to become an integral part in enduser web app distibutions.

And important question raised by Sören was: How do next generation search engines know that something has changed on the web of data? He suggested three approaches:

  1. Always try to crawl everything (this may sound silly – but that’s actually what is happening on the current web)
  2. Ping a central update notification service – e.g. PingTheSemanticWeb.com – which works as a showcase, but will probably not scale if the data web gets really deployed.
  3. Each linked data endpoint publishes an update log – e.g. with Triplify, as a special folder inside the Triplify namespace, e.g. http://example.com/Triplify/update

Also discussed by Sören and worth checking out is Reuters’ Semantic proxy – the demo went live in late September.

Orri Erling, as the lead developer of the Virtuoso Team, addressed the issue of mapping relational databases to RDF with OpenLink Virtuoso. In his talk, he addressed the pros and cons of RDF data warehouse:

Pros

  • Even query performance across all data
  • Possibility of forward-chaining inference
  • Some SPARQL features may be better supported, e.g. Unspecified predicates

Cons

  • Keeping data up-to-date
  • Complex set up, needs dedicated servers: you don’t build them on a whim

Orri Erling giving a talkWhat Virtuoso delivers is mapping of SPARQL to SQL against any existing schema (whether stored in Virtuoso or elsewhere); a physical quad-store (quad as in quadruple; not as in quad-bike :) ; and Federated/local Relational Data Base Management Systems (RDBMS).

A more detailed discussion of the requirements for Relational-to-RDF Mapping is available on Orri’s blog, where he discusses it in the light of his own experience. A power point presentation of a previous talk he gave to the W3C RDB2RDF Incubator Group can be downloaded here: Mapping Relational Databases to RDF with OpenLink Virtuoso (PPT, 115KB). His summary of the group discussions around the same topic, Requirements for Relational to RDF Mapping, can be found here.

Orri also showed the Virtuoso billion triples demo which, according to the corresponding blogpost, “is being worked on at the time of submission and may be shown online by appointment.” The demo was a submission to the Billion Triples Challenge.

Reblog this post [with Zemanta]
Sphere: Related Content