Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for April, 2010

A Dynamic Web Of Data

April 26, 2010 By: Michael Hausenblas Category: Linked Data & Open Data, Semantic Web Applications 2 Comments →

As a matter of fact things change – the Web of Data is no exception in that respect. While some sources, such as Twitter, are intrinsically dynamic, others change every now and then, potentially in unforeseeable intervals. In the recent Talis Nodalities Magazine, we made a case for Keeping up with a LOD of changes; here I’m going to elaborate a bit more on the current state of Dataset Dynamics and its challenges.

Let us first step a back a bit and have a look what Dataset Dynamics are and why this is important. In the Web of Linked Data we typically deal with datasets, for example, from the biomedical domain or the media industry on the one hand, and entities, such as a certain protein or people on the other. For the entity-level case established HTTP caching mechanism can be leveraged (see the Caching Tutorial and Things Caches Do). Further, with Memento, a HTTP-based versioning mechanisms has been proposed as well as implemented, adding a “time dimension” to HTTP (see Fig. 1).

Fig. 1 Memento Framework (Source: "An HTTP-Based Versioning Mechanism for Linked Data" Herbert Van de Sompel, Robert Sanderson, Michael Nelson, Lyudmila Balakireva, Harihar Shankar, Scott Ainsworth, LDOW 2010)

Dataset-level changes

However, tackling dataset-level changes is a rather new field with no agreed-upon, even less standardised solution handy. The main problem is that a dataset typically talks about many thousands to millions of distinct entities, which makes it impractical to apply entity-level solutions for a range of use cases, such as link maintenance or replication (see also Fig. 2).

Fig. 2 Change frequency vs. change volume

I often hear these days: “it seems there is no solution for handling of dataset-level changes”; nevertheless, I think quite the opposite it true. There are plenty of proposed solutions from both the academia and practitioners, targeting different challenges in the areas of:

  • Change discovery – how do I find out about about dataset changes?
  • Propagating changes - if there is a change, how is the change communciated to a consumer?
  • Change semantics – how do I learn what has changed (has been added, removed, etc.)?

Some proposals on the table are integrated approaches (such as DSNotify, SemanticPingback, Talis Changeset) while others focus on certain aspects (like the dady vocabulary for discovery or the Graph Update Ontology for change semantics) or deal concrete environments, for example sparqlPuSH for SPARQL enpdoints.

A Dataset Dynamics Manifesto

No matter on what (set of) solutions the community eventually agrees on to address the handling of dataset-level changes, it should adhere to the following principles:

  • light-weight
  • distributed and scalable
  • standards-based

Obviously, a light-weight (and ideally RESTful) approach lowers the barriers to adoption and enables a quick uptake. When I say light-weight, I mean it both in terms of protocol and code. It should be easy to integrate in RDF stores and libraries and available in all common Web programming languages including but not limited to Java, PHP, .NET family, etc.

Just as the Web of Data is a globally distributed dataspace, handling of changes should be done in a distributed fashion. There will be many different publishers and consumers (such as agents, indexer, consolidator platforms, etc.) of datasets with different requirements and capabilities. A distributed approach can cope with this challenge in a cost- and performance-efficient way. Tightly connected to this: It has to scale. Today, we’re dealing with some hundreds of LOD datasets. In the next couple of years, this will likely explode into the millions and hence one needs to be able to deal with such a growth. The same, just sooner, is true for the number of consumers of the changes.

Last but not least the Dataset Dynamics solution should be based on standards. It doesn’t necessarily need to be RDF for all of the challenges as outlined above. For example, Atom offers a standardised, extensible and widely accepted format to propagate changes; to take this further Pubsubhubbub can be utilised to enable a standardised, distributed publisher-subscriber scheme (Fig 3.)

Fig. 3 Pubsubhubbub - a standard-based, distributed publisher-subscriber-hub system (Source: http://docs.google.com/present/view?id=ajd8t6gk4mh2_34dvbpchfs)

As I’ve outlined above, it might still be too early for a conclusion on how to deal with dataset-level changes. However, people interested in this area have gathered already in the Dataset Dynamics group where solutions are discussed and implemented, potentially leading to a W3C standardisation work.

As an aside: in case you’re at the WWW2010 in Raleigh (NC, USA) these days, you may want to join the break-out meeting on Dataset Dynamics during the W3C Linked Open Data track on 29 April 2010.

(This blog post was written by Michael Hausenblas)

Sphere: Related Content

Social Semantic Web dawning?

April 22, 2010 By: Tassilo Pellegrini Category: Privacy & Information Ethics, Social Software 2 Comments →

Facebook — Open Graph — Semantic Search

Alex Wilhelm from The Next Web writes:

There is data outside of Facebook that the company wants to be brought in and made relevant inside of the Facebook platform. Enter the Open Graph protocol, Facebook’s way to say, in the common tongue ”all your graph are belong to Zuck.”

The product combines graphs, be they music graphs from Pandora or what have you, into the Facebook wider social graph. You can think of it has a “knit-up” with Facebook for other websites that are not Facebook affiliated.

Nick O’Neill from AllFacebook:

If HTML is the way developers get information into Google’s search engine, meta data is the way developers will get data into Facebook’s semantic search engine which will be based on the company’s “Open Graph”. Through the use of easy to implement plugins, Facebook is rapidly collecting structured data on every user. Facebook has also upgraded their API to make building on top of the Open Graph a much easier process. What’s pretty clear is that it’s an attempt to tackle the residing search giant.

[...] As Mark Zuckerberg said on stage an hour ago, by the end of the day Facebook should have more than 1 billion likes and that data will grow exponentially.

[...] There are a number of standards that have been created in the past as some developers have pointed out, microformats being the most widely accepted version, however the reduction of friction for implementation means that Facebook has a better shot at more quickly collecting the data. The race is on for building the semantic web and now that developers and website owners have the tools to implement this immediately.

Sphere: Related Content

Sören Auer: “Establishing a network effect around linked data is the most important R&D goal for the near future.”

April 15, 2010 By: Tassilo Pellegrini Category: Conferences & Events, Linked Data & Open Data, Politics, Privacy & Information Ethics No Comments →

Leipzig is one of Germany’s Semantic Web hotspots. From May 5-6, 2010 the annual Semantic Web Day provides the opportunity to catch up with latest developments especially in the domain of Linked Data and the foundation of the German chapter of the Open Knowledge Foundation. Organizer Sören Auer gave us some background information.

From May 5 – 6, 2010 the 3rd Semantic Web Day in Leipzig will take place. What will be this year’s topics? Who should attend?

The Semantic Web Day is targeting IT people, software developers, decision makers and users interested in learning about the potential of semantic technologies. The language during the event is German, so primarily Austrians, Swiss and Germans will attend. Beside semantic technologies a particular focus of this years event is open data in governments, public administrations and science. Although the programme is not yet finalized we already compiled an interesting number of talks and presentations including talks about the open biodiversity database Fishbase, the European Digital Library Europeana, a Linked Data project of the German Umweltbundesamt, use case presentations in the pharma, publishing and telecommunication industries and many more (cf. http://aksw.org/LSWT). Also, in addition to AKSW the Topic Maps Lab and the Web Data Integration Labs from Universität Leipzig be present at LSWT.

One of the highlights of this year`s Semantic Web Day is the official institutionalization of the German Chapter of the Open Knowledge Foundation. How did this come around? What does this mean for the OKF as a whole?

OKFN started to work in 2006 and since then managed to sucessfully complete a number of projects facilitating open knowledge. In particular, the Comprehensive Knowledge Archive Network (CKAN), the OKCon conference series, the open knowledge definition and recently OKFN’s involvement in the launch of data.gov.uk are prominent examples of OKFN’s successful work. However, many of the OKFN activities were primarily driven by an active group of volunteers in the UK. With the official launch of the German OKFN branch we will strengthen the international dimension of OKFN’s work. Especially for Germany, where data privacy and security are perceived to be most important, raising awareness for enabling open, standards compliant access to public information will be an important target of OKFN’s activities.

The InFAI has become one of the hotspots in Semantic Web development in Germany over the past few years. What are you working on at the moment? What are the most interesting research and development aspects for the near future?

From our point of view establishing a network effect around the publishing and use of linked data is the most important research and development goal for the near future. We just completed a first draft and implementations of a semantic enabled pingback method (http://aksw.org/Projects/SemanticPingBack), which applies a similar peer notification mechanism to linked data endpoints as it is widely deployed on the blogosphere. Other important research issues we are tackling with our partners are closing the performance gap between RDF and relational data management, increasing the coherence and quality of linked data and the provisioning of adaptive user interfaces for authoring and maintaining information on the data web.

About Sören Auer

Dr. Sören Auer leads the research group Agile Knowledge Engineering and Semantic Web (AKSW) at University of Leipzig. His research interests include Semantic Web technologies, knowledge representation, engineering and management, agile methodologies as well as databases and information systems. Sören is founder (respectively co-founder) of several high-impact research and community projects such as the Wikipedia semantification project DBpedia, the open-source innovation platform Cofundos.org or the social Semantic Web toolkit OntoWiki. Sören is author of over 50 peer-reviewed scientific publications, co-organiser of several workshops, chair of the Social Semantic Web conference 2007 and I-Semantics 2008, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.

Sphere: Related Content

Interview with Juan Sequeda: “I believe Linked Data will enable new killer apps that are only possible thanks to Linked Data.”

April 14, 2010 By: Tassilo Pellegrini Category: Calls & Competitions, Linked Data & Open Data, Semantic Web Applications 1 Comment →

Juan Sequeda, co-chair of the Triplification Challenge 2010 and one of the core figures in the Linked Data movement, gives us his view how the Semantic Web might evolve. His central message: “Once there is an incentive to create quality links, these links will start to show up. And then users will start linking to the data hubs of their interest.”

Linked Data itself has grabbed a lot of attention inside the Semantic Web community recently. But what about the outside perspective? Could linked data be called the killer app for the Semantic Web?

I foresee two things happening with Linked Data. One is from the web development perspective (the so-called Web 2.0 developers) and the other is from the enterprise perspective. The web development community will sooner than later realize that Linked Data will enable easy integration of data and therefore will ease the pain of consuming data from different data sources. Thanks to big organizations such as BBC, New York Times, Reuters, Best Buy, etc. web developers will start paying attention to this “new thing” called Linked Data.

What we need is that the inside Semantic Web community starts to create applications on top of current Linked Data so when the outside web development community starts to pay attention, they have something to chew on. We (the semantic web community) needs to start speaking the web development language. There is still a big gap. I have had personal experiences with people in the web development community who think that RDF is XML and because they hate XML, they will never consider it. This is false and this is something that we need to change.

From the enterprise perspective, Linked Data is another data integration solution. Data integration has been a problem since day one of relational databases. I believe enterprises will be open to consider new solutions with new technologies. I’m hoping to see new startups tackling the enterprise domain. Imagine being able to query “get all my clients from cities whose population is greater than 1 million” even though I don’t have the data about population of cities in my database.

Is Linked Data the killer app for the Semantic Web? Before I answer that, I would like to ask, what was the killer app of the Web? Was it the browser? Was it e-commerce? Was it search? Was it Amazon or Ebay or Google? I believe Linked Data will enable new killer apps, apps that are only possible thanks to Linked Data. The browser was only possible because of HTML. So let’s ask ourselves what is possible because of Linked Data, and there we will find our killer app.

One of the core deficiencies of the young open data cloud is the little amount of interlinks between datasets. Is it just a matter of time to overcome this or are there other measures needed to turn the existing datasets into a true giant global graph?

I like to remind myself that this new wave of semantic web technologies is an extension of the current web. Therefore we should analyze how the web evolved in the beginning. Initially, everything were a bunch of documents on the web in which people manually created links to other documents. When Google started, it created an incentive to offer quality links between documents. This also created data hubs. If you write a blog post about a book, most probably you will link to the web document of that book either on Amazon and/or Wikipedia. I believe that this will happen with Linked Data. Once there is an incentive to create quality links, these links will start to show up. And then users will start linking to the data hubs of their interest.

Open Governmental Data is a big issue at the moment. The US and UK government have started to apply Linked Data principles to turn this vision into reality. Lots of other countries are following. What do you expect from this trend?

I believe that Linked Data will take off thanks to the initiative of governments. We always talk about the chicken and egg problem of the semantic web. Once we have organizations that don’t even think about it and are just interested in putting their data on the web, the semantic web will start to grow. If Bookstore ABC puts their data on the web, it may not be so meaningful. But if the US and UK government puts their data on the web, following the Linked Data principles, then people can wake up and say “ok, so this is for real. Let me start paying attention to this”.

You are one of the chairs of the Triplification Challenge 2010. Can you give us a brief insight what to expect from this year’s challenge? What are the conditions to participate?

The Triplification Challenge this year has grown and is very exciting. For the first time, it is offering two different tracks.

The first track, the Open Track will accept submissions on three areas 1) new datasets that are published following the Linked Data principles and that show potential benefit, 2) generic methods, mechanisms and approaches of creating Linked Data from legacy datasets and 3) applications that make use of Linked Data.

The second track is the New York Times track which will accept submissions of applications that make use of the New York Times Linked Data and one or more government dataset. The objective is to create an application powered by Linked Data that would be of interest to any constituent of that government.

I personally believe that the year 2010 is the year of creating Linked Data applications and the Triplification Challenge is the way to be part of it.

Sphere: Related Content

The Open Government Data Meetup in Vienna

April 10, 2010 By: Thomas Thurner Category: Open Government Data, Politics No Comments →

Show what is possible! As Martin Kaltenböck – one of the organizers oft the recently held Semantic Web Meetup on an Austrian Open Government Data Initiative – said, there is a lot of enthusiasm and energy to inform the public and engage politics about the impact a initative similar to those in US and UK may have for Austria. And the KickOff was promissing. Inspiring talks by Rufus Pollock (UK) and Stefano Bertolo (EU) where giving an insight whats possible in the specific field of Open Government Data, as well as how a start of an initiative can look like.

As ePSI-Platform wrote in their blog
The Austrian Open Data initiative is online and at work.

The event was very well attended, and brought together stakeholders from science, industry, government and citizen activists, A promising melange of people which may carry the project forward to very concrete UseCases and Trials in the very near future. As the initiative is ment to be carried by a broad group of proponents, the follow-up of the meeting will be a round table talk, of those who are willing to contribute in upcoming light-tower projects and opening concrete sets of government data for that.

The next meeting of the Austrian Open Data Initiative
takes place on the 12th May at 9.30 a.m. in
Room D, quartier 21 of the Vienna Museum Quarter.

Find Documentation of the Meetup on Zukunftsweb, browse the Picture’s Album or read the conclusions at ePSI-Platform.

More resources

Sphere: Related Content