Andreas Blumauer

The LOD cloud is dead, long live the trusted LOD cloud

The ongoing debate around the question whether ‘there is money in linked data or not’ has now been formulated more poignantly by Prateek Jain (one of the authors of the original article) recently: He is asking, ‘why linked open data hasn’t been used that much so far besides for research projects?‘.

I believe there are two reasons (amongst others) for the low uptake of LOD in non-academic settings which haven’t been discussed in detail until today:

1. The LOD cloud covers mainly ‘general knowledge‘ in contrast to ‘domain knowledge

Since most organizations live on their internal knowledge which they combine intelligently with very specific (and most often publicly available) knowledge (and data), they would benefit from LOD only if certain domains were covered. A frequently quoted ‘best practice’ for LOD is that portion of data sets which is available at Bio2RDF. This part of the LOD cloud has been used again and again by the life sciences industry due to its specific information and its highly active maintainers.

We need more ‘micro LOD clouds’ like this.

Another example for such is the one which represents the German Library Linked Open Data Cloud (thanks to Adrian Pohl for this pointer!) or the Clean Energy Linked Open Data Cloud:

reegle-lod-cloud

I believe that the first generation of LOD cloud has done a great job. It has visualised the general principles of linked data and was able to communicate the idea behind. It even helped – at least in the very first versions of it – to identify possibly interesting data sets. And most of all: it showed how fast the cloud was growing and attracted a lot of attention.

But now it’s time to clean up:

A first step should be to make a clear distinction between the section of the LOD cloud which is open and which is not. Datasets without licenses should be marked explicitly, because those are the ones which are most problematic for commercial use, not the ones which are not open.

A second improvement could be made by making some quality criteria clearly visible. I believe that the most important one is about maintenance and authorship: Who takes responsibility for the quality and trustworthiness of the data? Who exactly is the maintainer?

This brings me to the second and most important reason for the low uptake of LOD in commercial applications:

2. Most datasets of the LOD cloud are maintained by a single person or by nobody at all (at least as stated on datahub.io)

Would you integrate a web service which is provided by a single, maybe private person into a (core-)application of your company? Wouldn’t you prefer to work with data and services provided by a legal entity which has high reputation at least in its own knowledge domain? We all know: data has very little value if it’s not maintained in a professional manner. An example for a ’good practice’ is the integrated authority file provided by German National Library. I think this is a trustworthy source, isn’t it? And we can expect that it will be maintained in the future.

It’s not the data only which is linked in a LOD cloud, most of all it’s the people and organizations ‘behind the datasets’ that will be linked and will co-operate and communicate based on their datasets. They will create on top of their joint data infrastructure efficient collaboration platforms, like the one in the area of clean energy – the ‘Trusted Clean Energy LOD Cloud‘:

reegle.info trusted links

REEEP and its reegle-LD platform has become a central hub in the clean energy community. Not only data-wise but also as an important cooperation partner in a network of NGOs and other types of stakeholders which promote clean energy globally.

Linked Data has become the basis for more effective communication in that sector.

To sum up: To publish LOD which is interesting for the usage beyond research projects, datasets should be specific and trustworthy (another example is the German labor law thesaurus by Wolters Kluwer). I am not saying that datasets like DBpedia are waivable. They serve as important hubs in the LOD cloud, but for non-academic projects based on LOD we need an additional layer of linked open datasets, the Trusted LOD cloud.

 

Andreas Blumauer

There’s Money in Linked Data

I believe that the ongoing debate whether there ‘is money in linked (open) data or not’ is a bit misleading. ‘Linked (open) data’ is not only the data itself. It’s much more, even more than yet another technology stack. Linked data is most of all a set of principles how to organize information in agile organizations that are embedded in fast moving and dynamic environments. And from this perspective there is a huge amount of money in it – but let me refine that a bit later.

networkMan

Crying out loud in 2013 that ‘there is no money in linked data’ is an important step towards the right direction because it points out that data publishers should be more precise with data licensing. Although quite flexible licensing models would already exist – it’s the people (and probably other legal entities) who forget to publish their data together with statements about the ‘openness’ of it. As a result, the data remains closed for commercial users. This hasn’t been properly noticed in the early days of the linked open data cloud since commercial users haven’t been around at all (in contrast to academic institutions which considered the LOD cloud to be a wonderful playground). It’s the same thing with linked data as a technology and linked data as a set of standards: the standards and the technology stack are mature now (just think about Virtuoso’s brilliant SPARQL performance, for example), but most people from IT still wouldn’t have things like URIs, RDF and SPARQL off the top of their head when they seek solutions for powerful data integration methodologies.

Why is that?

I believe that so far ‘linked data’ has always been perceived by people from outside the linked data core-community only as a new way to organize data on the web, thus technologies are still not mature for enterprises.

But the truth is, that linked data has at least a threefold nature. Linked data is

  1. a method to organize information in general, not only on the web but also in enterprises
  2. a set of standards which is flexible and expressive enough to link data across boundaries (organizational, political, philosophical), cultures and languages
  3. a way of using IT and information in a quite intuitive way, very close to the patterns like human beings tend to create realities, thus comprehensible also for non-techies.

I think that technologists have made a brilliant job so far with creating the linked data technology stack, its underlying standards, triple-stores and quad-stores, reasoners etc., and for specialists it’s absolutely clear why this kind of technologies will outperform traditional databases, BI-tools, search engines etc. by far.

But: the crucial point now is that enterprises have to adapt linked data technologies inside their corporate boundaries (and not only for SEO purposes or the like). The key question is not whether there is enough LOD out there for app-makers or not. High-quality LOD will be produced very quickly as soon as there are commercial consumers like large enterprises. I am not talking about use cases for linked data in the fields of data publishing or SEO.

The main driver for the further Linked Data development will be enterprises which embrace LD technologies for their internal information management.

It’s true that there are already some large companies (like Daimler - meet them at this year’s I-SEMANTICS in Graz!) dealing with that question but to be honest: there is not the same hype around ‘linked data’ as we can see with ‘big data’. IBM, Microsoft & Co. are not that interested in linked data of course because it is a platform by itself and doesn’t foresee any kind of lucrative lock-in effects. Internet companies like Google and Facebook make use of linked data quite hesitantly. Although Facebook’s Graph Search or Google’s Knowledge Graph contain large portions of this kind of technology, Google would never say ‘oh, we are a semantic web company now, we make heavy use of linked data, and of course we will also contribute to the LOD cloud.’

Why is that? Simply spoken, because through the glasses of Google, Facebook & Co. the internet is a huge machine which produces data for them. Not the other way around.

But shouldn’t the enterprise customers themselves be interested in a cost-effective way of information management? They are, but as stated before, they haven’t perceived linked data as such, although it clearly is.

To develop technologies, we need critical questions, and of course the most critical ones always come from the inside of a community or movement. But time has come to spread the good news for the ‘outside’.

  • Yes, databases which rely on linked data standards have become mature and enough performing for many query types so that they outperform even ‘traditional’ relational databases
  • Yes, also issues which are critical for enterprise usage like privacy and security have been solved by most linked data technology vendors
  • Yes, there is a critical mass of available LOD sources (for example UK Ordnance Survey) and also of high-quality thesauri and ontologies (for example Wolter Kluwer’s working law thesaurus) to be reused in corporate settings
  • Yes, there is a volume of developers and consultants on the labor market (in the U.S. as well as in the E.U.) which is big enough to being able to execute large linked data projects
  • Yes, there are tons of business cases that can benefit from linked data. Linked data and semantic web technologies should be considered as core technologies for any information architecture, at least in larger corporations
  • Yes, SPARQL Query Language is not only a second SQL but comes with some brilliant features like transitive queries which help to save a lot of time when developing applications like business intelligence reporting and analysis
  • Yes, Linked Data has the potential to become the basis for a large variety of tools which help decision-makers (not only in enterprises but also in politics) to become true ‘digerati’ instead of being degraded to masters of the ‘bullshit bingo’.

Yes, this list can be further extended and it is a core element for the further expansion of the LOD cloud. It’s the enterprises that will drive the next level of maturity of the linked data landscape. Because at the end of the day it’s only them who will pay or have already paid the bill for open (government) data.

Thomas Thurner

Free Webinar: Linked Data for the Environmental Sector – Use Cases and Opportunities

Organizations working in the environmental sector most often act as intermediates between politics, economy and citizens. They are growing out of their role as plain content providers. To service the demands of their stakeholders they have to act also as data and tool providers for their respective communities.

On June 13 this webinar introduces several good practice examples achieving data governance in using the linked open data paradigms. Together with a basic overview of the possibilities of linked open data you get an appealing picture of the new opportunities which are provided by these principles and technologies, also for your organisation!

Register Now!

Learn more about three organizations and their linked data projects

Global Buildings Performance Network (GBPN)

GBPN_logo_rgb_72GBPN established the “Policy comparative tool on building stock data” together with a domain specific thesaurus used for a domain specific news aggregator.

Renewable Energy and Efficiency Partnership (REEEP)

logo_reeepAs one of the pioneers in the sector, REEEP has an extensive focus on the use of linked data for renewable energy and energy efficiency, facilitating that in various services, like an automatic annotation service, aggregated country data presented as fact sheets, a domain specific search engine, etc.

Austrian Geological Survey (GBA)

gbaThe main driving factor for institutions like the GBA to invest in thesaurus and taxonomy projects, is the increasing need for a uniform description of their data. The idea is that this enhances value and re-usability of their products for their stakeholders. Especially in the geo-spatial sector the INSPIRE directive of the European Parliament and Council gave a push in that direction. As a public authority, the GBA was legally called to implement the directive for its domain.

Presenters in this Webinar

  • Martin Kaltenböck (SWC)
    CFO and Project Lead at Semantic Web Company for Data Portal Solutions
  • Florian Bauer (REEEP)
    Operations and IT Director of REEEP as well as the  clean energy information portal www.reegle.info
  • Andreas Blumauer (SWC)
    CEO and Evangelist for Linked Data and SKOS based Thesaurus Management

Free Register

 

Martin Kaltenböck

Linked (Open) Data has reached the European Publishing Industry – but is it the ‘Real Linked Data’ – a short review on the Publishers’ Forum 2013

Invited by Helmut von Berg, Director at Klopotek & Partner (Klopotek is THE European vendor for publishing production software) I had the chance to participate and speak at this years Publishers’ Forum 2013 at the Concorde Hotel in Berlin on 22nd to 23rd of April 2013.

Coming from the semantic web / linked (open) data community to this publishing industry event with about 320 participants (mainly decision makers) from small to huge publishers all across Europe made me really curious in the forefront of the Forum – what would be the most important issues for innovative publishing processes, what would be the hypes and hopes of a sector that is in the middle of a big change: coming from paper publishing straight into the world of our todays’ data economy?

And  then in Berlin, Monday morning – the big surprise: already the opening keynotes by David Worlock, Outsell, UK (Title of Talk: The Atomization of Everything) and Dan Pollock, Nature Publishing Group, UK (Title of Talk: Networked Publishing is Open for Business) mentioned topics as the Semantic Web, Linked (Open) Data and even RDF and Triple Stores – last but not least pointing out that the content of publishers needs to be atomized down to the ‘data level’ and then can to be used successfully for new and innovative business models to serve existing and future customers…

D-Worlock_PublishersForum2013_Keynote
David Worlock ‘singing my song’ at the Publishers’ Forum 2013

As I participated in the European Data Forum 2013 (EDF2013) just a few days before the Publishers’ Forum my first thought was: WOW – publishers today have arrived in modern data economy (following already the data value chain)! And I enjoyed talking to David Worlock in the coffee break telling him my thoughts and that I will manage a workshop about ‘Enterprise Terminology as a basis for powerful semantic services for publishers’ in the afternoon that day (see slides on slideshare) and his answer was ‘Yes Martin, it seems that I was singing your song’.

The following 1.5 days of the Publishers’ Forum 2013 were full of presentations, workshops and discussions about innovative publishing processes, new business models for publishers and innovative approaches and services – full of terms that are well known by myself like: meta data management, semantics, contextualisation and very very often: Big Data and Linked (Open) Data…..and I listened very carefully to all of this – and at some point it was clear: this discussion needs to be evaluated more carefully – because many of talks and presentations were using the above mentioned terms, principles and technologies only as marketing buzz words – but taking a deeper look showed: there is no semantic web technology in place?!

Hey, Linked Data does NOT mean to establish something like a relation / a link between ‘an Author and a publication’ inside of a repository / a database – Linked (Open) Data is a well established and specified methodology using W3C semantic web standards:

Tim Berners-Lee outlined four principles of linked data in his Design Issues: Linked Data as follows:

  • Use URIs to denote things.
  • Use HTTP URIs so that these things can be referred to and looked up (“dereferenced”) by people and user agents.
  • Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF*, SPARQL.
  • Include links to other related things (using their URIs) when publishing data on the Web.

Please read in more detail here:

As being a bit like an evangelist for Linked (Open) Data I think such a hype can be very dangerous for the publishing industry – because I see a very strong need for these companies to go for innovative content- and data management approaches very quickly to ensure competitiveness today as well as competitive advantage tomorrow – but not using the respective standards (means: only having the packaging and marketing brochures branded with it) cannot fulfill the hopes in the mid- and the long term!

Thereby I would like to point out here that ‘Linked Data’ seems not always to be ‘Linked Data’ – and I would like to strongly recommend to take a look at the well proven standards – and when selecting IT consultants and IT vendors (means: your IT partners – also a very interesting message taken home from the Forum: that publishers and IT vendors should co-operate more closely in the future in the form of sustainable partnerships) to ensure that these partners really have worked already and are working continuously with these standards and mechanisms!

C-Dirschl_PublishersForum2013_Terminology-Workshop

Christian Dirschl (Wolters Kluwer) presenting the
WKD Use Case on Enterprise Terminologies

Btw. I had a great workshop on Monday afternoon together with Christian Dirschl from Wolters Kluwer Germany (WKD) discussing applications on top of enterprise terminologies (controlled vocabularies using real linked (open) data principles). And: The Semantic Web Company (SWC) is already a partner of the publisher WKD – and this partnership seems to become a more and more fruitful and sustainable one every day – using real linked (open) data…