Tassilo Pellegrini

Linked Data in the Content Value Chain or Why Dynamic Semantic Publishing makes sense …

In 2012 Jem Rayfield released an insightful post about the BBC’s Linked Data strategy during the Olympic Games 2012. In this post he coined the term “Dynamic Semantic Publishing”, referring to

“the technology strategy the BBC Future Media department is using to evolve from a relational content model and static publishing framework towards a fully dynamic semantic publishing (DSP) architecture.”

According to Rayfield this approach is characterized by

“a technical architecture that combines a document/content store with a triple-store proves an excellent data and metadata persistence layer for the BBC Sport site and indeed future builds including BBC News mobile.”

The technological characteristics are further described as …

  • A triple-store that provides a concise, accurate and clean implementation methodology for describing domain knowledge models.
  • An RDF graph approach that provides ultimate modelling expressivity, with the added advantage of deductive reasoning.
  • SPARQL to simplify domain queries, with the associated underlying RDF schema being more flexible than a corresponding SQL/RDBMS approach.
  • A document/content store that provides schema flexibility; schema independent storage; versioning, and search and query facilities across atomic content objects.
  • Combining a model expressed as RDF to reference content objects in a scalable document/content-store provides a persistence layer that uses the best of both technical approaches.

So what are actually the benefits of Linked Data from a non-technical perspective?

Benefits of Linked (Meta)Data

Semantic interoperability is crucial in building cost efficient IT systems that integrate numerous data sources. Since 2009 the Linked Data paradigm has emerged as a light weight approach to improve data portability ferderated IT systems. By building on Semantic Web standards the Linked Data approach offers significant benefits compared to conventional data integration approaches. These are according to Auer [1]:

  • De-referencability. IRIs are not just used for identifying entities, but since they can be used in the same way as URLs they also enable locating and retrieving resources describing and representing these entities on the Web.
  • Coherence. When an RDF triple contains IRIs from different namespaces in subject and object position, this triple basically establishes a link between the entity identified by the subject (and described in the source dataset using namespace A) with the entity identified by the object (described in the target dataset using namespace B). Through these typed RDF links, data items are effectively interlinked.
  • Integrability. Since all Linked Data sources share the RDF data model, which is based on a single mechanism for representing information, it is very easy to attain a syntactic and simple semantic integration of different Linked Data sets. A higher-level semantic integration can be achieved by employing schema and instance matching techniques and expressing found matches again as alignments of RDF vocabularies and ontologies in terms of additional triple facts.
  • Timeliness. Publishing and updating Linked Data is relatively simple thus facilitating a timely availability. In addition, once a Linked Data source is updated it is straightforward to access and use the updated data source, since time consuming and error prune extraction, transformation and loading is not required.

On top of these technological principles Linked Data promises to improve the reusability and richness (in terms of depth and broadness) of content thus adding significant value to the content value chain.

Linked Data in the Content Value Chain

According to Cisco communication within electronic networks has become increasingly content-centric. I.e. Cisco reports for the time period from 2011 to 2016 an increase of 90% of video content, 76% of gaming content, 36% VoIP, 36% file sharing being transmitted electronically.  Hence it is legitimate to ask what role Linked Data takes in the content production process. Herein we can distinguish five sequential steps: 1) content acquisition, 2) content editing, 3) content bundling, 4) content distribution and 5) content consumption. As illustrated in the figure below Linked Data can contribute to each step by supporting the associated intrinsic production function [2].

Linked Data in the Content Value Chain

Linked Data in the Content Value Chain

  • Content acquisition is mainly concerned with the collection, storage and integration of relevant information necessary to produce a content item. In the course of this process information is being pooled from internal or external sources for further processing.
  • The editing process entails all necessary steps that deal with the semantic adaptation, interlinking and enrichment of data. Adaptation can be understood as a process in which acquired data is provided in a way that it can be re-used within editorial processes. Interlinking and enrichment are often performed via processes like annotation and/or referencing to enrich documents either by disambiguating of existing concepts or by providing background knowledge for deeper insights.
  • The bundling process is mainly concerned with the contextualisation and personalisation of information products. It can be used to provide customized access to information and services i.e. by using metadata for the device-sensitive delivery of content, or to compile thematically relevant material into Landing Pages or Dossiers thus improving the navigability, findability and reuse of information.
  • In a Linked Data environment the process of content distribution mainly deals with the provision of machine-readable and semantically interoperable (meta-)data via Application Programming Interfaces (APIs) or SPARQL Endpoints. These can be designed either to serve internal purposes so that data can be reused within controlled environments (i.e. within or between organizational units) or for external purposes so that data can be shared between anonymous users (i.e. as open SPARQL Endpoints on the Web).
  • The last step in the content value chain is dealing with content consumption. This entails any means that enable a human user to search for and interact with content items in a pleasant und purposeful way. So according to this view this step mainly deals with end user applications that make use of Linked Data to provide access to content items (i.e. via search or recommendation engines) and generate deeper insights (i.e. by providing reasonable visualizations).

Conclusion

There is definitely a place for Linked Data in the Content Value Chain, hence we can expect that Dynamic Semantic Publishing is here to stay. Linked Data can add significant value to the content production process and carry the potential to incrementally expand the business portfolio of publishers and other content-centric businesses. But the concrete added value is highly context-dependent and open to discussion. Technological feasibility is easily contradicted by strategic business considerations, a lack of cultural adaptability to legacy issues like dual licensing, technological path dependencies or simply a lack of resources. Nevertheless Linked Data should be considered as a fundamental principle in next generation content management as it provides a radically new environment for value creation.

More about the topic – live

Linked Data in the content value chain is also one of the topics set onto the agenda of this year’s SEMANTiCS 2014. Listen to keynote speaker Sofia Angeletou an others, to learn more about next generation content management.

References

[1]     Auer, Sören (2011). Creating Knowledge Out of Interlinked Data. In: Proceedings of WIMS’11, May 25-27, 2011, p. 1-8

[2] Pellegrini, Tassilo (2012). Integrating Linked Data into the Content Value Chain: A Review of News-related Standards, Methodologies and Licensing Requirements. In: Presutti, Valentina; Pinto, Sofia S.; Sack, Harald; Pellegrini, Tassilo (2012). Proceedings of I-Semantics 2012. 8th International Conference on Semantic Systems. ACM International Conference Proceeding Series, p. 94-102

Enhanced by Zemanta
Thomas Schandl

PoolParty 3.0 and its all new Linked Data framework

The new major release of PoolParty boasts with new Linked Data capabilities that further unlock the potential that the Semantic Web can bring to improve your metadata management, to enhance your data with external knowledge and to ease data integration efforts within your organization and with your partners.

In PoolParty 3.0 we created a Linked Data interlinking editor, making it easier than ever to add your own lookup and interlinking services (even for non-RDF sources) and made the Linked Data publishing front-end fully customizable in design, layout and regards to which parts of your content will be displayed.

But let’s start at the beginning:

Step 1 – Hook into the Linked Data Cloud!

In the era of the rapidly growing Linked Data Cloud your knowledge models don’t need to stay isolated from the outside world anymore. Simply use PoolParty’s new and improved lookup service to find matching resources from the Linked Open Data Cloud (e.g. from DBpedia).

Imagine having different data models that all refer to the same product categories and world regions. Once you have them represented in PoolParty you can use its lookup service to find matching resources from the Linked Data Cloud. In this way you will get globally used identifiers for your product categories and regions, usually in the form of a URI like http://dbpedia.org/resource/Berlin. This eases your internal data integration efforts, and it can aid the data exchange with partners or customers and enables hassle-free distributed management of knowledge models.

Image 1: Lookup of concept ‘Austria’ and selection of properties and values to be imported

 

With PoolParty 3.0 we increased the number of included lookup services: DBpedia, Geonames, Wordnet, Umbel, Yago, Freebase, Sindice, dmoz and LCSH – BBC Wildlife, Enis and Gemet are available on request.

Step 2 – Pull in Semantic Data!

There is a vast amount of Linked Data out there just waiting to be leveraged for thesaurus creation and extension. To meet that end we had a close look at our interlinking module and decided to enhance it a way that it becomes more of a Linked Data editor.

Once you have a base thesaurus in PoolParty and hooked a couple of your concepts into the cloud as described above, you can proceed to pull in the good stuff that comes with the Linked Data resources you have found.

Image 2: Imported Linked Data for concept ‘London’

 

As you can see in the image above, you can extend your local thesaurus with labels, definitions and all kinds of other information like e.g. in the case of countries their population, GDP, spoken languages, famous people born there, newspaper articles related to the political situation, and so on.

Now PoolParty 3.0 takes this approach a couple of steps further. You can not only specify which of your local concepts corresponds to which Linked Data resource and grab all semantic information that comes with this resource, but now you are able to selectively pick out the data items you are interested in and even transform the predicates they originally came with. Just switch them to whatever custom properties you created or want to re-use from any ontology (see an example in Image 1).

In this way you can easily enrich your own knowledge models with external information – which in turn can be utilized for better content recommendation, easier data integration and improved search services.

Step 3 – Publish your Linked Data in Style

Previous PoolParty versions already offered the possibility to instantly publish your thesauri, taxonomies or vocabularies and display their concepts as HTML while additionally providing machine-readable RDF versions for them. This means that anyone using PoolParty intuitive GUI can become a W3C standards compliant Linked Data publisher without having to know anything about Semantic Web technicalities.
Of course you don’t need to publish all your valuable models, just choose the parts that safely can be shared with the public and keep everything else behind your firewall, available only to you and trusted partners!

In this new release of PoolParty the design of all pages on the Linked Data front-end is now under your full control. You can use your own style sheets and create views on your data with velocity templates. It is even possible to develop project- and thesaurus-specific templates and layouts, so they can have an individual look and display different predicates and their values.

Take a look at PoolParty´s standard linked data frontend!

The following images show a PoolParty default Linked Data page and a custom-made Linked Data page of a PoolParty concept that has some DBpedia info imported.

Image 3: PoolParty default Linked Data page

PoolParty Linked Data page of ScOT thesaurus courtesy of Educational Services Australia
Image 4: Custom Linked Data page of ScOT thesaurus (courtesy of Educational Services Australia)

 

Step 4 – Unlock new Linked Data Sources

With PoolParty 3.0 you are in no way limited to DBpedia, Freebase, Geonames and the other lookup services that PoolParty provides out of the box: you can add your own non-Semantic Web data sources to the mix, thereby enabling you to boldly go where no Linked Data tool has gone before.

Maybe you have a product thesaurus and want to specify which products are related to patents that can be found with Google Patents?
Or you want to interlink concepts from a company taxonomy with related articles from the Guardian’s search service or any other newspaper that provides a search API?

All those sources are not available as RDF, so how can you re-use them easily as data sources for Linked Data style interlinking? For such cases PoolParty introduces the Unified Lookup API, which makes it easy to turn almost any third party Web API into a source for interlinking your concepts with third party resources as described above.

This makes it possible to interlink your concepts with many kinds of data out there, be it New York Times articles, UN data, synonym services, abbreviations, press releases, juridical information – or any web API important for your knowledge domain.

That being said, if you have suggestions for additional lookup services that you think are interesting, let us know!

To gain a first hand impression of the new PoolParty just apply for a demo account!

Thomas Schandl

Drupal and the Semantic Web – Interview with Stéphane Corlosquet

Stéphane Corlosquet has been the main driving force in incorporating Semantic Web capabilities into Drupal. In the recent release of Drupal 7, Semantic Web technologies became part of the core of this popular CMS, which is used to power at least 1% of all the world’s web sites.

Drupal is the leading CMS when it comes to implementing Semantic Web standards. What are the reasons for this, what makes Drupal such a good fit for Semantic Web technologies?

Historically, Drupal is known to be web standard compliant. It supported the RDF-based aggregation format known as RSS 1.0 as early as in 2001, which was later upgraded to RSS 2.0. The Drupal community prides itself in valid HTML code, not only for the code generated by Drupal, but also by taking the extra step of automatically fixing faulty HTML entered by its users. Drupal has been using XHTML since its version 4.0 in 2002. The next logical step beyond XHTML was to add a layer of semantics with the RDFa standard, a W3C recommendation published in 2008.

There are definitely many reasons that contributed to the addition of RDFa into Drupal 7. The first comes from the Drupal project lead, Dries Buytaert, who is passionate about the web and open source. Secondly, the growing Drupal community is very web savvy and includes many experts from different backgrounds in accessilibity, CSS, HTML, security etc. As a result, every release of Drupal includes many latest standards. The community meets twice a year at conferences (DrupalCons), thes events play a great role in hashing out what technologies or designs will be incorporated into the next version of Drupal. Because of the flexibility of its internal architecture, Drupal is able to keep up with the latest web standards. Content in Drupal is very structured and provides site administrators with a user interface to build the site structure they want, using entity types, content types, fields and taxonomies for categorization. When it comes to other CMSs, Joomla!’s community appears to be more fragmented with a core software that is not as extensible as Drupal and WordPress is more of a blogging platform, so turning it into a full blown CMS can be challenging. Both WordPress and Joomla! are in fact adapting the concept of Drupal’s Content Construction Kit (CCK) to their software but they have not yet reached the same level of maturity as Drupal.

A common objection to the adoption of Semantic Web technologies is that the learning curve is steep and that it is too complicated for many web developers to get into it. How can Drupal 7 change that? Which features accessible for the average web site operator will it offer?

Semantic Web technologies don’t have to be complicated when applied to simple use cases! We purposely chose only of a subset of semantic web technologies to integrate into the core of Drupal, keeping the learning curve for the Drupal developers and users as low as possible. The main technology is RDFa which includes the notions of vocabularies (a schema, or collection of attributes) as well as Compact URIs (CURIEs) which make the authoring of RDFa easier. In fact, some web developers might have come across these notions before when working with Dublin Core in the meta tags as such dc:title or dc:date.

Which benefits will web site owners get when they switch to a semantics enabled Drupal 7?

Google and Bing increasingly rely on machine-readable structured data from the websites that they crawl. The design of Drupal 7 embeds semantic meta data that makes machine-to-machine (M2M) search native for a Drupal 7 website. RDFa can add value by giving search engines more details such as the latitude and longitude of a venue for display on a map; or providing the ISO date format for localization and proper display in the search results for different countries.

What are your hopes regarding the development of other applications that either provide or consume data from D7 sites? Which improvements of standards, best practices or (lightweight) ontologies in the Semantic Web community would you like to see?

Services like Sig.ma are already able to collect semantic data from different sources and display it in new ways in the form of mash-ups. Eventually, these services that consume semantic data will not be just Drupal specific, as more platforms jump on the semantic web band wagon. What I hope to see as improvements or best practices in the future are more well-maintained vocabularies. Many of the existing vocabularies are over engineered, some fail to de-reference properly. Their is also some work to be done in order to improve the tooling made available to web developers as well as introducing the simple concepts of Linked Data to web developers via easy to read documentation.

Thank you for this interview, Stéphane!

Thomas Thurner

Report on developments at the European Semantic Technology Market

The present state of development, future trends and expected market scenarios for Semantic Technologies are shown in the just published “Demand driven Mapping Report”. The report is part of the EU-funded project Value It, which is about bringing together the various stakeholders within the sector: Industry, Research and Government. VALUE-IT preliminary findings show that the STE potential market in Europe will size up to €1.44B for 2014. Scanning furthermore the executive summary of the report, some findings attract attention:

The survey results also show considerable variation by sector, both of policy and technology implementation. With respect to technologies, ICT companies are also the most willing to consider semantic approaches. The ICT sector has an unusually high interest in all ST components, with 20% or more being willing to consider all of them, and over half of IT respondents looking at Web 2.0 (social computing). […]  The use of tagging technologies – which overall is the least mature approach in the survey – is most advanced in Life Sciences. The Life Sciences, Media & Entertainment, and ICT sectors all have a reasonably strong interest in Natural Language Processing (roughly 25% on average). Ontologies and RDF/OWL are the technologies least often considered, though the interest in these Semantic Technologies is not insignificant. Taxonomies are slightly more popular, perhaps indicating that companies are taking the first step to prepare for a more semantic approach to IT solutions. The ICT, Energy & Utilities, and Media & Entertainment sectors all have a reasonably strong interest in using taxonomies.

The 190 pages report gives an actual overview of the status quo on European Semantic Technology Market and is now available for download: Final demand driven mapping Report