Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Why SKOS thesauri matter – the next generation of semantic technologies

August 31, 2010 By: Andreas Blumauer Category: Search Engines, Semantic Web Applications, Text Mining, Tools & Software

As a matter of fact still a lot of “semantic technologies” are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search”, the search for similar documents to enable cross-reading or recommendation systems.

Providers of first generation semantic technologies calculate rather basic “semantic networks” by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.

A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.

STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.

This background knowledge is used in this demo app to improve the search for similar documents dramatically:

Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents”.

This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.

Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.

Sphere: Related Content

The Semantic Web journal – half a year later

August 03, 2010 By: Pascal Hitzler Category: Literature & Publications

SWJ-logo The journal “Semantic Web – Interoperability, Usability, Applicability” – in short: the Semantic Web journal – was launched 7 months ago, sporting a transparent open review process. Pascal Hitzler is one of the Editors-in-Chief (the other one is Krzysztof Janowicz). He answers some questions on the motivation, setup, and future plans of the journal. (Pascal also wrote the questions and this intro, so it’s really a fake interview. But it seemed an appropriate literary form …)

Question: Why did you launch yet another journal on Semantic Web?

Hitzler: Because the community is growing and the need for publication outlets grows with it. I heard the objection that there weren’t enough quality papers for all the journals, but I don’t think so. It’s just that most of the quality papers still end up in journals which are not dedicated to the Semantic Web as such.

Personally, my desire to start a new journal began when I wanted to do a special issue on Semantic Web reasoning in some other, established, journal, and the Editors-in-Chief basically replied with a lapidary “Is there anything to report?” I didn’t push the case back then (though I probably should have). But this and similar experiences made me think about scientific publishing from a different angle, a normative one: What should scientific publishing in our field look like? The journal gives me a possibility to realize some of my answers – or at least to go a few steps into the right direction. So when the opportunity arose to set up this journal with a well-known publishing house (IOS Press) and with a co-Editor-in-Chief (Krzysztof Janowicz, a strong proponent of open and transparent reviewing) who I knew would also put a maximum of energy into the venture, it was simply too good an opportunity to let it pass. However I also realize that the reality of scientific publishing can change only slowly, and that it needs time and gradual improvements. We can’t do it all at once.

Question: Your journal uses an open review process. What is that and why?

Hitzler: Open reviewing, in the sense we use it for the Semantic Web journal, is all about transparency. Submitted papers are made publicly available. Solicited reviews are made publicly available. Anybody else can additionally contribute a public review. Reviewers are publicly known by name. Discussions between reviewers and authors can (and should) happen in public. Reviewers and editors are acknowledged by name in the published versions of the papers.

The obvious reason for setting up an open review process is to improve the quality of the decision-making process. We have to realize that some persisting habits about reviewing have their origin in times when scientific publishing was made for a small expert audience, and had to be conducted by sending manuscripts and letters by conventional mail. Today, however, reviewing and publishing is inflationary, which substantially reduces the quality of the typical paper – and of the typical review. While we cannot simply reverse this trend, we can take advantage of the World Wide Web to counteract these developments and improve quality by bringing the review process out into the public space. Reviewers will put more effort into providing constructive reviews if they publicly sign their reviews. Open and public discussions on controversial submissions minimize errors in the decision making.

Personally, I also hope that the ensuing discussions will help to bring back a scientific tradition which has long been on the decline in our field: controversial but constructive discussion. Regretfully, these days we somehow tend to mainly present incremental results, bash opposing opinions, and sugarcoat our own …

Question: Past attempts to set up open reviewing for journals have failed …

Hitzler: Yes, I remember seeing some of these early attempts many years ago when I was a PhD student. Even back then I was doubtful if the sometimes rather radical setups had a chance. In the meantime, there is growing experience in other fields that open reviews can work out if set up carefully. In our case, we mix old-style with open, by still soliciting reviews, and by giving solicited reviewers the option to stay anonymous, if they see a need for this protection. We Editors-in-Chief also “steer” the journal in the sense that we have rather clear strategic targets, e.g. in terms of scope and quality, which we’re trying to meet. In short: rather than experimenting with radical changes, we mildly introduce a new but essential component – open reviewing – in a traditional scientific publishing process. That way, it will work.

Question: But isn’t anonymous reviewing necessary to protect the reviewers and in order to get objectively critical reviews?

Hitzler: Sometimes. That’s why it’s good that solicited reviewers can opt to stay anonymous. Open reviewing – like any form of assessment in science – isn’t perfect, and has its drawbacks. However, the current reality in Computer Science is that reviewing processes are often extremely poor and decision processes are not very transparent. For conferences, reviewer discussions and rebuttal phases were introduced some time ago to improve the decision making. Open reviews simply go a step further.

Question: Aren’t potential authors afraid of getting a public bashing in the review process?

Hitzler: Reviewers typically won’t bash if they sign with their name. And in fact, we monitor the reviews in order to make sure that they adhere to a certain minimal scientific standard. At the same time, it’s probably just as well if our public process makes people more reluctant to submit papers which are not yet mature enough for publication. We wouldn’t want to publish them anyway. And in order to protect authors of rejected submissions, we actually remove the corresponding papers and reviews from the website after some time.

While I understand that some people may be more reluctant to put their work out in the open before it’s been accepted through a review process, we have to be aware that many quality journal publications, like the ones we’re striving for, are extended versions of high-quality conference publications: so they have indeed already been through a review process. Furthermore, submitting to our journal gives added visibility for the work, since it’s up for public review on our website.

Question: Your journal also publishes papers which are not standard research papers. Aren’t you compromising scientific rigor by doing this?

Hitzler: Times are changing. The prime purpose of a scientific journal is to disseminate results to other researchers, and to do so through a quality filter. Traditionally, this dissemination was restricted to focused research contributions, targeted at other researchers working in the same narrow area as the author(s). Semantic Web as a field, however, is extremely diverse and comprises researchers and practitioners from many other communities. Consequently, high-quality tools, systems, ontologies, introductory surveys and application reports are very much needed for the dissemination of advances in our field to all interested parties. As for research papers, the role of the journal for these other types of papers is primarily quality assurance. And consequently, we have clearly formulated the evaluation criteria for different types of papers. A report on a high impact tool, for example, is thus not a direct research contribution in the traditional sense. But if the tool enables further developments in the field, then it is worth reporting, and it indirectly makes a contribution to scientific progress.

Question: Why are you still publishing through a commercial publishing house?

Hitzler: Because it helps. A lot. It’s easy to underestimate the amount of work which needs to be put into running a journal, and going with a commercial publisher rids the Editors-in-Chief and the Editorial Board from a lot of tasks which are not directly related with quality assurance. Open review does not mean that this kind of professional support is no longer needed. And we are glad that we have found a publishing house which is very accommodating to our ideas.

Question: What are plans for the immediate future?

Hitzler: We currently have more than 30 papers up for review, most of them responses to two recent calls, one on tools and systems papers, and one on applications of OWL – and some of the submissions seem rather prominent. We also have several special issues lined up, most of them have not been announced yet. The first issue will appear towards the end of the year and contain vision statements by the EB members – we do not normally publish vision statements, but this seemed an appropriate way to introduce the journal. Considering that the journal has been launched only 7 months ago, this means that we are already very well under way in pursuing our goal of establishing a high-quality scientific outlet in the field.

[author: Pascal Hitzler]

Sphere: Related Content

What if the biggest web company bought one of the central semantic web players?

July 17, 2010 By: Andreas Blumauer Category: Companies & Institutions, Search Engines

Well, exactly this happened yesterday: Google bought Metaweb – provider of Freebase. Freebase is an important hub in the linked data cloud providing 12 million entities with uniform resource identifiers most of them linked to other semantic web datasets like DBpedia or New York Times. For example: Google´s page on Freebase offers a rich source for machine-readable facts around this company.

What does this mean to the Semantic Web Community which has  been working on a smarter web in the last decade?
Well, a lot… First of all, it´s good to hear that Google will continue to develop Freebase as a free and open database to everyone, saying “… we would be delighted if other web companies use and contribute to the data.”

Until yesterday still a lot of companies were not fully convinced if the Semantic Web will play a central role in the further development of the Internet. Now the game has changed. The entity-driven approach to develop web applications has just started now:

We will keep on reporting and discussing how Google will influence the development of the Semantic Web – and if I had a wish for free: Please add RDF(a) to the Freebase widgets!

Sphere: Related Content

I-Semantics 2010: Relevance of semantic technologies for industry increases fast

July 01, 2010 By: Andreas Blumauer Category: Calls & Competitions, Conferences & Events, Corporate Semantic Web, Linked Data & Open Data

I-Semantics 2010

I-Semantics will take place for the 6th time this year in September and it will be co-located again with I-Know in Graz/Austria. This year´s programme shows that Semantic Web and semantic technologies in general are increasingly relevant for all kind of industries:

  • Biomedicine
  • Public administration & Public transport
  • Information technology
  • Libraries
  • Media & Content Industry
  • E-commerce
  • Education etc.

450 people in 2009

I-Semantics “Industry Track” with its 3-days programme full of demos is one of the highlights of the congress. With 28 submissions this year´s Triplification Challenge tells a lot about the significance of Linked Data in areas like librarianship, public administration or GIS & environmental planning. Take a look at the 15 nominees – and if you consider to come to I-Semantics 2010 follow the link for registration.

Sphere: Related Content

Report on developments at the European Semantic Technology Market

June 25, 2010 By: Thomas Thurner Category: Corporate Semantic Web, Enterprise 2.0, Literature & Publications

The present state of development, future trends and expected market scenarios for Semantic Technologies are shown in the just published “Demand driven Mapping Report”. The report is part of the EU-funded project Value It, which is about bringing together the various stakeholders within the sector: Industry, Research and Government. VALUE-IT preliminary findings show that the STE potential market in Europe will size up to €1.44B for 2014. Scanning furthermore the executive summary of the report, some findings attract attention:

The survey results also show considerable variation by sector, both of policy and technology implementation. With respect to technologies, ICT companies are also the most willing to consider semantic approaches. The ICT sector has an unusually high interest in all ST components, with 20% or more being willing to consider all of them, and over half of IT respondents looking at Web 2.0 (social computing). [...]  The use of tagging technologies – which overall is the least mature approach in the survey – is most advanced in Life Sciences. The Life Sciences, Media & Entertainment, and ICT sectors all have a reasonably strong interest in Natural Language Processing (roughly 25% on average). Ontologies and RDF/OWL are the technologies least often considered, though the interest in these Semantic Technologies is not insignificant. Taxonomies are slightly more popular, perhaps indicating that companies are taking the first step to prepare for a more semantic approach to IT solutions. The ICT, Energy & Utilities, and Media & Entertainment sectors all have a reasonably strong interest in using taxonomies.

The 190 pages report gives an actual overview of the status quo on European Semantic Technology Market and is now available for download: Final demand driven mapping Report

Sphere: Related Content

Vienna 01.07.2010 – Panel discussion on the Future Internet

June 22, 2010 By: Thomas Thurner Category: Conferences & Events

Within the last year the SWC’s team run the project called “ZukunftsWeb” (Future Internet). After ten month of in-deep discussion, expert panels, webinars and the becoming of a book on the topic, it’s time to celebrate the past efforts and have also a look into the future. So this is why we want invite friendly to our evening event on july the first. So if you are in vienna that day, join us – we promise a inspiring evening, with nice people and wise talks.

Venue: Filmmuseum Wien
Date/time: 01.07.2010 / 6pm

More about this event in german and english.

RSVP to
eMail FacebookYahoo Upcomingxing

Sphere: Related Content

Stella Dextre Clarke & Alan Gilchrist about the “Future of Knowledge Organization on the Web”

June 21, 2010 By: Andreas Blumauer Category: Linked Data & Open Data, Tools & Software, Vocabularies & Languages

Semantic Web Company (SWC) had the pleasure and the opportunity to talk with two internationally recognised experts in the fields of information management and knowledge organization: Alan Gilchrist and Stella Dextre Clarke. SWC asked some questions about the “Future of Knowledge Organization on the Web & Linked Data” on the occasion of an event of the same name organised by ISKO UK which will take place on September 14, 2010 in London.

1. Alan, you are one of the leading experts in the field of thesaurus construction. Organising knowledge in a (worldwide) Semantic Web is a rather young discipline compared to your domain. What do you think can the Semantic Web community learn from “traditional” thesaurus management and vice versa?

You put inverted commas round the word traditional, but it might be more appropriate to put them round the word thesaurus! So long as words are used in information retrieval and in information sharing, different forms of structured vocabularies will be required, and many of the fundamental principles of thesaurus construction are still valid for their construction. Of course, the “traditional” thesaurus has mutated since the days when it was used only for controlled indexing and retrieval; and now, with the many enrichments possible it can be viewed as an ontology (in one of the definitions of this word). What remains a difficulty is to create a generalisable typology of associative relationships, though this is, of course, possible in relatively closed systems. In short, structured vocabularies with broadly thesaurus formats will be a necessary component in the web stack.

2. Stella, as a consultant you are specialized in the design and implementation of knowledge structures for information retrieval applications. In the last few months we have seen that SKOS can serve as a significant building block to link “traditional” thesaurus management to knowledge structures from the semantic web. Can you see that this development is market-driven, is there a significant growth of demand for solutions built around SKOS?

This question sounds surprisingly sceptical about the growth of SKOS. I guess the dizzying speed of phenomena like Facebook and Twitter has fuelled expectations of tools springing up overnight like mushrooms, fully formed and ready to eat. But actually it takes time, not just for the tools to be fashioned, but for the potential market to develop an understanding of what they can do and what will happen next when they are used.

Applications for SKOS are springing up all the time, as fast as people can grow the skills and vision to deploy them. At the moment the market, or shall we say the power-base, seems to be with the academic sector and allied not-for-profit organisations. This will spread progressively through the public to the private sector, as enterprises find ways of adapting their business models. The main hurdles to overcome could be intellectual property rights and the need for compilers of databases to keep earning their living.

3. Alan, constructing thesauri for the semantic web also means that one has to make the “open world assumption”. In which sense does this change the way to manage thesauri, keep them growing and assure quality? Can you see new, upcoming methodologies to do that?

Everything changes with the “open world assumption”! Following on from my answer to the previous question, it seems clear that one manifestation of the thesaurus will be found in those systems that support interoperability, such as federated searching or metadata registries. Even with simple thesaurus management software, it is possible to construct a “master vocabulary” or “word bank” to support different applications within an enterprise; thereby promoting interoperability. More sophisticated software is already available (though not very widely); more will be needed and, doubtless, will be created.

A more formal answer to both questions will be found in a new standard – ISO 25964, currently being prepared on the basis of BS 8723. The two fundamental features of these two standards are (1) the thesaurus as a theoretical and practical basis for the construction of structured vocabularies for information retieval and (2) the growing and vital need for interoperability between systems and the intelligent mapping of the vocabularies used by those systems.

4. Stella, just recently at ESWC 2010, Sean Bechhofer was asked during his keynote why there are so few SKOS tools on the market. What do you think are the reasons for this? Are there still shortcomings of the SKOS specification compared to other existing thesaurus standards? (see also: http://www.eswc2010.org/program-menu/keynote-speakers/155-sean-bechhofer & http://www.slideshare.net/seanb/skos-past-present-and-future )

Regarding the speed of development, see my reply above. As to shortcomings, did you note in one of Bechhofer’s slides: “Standardisation is necessarily a compromise: Everyone equally unhappy = success!” The SKOS development team took a conscious decision to keep the schema sufficiently simple that it could be applicable to as many different types of KOS as possible.  On the downside, this means SKOS is unsatisfactory for conveying sophisticated features of some thesauri and classification schemes. But by keeping the entry barrier low, more widespread use has been encouraged.

By way of illustration, compare SKOS with the data model and XML schema of BS 8723. This schema is comparatively specialized, with the aim of enabling exchange of any thesaurus carrying any or all of the features recommended in the standard. And incidentally, this data model and schema will have some further capabilities added when published in the forthcoming standard ISO 25964. SKOS does not provide for a number of features in these standards (such as compound equivalence). But the schemas in BS 8723 and ISO 25964 are designed for thesaurus developers to share their work, rather than for easy publication on the Web, and will never have so many users or associated tools as SKOS.

So I believe that SKOS has done well to accept compromises that encourage generalisation although they might not suit some specialists. That said, I do regret one of its weaknesses in the context of mapping. Compound equivalence mappings (that is to say, where Concept A in one vocabulary maps to a combination of Concepts  B and C in another) are very commonly needed when extending a search across multiple databases, and the SKOS mapping properties do not currently allow for them. Perhaps there will be some provision in future?

5. Stella, Alan, in September ISKO UK will organise an event on “The Future of Knowledge Organisation on the Web”. “Linked Data” seems to be a promising approach to organise knowledge in large scale environments.
Could you imagine that SKOS as a small subset of semantic web specifications will play a central role in this environment since it is quite intuitively comprehensible by virtually any knowledge worker or do you rather think SKOS is too simple (or too complex)? (see also: http://poolparty.punkt.at/using-skos-as-an-interface-to-the-linked-data-cloud )

Stella: Of course SKOS will have a central role (whether or not every knowledge worker finds it as intuitive as you suppose). “Linked Data” will find even wider applicability. ISKO-UK (the organiser of the meeting in London on 14 September) has a mission not just to spread the word about both these technologies, but to build bridges between the several communities who must share their expertise and data to build more exciting applications. We’re expecting an audience of over 100 at this low-cost event.

Alan: Yes, of course, just as all the tools in the web stack will be necessary if semantic web technologies are to be effective. But it is obvious that we are dealing with complexities of a higher order than ever before. Any structured vocabulary is an “artificial language” which, while acknowledging many aspects of theoretical linguistics is forced to be pragmatic in its construction. Consequently, it would not be surprising if SKOS is seen to be “catching up”, and this became apparent in the work of BS 8723 when thesaurus models using UML were being constructed. There remains much work to be done on all fronts.

Stella Dextre Clarke is an independent consultant specializing in the design and implementation of thesauri and other knowledge organization structures. She currently leads ISO NP 25964, the project to update and revise the international standards for thesauri. Previously she was the Convenor of the Working Group which developed BS 8723. In 2006 she won the Tony Kent Strix Award for outstanding achievement in information retrieval, in recognition for her development work on IPSV (Integrated Public Sector Vocabulary), as well as on the vocabulary standards. She is a Fellow of the Chartered Institute of Library and Information Professionals.

Alan Gilchrist has been a consultant for many years in the fields of information management and information architecture, specialising in the vocabulary aspects of information retrieval. He is co-author, with Jean Aitchison and David Bawden of Thesaurus Construction and Use, now in its fourth edition. In 1979 he founded and edited the Journal of Information Science, and is now Editor Emeritus. He has an Honorary Degree (D. Litt.) from the University of Brighton and is an Honorary Fellow of the Chartered Institute of Librarians and Information Professionals.

Sphere: Related Content

Kingsley Idehen: “By declaring its context, Linked Data can be made more easily reusable by others”

June 16, 2010 By: Andreas Blumauer Category: Corporate Semantic Web, Enterprise 2.0, Linked Data & Open Data, Tools & Software

Semantic Web Company talked with Kingsley Idehen who is CEO of OpenLink Software and probably one of the most profound experts on data integration issues about “Linked Data”.

The interview covers questions like:

  • How can Linked Data help to make companies more productive?
  • Do you think that the Linked Data Initiative can build upon a stable architecture or will it face more and more problems the bigger the “cloud” will grow?
  • What´s the ultimate argument for an Enterprise Architect to use languages like SPARQL at least in addition to SQL?
  • How will a “Real Time Semantic Web” change the whole game?
  • How will the “Semantic Web” be called in 10 years? Will there still be a “Semantic Web”?

Read the full version of the interview here.

Sphere: Related Content

Lyndon Nixon: “With the hundreds of TV channels available, content selection becomes a significant challenge for users.”

June 07, 2010 By: Tassilo Pellegrini Category: Conferences & Events, Internet & Media

Lyndon Nixon

From June 9 – 11, 2010 the EuroITV Conference discusses latest advances and research of media technology, HCI, media studies, and the content creation community. Tassilo Pellegrini talked to Lyndon Nixon, STI International, about the future role of semantic technologies in the television industry and how a Social Semantic Web might influence the traditional television experience.


At this year’s EuroITV conference you will hold a workshop on the EU project NoTube. Can you give us a brief insight what this project is about?

NoTube is all about the future of television! We are seeing a significant shift in viewing patterns driven by the Web, which breaks the linear programming model and makes TV or video on demand a reality, whether it is being provided directly by the broadcasters or via a third party like Hulu or YouTube. The Web-based model taken up by viewers using their PC is being transferred back to the TV set in the lounge by IPTV applications running on Set Top Boxes or Internet TVs which come with Web access built into them. The strong interaction between the desires of users and technology has had its impact on the Web and as the gap between the Web and TV experience grows, we aim to translate features of the Web to TV, such as the personalised and community aspects. The NoTube European project puts the TV user back in the driver’s seat by generating user profiles from data the user creates on the Social Web, and in this way facilitating a personalised TV experience without an intrusive user profiling process.

What promises does the Social Semantic Web hold with respect to innovate the television experience? What is the vision?

With the hundreds of channels available via modern TV providers, content selection and dealing with the vast amount of TV-related information become significant challenges for users. TV metadata is created and distributed by a small group of people, as a result of the closed-source information exchange protocols that are the standard for providing electronic programme guide (EPG) data to users. Yet people often have several clusters of personal data on the Web, such as their profiles on social networks, or ratings of videos on YouTube and IMDB.

Analogously, there are many isolated clusters of broadcast data on the Web, such as broadcast data on EPGs and background information on Wikipedia. Within the NoTube vision context, we speculate that the conjunction of all these bits and pieces of data provide accurate information on someone’s interests, which is suitable for generating relevant recommendations on TV broadcasts. We see progress on opening up this data with open standards and APIs such as Google’s OpenSocial, Facebook’s OpenGraph, DBPedia, the BBC ontologies and FOAF. Further, we assume that Semantic Web technologies provide important building blocks for realizing this vision, as they enable the global identification mechanism of URIs and the means to define relations between data anywhere on the Web. By integrating these different pockets of data, we can provide TV viewers with personalised recommendations for their viewing.

What economic effects on the value chain do you expect from semantically empowered television? Will there be new revenue opportunities with respect to advertising or Pay TV models?

Our primary focus is on open source and open standards, so for example we are extending the open source MythTV media centre to develop first scenarios of personalised EPGs. However, down the road there are clearly commercialisation opportunities.

Another scenario in the project looks at personalised advertising, which is clearly somewhere where there are revenue opportunities. However, we take user privacy very seriously, and one aspect we need to tackle in NoTube is the fine line between analysing user activity (in order to personalise their TV experience) and using that analysis commercially.

The third NoTube scenario involves pushing personalised news streams to TV viewers. Here, one could imagine that such a service could be packaged within a Pay TV offer, and used to give competitive advantage or justify a higher fee.

Despite many attempts experience has shown that television is a rather conservative and innovation-averse medium. What can be done to stimulate the uptake of semantic technologies in the television sector?

That’s true; in the traditional broadcasting sector the larger companies are extremely slow to adopt new technologies. However, I think Web video and TV has really shook up the sector – traditional broadcasters are seeing that they lose viewer share to Web-based offers and have been quick to take their video material to the Web. There is a clear demand for this, look at the viewing numbers for BBC’s iPlayer in the UK for example.

IPTV also means that new applications and services can be built on top of traditional TV. I think once the broadcasters see the added value of offering applications and services tied into the content of their programming – such as through semantic analysis of the program metadata, which NoTube is doing – they will be encouraged to support better these efforts. The BBC is really taking a lead in this, publishing a lot of their data already in RDF.

Workshop Information

The NoTube workshop on Future Television: integrating the Social and
Semantic Web
will take place at the EuroITV 2010 conference in Tampere, Finland on June 9, 2010.
For more information please see

http://www.euroitv2010.org

and

http://www.notube.tv/news/73-futuretv-2010

For more information about NoTube, please see

http://notube.tv and follow our blog, at http://blog.notu.be

About Lyndon Nixon

Dr. Lyndon Nixon joined STI International as senior postdoctoral researcher in November 2008. Previously he was a researcher at the FU Berlin, where he acted as Industry Area Co-Manager of the EU Network of Excellence KnowledgeWeb and double Workpackage Leader in the EU project TripCom. In KnowledgeWeb, Dr. Nixon organized and led activities promoting the transfer of semantic technology to industry. He received his PhD in January 2007 with the topic ‘Semantic Web enabled Multimedia Presentation system’. His research focus is Web-based TV/video and the semantically guided integration of Web-based content, and he has several publications and has organized a number of workshops around related themes.

Sphere: Related Content

Adrian Pohl: “We believe the Semantic Web plays an important role for the future of libraries.”

May 20, 2010 By: Tassilo Pellegrini Category: Companies & Institutions, Linked Data & Open Data

A group of Cologne-based libraries has taken a big step towards open data. In an concerted action they have relased their catalogue data for reuse on the web. Project manager Adrian Pohl comments on the initiative and what role the Semantic Web will play for libraries in the future.

In March 2010 several Cologne-based libraries have opened their catalogue data under a CC0 license following Tim Berners-Lee’s call for “Raw Data Now!”. What has been the motivation behind this step?

The hbz (“Hochschulbibliothekzentrum des Landes Nordrhein-Westfalen”, english: “North Rhine-Westphalian Library Service Centre”) has come to the conclusion that libraries need to participate in the development of the Semantic Web. The opening of catalog data followed as a necessary first step. Our intention is to show with this first legal-political step how important the legal/licensing dimension is when you publish data on the web, be it Linked Data or not. So for us at the hbz the Open Data initiative primarily is seen as the first step in eventually publishing Linked Open Data just as Tim Berners-Lee had called for.

Other participants in the Cologne Open Data initiative like the Cologne University and City Library focus more on the direct advantages the releasing of raw bibliographic data bings: With other libraries and consortia following this example it will be easy to enrich existing catalog or other bibliographic services with subject headings, classification numbers, tags etc. Also, published raw data is integrated into other web services like Wikipedia which point back to libraries’ services. Indeed, Open Data is an end in itself which should be pursued by more organizations in the library world and beyond it.

The provided data is currently availble in a proprietary but open format. Can you give us some technical description of the published data? Do you have plans in providing more structured datasets in the future?

“Opaque but open” would be the better description of the underlying format because it isn’t proprietary at all. Actually, alongside the data from the hbz union catalog there is data stemming from libraries’ local databases (see http://opendata.ub.uni-koeln.de/ and http://opendata.zbsport.de/). We are using different internal formats. Generally, all the formats are based on the MAB format (an acronym for “Maschinelles Austauschformat für Bibliotheken” which means “Automatic Interchange Format for Libraries”) that is only used in the German and Austrian library world for the data interchange between libraries similar to the better known MARC format (Machine-Readable Cataloging) of the Library of Congress. It was developed in the 1970s for storing data on magnetic tape. The format documentation can be viewed on the German National Library’s webpages.   As the format is nearly 40 years old, the processing of MAB data is very cumbersome on modern computers. Therefore, the hbz provides an encapsulation method called “generic format”, where the historic data records of the library catalogs are unwrapped into a more common, user-friendly scheme. Each record is placed into a Unicode UTF-8 encoded file, containing all the MAB fields, each of them separated by line feeds, and the whole record set of a library is forming a “tar” archive, which is compressed afterwards to save space.   It is possible to dump those archives by a usual unpack tool. This software is available on all known Windows/Linux/Unix platforms. Or you can use a simple Perl helper script provided by hbz. More tools and scripts, even in other programming languages, are in preparation for publication.   The opaqueness and the age of the standards used in the library world (the english standard MARC which is used worldwide doesn’t differ in these respects from MAB) make it necessary to change to a more open and widely adopted standard. That’s where Linked Data comes into play which is based on the accepted and widespread standards HTTP and URIs. The construction of RDF out of the library catalog raw data is a very sophisticated design task. Our plans are to convert the existing data to RDF using proper vocabularies which enable us to lose as little information as possible and giving access to the data by providing a SPARQL endpoint.

Currently the data you provide is open but not yet linked. What are your plans when it comes to contribute to the Linked Data Cloud?

I have to go into greater detail to answer this question properly. Viewed simply, the data of library institutions can be divided into two broad types: authority data and bibliographic data. Authority data splits up in data about people, about corporate entities and about subject headings. In Germany, authority data is maintained centrally by the German National Library in cooperation with the six German library consortia. Bibliographic databases consist of records about books or rather editions of books. Authority data and bibliographic data are already heavily linked, for instance a bibliographic record contains the author’s or editor’s authority number which links to the corresponding authority record.   The German National Library is also working on migrating library data, especially authority data, into the Semantic Web. They recently made their Linked Data prototype for authority data publicly available. We have already taken first steps to cooperate and coordinate our efforts. The colleagues at the German National Library have recently developed a Linked Data prototype for their authority data. As they take care of authority data we focus ourselves on bibliographic data. At the moment we are exploring the technology and vocabularies for publishing bibliographic data as Linked Data. That’s a demanding task because besides the known vocabularies like Dublin Core or the Bibliographic Ontology (Bibo) which don’t fully map to the density and structure of the information in the catalogs, there has been several years’ work on the new comprehensive cataloging standard RDA (Resource Description and Access) for which a RDF representation has been developed. However, RDA in RDF needs to be modified a lot so that it can be applied to our bibliographic data. We are currently working on a vocabulary for the union catalog’s data based on existing vocabularies like Bibo and RDA.   Of course, as soon as we will have published bibliographic data as linked data we will start linking to hubs in the Linked Data Cloud like DBpedia or GeoNames.

Publishing data to the LOD Cloud is one thing. Consuming data is another. Have you plans to integrate data from the LOD Cloud into your systems? Do you have policies for quality assurance?

Of course the possibility to incorporate data from other sources easily is one major reason for us to publish Linked Data besides the goal of making libraries’ data an integral part of the web. Enriching our data with other data and providing new services through and with mashups would be a main reason to link to other data. We are, however, not working on such projects yet, because we first need to convert our legacy data to RDF.

What role will the Semantic Web play for libraries in the future?

We believe the Semantic Web plays an important role for the future of libraries. Discussions about “Next Generation Catalogs” are a recurring theme in the library world since the 1990s. It is time to finally act and move our data enprisoned in opaque formats to a new level by improving its structure and underlying technology and by migrating to formats that can be easily consumed by others who are not part of the library world. Joining the Linked Open Data community seems to us the best way to go.   Also, the production, publication and dissemination of academic literature is subject to ongoing and fundamental changes which have far-reaching implications for the work of academic libraries and their role in research and education. We believe that semantic markup and interlinking will play an important role in the development of knowledge production and thus indirectly will have great impact on libraries. Clearly, the Semantic Web can’t be cancelled out of the future of libraries.

Moreover, turning your question around, libraries could play an important role for the future of the Semantic Web. Libraries are trusted institutions and deeply grounded in our culture. As indicated above libraries have produced linked data (again: lower case) since the time of card catalogs. We undoubtly have some practice in producing and curating linked data which should be worth a lot to the Semantic Web community. We thus think libraries are predestinated for helping to coninuously order the messy place the Semantic Web always will be and ensuring its trustworthiness and stability.

About Adrian Pohl

Adrian Pohl is working at the Cologne-based North Rhine-Westphalian Library Service Center on Open Data, Linked Data and its conceptual, theoretical and legal implications. He regularly writes at Übertext: Blog about the internet, libraries and metadata, Linked Open Data, communication, epistemology and the like. He has studied communication science and philosophy in Aachen and is currently studying Library and Information Science at the Cologne University of Applied Science. You can follow him on Twitter: http://twitter.com/acka47.

Sphere: Related Content