Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Companies & Institutions’

What if the biggest web company bought one of the central semantic web players?

July 17, 2010 By: Andreas Blumauer Category: Companies & Institutions, Search Engines 4 Comments →

Well, exactly this happened yesterday: Google bought Metaweb – provider of Freebase. Freebase is an important hub in the linked data cloud providing 12 million entities with uniform resource identifiers most of them linked to other semantic web datasets like DBpedia or New York Times. For example: Google´s page on Freebase offers a rich source for machine-readable facts around this company.

What does this mean to the Semantic Web Community which has  been working on a smarter web in the last decade?
Well, a lot… First of all, it´s good to hear that Google will continue to develop Freebase as a free and open database to everyone, saying “… we would be delighted if other web companies use and contribute to the data.”

Until yesterday still a lot of companies were not fully convinced if the Semantic Web will play a central role in the further development of the Internet. Now the game has changed. The entity-driven approach to develop web applications has just started now:

We will keep on reporting and discussing how Google will influence the development of the Semantic Web – and if I had a wish for free: Please add RDF(a) to the Freebase widgets!

Sphere: Related Content

Adrian Pohl: “We believe the Semantic Web plays an important role for the future of libraries.”

May 20, 2010 By: Tassilo Pellegrini Category: Companies & Institutions, Linked Data & Open Data No Comments →

A group of Cologne-based libraries has taken a big step towards open data. In an concerted action they have relased their catalogue data for reuse on the web. Project manager Adrian Pohl comments on the initiative and what role the Semantic Web will play for libraries in the future.

In March 2010 several Cologne-based libraries have opened their catalogue data under a CC0 license following Tim Berners-Lee’s call for “Raw Data Now!”. What has been the motivation behind this step?

The hbz (“Hochschulbibliothekzentrum des Landes Nordrhein-Westfalen”, english: “North Rhine-Westphalian Library Service Centre”) has come to the conclusion that libraries need to participate in the development of the Semantic Web. The opening of catalog data followed as a necessary first step. Our intention is to show with this first legal-political step how important the legal/licensing dimension is when you publish data on the web, be it Linked Data or not. So for us at the hbz the Open Data initiative primarily is seen as the first step in eventually publishing Linked Open Data just as Tim Berners-Lee had called for.

Other participants in the Cologne Open Data initiative like the Cologne University and City Library focus more on the direct advantages the releasing of raw bibliographic data bings: With other libraries and consortia following this example it will be easy to enrich existing catalog or other bibliographic services with subject headings, classification numbers, tags etc. Also, published raw data is integrated into other web services like Wikipedia which point back to libraries’ services. Indeed, Open Data is an end in itself which should be pursued by more organizations in the library world and beyond it.

The provided data is currently availble in a proprietary but open format. Can you give us some technical description of the published data? Do you have plans in providing more structured datasets in the future?

“Opaque but open” would be the better description of the underlying format because it isn’t proprietary at all. Actually, alongside the data from the hbz union catalog there is data stemming from libraries’ local databases (see http://opendata.ub.uni-koeln.de/ and http://opendata.zbsport.de/). We are using different internal formats. Generally, all the formats are based on the MAB format (an acronym for “Maschinelles Austauschformat für Bibliotheken” which means “Automatic Interchange Format for Libraries”) that is only used in the German and Austrian library world for the data interchange between libraries similar to the better known MARC format (Machine-Readable Cataloging) of the Library of Congress. It was developed in the 1970s for storing data on magnetic tape. The format documentation can be viewed on the German National Library’s webpages.   As the format is nearly 40 years old, the processing of MAB data is very cumbersome on modern computers. Therefore, the hbz provides an encapsulation method called “generic format”, where the historic data records of the library catalogs are unwrapped into a more common, user-friendly scheme. Each record is placed into a Unicode UTF-8 encoded file, containing all the MAB fields, each of them separated by line feeds, and the whole record set of a library is forming a “tar” archive, which is compressed afterwards to save space.   It is possible to dump those archives by a usual unpack tool. This software is available on all known Windows/Linux/Unix platforms. Or you can use a simple Perl helper script provided by hbz. More tools and scripts, even in other programming languages, are in preparation for publication.   The opaqueness and the age of the standards used in the library world (the english standard MARC which is used worldwide doesn’t differ in these respects from MAB) make it necessary to change to a more open and widely adopted standard. That’s where Linked Data comes into play which is based on the accepted and widespread standards HTTP and URIs. The construction of RDF out of the library catalog raw data is a very sophisticated design task. Our plans are to convert the existing data to RDF using proper vocabularies which enable us to lose as little information as possible and giving access to the data by providing a SPARQL endpoint.

Currently the data you provide is open but not yet linked. What are your plans when it comes to contribute to the Linked Data Cloud?

I have to go into greater detail to answer this question properly. Viewed simply, the data of library institutions can be divided into two broad types: authority data and bibliographic data. Authority data splits up in data about people, about corporate entities and about subject headings. In Germany, authority data is maintained centrally by the German National Library in cooperation with the six German library consortia. Bibliographic databases consist of records about books or rather editions of books. Authority data and bibliographic data are already heavily linked, for instance a bibliographic record contains the author’s or editor’s authority number which links to the corresponding authority record.   The German National Library is also working on migrating library data, especially authority data, into the Semantic Web. They recently made their Linked Data prototype for authority data publicly available. We have already taken first steps to cooperate and coordinate our efforts. The colleagues at the German National Library have recently developed a Linked Data prototype for their authority data. As they take care of authority data we focus ourselves on bibliographic data. At the moment we are exploring the technology and vocabularies for publishing bibliographic data as Linked Data. That’s a demanding task because besides the known vocabularies like Dublin Core or the Bibliographic Ontology (Bibo) which don’t fully map to the density and structure of the information in the catalogs, there has been several years’ work on the new comprehensive cataloging standard RDA (Resource Description and Access) for which a RDF representation has been developed. However, RDA in RDF needs to be modified a lot so that it can be applied to our bibliographic data. We are currently working on a vocabulary for the union catalog’s data based on existing vocabularies like Bibo and RDA.   Of course, as soon as we will have published bibliographic data as linked data we will start linking to hubs in the Linked Data Cloud like DBpedia or GeoNames.

Publishing data to the LOD Cloud is one thing. Consuming data is another. Have you plans to integrate data from the LOD Cloud into your systems? Do you have policies for quality assurance?

Of course the possibility to incorporate data from other sources easily is one major reason for us to publish Linked Data besides the goal of making libraries’ data an integral part of the web. Enriching our data with other data and providing new services through and with mashups would be a main reason to link to other data. We are, however, not working on such projects yet, because we first need to convert our legacy data to RDF.

What role will the Semantic Web play for libraries in the future?

We believe the Semantic Web plays an important role for the future of libraries. Discussions about “Next Generation Catalogs” are a recurring theme in the library world since the 1990s. It is time to finally act and move our data enprisoned in opaque formats to a new level by improving its structure and underlying technology and by migrating to formats that can be easily consumed by others who are not part of the library world. Joining the Linked Open Data community seems to us the best way to go.   Also, the production, publication and dissemination of academic literature is subject to ongoing and fundamental changes which have far-reaching implications for the work of academic libraries and their role in research and education. We believe that semantic markup and interlinking will play an important role in the development of knowledge production and thus indirectly will have great impact on libraries. Clearly, the Semantic Web can’t be cancelled out of the future of libraries.

Moreover, turning your question around, libraries could play an important role for the future of the Semantic Web. Libraries are trusted institutions and deeply grounded in our culture. As indicated above libraries have produced linked data (again: lower case) since the time of card catalogs. We undoubtly have some practice in producing and curating linked data which should be worth a lot to the Semantic Web community. We thus think libraries are predestinated for helping to coninuously order the messy place the Semantic Web always will be and ensuring its trustworthiness and stability.

About Adrian Pohl

Adrian Pohl is working at the Cologne-based North Rhine-Westphalian Library Service Center on Open Data, Linked Data and its conceptual, theoretical and legal implications. He regularly writes at Übertext: Blog about the internet, libraries and metadata, Linked Open Data, communication, epistemology and the like. He has studied communication science and philosophy in Aachen and is currently studying Library and Information Science at the Cologne University of Applied Science. You can follow him on Twitter: http://twitter.com/acka47.

Sphere: Related Content

Attending TopQuadrant’s SemWeb Technology Training

October 14, 2009 By: Thomas Schandl Category: Companies & Institutions, Tools & Software No Comments →

There’s a lot to know about semantic standards, languages, technologies and their application, so last week I attended TopQuadrant’s first European training from Oct 5th to 9th in Amsterdam.

We kicked off with Eddy Vanderlinden elaborating on the lessons he learned from 30 years of work in the financial sector. He outlined how improvements could be achieved by using data models relying on semantic web standards. You can read about his ideas in this essay.

TQ’s chief scientist Dean Allemang then continued with his talk “Enabling Creativity at the Edge”. “The edge” refers to the boundary between an information system and the real world, where the end users of a system work. As business needs change faster and faster, the people working at the edge need to be able to adapt the company’s applications on their own and shape them to their everyday needs.

Dean Allemang

Dean Allemang

Nowadays end user often achieve this kind of creativity on the edge by using self-made spreadsheets. The problem with that is their lack of interoperability. These data from different spreadsheets, databases, reports, etc. are often connected through business processes that rely on repetitive and error prone human processing, like copying things from a spreadsheet to a database, creating a report and pasting its result into another system, and so on.

The result is a complex system with many heterogenous parts and an organisation that cannot possibly know what it knows.

As a solution Dean proposed to “think outside the table” and go beyond the relational database way of orgranising data. This of course can be achieved by integrating the data using semantic technologies. TopQuadrant’s software offers possibilities to do just that, and makes it possible to create highly customizable dashboards and applications that all rely on the same data.

During the following days we learned about the ins and out of using semantic standards and languages and tried out TopBraid tools in several hands-on excercises. The TopBraid Suite is a very powerful, commercial toolkit. It includes TopBraid Composer, Live and Ensemble. Composer is a semantic web modeling and application developement tool, that uses the Eclipse framework. TopBraid Live is a server for semantic applications built with TopBraid Ensemble. Ensemble is a graphical application assembly toolkit, that enables end users to create custom apps that run in a browser and use RDF data and data models – thereby allowing for the above mentioned “creativity at the edge”.

I am very impressed with the capabilities of these tools, they enable the user to realize manifold possibilities that come with using semantic web standards – and that without programming. You can see some of these tools in action and learn about applying semantic standards in a series of webcasts from Semantic Universe. For the latter topic you might also attend one of our webinars.

On the last day Dean coverd several case studies, like connecting ontologies to legacy data sources (using e.g. D2RQ inside Composer), applying semantic technologies to the customer service management of a larger retailer or using ontologies in Federal Enterprise Architecture.

All in all I am very happy to have attended TopQuadrant’s training and hope they will establish a successful series of trainings in Europe just as they did in the US.

Sphere: Related Content

Tom Tague on Open Calais 4

January 29, 2009 By: Thomas Schandl Category: Companies & Institutions, Linked Data & Open Data, Mashups & Web services, Text Mining, Tools & Software No Comments →

The recent release of Open Calais v4 offers excting new possibilities by making a great contribution to Linked Data efforts.

Previous releases of Thomson Reuter’s Open Calais web service already produced promising results by extracting named entities, facts and events from user submitted contet – especially news articles. Now these extracted concepts come with an URI and are linked into the LOD cloud – specifically to DBpedia, Freebase, Musicbrainz, CIA world fact book and others. Tom Tague

On this occasion Tom Tague, vice president of the Calais creators ClearForest, answered questions the Semantic Web Company had about the goals of Open Calais. 

The latest release of Open Calais produces metadata conforming to linked data principles. You provide this great service free to everyone via your web service.
What led to that decision, which benefits are there for Thomson Reuters?

Thomson Reuters has the largest trusted content sources in the world – but we don’t have all the content in the world. We believe that the world is going to want to integrate highly managed and trustworthy content assets such as those provided by Thomson Reuters with the low latency, highly diverse content exploding on the web. Fundamentally what we’re trying to achieve is nearly effortless interoperability of content between any two partners – Calais enables this by extracting the semantic metadata buried in your content but then takes it a step further. By linking those semantic elements to the Linked Data cloud we are setting the stage for the dramatic enhancement of any content source – and we hope that many will choose Thomson Reuters as one of the methods for enhancing that content.

It seems with Open Calais you use a hybrid business model, which integrates end users in a form of enterprise collaboration into value creation.
Do you think such a business model is viable during the long run and what are your experience so far?

As of right now Calais isn’t truly a “Business”.  It’s a strategic initiative that’s setting at least a piece of the stage for the Linked Content Economy. Our goal is to understand how this new content economy is going to involve and to make certain that we have a leadership position as it moves from a concept to reality.

Apart from the thousands of users submitting content to Open Calais, there is also a community of developers making their own applications around your core app. How important are the social dynamics of the Open Source community for the success of Open Calais?

Extraordinarily important. Calais is a web service – which means it’s relevant to about 0.0001% of the population. We are absolutely reliant on the creativity, energy and domain expertise of our developer community to translate Calais from a technology to an end-user relevant capability. And – as a user-driven project we also rely on our developers and users to give us feedback on what they like, what they don’t and where they think we should head.
What are your plans regarding to offering your service in German?

We hope to get there in 2009. We’ve released basic French and are gearing up for additional languages in the coming year.

Thank you, Tom, for your answers! We look forward to more applications like Semantic Proxy and Linked Facts that demonstrate the great protential of the Calais engine.

Sphere: Related Content

OntoWiki Kick-off in Leipzig

December 03, 2008 By: Andreas Blumauer Category: Companies & Institutions, Conferences & Events, Linked Data & Open Data, Ontology Engineering, Search Engines, Semantics & Philosophy, Social Software 1 Comment →

Virtuoso+DBpedia+OntoWiki together with several industry relevant uses cases – that´s about the formula of the OntoWiki project, which was launched yesterday in Leipzig.

Sören Auer and his team from AKSW at Uni Leipzig are the coordinators of this EU funded project which supports the development of innovative software products. All industry partners are SMEs which offer services for different fields like E-learning, E-tourism or Business Intelligence. Leipzig and OpenLink Software will work on an integration of OntoWiki & Virtuoso.

The first day of the meeting was, of course, dedicated to socialize and get to know each other. The mixture of the project team turned out to be well chosen – and in the evening we flew at higher game: We had a nice overview over Leipzig standing on the highest building of the town.

On the second day of the meeting Orri Erling, Program Manager at OpenLink Software, came up with an idea which is pretty forward: Why shouldn´t we provide OntoWiki as a Linked Data Browser, e.g. on top of DBpedia etc.? One possible outcome of this project.

Some other use cases which make already use of the existing OntoWiki system were demonstrated: Take a look at Vakantieland (…and start to plan your holidays in the Netherlands) and also at LinkedGeoData where a nice user interface can be tried out.

The Kick-Off Meeting will proceed with two workshops dedicated to semantic technologies and to Application Development with the OntoWiki Framework. Thanks to Sören and his team for the excellent hosting of this event!

Sphere: Related Content

EU Parliament backs the rights of internet users

October 10, 2008 By: Tassilo Pellegrini Category: Companies & Institutions, Miscellaneous, Politics, Privacy & Information Ethics No Comments →

For the past several months the EU Commission and the EU Parliament were struggling over the so called “Telecom Package“, a legislative initiative promoted by the Commission under heavy advocacy of France. In a nutshell the Telecom Package contains a very problematic passage, which is meant to strengthen the rights of ISPs in being able to cut off the internet access of individual users, if any violations of existing or future copyright law were detected. In other words: ISPs would be able to control who gets access to the internet, violating the universal service doctrine, which is a basic cornerstone of democracy.

In their first reading on September 24, 2008 the European Prarliament voted against the the “Telecom Package” advocating the so called “Bono Amendment” – which refers to the French Socialist MEP Guy Bono – which basically states that that courts need to be involved in any disconnection procedure. In the original passage, quoted in a recent EU Observer article, it says:

No restriction may be imposed on the rights and freedoms of end users … without a prior ruling by the judicial authorities.”

This decision has some relevant implications for any future developments of the internet. While the telcos and the media companies are struggling hard to adapt to the social logic the internet, searching for new business models and lobbying for regulation in their favour, it is obvious that the existing abundance and innovativeness of the internet is hardly compatible with their notion of making money on the web – basically by restricting access and promoting artificial scarcity.

It also is relevant to developments like Linking Open Data, as in an increasingly interconnected and mashupped world it is getting harder and harder to comply with strict and rigid copy- & usage rights policies – even if they are published under any sort of commons license. In this respect it is important to mention that research on judicial problems arising from the automated processing of content released under differing commons licenses is still missing (as far as I know – does anybody have a hint for me?). But with the current decision of the European Parliament we can observe a very promising shift in the notion that the internet is made up of much more than its commercial exploitability. And that any attempt to stiffle this notion by imposing unbalanced regulatory restrictions on the rights of the users is a major threat not just to the internet as it exists but to democracy itself.

In this respect enjoy a great talk of Lawrence Lessig on this topic.

Reblog this post [with Zemanta]
Sphere: Related Content

LEARNtec Forum Austria 2008

October 07, 2008 By: Andreas Blumauer Category: Companies & Institutions, Conferences & Events No Comments →

End of this week Learntec Austria 2008 is going to take place in Vienna. I am going to give a short talk on “How to learn in the social semantic web” (in German) and I am already excited, because I can discuss some interesting questions: How has the process of learning changed over the last few years? Why is there such a big gap between all the new possibilities like “Learning Communities”, “Collaborative Platforms”, “Enterprise 2.0″ and the reality of Learning in all kinds of organisations?

I have done a lot of teaching in the last few years (especially at Universities of Applied Sciences) and in reality there is no systematic approach how New Media is used, neither for teaching at schools nor for continuing education at most companies.

But as time passes by more and more students and employees are fully aware of platforms like elgg, busuu or helpful search tools like cluuz.

Soon they will be asking: Why do we still moodle around?

Author: Andreas Blumauer

Sphere: Related Content

EU Commission started a Consultation on the Internet of Things aka Web 3.0

October 07, 2008 By: Tassilo Pellegrini Category: Companies & Institutions, Politics, Privacy & Information Ethics No Comments →

A few days ago I wrote about the the EU Commission’s definition of Web 3.0. Now they started a consultation on that topic.

To be precise it is about “early challenges regarding the Internet of Things”.

And it will focus on

architectures, control of critical infrastructures, emerging applications, security, privacy and data protection, spectrum management, regulations and standards, broader socio-economic aspects.

Contributions can be sent to infso-iot-europe@ec.europa.eu by 28th November 2008.

Take your chance! Visit their consultation site.

Author: Tassilo Pellegrini

Sphere: Related Content

EU Commission’s (short sighted) Definition of Web 3.0

September 30, 2008 By: Tassilo Pellegrini Category: Companies & Institutions, Politics 3 Comments →

An interesting article for all those who are interested in technology discourse. In a recent VNU.net post the EU Commission made a statement about their understanding of Web 3.0:

While Web 2.0 described the trend towards online collaborative working, including the evolution of social networking sites, wikis and blogs, Web 3.0 will rely on high-performance broadband infrastructure.

According to Viviane Reding, Commissioner for Information Society and Media, Web 3.0

means seamless, anytime, anywhere business, entertainment and social networking over fast reliable and secure networks. [...] It means the end of the divide between mobile and fixed lines. We must make sure that Web 3.0 is made and used in Europe.

This sounds to me like selling old wine in new bottles, revitalising the Commission’s infrastructure policies. And altough broadband is a crucial factor in the evolution of the web, the Commission totally misses the point about the semantics-related innovation paths rolling out in a Web 3.0-scenario.

If anyone has the opportunity, please give Mrs. Reding a briefing!

Reblog this post [with Zemanta]
Sphere: Related Content

“75 Bleeding Edge Search Engines” … according to CMS Wire

August 29, 2008 By: Tassilo Pellegrini Category: Companies & Institutions, Search Engines, Tools & Software 1 Comment →

This article on CMS Wire from July 10, 2008 is a nice read for all search engine afficionados. It lists 75 web search engines and categorizes them according to their technology and application domain. In the category “Semantic and Natural Language Search Engines” you will find usual suspects like Powerset, Hakia, Swoogle, Intellidimension, Falcon, Yahoo! Microsearch, SWSE and Watson. Sadly not in the list is Freebase Parallax, but I doubt they have been online by then.

Personally I like the category “Bizarre / Strange Search Engines” most. Ever used Ms. Dewey, a Microsoft search engine …

that sings to you, insults you and lives under a highway flyover in a futuristic cityscape?

At first sight a really good laugh. Discussing it from a gender perspective would definitely be worth another post.

Sphere: Related Content