Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Mashups & Web services’

Interview with Georgi Kobilarov: “I believe that data publishing must happen in a distributed style.”

March 26, 2010 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Tools & Software 1 Comment →

Uberblic.org connects structured data from the web. The Berlin-based inventor Georgi Kobilarov gives a brief insight into the mashup service and talks about the challenges when it comes to build applications upon linked data.

You have recently published the service uberblic.org, a Linked Data mashup editor. What was your motivation to develop this tool?

Uberblic.org provides an integrated view of web data. Our goal is to integrate all the structured data on the web, and give web-developers a single point to access to that reconciled data. More than that, we will open up the tools we use to manage the data sources to the community, so that the people can help us curating that repository of free data. We re-publish all the data we import as Linked Data, under the licenses of the original data publishers.

Some of the data sources we import are available in the Linked Open Data cloud as well, but many are not. Linked Data is an elegant way to publish data in a distributed way on the web, but consuming it from that distributed cloud is – at least – impractical. In every real-world application using linked data from the web I’ve seen, organizations built up internal copies of the cloud, and often even reconcile linked data sources. They build their own Linked Data proxies. Uberblic.org helps those users by providing one public proxy for data from the web. Many of our sources get monitored for data changes, and the according data in uberblic is updated in real-time.

uberblic

Can you give us a brief insight how the tool works? What technology is is built on?

My company, Uberblic Labs, has developed a data integration platform that we use to power uberblic.org. We call it the Uberblic Platform (the name uberblic is derived from the German “Überblick” – English “overview”). This platform enables us to do the full process of “data fusion”: Importing and converting external data sources, mapping the data schemas to a central ontology, filtering out data errors, automatically suggesting duplicates to the user, and merging data from different sources into a single, reconciled representation.

Structured and semi-structured data from the web is an excellent use case for our software platform, since there we come across all the interesting cases of real-world data heterogeneity. But what I think is especially powerful and yet missing in other Linked Data projects I know, is the ability to subscribe to update-feeds. We do that extensively, fetching updates in real-time from Wikipedia and the like.

Our platform is built in Scala and runs a on cluster of machines, with workers communicating through a messaging system. We developed an RDF storage layer on top of a distributed key-values store for storing all provenance information used in the extraction process, currently around 100 million named graphs for uberblic.org. That storage layer does not directly provide SPARQL access, so we push all the output data into a SPARQL endpoint hosted by Talis as well.

What have been the biggest challenges in tackling the integration issues of dispersed data?

It was quite a steep learning curve to do Linked Data not only in an academic environment, but in a reliable, industry-strength set-up. In academia, there was always the excuse that things are just research prototypes. Now that excuse is gone. That’s also where it becomes necessary to manually clean up data. And there are two ways to do that: Either you enable the users to change facts directly in your repository after you have imported the external data (that is what Freebase does), or you facilitate clean-up cycles in the original data source and fetch these updates in real-time. That is what we do.

I believe that data publishing must happen in a distributed style, because then each data source gets taken care of by a specialized group of people using specialized tools. And it’s what you see not only on the web, but also inside organizations and enterprises. But consuming data trough centralized APIs is more than just convenient. We all use Google
or another search engine as a central access point to web pages which are published in a distributed way all over the web, don’t we? Can you imagine today researching a topic on the web without the centralization power of search engines, just by following links across web sites, like in the old days?

When we built the Uberblic Platform, some of the things I imagined to be large headaches, like schema mapping, turned out to work really well. Those pathologic cases you often see in academic “challenges” are – well – pathologic. It’s not necessary to solve them fully automatically through super-intelligent algorithms. Much more important than the sophistication of your algorithms are well designed workflows so that the user becomes a part of the solution. And that’s not about crowd-sourcing or swarm intelligence, the editorial curating of schema mappings and object reconciliation can be done just by a small team of people. If they have the right set of tools.

What are the next plans with uberblic.org? Where will the journey go?

Uberblic.org will continue to integrate more interesting and useful data sources from the web, and we will start making more APIs available to web developers to build their applications on top. We are also looking for partners who are interested in developing applications and have been struggling in the past to get the cross-source data from the web they need.

The work on improving uberblic.org will also benefit our Uberblic Platform, and hence our clients who use that same software for integrating organizational data sources with each other and with the web of data.

About Georgi Kobilarov

Georgi is founder and managing director of Uberblic Labs, a company based in Berlin specialized in Linked Data integration. He worked as a research associate in the Web-based Systems Group at Freie Universität Berlin and as a visiting researcher at Hewlett Packard Labs Bristol. As co-founder and lead developer of DBpedia, he was also a day-one contributor to the Linking Open Data project. Georgi is consulting with the BBC on several Linked Data related projects. He organizes the Web of Data Meetup London, a bi-yearly gathering of the UK Linked Data community. Georgi graduated with a Diplom in business administration from Freie Universität Berlin and has many years of work experience as a software developer. Visit his blog: http://blog.georgikobilarov.com

Sphere: Related Content

George Anadiotis: “Linked Data brings value by offering an alternative approach to lightweight data integration and mashups.”

December 10, 2009 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Software Development, Tools & Software, Vocabularies & Languages No Comments →

george-imcGeorge Anadiotis is an expert on artificial intelligence with academic roots at the Vrije Universiteit, Amsterdam. In February 2009 he took the position as R&D Director at the Greek technology company IMC. I met him in September at I-SEMANTICS 2009 where he and his team contributed to the Triplification Challenge. In their paper Linked Data for the Masses they were pondering about the pragmatic value of Linked Data from an inbound and outbound perspective.  In his words:

We started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Invited Talk at IFRA 2009

September 25, 2009 By: Tassilo Pellegrini Category: Calls & Competitions, Conferences & Events, Internet & Media, Knowledge Management, Mashups & Web services, Social Software No Comments →

I will give a talk about the relevance of Semantic Web and Linked Data for news publishers at this year’s IFRA summit in Vienna on October 15, 2009. IFRA is the World Association of Newspapers and News publishers and within their Technical Group Publishing they are starting to deal with Semantic Web. Further invited speakers are Michael Steidl (IPTC) and Robert Schmidt-Nia (dpa mediatechnology).

Reblog this post [with Zemanta]
Sphere: Related Content

Calais, Zemanta or textwise?

July 07, 2009 By: Andreas Blumauer Category: Mashups & Web services, Text Mining 2 Comments →

Beside W3C´s Linked Data Initiative, it were semantic services like Calais, Zemanta or textwise which have made the advantages of the Semantic Web visible for a broader community in the last few months.

Each of those services follow a slightly different approach, but in a nutshell: They all offer an API to provide “similarity search” around social media or also to enhance enterprise information management.

Like a magic bullet those services offer a relief from information overflow and seem to become kind of a “semantic web killer application“.

If you´re familiar with one or many of those services, drop a comment and let us know, what you´ve been experienced so far, or also if you can think of any applications or further developments you would like to see around these kind of services.

If you are not familiar with this stuff, for a quick demo go to

The widget uses text from this blog to calculate similar stuff from the web.


Reblog this post [with Zemanta]
Sphere: Related Content

Chris Bizer talks about the commercial opportunities of linked data

April 17, 2009 By: Tassilo Pellegrini Category: Corporate Semantic Web, Linked Data & Open Data, Mashups & Web services, Privacy & Information Ethics No Comments →

bizerIn a recent interview Prof. Chris Bizer from FU Berlin gave some insights into the commercial opportunities of linked data. In the short run he predicts three application areas:

I think we will see a growing number of applications that use data from the public Web as background knowledge to offer better search capabilities and to augment local content with additional content from the Web of Data.
[...]
Beside of the classic search engines, there might also be market opportunities for new search engines that specialize on Linked Data. [...] This will allow them to sell access to cleaned views on the Data Web and to become central components within Linked Data applications.
[...]
Within the corporate market, there is interest in using Linked Data as a lightweight, pay-as-you-go data integration technology.

Additionally Chris comments on the latest developments in the area of triple stores and D2RQ, and the necessity for more privacy awareness and information accountability in an increasingly interlinked world.

Read the full interview on our homepage.

Reblog this post [with Zemanta]
Sphere: Related Content

Keep the Semantic Web trusty

March 13, 2009 By: Thomas Thurner Category: Corporate Semantic Web, Mashups & Web services, Politics, Privacy & Information Ethics, Text Mining 1 Comment →

Tim Berners-Lee at a Podcast Interview
Image via Wikipedia

In recent days – here at Semantic Web Company – we have had a lot of discussions on how the future of the Semantic Web (name it Web3.0 if you like) will develop. Several stakeholders on the future of the Semantic Web see already, that also a potential danger will come along with the technical realisation of the web3.0: This is the present possibility to create applications and mashups with semantic technologies that are a real drain on privacy and information ethics. Without an underpinning discussion about the ethical framework within technolgies like linked data, text-mining, biometric-systems and geo-systems in combination with the web of data, the whole domain is in danger to be doomed like genetic engineering some years ago.

It’s crucial for the public opinion on the Semantic Web, to adress the immanent risks regarding privacy and ethics. In this context I’ll see also Tim Berners-Lee’s statement yesterday: “W3C wants to help make sure data use is appropriate,” he said. Berners-Lee, who is director of W3C, said in an interview on Wednesday that the teams working on the Semantic Web project are making sure that privacy principles are included in its architecture: “The Semantic Web project is developing systems which will answer where data came from and where it’s going to — the system will be architectured for a set of appropriate uses.”

Maybe it’s an important step in keeping the further development of Semantic Web trusty in the eyes of public opinion, that the W3C has privacy and information ethics on their agenda and persons like Berners-Lee stand with their reputation for it. But it is also crucial to build this awareness on the corporate side. Only if everyone within the domain follows a common ethic understanding we have a public opinion, which is on the future potential of the Semantic Web, and not in fear of the same.

Reblog this post [with Zemanta]
Sphere: Related Content

Semantic-like tools to pimp your blog

March 09, 2009 By: Thomas Thurner Category: Mashups & Web services, Search Engines, Tools & Software 1 Comment →

Presently more and more tools come up in the Web 2.0 – Domain, which bring semantic technologies into blogger´s everyday life. Zemanta was for sure a break-through in annotation of blog entries. I’m running this service on my private and my corporate blog. It is easy to integrate in every common blog-software and it is really a save of time in my daily work. Unfortunaly it is avaible only for english blogs.

bild-2Another service which came up recently is Quintura, which provides search capabilities for your own blog with a visual map of tags or hints based on an index created of the own blog entries. It is easy to customize for the own blog’s style with the use of a simple interface. Quintura offers code-snippets to copy to your blog-post or sidebar. Even if it is no semantic search engine in the narrow sense, Quintura provide a fine semantic-like interface for a meaning-sensitive search. See how Quintura is implemented into The Semantic Puzzle at our sidebar.

Reblog this post [with Zemanta]
Sphere: Related Content

Linked Data in Enterprises – some ideas for business models

February 10, 2009 By: Andreas Blumauer Category: Linked Data & Open Data, Mashups & Web services 7 Comments →

Today in the morning, I wrote a short blog philosophizing about linked data and the value for enterprises. I asked a couple of questions and in its core I was wondering: “Which services and keyplayers will drive the web of data in the next few months?”

In the meantime I had the pleasure to listen to Talis´ Semantic Web Gang Podcast (January 2009 with Tom Tague from Calais) and some answers came into my mind:

  1. Some service providers will provide the highest accuracy regarding the links or tags (and the “things behind them) they provide for a given ressource or document (like Open Calais does). Tom Tague mentioned in the podcast quite often how important disambiguation is to provide the highest quality.
  2. Some will provide end-points to a given “thing” like a company, a person etc. in addition to free ones like DBpedia, but they always will try to refer to established URIs like the ones in DBpedia or Open Calais URIs, e.g. IBM´s URI @ Calais). Those companies will provide more facts, for example about a person, as those which are available now for free. They will build on the LOD infrastructure and will live in symbiosis with group number 3. They will control to whom additional facts will be given to but they will build exactly on the same interoperable framework as the “Linking Open Data” community does.
  3. Some companies will build applications on top of the linked data infrastructure. They have two kinds of knowledge: Who has the best end-points to a complex “thing” which consists of a couple of other atomic things (which necessarily exist in the web of data)? Who is interested in such a mashup?

My prediction: One possible business model will be pretty much the same as iTunes is built upon at the moment: You can listen to a song for free – but only a couple of seconds , if you want more, you pay 99 cents.

If you want to know a little bit about Werner Faymann (who is Austria´s prime minister) you go to an application which makes use from DBpedia (or the like) starting at http://dbpedia.org/page/Werner_Faymann.

If you pay 99 cents (or a bit more…) you get even more facts about Mr. Faymann, nicely mash-uped with other facts from the LOD cloud and together with special content from some other linked data sources, produced with relatively low costs due the high interoperability the Semantic Web provides – thanks to W3C and the whole community.

Sphere: Related Content

Pimp your Google

February 04, 2009 By: Andreas Blumauer Category: Mashups & Web services, Search Engines No Comments →

Sure, that´s not the end of the flagpole – but “a little semantics goes a long way” (Jim Hendler): With two Firefox add-ons, you can pimp your Google and you will get (1) a better overview over the search results, (2) kind of a moderated search and (3) information from Wikipedia along with the results.

Install Cloudlet and Googlepedia (Don´t forget to donate!) and you will see something like this:

pimp_your_google

Sure, both “mashups” are not based on RDF, and the “TagCloud” is not as accurate as we wished, but let us be patient again. At least this picture makes end-users yearning for a bit more semantics (which goes a long way…) on top of the usual lists of search results.

Sphere: Related Content

BibSonomy – the blue social bookmark and publication sharing system

February 02, 2009 By: Gerd Stumme Category: Mashups & Web services, Software Development 4 Comments →

BibSonomy is a Web 2.0 style collaborative bookmarking and publication management system. In the style of YouTube, Flickr and Del.icio.us, it allows you to store the metadata of your own publications and of all papers that you consider interesting, It also allows to store bookmarks – and to share them with others.The Semantic Web Blog already reported about BibSonomy on The Wild vs The Orderly: Folksonomies and Semantics (TRIPLE-I 2008) in September 2008. The BibSonomy team is very active, and has implemented many new features.

googlesonomyIt is thus high time to tell you about them. Let’s start with the new layout, introduced in December. It’s much closer to the Web 2.0 look & feel, with pastel colors and rounded corners. The navigation has become a bit more consistent, and you can now select if you want to see both bookmarks and publications, or just one of the two. BibSonomy is also available in German now. Most other extensions of BibSonomy are about integration with other systems. The most useful are:

  • GoogleSonomy is a new firefox addon which integrates search results from BibSonomy directly in your Google search. The addon is customizable so that you can decide whether to search in your personal publications and/or bookmarks, or to search over all BibSonomy posts.The extension is available from the Mozilla Addon Page.
  • BibSonomy now also allows to export citation information to Zotero. Zotero is a Firefox extension, which helps you to collect, manage and cite publications locally in your browser. The other way around is not fully automized yet. However, there is a copy and paste workaround.
  • Bloggers who are using WordPress can integrate data from BibSonomy into their posts – for instance your tag cloud, or your last three publications (or all of them). Conversely, your blog posts will (almost) automatically be published on BibSonomy. A more general way of including BibSonomy content into your system is BibSonomy’s JSON feed. JSON (JavaScript Object Notation) is a lightweight data-interchange format, which is now available for all BibSonomy pages.
  • As alternative login procedure, BibSonomy now also supports OpenID, which is an open, decentralized standard, allowing users to log onto many different services on the web using the same identity identification (single sign-on). This kind of authentication is provided by a growing number of websites, including large ones like AOL, Google, Microsoft, MySpace, Yahoo and many others. You may even have an OpenID without knowing so, e.g. when you have a Flickr account. Why not using it for logging in to BibSonomy as well?
  • The family of scrapers for automatically extracting references from digital libraries or publishers’ websites has been extended, allowing you to store publication metadata automatically from over 60 sites. The scraping service can be used independently from BibSonomy for other purposes by everyone needing access to bibliographic metadata.

If you want to learn more about these features, visit the BibSonomy blog. Last but not least there exists a new BibSonomy developer site. It provides access to some of the BibSonomy modules. All source code is released under GPL LGPL licenses. If you want to experiment with the code, have a look!

Reblog this post [with Zemanta]
Sphere: Related Content