Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Linking Open Data to Thesaurus Management

February 16, 2010 By: Tassilo Pellegrini Category: Corporate Semantic Web, Knowledge Management, Linked Data & Open Data, Search Engines, Semantic Web Applications, Software Development 1 Comment →

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • Annotation- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

Thesaurus Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like Georgi Kobilarov’s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]
Sphere: Related Content

Topic Maps and the Semantic Web

October 16, 2009 By: Tassilo Pellegrini Category: Conferences & Events, Miscellaneous, Tools & Software 1 Comment →

tmraFrom November 11 – 13, 2009 this will be one of the big issues at the 5th International Conference on Topic Maps taking place in Leipzig/Germany. When asked about the relationship between TM and SemWeb conference organizer Lutz Maicher says:

With the vision of the web of data Topic Maps and the Semantic Web move closer over time. Anywhere URIs represent subjects, structured statements are gathered around them. In this context I see subj3ct.com as an interesting ventures. This recently launched service provides URIs for 15 million subjects to be used in structured data. Naturally, linked data hubs like dbpedia or geonames.org are part of it. The crowd is invited to contribute to this collection, also the Topic Maps Lab provides several feeds to register new URIs. Subj3ct.com turns out to be an infrastructure technology for Web 3.0 applications, regardless whether they are based on Topic Maps or other Semantic Web technologies.

Through this convergence the uniqueness of each technology sharpens. Reasoning is the strong point of the Semantic Web. But the strength of Topic Maps are semantic portals and the global federation of facts around subjects. Bringing together all and even contradictory information about each subject – and not building reasoning-ready consistent models of the world – is built into the genes of Topic Maps.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Multimedia Semantics @ SAMT 2009

October 07, 2009 By: Tassilo Pellegrini Category: Conferences & Events, Linked Data & Open Data, Vocabularies & Languages No Comments →

samtOn accasion of the upcoming 4th International Conference on Semantic and Digital Media Technologies (SAMT ‘09) from December 2 – 4, 2009 in Graz/Austria, Werner Bailer from Joanneum Research gave us a short interview about state of the art in multimedia semantics.  When asked about the Multimedia and the Semantic Web he says:

There have been a number of proposals for multimedia ontologies and mappings of multimedia vocabularies (cf. the excellent report from the W3C MM Semantics XG), differing in complexity and expressivity. Thus the W3C has chartered a working group to develop an ontology and API for multimedia content on the Web. The group is developing a lightweight core set of metadata properties and an API specification for accessing these properties, which may come from metadata documents in different standards. Thus mappings to many relevant standards have also been specified. The set of metadata properties will be formalized for interoperability with the Semantic Web. A W3C recommendation is expected in 2010.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Looking back I-Semantics 2009

September 09, 2009 By: Tassilo Pellegrini Category: Conferences & Events No Comments →

isemantics_logoLast Friday, September 4, 2009 I-Semantics, the 5th International Conference on Semantic Systems, ended. I am extremely happy about the positive response from so many people I got in the last few days. It was a lot of work and I am glad everything worked out fine.

I-Semantics, which started on Wednesday, September 2, and was colocated with I-Know, the International Conference on Knowledge Management, for the third time now, attracted 450 participants. As inteded by our original idea – bringing the Semantic Web out of the echo chamber – this colocation has proven to be absolutely fertile as the semantic systems community and the knowledge management community really fit well together and complement each other. So we had a rich program consisting of 64 scientific talks (30 I-Semantics / 34 I-Know), a poster session, an industry track and numerous mini tracks and discussion panels. Read a review of the first, second and third conference day on Harald Sack’s blog (with whom I enjoyed pondering about Net Neutrality and IPV6.)

For the first time we had the Pragmatic Web Community on board, which held a special track bringing in lots of new ideas and views on computational semantics. Beside that I recognized that in this track we had quite large amount of people from the social sciences and humanities among the audience, which is a promising signal and hopefully leads to new research and human-oriented technologies.

Another highlight was this year’s matchmaking event which aims at initiating business contacts between industry and academia. According to the organizers the Styrian Research Agency and the Enterprise Europe Network,  120 bilateral meetings took place. Astonishingly 56 of the 71 registered participants had a company background.

And finally we hosted the second Triplification Challenge where Chris Bizer gave a keynote and introduced quite a bunch of people to the idea of Linked Data. Unfortunatelly Michael Hausenblas who chaired this year’s challenge could not attend so I did the moderation during the award ceremony and Chris assisited me handing over the awards to the winners. For the results of the challenge go to Soeren Auer’s blog.

Wrapping up, all this would not have been possible without the great support of Prof. Klaus Tochtermann and his team from Know Center. Year after year they do a great job and it is a great opportunity and pleasure to work together with them. Big thanks also go to Adrian Paschke from Corporate Semantic Web of Free University of Berlin, Hans Weigand from Tilburg University and the guys from Salzburg New Media Lab, who helped to set up the I-Semantics conference this year.

The next I-Semantics will take place from September 1 – 3, 2010. Hope to see you next year in Graz!

Reblog this post [with Zemanta]
Sphere: Related Content

Great satire: “Web 3.Oh No!”

August 04, 2009 By: Tassilo Pellegrini Category: Miscellaneous, Semantics & Philosophy 1 Comment →

Found this piece on FCW.com. I love it!

Posted by John Klossner on Aug 03, 2009

For those of you, like me, who need a way to keep these things straight, I offer the following handy, wallet-sized program.

WEB 1.0 (browsers) – Users find data
WEB 2.0 (social networks) – Users find each other
WEB 3.0 (semantic Web) – Data find each other

Of course, a lifetime of science-fiction reading and viewing leads me to fear we can look forward to the following developments:

WEB 4.0 – Data create their own Facebook page, restrict friends.
WEB 5.0 – Data decide they can work without humans, create their own language.
WEB 6.0 –Human users realize that they no longer can find data unless invited by data.
WEB 7.0 – Data get cheaper cell phone rates.
WEB 8.0 – Data horde all the good YouTube videos, leaving human users with access to bad ’80’s music videos only.
WEB 9.0 – Data create and maintain own blogs, are more popular than human blogs.
WEB 10.0 – All episodes of Battlestar Gallactica will now be shown from the Cylons’ point of view.


Reblog this post [with Zemanta]
Sphere: Related Content

Knowledge Management and the Semantic Web

July 28, 2009 By: Helmut Nagy Category: Knowledge Management, Literature & Publications 3 Comments →

That’s the title of my diploma thesis and first of all, thanks to SWC for the possibility to say some words about it. My interest in knowledge management reaches back some time now and I decided to make it the subject of my diploma thesis in my first attempt to write one back in 2001. The semantic web “came to me” in the last one or two years and the TRIPLE-I conference last year was somehow the trigger for me to connect the two topics.

My basic idea was very simple. When you read about the Semantic Web you are confronted right away with connections to creating knowledge and knowledge management. But in my understanding the Semantic Web is a technical thing and knowledge management is primarily a cultural and organisational thing. So the research questions for my thesis where:

  • What relevance do knowledge management and semantic technologies have in the daily work of people working in knowledge intensive domains?
  • Which possibilities lie in the adoption of knowledge management and semantic technologies?
  • Are semantic technologies already fit for practical use?

The basis of the empirical part of my thesis are group discussions held in different organisations. As a result I developed starting points for an understanding of the topics “Knowledge Management” and “Semantic Web” and their relevance in organisations. The empirical results, in short, provide the following answers to the research questions:

  • The “theoretical relevance” of both topics is high, the “practical relevance” on the other hand is rather low. Neither do structured concepts for knowledge management exist in the studied organisations, nor are there attempts at using semantic technologies
  • Most of the participants have not heard of the “semantic web” prior to the discussions. After having been introduced to the topic, the relevance of the semantic web and of semantic technologies is rated high
  • Possibilities are seen in a better management of information or knowledge in organisations and, especially for semantic technologies, in the improvement of search functionality’s and search results
  • Semantic technologies are not yet seen as fit for practical use
  • The connection between knowledge management and semantic web is taken as a fact without giving any justification for it.

In my conclusion I tried to match my results with the results of the Semantic Web Barometer 2009 and it was very interesting for me, that there were several similarities. I also found that talking to the people that have to work with technologies that are developed for them can be quite interesting and that group discussion are a great way to do that.

I wrote most parts of my diploma thesis in a wiki (and the rest is available as PDF) so you can find it on my wiki.

Your comments and annotations are very welcome!

Thanks for reading as far as this, Helmut

Reblog this post [with Zemanta]
Sphere: Related Content

55 people enjoyed the first semantic web meetup in vienna

July 17, 2009 By: Thomas Thurner Category: Conferences & Events No Comments →

dsc_0494Yesterdays first “semantic web meetup” attracted 55 attendees to join in for presenting, talking and socialising. Approximately one year after the series of semantic web meetups started in NYC, there is now also a vital community gathering in vienna. Beside an inside view on brandnew ideas and developments of austrias semweb-labs in presenations and lightning talks, Steve Sandhouse of New York Times joined in via webmeeing to give an insight on NY-Times’s Semantic Web – efforts, which have a back-history of about 100 years now – as he explained.

In conclusion: A good start for the First Vienna Semantic Web Meetup, which may paved the way for a next meeting in the very next future. In the meanwhile some pictures of the venue to amuse those which were there and to inspire new people to join: www.meetup.com

Reblog this post [with Zemanta]
Sphere: Related Content

Some Semantic Apps for the iPhone

June 25, 2009 By: Andreas Blumauer Category: Life Sciences, Semantic Web Applications 1 Comment →

evriverseSome new releases around Apple´s iPhone family, like the new OS3.0 or the new 3G S have stimulated another big hype around this “little darling”. I took a look at another facet, namely: Has the Semantic Web entered the iPhone realm yet (or vice versa)? Experts have been talking about the need for semantically enhanced mobile applications for years, so let´s see, if they are in place already.

Searching for “semantic web” in the AppStore delivers six results, one of them called “SemanticWb” is obviously an interesting match. The application “extracts current life sciences and health care knowledge and place them conveniently at your fingertips on your iPhone”. The application offers search suggestions and moderated search and retrieves articles from PubMed or genetic disorders which are related to the search term. Good start, this is a neat iPhone application which should be interesting for medical doctors and related professions.

Another application on the iPhone which is related to the semantic web is the “English wordnet dictionary” based on WordNet from Princeton University.

So, not much semantic web on the iPhone so far – I thought until Evriverse was released some weeks ago. The iPhone version of evri.com offers a new way to find connections between all kind of things. Similar to OpenCalais Evri can extract people, places, organisations, products etc. from unstructured information like news or blogs. The innovation around Evriverse is the way how complex search queries around “anything” can be formulated by just touching the screen. For example, if you are looking for information about “Tim Berners-Lee” the application not only offers auto-complete but also suggests related people, organisations etc. to refine any search query. Such relations are updated constantly and are based on the semantic analysis of news and blogs.

Evriverse offers the most comfortable way to do news research on the iPhone today. It shows how semantic technologies can enhance user experience on a mobile device and it will path the way to more semantic (web) apps on the iPhone.

Reblog this post [with Zemanta]
Sphere: Related Content

Interview with David Huynh: “The user interface design must inform the back-end design”

May 14, 2009 By: Andreas Blumauer Category: Linked Data & Open Data, Semantic Web Applications No Comments →

Linked Data is evolving fast. A huge amount of RDF data is available and ready for exciting new applications. Unfortunately, the bottleneck is still the availability of Semantic Web user front-ends which demonstrate the power of linked data. To a certain degree BBC Music beta is the first commercial platform which makes heavy use of linked data. With Parallax David Huynh has shown that one of the most interesting semantic web applications can be built around browse and search applications which offer tools for doing complex search queries.

Andreas Blumauer from Semantic Web Company (SWC) talked with David Huynh, “Interaction Scientist” at Metaweb, the company which developed Freebase, an “open, shared database of the world’s knowledge”.

SWC: David, you have been working for MIT´s Simile Project and now for Metaweb Technologies – two “building blocks” of the Semantic Web. Could you tell us a bit about your ongoing work at Metaweb?

David: My official title at Metaweb is “Interaction Scientist,” and so my main focus is coming up with novel interaction designs for Metaweb’s platform and products, and prototyping them to some extent to evaluate their effectiveness. Parallax was one such prototype that has gathered much excitement within Metaweb and the Semantic Web community at large. And the Freebase query editor 2.0 shows my interaction designs at the other end of the spectrum – targeting developers rather than just end-users.
I’ve also learned that data-centric user interfaces and interaction designs can only be as good as the data allows them to. So I am also dedicating some of my time toward analyzing the data we have and improving its quality so that I can design even better interactions.

Freebase Query Editor 2.0 from David Huynh on Vimeo.

SWC: With Parallax you have introduced a new way to search and explore data: Could you explain the “set-based browsing paradigm”?

David: In the browsing paradigm of the original Web, while looking at a web page, you can only click on one hyperlink to get to one other web page. But in a lot of cases, the hyperlinks on that web page can be grouped into different groups based on what they mean to the human reader: these are the links that lead to reviews, these are the links that lead to authors, these are the links that lead to vendors, etc.
Now if the computer actually knows what these links mean, then you can tell it to follow several of those links that mean the same thing: follow all the links that lead to authors. Think of it as powered browsing: the computer does the work of following several similar browsing paths at the same time – going from a set of things (web pages or data entries) to a similarly related set of things – and making all of that information available for your perusal in one shot. It is a paradigm shift compared to how we browse the Web today. And it’s only possible when the computer is capable of telling which link is similar to which other link. And that capability, in turn, will be made possible by the Data Web.
(See this unpublished paper which goes into depth about this concept)

SWC: Linked Data is evolving fast. A huge amount of RDF data is available and ready for exciting new applications. Unfortunately, one bottleneck is still the availability of Semantic Web user front-ends which demonstrate the power of linked data. Do you think, that the Semantic Web is rather a server-technology than an end-user experience?

David: I have never thought of the Semantic Web as either a server technology or an end-user experience. I only care about usefulness, and then a matching amount of usability to make that usefulness accessible to people, especially those without Computer Science expertise.
I find that it’s so much easier to explain to people and get them excited about “immediate, personal, local benefits” of a particular technology than about “long-term, communal, global benefits” of a vision. For most people, the former must be experienced and felt often before the latter can appear vaguely appealing enough to call for actions. I’m lazy – I don’t like to spend efforts convincing people of visions; I only want entice people into using the tools that I have created.
So if Parallax is considered a success, it is so not just because of its technologies and research contributions, but also because the accompanying screencast explained it in a way that people who cared nothing about the Semantic Web could understand why Parallax would be useful to them. This was achieved by pointing out limitations of existing web technologies as already experienced and understood by a lot of web users, and then illustrating concretely a possible solution enabled by data web technologies.
Perhaps I could venture further and say that the dichotomy of server technologies and end-user experience is what’s holding back Semantic Web user interface efforts. For those who don’t have expertise in design, it is a comfort to think that once the back-end technologies are solid, then it’s just a matter of putting on some polishes, a.k.a. user interfaces from their point of view, to make the whole package appealing. This approach is wrong. The user interface design must inform the back-end design. Otherwise, the user interface will almost always reflect the internal system model, and that’s usually very dissonant with how users think and behave. Recall all the Semantic Web interfaces you have seen that force users to think in terms of triples or of raw URIs. Those were made by starting from the data model, not from user needs.

SWC: Quite often I hear people saying: Where is the Semantic Web? – I still can´t “see” it! How could the linking open data community make use of such user interfaces like Exhibit, Piggy Bank or Parallax? Is the set-based browsing paradigm a universal way to browse linked data or just one possible way?

David: My research prototypes embody a number of UI ideas that are quite transferable to other platforms. Most of my code is open source, too. This, by the way, is rarer than it should be: research prototypes often fall apart as soon as, or even sooner than, the relevant research papers get presented at conferences, and research code rots rather than gets offered free for reuse. This is sad, because reusable data needs reusable code to proliferate even more widely, but there is no reward system for making research code reusable, or for keeping research prototypes running. So perhaps people can’t “see” the Semantic Web because research prototypes are not presented in appealing and comprehensible ways, and they break down and disappear too quickly.
Regarding the set-based browsing paradigm, it is most certainly not the only way to browse linked data. It is just the first good one that came to my mind, around 2005. But it’s not until 2008 that I actually got around to implement it for real. One of the factors so important in its feasibility is the quality of data in Freebase, compared to other data sources that I had access to. Even the simple fact that a lot of Freebase topics have images makes Parallax look a lot more interesting and useful. People like to see pictures rather than raw URIs. And the diversity of types of data helps illustrate the browsing paradigm of Parallax – that ability to shift focus from one set of things to another set of things, even across very seemingly unrelated domains of information, such as from politicians to their celebrity friends in the movie industry.
So, perhaps one of the main challenges in adopting Parallax ideas on any arbitrary RDF data set is curating the data sufficiently for the purpose of presenting it. In fact, if you don’t know how some data is to be presented and used, there’s no way for you to determine if that data is of sufficient quality. User needs and interface designs drive back-end implementation and data curation, not the other way around. It’s a simple idea, really, but it can be hard to adopt if one is fixated on data alone.

SWC: Do you plan new versions of Parallax? When will it become part of Freebase or of even more Linked Data Sources?

David: I’ve done a few further experiments with the ideas in Parallax, but they are not ready for public use, yet. Freebase data makes my job much easier by allowing me to focus mostly on interaction designs rather than mostly on data quality, or rather, fighting the lack of data quality, for the purpose of presenting it. So I’ll start with Freebase data and we’ll see where it takes me.

SWC: What else are you working on at the moment?

David: As mentioned briefly earlier, reusable data needs reusable code to proliferate widely. That gives you a hint at an effort that I’m involved with.

SWC: Many thanks, David!

About David François Huynh

Reblog this post [with Zemanta]
Sphere: Related Content

1000-and-one pulldowns

May 12, 2009 By: Thomas Thurner Category: Internet & Media, Knowledge Management, Search Engines 2 Comments →

Personalisation interface
Image by wocrig via Flickr

Luckily, times have come, where semantic search techniques have found their way to enhance knowledge providing theme portals. Nearly once a week a new knowledge portal with built-in semantic search pops up. They deal with environmental issues, health care, economy etc. These sites are good examples how the vision of a knowledge web is fostered by semantic technologies. Such focused approaches are great showcases for “a” semantic web (even if they are not based on “the” RDF semantic web) in the next few months besides general knowledge portals like Wolfram Alpha.

But the potential of these semantic theme portals is often reduced essentially by their bad usability. You get lost in categories and flags – you get puzzled by pulldowns, mouseovers and embedded hierachies – it’s sometimes a mess out off 1001 functions. You need to understand the underpinning semantic concept to get oriented within these applications – and this is not the goal of the exercise. Search has to be easy.

To show the potential of semantic technologies, we need good examples, which offer good usability. This is a call to everyone to provide such examples.

See my favorites:

  • NextBio, a platform that enables life science researchers to search, discover, and share knowledge locked within public and proprietary data
  • reegle, the Search Engine for Renewable Energy and Energy Efficiency
  • CultureSampo, a Finnish cultural heritage platform for institutional organizations as well as private citizens
Reblog this post [with Zemanta]
Sphere: Related Content