Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Software Development’

Linking Open Data to Thesaurus Management

February 16, 2010 By: Tassilo Pellegrini Category: Corporate Semantic Web, Knowledge Management, Linked Data & Open Data, Search Engines, Semantic Web Applications, Software Development 1 Comment →

The Vienna-based company punkt. netServices is just about to release a demo version of their PoolParty service, a SKOS-based thesaurus management tool with linked data capabilities. I had the chance to pre-read a white paper and test their service. Here is a brief overview. You can also try a demo.

Purpose

Poolparty was conceived to facilitate various applications like

  • Semantic search engines
  • Recommender systems (similarity search)
  • Corporate bookmarking
  • Annotation- & tag recommender systems
  • Autocomplete services and facetted browsing.

These use cases can be either achieved by using PoolParty stand-alone or by integrating it with existing Enterprise Search Engines and Document Management Systems or Enterprise Wikis.

Thesaurus Management

PoolParty is aiming to be easy to use for people without a strong Semantic Web background or special technical skills. The GUI is entirely web-based and utilizes AJAX so the user can e.g. quickly merge two concepts via drag & drop. An overview over the thesaurus can be gained with a tree or a graph view on the concepts.

poolparty-blueskin

PoolParty also helps to semi-automatically add concepts to a thesaurus as it can be used to analyse documents (e.g. web pages or PDF files) relevant to a thesaurus’ domain in order to glean candidate terms. This is done by the key-phrase extractor of KEA. The extracted terms can be selected by the user, thereby becoming “free concepts” which later can be integrated into the thesaurus, turning them into “approved concepts”.

Documents can be searched in various ways – either by keyword search in the full text, by searching for their tags or by semantic search and similarity search. The latter takes not only a concept’s preferred label into account, but also its synonyms and the labels of its related concepts are considered in the search. The user might manually remove query terms used in semantic search. Boost values for the various relations considered in semantic search may also be adjusted. In the same way the recommendation mechanism for document similarity calculation works.

PoolParty by default also publishes a Semantic Wiki version of its thesauri, which provides an alternative way to browse and edit concepts. Through this feature anyone can get read access to a thesaurus, and optionally also edit, add or delete labels of concepts. Search and autocomplete functions are available here as well. The Wiki’s XHTML source is also enriched with RDFa, thereby exposing all RDF metadata associated with a concept to be picked up by RDF search engines and crawlers. (See two examples: Cocktail thesaurusStandard Thesaurus for Economics)

PoolParty also supports the import of thesauri in SKOS (including several consistency checks) or Zthes format. Those functionalities can also be consumed as stand-alone web services via PoolParty SKOS Services. Additionaly, lists of concepts and their labels can also be imported via CSV files.

Linked (Open) Data

PoolParty not only publishes its thesauri as Linked Open Data (in addition to a SPARQL endpoint), but it also consumes LOD in order to expand thesauri with information from LOD sources.

Concepts in the thesaurus can be linked to e.g. DBpedia  via a service like Georgi Kobilarov’s DBpedia lookup service, which takes the label of a concept and returns possible matching candidates. The system suggests relevant resources from DBpedia and the user can select the one that matches the concept from his thesaurus, thereby creating a skos:exactMatch relation between the concept URI in PoolParty and the DBpedia URI. The same approach can be used to link to other SKOS thesauri available as Linked Data.

poolparty-lod

Other triples can also be retrieved from the target data source, e.g. the DBpedia abstract can become a skos:definition and geographical coordinates can be imported and be used to display the location of a concept on the map, where appropriate. The DBpedia category information may also be used to retrieve additional concepts of that category as siblings of the concept in focus, in order to populate the thesaurus.

PoolParty is capable of importing a SKOS thesaurus from a Linked Data server, and may also receive updates to thesauri imported this way. This feature has been implemented in the course of the KiWi  project funded by the European Commission. KiWi also contains SKOS thesauri and exposes them as LOD. Both systems can read a thesaurus via the other’s LOD interfaces and may write it to their own store. This is facilitated by special Linked Data URIs that return e.g. all the top-concepts of a thesaurus, with pointers to the URIs of their narrower concepts, which allow other systems to retrieve a complete thesaurus through iterative dereferencing of concept URIs.

Additionally KiWi and PoolParty publish lists of concepts created, modified, merged or deleted within user specified time-frames. With this information the systems can learn about updates to one of their thesauri in an external system. They then can compare the versions of concepts in both stores and may write according updates to their own store.

This means each system decides autonomously which data it accepts and there is no risk of a system pushing data that might lead to inconsistencies into an external store. Data transfer and communication are achieved using REST/HTTP, no other protocols or middleware are necessary. Also no rights management for each external systems is needed, which otherwise would have to be configured separately for each source.

Technology

The software is written in Java and utilizes the SAIL API, so it can be used with various triple stores. The thesaurus management itself (viewing, creating and editing SKOS concepts and their relationships) can be done in an AJAX Frontend based on Yahoo User Interface (YUI). Editing of labels can alternatively be done in a Wiki style HTML frontend. For key-phrase extraction from documents PoolParty uses a modified version of the KEA 5 API, which is extended for the use of controlled vocabularies stored in a SAIL Repository (this module is available under GNU GPL). The analysed documents can be stored and indexed in Lucene/Solr or any other (enterprise) search system along with extracted and semantically related concepts.

Reblog this post [with Zemanta]
Sphere: Related Content

George Anadiotis: “Linked Data brings value by offering an alternative approach to lightweight data integration and mashups.”

December 10, 2009 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Software Development, Tools & Software, Vocabularies & Languages No Comments →

george-imcGeorge Anadiotis is an expert on artificial intelligence with academic roots at the Vrije Universiteit, Amsterdam. In February 2009 he took the position as R&D Director at the Greek technology company IMC. I met him in September at I-SEMANTICS 2009 where he and his team contributed to the Triplification Challenge. In their paper Linked Data for the Masses they were pondering about the pragmatic value of Linked Data from an inbound and outbound perspective.  In his words:

We started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

the next google

March 25, 2009 By: Thomas Thurner Category: Search Engines, Software Development, Tools & Software No Comments →

Google in 1998
Image via Wikipedia

Maybe you have noticed it already; today in the morning something new appeared at Google’s search engine interface: A bunch of corresponding search-suggestions based on your search query. Google spoke about this enhancement:

Starting today, we’re deploying a new technology that can better understand associations and concepts related to your search, and one of its first applications lets us offer you even more useful related searches (the terms found at the bottom, and sometimes at the top, of the search results page).

I tried it. So, if you type in “time travel” you also get search proposals like “theory of relativity time travel” or “wormhole time travel”. Google annouced, that the service is available in various languages. The direct test with German is a little disillusioning: Searching for “zeit reise” (which is the same concept as above, in german) leads to alternative searches like “reisen 50er jahren” (travel 50ies) and “reisen im mittelalter” (travel in the medieval).

Even if this semantic-like extension of the basis search function still needs some tuning, the point is getting clearer: Also Google is doing developments to get more meaningful results into their search algorithms. And parts of the semantic methodology are finding their way into mainstream services like search engines – as we have seen with Wolfram Alpha some days ago. So keep your eyes open – maybe next morning you’ll find another piece of the semantic puzzle embedded into one of your favorite web-apps.

Reblog this post [with Zemanta]
Sphere: Related Content

BibSonomy – the blue social bookmark and publication sharing system

February 02, 2009 By: Gerd Stumme Category: Mashups & Web services, Software Development 4 Comments →

BibSonomy is a Web 2.0 style collaborative bookmarking and publication management system. In the style of YouTube, Flickr and Del.icio.us, it allows you to store the metadata of your own publications and of all papers that you consider interesting, It also allows to store bookmarks – and to share them with others.The Semantic Web Blog already reported about BibSonomy on The Wild vs The Orderly: Folksonomies and Semantics (TRIPLE-I 2008) in September 2008. The BibSonomy team is very active, and has implemented many new features.

googlesonomyIt is thus high time to tell you about them. Let’s start with the new layout, introduced in December. It’s much closer to the Web 2.0 look & feel, with pastel colors and rounded corners. The navigation has become a bit more consistent, and you can now select if you want to see both bookmarks and publications, or just one of the two. BibSonomy is also available in German now. Most other extensions of BibSonomy are about integration with other systems. The most useful are:

  • GoogleSonomy is a new firefox addon which integrates search results from BibSonomy directly in your Google search. The addon is customizable so that you can decide whether to search in your personal publications and/or bookmarks, or to search over all BibSonomy posts.The extension is available from the Mozilla Addon Page.
  • BibSonomy now also allows to export citation information to Zotero. Zotero is a Firefox extension, which helps you to collect, manage and cite publications locally in your browser. The other way around is not fully automized yet. However, there is a copy and paste workaround.
  • Bloggers who are using WordPress can integrate data from BibSonomy into their posts – for instance your tag cloud, or your last three publications (or all of them). Conversely, your blog posts will (almost) automatically be published on BibSonomy. A more general way of including BibSonomy content into your system is BibSonomy’s JSON feed. JSON (JavaScript Object Notation) is a lightweight data-interchange format, which is now available for all BibSonomy pages.
  • As alternative login procedure, BibSonomy now also supports OpenID, which is an open, decentralized standard, allowing users to log onto many different services on the web using the same identity identification (single sign-on). This kind of authentication is provided by a growing number of websites, including large ones like AOL, Google, Microsoft, MySpace, Yahoo and many others. You may even have an OpenID without knowing so, e.g. when you have a Flickr account. Why not using it for logging in to BibSonomy as well?
  • The family of scrapers for automatically extracting references from digital libraries or publishers’ websites has been extended, allowing you to store publication metadata automatically from over 60 sites. The scraping service can be used independently from BibSonomy for other purposes by everyone needing access to bibliographic metadata.

If you want to learn more about these features, visit the BibSonomy blog. Last but not least there exists a new BibSonomy developer site. It provides access to some of the BibSonomy modules. All source code is released under GPL LGPL licenses. If you want to experiment with the code, have a look!

Reblog this post [with Zemanta]
Sphere: Related Content

OntoWiki Workshop

December 09, 2008 By: Thomas Schandl Category: Software Development, Tools & Software No Comments →

Days 3 and 4 of the OntoWiki KickOff Meeting in Leipzig were comprised of semantic technologies and OntoWiki development workshops.

Just like the overall organization of the project meeting was very good, so Sebastian Dietzold, Sebastian Hellmann, Michael Martin and Jörg Unbehauen did a real good job at putting the ideas behind key concepts of the semantic web across in several introductory SemWeb presentations. Their talks about various technologies from the semantic web stack like URIs, RDF and its serialisations, RDFS, SPARQL and some related tools were well suited to bring people who are relatively new to the semantic web up to speed. Links to the presentation slides can be found at the project page in the coming days.

Later Jens Lehmann outlined the new things OWL 2 brings, e. g. profiles, which are subsets of OWL 2 and which provide different degrees of expressivity and reasoning efficiency.

The last day started with Sören Auer’s presentation of their semantic wiki OpenResearch, a site where information on conferences, journals and scientists is pooled. OpenResearch is built with Semantic MediaWiki (SMW), just like our Social Semantic Web wiki.

While SMW is a very useful tool as it lowers the entry barriers for using semantic wikis, Sören also pointed out  that in comparison OntoWiki provides some important features that SMW doesn’t have:

  • SMW doesn’t use SPARQL for its queries, but a less powerful custom query language, whereas OntoWiki has full SPARQL support.
  • OntoWiki’s UI has many widgets that support the user when entering data or new properties on a page (e. g. there is an autocomplete feature for suggesting properties)
  • With SMW changes to the wiki’s semantic structure often entail manual changes to many, many pages. With OntoWiki it is easy to e.g. change poperties at any time.

For the new version of OntoWiki Sören and his team use the Zend framework and develop the Erfurt API to store and access RDF data. The Erfurt API supports SPARQL, versioning, caching and RDF based authentification/access control. It abstracts different stores using the adapter pattern, so it can be used with Virtuoso and any other store which has an interface provided by Zend_Db (MySQL, Oracle, PostgreSQL, etc.) plus they are working on an interface for Redland. Find the slides for Philipp Frischmuth’s Erfurt API presentation here, the API documentation here and Norman Heino’s Zend & OntoWiki Application Framework presentation here.

Julian Jöris demonstrated how Selenium is used for acceptance testing. This is a very promising testing framework for web applications, where one can e.g. record interactions with different browsers and automatically run them as tests. Selenium has a Firefox extension to record macros and is integrated with PHPUnit.

Finally we had a very good discussion about our conX-OntoWiki integration use case and application ideas, so we left Leipzig with a pleasant anticipation of the coming co-operation in the project.

Sphere: Related Content

Content Versatility in the KiWi Core System

November 27, 2008 By: Jana Herwig Category: Social Software, Software Development, Tools & Software No Comments →

It’s been five months since the last Joint Work Package (WP) meeting in the KiWi – Knowledge in a Wiki – project. This morning, we gathered in Vienna for the next round – focus this time around will be on the core system (architecture developed by the WP3 team, handing over and paving the way for WP 4 team) and the use cases (Logica, Sun Microsystems) where it is of particular importance that everyone involved in the project understands the requirements of the use cases.

In the first presentation today, Sebastian Schaffert from Salzburg Research gave us a tour of two different configurations of the KiWi system. The KiWi core system is oriented towards content versatility, meaning that content items can be displayed and used in various contexts and configurations. As a service to the user, KiWi uses Javascript-based WYSIWYG Editor TinyMCE enhanced with a few home-grown plug-ins which, for instance, make it easier to set links to other wiki pages. Memorizing wiki shorthand is sometimes a challenge, so this feature helps getting things done.

Using a different skin and interface, KiWi can take various forms and shapes – even shapes where you might not spot the wiki in it at first glance. TagIT is such an example of an adaptation of the KiWi core system: a geotagging platform targeting youth in Salzburg who can locate, tag and comment on places that matter to them.

Vice versa, KiWi in its wiki incarnation displays a little map, provided a content item is enhanced with geoinformation; technically, the map on the wiki page is an interpretation of a georelated tag (learn more about complex, structured tags proposed by the KiWi Enabling Technologies Work Package in this article: Usage Data Model Day in the KiWi Project).

Take a look at the screenshots below:

KiWi-Screenshot

It is the same article that is being displayed, in the first example using the classic KiWi interface, in the second example using the TagIT interface with the article appearing as an info page.

TagIt Screenshot

This afternoon, we expect to see another configuration of the system, in a presentation about how the system is specifically tailored to the needs of Logica’s “Knowledge Management for Project Management” usecase.

N.B. The system is not yet publicly available, if you have questions, please contact Sebastian Schaffert.

Reblog this post [with Zemanta]
Sphere: Related Content

Which flavour does knowledge have on the web?

October 09, 2008 By: Jana Herwig Category: Knowledge Management, Software Development 4 Comments →

In recent debates within the KiWi – Knowledge in a Wiki project, the need arose to further refine and find a common understanding of the type of knowledge that is (ideally) managed and processed using (semantic) wikis. One of the proposals evolved around a conceptualization of knowledge put forward by Gabi Reinmann-Rothmeier, also dubbed the “Munich Modell” (Münchner Modell).

In the Munich Modell, knowledge comes in three states of matter: solid (like ice), liquid (like water) and gas (like water vapor).

“Frozen” knowledge is knowledge in its most tangible, manageable form, for instance the type of verified, expert-endorsed information you would find in an encyclopedia like the Encylopedia Britannica.

“Gaseous” knowledge, on the other hand, is knowledge in its least consolidated form: think for instance of the type of heated debate you might have with folks in a pub, which is arguably the least structured, most uncontrollable, but also the most engaging type of knowledge!

And the “liquid” form of knowledge, eventually, is the common knowledge of day-to-day-life. It’s probably fair to say that it becomes obvious mostly when in the process of changing its state of matter: When it is calibrated against “frozen” or informational knowledge or when it is debated, becomes “gaseous” knowledge that informs action. (If you’d like to know more about the Munich model and are able to read German, you might want to download the original article here – PDF, 365 KB).

When talking about knowledge that is managed, used or, respectively, that evolves online, I think it also makes sense to pay some attention to the type of community that is preferred by particular online tools or environments. The particular flavour of knowledge, in this sense, is simultaneously characterized and shaped by the state of matter of knowledge and the form of the community that applies.

N.B. The following is not an immediate translation of the “Munich model”, but rather a reconceptualization which tries to also consider that different community models (and their implementation through IT) also play a role for the whole spectrum of knowledge management on and with the web (e.g. for online communication and interaction, online publishing and documentation and maintenance of web infrastructures).

Web-Flavour 1: The Blogosphere – gas, gas, gas!

Hmm… sniff it! This is the flavour I like best because it is my flavour. On the blogosphere (and twittersphere), knowledge is exchanged, developed further and evolves almost like in a pub debate… (more…)

Sphere: Related Content

Java’s Inner Sanctum: A Visit to Sun Microsystems’ Usability Lab in Prague

July 02, 2008 By: Jana Herwig Category: Software Development 2 Comments →

The walls in room 3328, the observation room at Sun Microsystem’s usability lab in Prague, are painted a subdued blue. It swallows all the light, ensuring the testing scenario is not interrupted by curious guests like us, the Kiwi-project team members who were granted the privilege of a tour of the inner sanctum of Sun’s developer den. Through the one-way mirror, we can see a rosy-cheeked developer, talking to himself in Czech, interrupted by little sips from a coke bottle. He does not see us. The fact that very few of us understand Czech gives the situation an even more experimental appeal.

Sun Usabilty Lab, Prague
The new usability lab at Sun Microsystems, Prague

Jakub Franc, the cognitive psychologists in charge of the design of the study, explains to us that Sun rely on the Think Aloud method and observation in most of their test cases, rather than analyzing data from biofeedback sensors or eye-tracking devices. “Eye-tracking is good for testing the usability of web sites,” he says, “but for our purposes, the think aloud method, where the test person describes what he does and thinks, has greater benefits to offer.” The authenticity of the tasks to be performed in the study is a key: The developer behind the sound-proof glass wall is currently busy importing his own PHP application into NetBeans, Sun’s open source development environment, while the interaction designers and developers who created the tested module observe. A typical testing scenario lasts about 90 minutes, with the final 20 minutes consisting of an interview. “I always tell the testers that it’s not their fault if they fail to perform a task,” says Jakub. “If they fail, it’s the product’s fault. After all, that’s why we’re testing it.”

Before a software product is tested in its design or redesign phase, the ideal candidates are identified based on the results of questionnaires that are sent out to people in the tester database. The database includes both users of open source software as well as of competitive products, with the ideal test sample consisting of people who represent the whole spectrum of the target group, ranging from expert to newbie – and they must not necessarily be open source enthusiasts: “We offer a relatively high reward of 1000 CZK* as we want testers from all levels and backgrounds and not just the volunteering enthusiasts.”

Until Sun Microsystems moved into their new building in 2006, they collaborated with the Department of Computer Science at Czech Technical University (CTU), where they set up the very first usability lab in the Czech Republic in 2004. The deal was that Sun would supply the equipment and know-how, and CTU would supply the space and construction. Both institutions shared the facility until, after three years, all usage rights and equipment were transferred to CTU. One of the features of the new lab is the one-way mirror – the previous one relied on video observation: “From our experience, despite the fact that some participants feel less comfortable in this set-up, it makes a difference to observers”, writes Jiri Mzourek on his, i.e. one of the many Sun blogs, “they feel more connected to the participants”.

Jakub Franc
Jakub Franc, cognitive psychologist and usability researcher

Even though there is now an in-house usability lab at Sun, the collaboration between Sun and CTU continues, in particular in research and design projects. Students participate in projects led by Sun that focus on Sun products, learning about research methodology as well as gaining experience in project management in a real business environment. Jakub Franc also gives seminars in cognitive psychology and research methodology to CTU students, and is himself pursuing a PhD in environmental psychology – a relatively new discipline which, according to Jakub, deals with questions such as: “How should buildings be designed so that people are not getting lost in them? What recreational areas help people to recover from daily stress? What kinds of front gardens discourage burglars from invading the place?” In other words: Jakub studies the cognitive parameters of the usability of real objects.

Once the KiWi/Sun usecase enters the evaluation stage, the KiWi team will again be given access to the lab – but this time not as visitors, but as observers, witnessing how usable the KiWi-Wiki system really is to the inclined user. We are looking forward to the experience – and thank the designers of the lab for implementing a sound-proof wall, just in case the KiWis get emotional!

*) worth about 2 monthly passes for the metro in Prague, or 40 beers in a good pub

Zemanta Pixie
Sphere: Related Content

Not a pipeline, but a graph: software development in the KiWi/Sun usecase

June 25, 2008 By: Jana Herwig Category: Software Development 3 Comments →

Josef Holy’s report this morning about the status quo in the KiWi usecase conducted in collaboration with Sun microsystem presented us with an interesting contrast. While the point of departure in the Logica usecase was a conceptual model of knowledge that is shaped by CMMI, Josef made the point that software development in an open source project follows quite different rules: “The lack of formal processes IS the process”, said Josef, characterizing the collaboration in the NetBeans development environment which (as reported earlier) has about 200 Sun developers working on it, in addition to thousands of external contributors and hundreds of thousands of active users. Correcting his own presentation regarding the development process from May ‘08 he said: “We shouldn’t think of a production pipeline here, but of a graph.”

Roles in a Software ProjectWithin Sun (or any software development project), one can draw a distinction between three groups of people working on a software project, depending upon the intensity of interaction and impact they have on the project: First of all, you have the planners, designers, developers and testers who interact intensely and whose work strongly affects the actual product. Secondly, you have people like the documentation writers who describe a software, but who do not shape it. And thirdly, there are people who translate the work of the second level to different markets or target groups, e.g. people working in localization.

A first important decision to be made was: Who to involve in the KiWi usecase? As the KiWi approach is decisively informed by the wiki philosophy, it only made sense that the designers, rather than the developers, should participate. Designers need to have access to various repositories of information, for instance user interface specifications, requirements descriptions, usability reports, marketing intelligence data, etc. And, to a much greater extent than is the case with developers, their work relies primarily on the written word, in the form of definition and documentation. And that makes them ideal candidates in the KiWi-Wiki usecase.

P.S: Yes, we all know that documentation is also a crucial part of the work of developers – yet we also know that the world has seen a lot of software developments that went undocumented;-)

Zemanta Pixie
Sphere: Related Content