Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Report on developments at the European Semantic Technology Market

June 25, 2010 By: Thomas Thurner Category: Corporate Semantic Web, Enterprise 2.0, Literature & Publications No Comments →

The present state of development, future trends and expected market scenarios for Semantic Technologies are shown in the just published “Demand driven Mapping Report”. The report is part of the EU-funded project Value It, which is about bringing together the various stakeholders within the sector: Industry, Research and Government. VALUE-IT preliminary findings show that the STE potential market in Europe will size up to €1.44B for 2014. Scanning furthermore the executive summary of the report, some findings attract attention:

The survey results also show considerable variation by sector, both of policy and technology implementation. With respect to technologies, ICT companies are also the most willing to consider semantic approaches. The ICT sector has an unusually high interest in all ST components, with 20% or more being willing to consider all of them, and over half of IT respondents looking at Web 2.0 (social computing). [...]  The use of tagging technologies – which overall is the least mature approach in the survey – is most advanced in Life Sciences. The Life Sciences, Media & Entertainment, and ICT sectors all have a reasonably strong interest in Natural Language Processing (roughly 25% on average). Ontologies and RDF/OWL are the technologies least often considered, though the interest in these Semantic Technologies is not insignificant. Taxonomies are slightly more popular, perhaps indicating that companies are taking the first step to prepare for a more semantic approach to IT solutions. The ICT, Energy & Utilities, and Media & Entertainment sectors all have a reasonably strong interest in using taxonomies.

The 190 pages report gives an actual overview of the status quo on European Semantic Technology Market and is now available for download: Final demand driven mapping Report

Sphere: Related Content

Read this: Linking Social Networks on the Web with FOAF

November 13, 2008 By: Jana Herwig Category: Literature & Publications 1 Comment →

Jennifer Golbeck, Matthew Rothstein. Linking Social Networks on the Web with FOAF: A Semantic Web Case Study. Proceedings of the Twenty-Third Conference on Artificial Intelligence (AAAI’08).
Download (PDF, 320 KB).

ABSTRACT
One of the core goals of the Semantic Web is to store data in distributed locations, and use ontologies and reasoning to aggregate it. Social networking is a large movement on the web, and social networking data using the Friend of a Friend (FOAF) vocabulary makes up a significant portion of all data on the Semantic Web. Many traditional webbased social networks share their members’ information in FOAF format. While this is by far the largest source of FOAF online, there is no information about whether the social network models from each network overlap to create a larger unified social network model, or whether they are simply isolated components. In this paper, we present a study of the intersection of FOAF data found in many online social networks. Using the semantics of the FOAF ontology and applying Semantic Web reasoning techniques, we show that a significant percentage of profiles can be merged from
multiple networks. We present results on how this affects network structure and what it says about relationships and individual behavior. Finally, we discuss the implications this has for using web-based social networking data to create intelligent user interfaces and social software.

Reblog this post [with Zemanta]
Sphere: Related Content

Reasoning Problems?

November 01, 2008 By: Pascal Hitzler Category: Conferences & Events, Miscellaneous, Ontology Engineering No Comments →

I’m not going to explicitly comment on the panel discussion at ISWC08, entitled An OWL 2 Far? Let’s simply say it was controversial. I don’t mind controversial panels. In fact, I think that few things are more boring than a panel where all panelists more or less agree. But at the same time, at the ISWC08 panel, I think, an important message got lost, namely that we really need reasoning for the Semantic Web, and that we need diversity in reasoning. (Admittedly, some people said so, but I think the message didn’t really get through.)

So, instead, let me give you some web search problems. They all came up in my real life, so they are not artificially created. It seems to me that the Semantic Web should make answering them easier, but with the existing web resources, they are really difficult.

  • Find all papers having received best paper awards at ISWC conferences. I did that today, and it took me more than 30 minutes. And I’m not sure if I got all of them – indeed I would have missed one of them if I hadn’t known beforehand about that specific paper having received the award. Isn’t this a typical Semantic Web problem? (The results of my search are further below.)
  • There’s an owl-like bird in southern German woods, and in colloquial german it’s called Käuzchen. Try to find out the english name for this bird. I actually failed, though I think I got close to the answer when I merged web search with an external knowledge base (in form of a biologist I happen to know). And actually, simply going to Wikipedia and clicking on the English link is not enough, since I’m not looking for the Strix genus of owls, but rather for a particular bird …
  • Who is this researcher with the russian looking name who worked on resolution-based methods for the description logic EL? This also looks like a typical Semantic Search problem, which shouldn’t be too difficult if you have the corresponding knowledge (and background knowledge) available. I admit I failed on this one using traditional methods (unless you consider it a traditional method to ask Franz Baader by email about it.)
  • Are lobsters spiders? I.e. are lobsters classified as spiders by biologists? This one is actually tougher than you would think using traditional methods. Should be easy using Semantic Web knowledge bases and some simple reasoning, shouldn’t it?

For all these tasks (and many others), it seems to be apparent that Semantic Web Reasoning – and the availability of corresponding knowledge bases – would make the finding of answers much easier. The current reality of the Semantic Web is still quite a bit away from this. But we’re working on it.

Finally, as promised, the results of my inquiry about the ISWC best paper awards:

So why did I dig these awards out? Because I noticed that among these 6 papers there are 3 which are explicitly concerned with OWL. And the 2007 paper involves RDF inferencing. Talk about the importance of reasoning for the Semantic Web …

Author: Pascal Hitzler, AIFB, University of Karlsruhe (TH), Germany

Sphere: Related Content

A (very personal) bit of ISWC08 trendspotting

October 30, 2008 By: Pascal Hitzler Category: Conferences & Events 2 Comments →

As ISWC08 is drawing to a close, it dawns to me that something which Frank van Harmelen has been forecasting for years is now happening, seemingless without conscious effort. He calls it Approximate Reasoning – have a look at his ESWC06 keynote. The basic idea behind it is to do reasoning over ontologies with a different focus, namely by giving up some reasoning correctness in order to gain better scalability.

And indeed, at ISWC08 I have seen a number of things which fit exactly into this corner (while at the same time the authors/programmers might not even be aware of it).

  • As part of the Billion Triple Challenge, Axel Polleres presented the SAOR system, which does approximate OWL reasoning by means of forward chaining rules. Now you can’t do OWL reasoning (in a sound and complete way) with forward chaining rules (and Axel knows this), so in the end you’re losing some consequences. But at the same time you do get some consequences when having to deal with large amounts of data.
  • Eyal Oren, also at the Billion Triple Challenge, presented the MARVIN system which performs approximate RDF reasoning by means of massive parallelisation. MARVIN comes out of the EU project LarKC, which is actually pursuing approximate reasoning on a large scale (pun intended). Edit: This one actually won the 3rd prize at the challenge.
  • Among the results presented at ISWC08, I found those by Claudia D’Amato on Statistical Learning for Inductive Query Answering on OWL Ontologies really amazing. She and her collaborators managed to do OWL instance retrieval without any deduction algorithm. Instead they used Support Vector Machines and learned which (named) OWL classes individuals belong to. The learning was done from a small sample set (generated by a reasoner), but the network was able to generalise from the data to achieve about 90% of coverage. In my opinion, this is something conceptually new and it is really remarkable that it works.
  • In a regular paper Eyal Oren also reported on using Evolutionary Algorithms for RDF query answering.

The above is only a selection of approximate reasoning related things at ISWC08. There was also the Workshop on Nature inspired Reasoning for the Semantic Web where related ideas were discussed. At the colocated Web Reasoning and Rule Systems conference, RR2008, there will be two papers on approximate reasoning (incidentially, with me as coauthor).

I foresee the importance of such approaches rising substiantially in the future (and I think it’s a safe guess since Frank also seems to think so). The Billion Triple Challenge series could become one of the driving forums for this. There are exciting times ahead!

Author: Pascal Hitzler, AIFB, University of Karlsruhe (TH), Germany

Sphere: Related Content

The Day after Freebase went RDF

October 30, 2008 By: Jana Herwig Category: Linked Data & Open Data, Mashups & Web services 6 Comments →

So what’s been happening on the blogosphere after John Giannandrea’s keynote at ISWC and the revelation that Freebase now produces Linked Data from an RDF service

Tetherless World sums up the Freebase facts (e.g. 156,000,000 assertions made; 1370 published types; 75 domains; graph model, identity, web based) and further points out that ontology creation “is a social process, and both freebase and semantic wiki are tools that enable users to create ontological vocabulary without worrying too much on building a comprehensive ontology.”

Inkdroid notes that the RDF service release “is important news because Freebase is an active community of content creators, creating rich data-centric descriptions with a wiki style interface, fancy data loaders, and useful machine APIs.” This is followed up by a quick and handy tutorial how you can get machine readable data back from freebase using a URI with Freebase. Conclusion:

So why is this important? Because following your nose in HTML is what enabled companies like Lycos, AltaVista, Yahoo and Google to be born. It allowed for agents to be able to crawl the web of documents and build indexes of the data to allow people to find what they want (hopefully). Being able to link data in this way allows us to harvest data assets across organizational boundaries and merge them together. It’s early days still, but seeing an organization like Freebase get it is pretty exciting.

Yves Raimond was the first to wonder on the public W3C LOD mailinglist: “now, to see whether it links to other datasets :-) ” – the idea of having linked data without the linkage would indeed seem like love’s labour lost. Semantic Focus / James Simmons seconds: “One downside is the data doesn’t appear to link to external resources, in a sense walling itself in. It should be trivial to link the topics that came from Wikipedia back to Wikipedia as well as DBpedia (which would be killer, by the way).” This is followed up a later post, where James expresses concerns regarding the relationship DBpedia / Freebase: “Freebase may see a drop in userbase growth and participation if it becomes a mirror of DBpedia (or vice-versa) and the popularity once garnered by one project may shift towards the other, or away entirely.”

More News / Andrew Newman puts the Freebase RDF service release in context with Cathrin Weiss’ “250 million triples on your iphone” submission, iMoCo, to the Billion triples challenges, also DBpedia and Semaplorer, developed at the University of Koblenz:

DBPedia stood out because it was the only one that allowed you to write data to the Semantic Web rather than just read the carefully prepared triples. For a similar reason I though SemaPlorer was good because they tried to do more than just the standard triples but went that extra bit further by making it more generic like integrating flickr. But they were all excellent, all of them showing what you get with a billion or more triples and inferencing.

That combined with the guys at Freebase making all of their data available as RDF and it was a big day for the Semantic Web.

ARQtick / AndyS plays a bit with the Blade Runner example cited by Freebase, e.g. takes a look at the graph, looks for interesting properties and extracts author names

N.B. If you want to follow ARQtick’s example: use the Linked Data browser plugin Tabulator or go to the Marbles site to view the RDF – without a data browser you’ll be redirected to the HTML page. You will also need it to make sense of rdf.freebase.com.

Sphere: Related Content

The Gap between the Web 2.0 and Semantic Web Community (tentative post)

August 25, 2008 By: Jana Herwig Category: Linked Data & Open Data, Mashups & Web services 6 Comments →

Two days ago in upper Austria, the BarCamp Traunsee, subtitled “Social Media Review Camp”, took place, which I had co-organized and which was co-sponsored by our own lil’ Semantic Web Company. Andreas Blumauer (also SWC) joined me on the first day, hosting a session about and giving an introduction to Linked Data. Given the angle of the BarCamp, he gave it to an audience of Web 2.0 people (i.e. consultants, marketers, developers, communications people). And was he able to bridge the gap between 2.0 and 3.0?

BarCamp Traunsee

Half a year ago, I had been a complete newbie to the Semantic Web and Linked Data myself, and while the concept of the Semantic Web is undoubtedly as persuasive as a technological concept possibly can be, I remember how hard it was to come to grips with it (btw, I am a Humanities/Liberal Arts person). I think that Andreas’ presentation on Friday was probably the most accessible introduction to the topic I have witnessed this far, and it allowed me to backtrack once more where the biggest comprehension and communication issues probably are.

If Semantic Web people start explaining their concepts to ‘other species’, they very soon start juggling acronyms and technical lingo, in particular names and abbreviations from the Semantic Web Stack – understandably so, as URIs, XML and RDF form the very foundation, on the technological side. But the only concept where the web 2.0 people (in particular those who approach it from the business, PR or marketing side) might still be with them is XML – even though it might sound surprising, not everyone is able to guess without context that the term URI refers to the same kind of thing as URL. And when you say RDF, people are surprisingly often inclined to think you are talking about “RFID” (Radio Frequency Identification) – it’s got, after all, also to do with unique identification, doesn’t it?

Just as the Semantic Web interfaces are only about to become more accessible to web 2.0 people (once more, hooray for Parallax), I think a VITAL next step in promoting the Semantic Web is to find human-readable explanations of its technologies.

The generic explanations all sound very good ( “At the moment, we have a web of documents, but the Semantic Web aims for the web of data” or “The Semantic Web wants computers not only to be able to process, but also to understand data”), but what they fail to achieve is to make non-tech people interested in the (workings of the) technology.

Without addressing technology, these generic explanations are just too bland to convey what is really exciting about the semantic web – yet as soon as SemWeb people start to talk technology, the acronym shower starts – see above. Dilemma.

Back to the BarCamp: I think that Andreas took a good approach in that he
a) kept the acronym level low
b) went on to explain how Linked Data can be a better source for mashups than APIs – because APIs really are the Holy Grail of the Web 2.0 community. I saw it happen before and I saw it happen at the BarCamp Traunsee – as soon as a new tool or feature is introduced, people start asking: “Does it have an API?” – - “Will it have an API?” – “Can I get access to the API?” – “Is the API documentation online?”

What seems to be pegged in people’s mind is that you have to have an API to make mashups, and that mashups are what constitutes the miracle of the web 2.0. So my simple advise for all Semantic Web evangelists would be:

If you want to develop a showcase that people understand, develop a mash-up, and more specifically one that uses data that average users would use and understand.

Develop something like DBpedia mobile (call up in emulator), and go into the details of the Semantic Web stack only after people have seen and understood that you don’t need an API (well, theoretically) and huge programming effort to obtain structured, processable data.

Btw, things got even more semantic on the second day of the BarCamp: Alexander Kirk presented his Factolex dictionary, a dictionary consisting of “short and concise explanations” which can be enhanced by tags, and which, because of their simplicity, would ideally lend themselves for a conversion into triples. Alexander confirmed that he keeps semantic integration in mind while developing Factolex further.

Alexander’s presentation was followed by input from Michael Schuster (who hasn’t yet put his session online, and I seem unable to remember the names of the sites he uses and showed us). One of them was a tool that uses natural langauge processing to interpret user notes, and which is able to decide, for instance, whether an entry should be added to the calendar or to a to do list.

Nifty tool (and I hope I’ll be able to provide a link later), but what I mostly remember his presentation for is that he presented it as an example of a “dirty semantic web approach”, making it sound as something diametrically opposed to the (potentially anal) endeavours of those who rely on the Semantic Web stack.

But why open up this binary opposition? You can and must have both, semantic technologies likes NLP, and open standards such as defined in the Semantic Web stack.

It’s not like one is for the ‘cool kids’ (or web 2.0 kids) and the other one for the ‘geeks’ – if anything, then I’d say that the ‘cool kids’ are probably more interested in improving the service of just their site (making the industry and software market more diverse, if there are enough of them), whereas the ‘geeks’ work towards global exchange through the definition and further development of open standards (and make sure the ‘cool kids’ don’t get trapped in their data silos).

In the end, once the Semantic Web enters maturity level, it will need both of them.

Reblog this post [with Zemanta]
Sphere: Related Content

Linked Open Data Triplification Challenge: Nominees are up, voting tool is online

July 31, 2008 By: Jana Herwig Category: Calls & Competitions, Conferences & Events No Comments →

The nominees for the LOD Triplification Challenge are up! The challenge was organized as part of the preparations for the I-SEMANTICS 2008 conference and asked for submissions in the form of applications of Linked Open Data tools, RDF and Linked Data exporters, adoptions of configurations of Triplify for standard web applications, portings of the triplify script into other languages (e.g. Python, Ruby, Perl, ASP) and for applications showcasing the benefits of Linked Data to end-users.

TriplifyTriplify itself is a small web application plugin – its crucial parts consisting of roughly 200 lines of code – currently only implemented in PHP. It is based on the definition of relational database queries for a specific web application in order to retrieve valuable information and to convert the results of these queries into RDF, JSON and Linked Data. More information about Triplify can be found here.

On to the nominees: Eight of the submissions were nominated and can now be voted on using the poll widget on the nominations page. The nominees are:

  1. Automatic Generation of a Content Management System from an OWL ontology and RDF import and export by Alastair Burt, Brigitte Jörg.
    URL: www.lt-world.org/triplify
  2. Integrating Triplify into the Django web application framework and discover some math by Martin Czygan.
    URL: pcai042.informatik.uni-leipzig.de:9103
  3. Linked Movie Data Base by Oktie Hassanzadeh, Mariano Consens.
    URL: www.linkedmdb.org
  4. Interlinking Multimedia Data by Michael Hausenblas, Wolfgang Halb.
    URL: sw.joanneum.at/CaMiCatzee
  5. Showcases of light-weight RDF syndication in Joomla! by Danh Le Phuoc, Nur Aini Rakhmawati.
    URL: swm.deri.org/jsyndication
  6. Semantic Web Pipes Demo by Danh Le Phuoc.
    URL: pipes.deri.org
  7. DBTune by Yves Raimond.
    URL: dbtune.org
  8. Triplification of the Open-Source Online Shop System osCommerce by Elias Theodorou.
    URL: triplify.org/vocabulary/oscommerce

Detailed information can be found in PDF outlines on the nomination page – and don’t forget to vote! The final decision about the winners of the challenge will be made by the organizing committee.

The prizes will be awarded at I-SEMANTICS 2008, 3–5 September 2008, Graz, Austria, which is part of TRIPLE-I, a joint venture of three conferences (I-SEMANTICS, I-KNOW, I-MEDIA).

Related post:
Sören Auer: Triplification Challenge Nominations

Zemanta Pixie
Sphere: Related Content

SWC’s Matthias Samwald contributes to W3C notes

July 14, 2008 By: Jana Herwig Category: Ontology Engineering, Vocabularies & Languages No Comments →

Early June saw the release of two notes drafted by the Semantic Web Health Care and Life Sciences (HCLS) Interest Group within the W3C. One of the contributors, and editor of one note, is Matthias Samwald, a project coordinator at SWC, who is a member of this SIG and who has worked on several Semantic Web projects for the Yale Center for Medical Informatics (USA), Science Commons (USA) and DERI Galway (Ireland).

A Prototype Knowledge Base for the Life Sciences
W3C Interest Group Note 4 June 2008
Editors: M. Scott Marshall, Eric Prud’hommeaux
Contributors: Alan Ruttenberg, Jonathan Rees, Susie Stephens, Matthias Samwald, Kei-Hoi Cheung
Abstract: The prototype we describe is a biomedical knowledge base, constructed for a demonstration at Banff WWW2007 , that integrates 15 distinct data sources using currently available Semantic Web technologies such as the W3C standard Web Ontology Language [RDF]. This report outlines which resources were integrated, how the knowledge base was constructed using free and open source triple store technology, how it can be queried using the W3C Recommended RDF query language SPARQL [SPARQL], and what resources and inferences are involved in answering complex queries. While the utility of the knowledge base is illustrated by identifying a set of genes involved in Alzheimer’s Disease, the approach described here can be applied to any use case that integrates data from multiple domains.

Experiences with the conversion of SenseLab databases to RDF/OWL
W3C Interest Group Note 4 June 2008
Editors: Matthias Samwald, Kei-Hoi Cheung
Contributors: Alan Ruttenberg, Huajun Chen
Abstract: One of the challenges facing Semantic Web for Health Care and Life Sciences is that of converting relational databases into Semantic Web format. The issues and the steps involved in such a conversion have not been well documented. To this end, we have created this document to describe the process of converting SenseLab databases into OWL. SenseLab is a collection of relational (Oracle) databases for neuroscientific research. The conversion of these databases into RDF/OWL format is an important step towards realizing the benefits of Semantic Web in integrative neuroscience research. This document describes how we represented some of the SenseLab databases in Resource Description Framework (RDF) and Web Ontology Language (OWL), and discusses the advantages and disadvantages of these representations. Our OWL representation is based on the reuse and extension of existing standard OWL ontologies developed in the biomedical ontology communities. The purpose of this document is to share our implementation experience with the community.

Zemanta Pixie
Sphere: Related Content

Video: Links to DBpedia in TopBraid

April 30, 2008 By: Jana Herwig Category: Tools & Software No Comments →

TopBraid DBpediaTwo weeks ago (but still worthwhile mentioning) Holger Knublauch from Topquadrant made a little video for his blog, highlighting how DBpedia can be used to link different domain models with each other, a feature that’s now incorporated in TopBraid Composer 2.5.3. He explains DBpedia as

an RDF repository based on Wikipedia. DBpedia provides machine-readable RDF data for each of the pages in Wikipedia. Each Wikipedia page is represented by a corresponding RDF resource, and these resources are associated with RDF property values to provide descriptions, images, cross-references and tons of useful background knowledge. For example, the DBpedia pages for cities (e.g., Canberra) contain geographical information, the number of inhabitants, population density, links to famous inhabitants and average temperatures, all in machine-processable form. While these property values may not be totally stable and reliable, they are at least a good start.

However, the main benefit of DBpedia is that it provides relatively stable URIs for all relevant real-world concepts. This makes it a natural place to connect specific domain models with each other. If I publish my RDF files with links to DBpedia and you do the same, then we can automatically find cross-references and might more easily find mappings between our domain models. All I need to do is to add links such as { my:Canberra owl:sameAs dbpedia:Canberra }.

Here’s the link to the blogpost, and here the direct link to the video (*.wmv, 10,8 MB)

Sphere: Related Content

LinkedData Planet – Conference & Expo 2008

April 17, 2008 By: Jana Herwig Category: Conferences & Events 3 Comments →

Come share your expertise with linked data and semantic technologies and learn from others at LinkedData Planet in New York City (June 17-18, 2008).

In creating the modern generation of enterprise and web applications, we typically integrate information from multiple sources. Relating data from disparate sources presents a challenge of deriving information. However, semantic tools and technologies are evolving that enable us to understand information derived by linking data from different sources, including data from applications, databases, ontologies and content management systems. Semantic technologies and tools support techniques such as tagging online information to make it more readily accessible for data integration. This makes it easier to understand data in relation to other data, even if some of this data is inside your firewall, some is in a business partner’s system, and some is part of the growing collection of useful publicly available data on the web.

LinkedData Planet provides insights into those technologies that enableus to:

  • connect data contained in silos within organizations in a meaningful way
  • extract and correlate data from web sites and databases for purposes such as analyzing trends and decision support, customer and vendor relationship management, and social networking

(more…)

Sphere: Related Content