Andreas Blumauer

Automatic text analytics using DBpedia and PoolParty – A Live Demo

Let me show you which steps have to be taken to generate a high-quality text mining application, ready to be used to annotate and to categorize any kind of text or documents covering nearly any domain. With our approach of thesaurus based text mining your documents can also be linked to the world of linked (open) data; enrich your documents with data from the LOD cloud!

Step 1. Generate a thesaurus by using a linked data source like DBpedia

As recently reported SWC has developed a tool called SKOSsy which can be used to extract seed thesauri from DBpedia. In our example I will generate a knowledge model describing the domain of “digital photography“. This step took around 15 minutes.

Step 2. Load the thesaurus into PoolParty and improve it to your needs

After the seed thesaurus has been loaded into PoolParty Thesaurus Manager you have many possibilities to enhance the knowledge model further: Add more categories, synonyms, relations etc. In this example I use the seed-thesaurus without any further improvements. This step took approximately 2 minutes.

Step 3. Generate an automatic text extractor on top of your thesaurus

This step took a couple of seconds and ended up in having generated a fast and reliable text mining application on top of PoolParty Extractor, ready to be used to enrich your documents with data from the LOD cloud.

You can try it out here: PPX Live-Demo

To try the extractor on your own, please take a look at the image above which shows a proper configuration, you have to insert the following UUID in the form: d35d4ddb-adc3-4ea5-b027-deacac03e391

Since our example is all about ‘digital photography’, we recommend to use text samples (or some fragments) like these ones to test the quality of PPX based text analytics:

Let us know what you think about this straight-forward approach and your opinion about the quality of the results. We believe that thesaurus based text mining is in many cases an alternative to some other approaches, especially if you want to to enrich your content with information from the upcoming web of data.

Of course we would be happy to generate other demos in the areas of your interest! Just get in contact with us by using our contact form.

Andreas Blumauer

Experiences from teaching Linked Data

Dr. Bernhard Haslhofer works as instructor on Web Information Systems at Cornell Information Science. Just recently he gave a course which examined technologies for building data-centric information systems on the World Wide Web. Semantic Web Company (SWC) had the opportunity to talk with Dr. Haslhofer to examine the question “How to teach Linked Data?“.

SWC: Bernhard, you have been working on the Semantic Web and Linked Data for years now. What is the first lesson you usually give when you try to explain the “Semantic Web”?

Maybe I should first clarify that the course I am co-teaching is not a Semantic Web course. The course is about data-centric Web information systems in general and we spent some classes talking about Linked Data and Semantic Web technologies. We start explaining the origins and the fundamental architectural principles of the World Wide Web and then focus on the data-centric aspects of the Web.

“instead of building isolated repository-centric APIs we could also build a globally connected data graph

After introducing various data exchange formats (XML, JSON & co.) we teach how Web APIs work, and discuss the design principles of RESTful Web Services. Then the conceptual transition to Linked Data is just a small step, because we can argue that instead of building isolated repository-centric APIs we could also build a globally connected data graph, which is based on a uniform data model and can be traversed and queried using SPARQL.

“DBpedia and all the other existing Linked Data projects and tools that came up in recent years really help in explaining and illustrating how things work”

So, I am somehow approaching the “Semantic Web” bottom-up and concentrate on the “visible” parts of the “Semantic Web” vision. DBpedia and all the other existing Linked Data projects and tools that came up in recent years really help in explaining and illustrating how things work. And last but not least, schema.org and the design of the Facebook Open Graph protocol also show the growing importance of having structured data on the Web.

SWC: At least for non-technicians “Linked Data” sounds very technical. Antoine de Saint-Exupery said: “If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” Is there an “endless immensity of the sea” you try to bring in as well?

If you can access and combine data from the Web you can answer interesting questions and discover previously unknown relationships between things. We thought the best way to learn about Linked Data is to implement simple demo applications. So we asked the students to think about uses cases that bring some benefit for end users and require data from several Web sources to answer certain questions.

“I think it became clear what it means to work with easily accessible structured Web data opposed to working with unstructured data”

One group developed a service which connects safety records with public transport information. Now users can now easily choose the “safest” bus connection between from and to New York City and other cities. Another group combined public school district information with geographic data, which now allows parents to view statistical information about school districts in New York State by using apps like Google Earth. There are many more examples, but most importantly, I think it became clear what it means to work with easily accessible structured Web data opposed to working with unstructured data.

SWC: Instructing how to use the Semantic Web is not only a matter of slide-decks. It is rather a question of concrete use cases in combination with tool skills. What kind of tool skills should students of information sciences acquire to your opinion?

Collecting and making sense out of data is a common scholarly practice in many research areas and the Web is becoming, or is already, the primary medium for publishing and distributing results. I believe that making data accessible as part of a some research activity will become increasingly important in future and the Web will probably be infrastructure for doing this.

So I think that a student who is working with data should at least know (i) how to retrieve and (ii) how to publish data on the Web in way that others can easily discover, access, and use their data. Linked Data is one possible technical approach for doing that.

SWC: As a European who is teaching and working in the U.S., how do you perceive the different approaches between those two systems when it comes to transfer complex fields of knowledge like the semantic web from universities to business environments?

From the experiences I have made in my previous and current working environments I can only tell that the relations between businesses and universities seem to be tighter in the US. I don’t necessarily mean “formal” bounds between institutions but rather informal relations between people, who understand complex fields of knowledge, both in the academia and in business.

“I assume transferring knowledge between two proxies who speak the same ‘language’ makes it a lot easier”

PhD students, for instance, often work in business over the summer and/or continue their career in the research department of some company. Some continue their cooperation with their former professors and academic colleagues and I assume transferring knowledge between two proxies who speak the same “language” makes it a lot easier.

SWC: What are the most important things which are still missing to make linked data technologies an integral part of enterprise information systems?

Quite often I hear the complaint that major database vendors still don’t provide satisfactory RDF support in their products. I don’t think this is a necessary precondition for implementing Linked Data but for some institutions this seems to be very important.

Many thanks!

Enhanced by Zemanta
Andreas Blumauer

WordPress plugin to make use of linked data

PoolParty Team has recently published an improved version of their WordPress plugin which enables linked data enrichments of blogs. Therefore a SKOS based vocabulary has to be uploaded or retrieved from a SPARQL-endpoint. Users and developers benefit from

  • automatic annotation of all blog entries displayed as tooltips
  • a comfortable search facility with auto-complete over all concepts from the linked thesaurus including semantic search over the whole blog
  • an integrated thesaurus browser, plus
  • a corresponding linked data frontend including RDF/XML serialization of the underlying thesaurus + SPARQL endpoint

All details about the new version 2.2.3 can be read here.

Enhanced by Zemanta
Thomas Schandl

I-Semantics: The Review in a Car – 2011 Edition

Continuing the tradition of last year’s review in a car, the Semantic Web Company’s participants of the I-KNOW / I-SEMANTICS talked about their impressions of the conference while on their way back to Vienna.

Image based on work by Paolo Mañalac

Image based on work by Paolo Mañalac

 

Thomas Schandl: An especially nice thing about this conference is that it’s co-location attracts people from two separate communities: Knowledge Management and Semantic Web. This serves as a natural facilitator for looking beyond the boundaries of one’s own domain and getting more than a glimpse of what’s currently happening in related fields.

That being said one of the most interesting talks I attended was a talk from KM expert Prof. Martin Eppler and his take on “Sketching at Work“, which introduced loads of sketching methods which can help to solve problems, inspire creativity and support communication.

From the Semantic Web side I enjoyed the innovative approach taken by Hasso Plattner Insitute‘s DBpedia powered quiz game Risq!. It is a Jeopardy-like Facebook game, that (besides being fun) sheds insights as to which facts are especially important to characterize a Linked Data resource. E.g. when the system wants you to guess a specific “female politician” would it help you more to know that she is part of the category yago:LivingPeople or would you rather get the hint that she is dbpedia:Chancellor_of_Germany?
By analyzing the logs of the played games, the researchers can find out which triples have more discriminative power than others.

Through the many personal encounters I also got a lot of input on which new features would be especially interesting for future versions of PoolParty and what we should concentrate on in the LOD interlinking project LASSO that Bernhard Schandl (Gnowsis / Refinder), Stefan Wunder (Neurovation) and me presented at the I-Praxis track.

Andreas Blumauer: Again, this year was absolutely worth coming to Graz also from a business perspective. For me it was the 10th time going to Graz. When I went to the second edition of I-KNOW in 2001 I remember that nearly nobody has ever heard of “semantics”. When I-SEMANTICS came to Graz the first time, this was in 2007, it was still unclear for most visitors how semantic technologies could contribute to a more efficient enterprise knowledge management. Nowadays, 10 years later, there is another question most prominent:

Which kind of semantic technology is solving my problem?

Being most of the time at our exhibition booth I enjoyed talking to visitors who had very concrete plans & ideas about how to use linked data, text mining or knowledge models for their business. The time when we had to explain what the “semantic web” is all about is over.

Christian Dirschl´s (Wolters Kluwer) keynote on Friday was exactly reflecting this fact: It´s good to see how big players have started to integrate the idea of linked data into their processes already. The days when we had to explain the difference between RDF and XML seem to be over. Or at least almost.

Florian Kondert: It was a vibrant atmosphere for me, since I didn’t make it to participate to just one track, but talking to interesting and interested persons at the booth without one break, instead.

From the participant’s perspective the conference as a networking platform was a huge success – and it definitely didn’t stop at dusk! It is worth pointing out the diverse needs and ideas on semantic use cases, that allow us to learn more with every discussion. The bottom line is that semantic solutions are badly needed for many organisations – and they start to realize, that there are no working alternatives at the moment.

On the other hand it is crucial to show up with real life examples, not just with prototypes that might work tentatively! As providers for semantic solutions we face decision makers on the highest level and they demand high level remedies – so, no time to take a break!

Tassilo Pellegrini: As the conference chair I really had an intense, but all in all very positive time at the conference. Interesting people, inspiring talks and a really good time at the socializing events (greetings to Leo Sauermann & Co. – I enjoyed the drinks!). For a general conference overview read my post from a few days ago.

But there is more to such a diverse conference as just talking about semantics. As some of you might know, beside my interest in Semantic Web, I have been involved in some policy consulting lately concerning the topic of net neutrality. At the conference I took the opportunity to talk to some telecommunications-savvy people and had some really great conversations (Harald … I really enjoyed our discussion!). But to my surprise I had to find out that – especially among the engineering guys – there seems to be very little awareness about the pressing social, cultural and economic consequences that an abandoning of net neutrality will have on the Internet as we know it today. For those readers who are into semantic web but not into the net neutrality discourse I want to reduce it to a very simple formula: without net neutrality you can say goodbye to linked open data. And this should really make us think and act!!

Enhanced by Zemanta