Semantic Web Company

The Semantic Puzzle

Open World Assumptions

subscribe RSS

Archive for the ‘Vocabularies & Languages’

Stella Dextre Clarke & Alan Gilchrist about the “Future of Knowledge Organization on the Web”

June 21, 2010 By: Andreas Blumauer Category: Linked Data & Open Data, Tools & Software, Vocabularies & Languages 1 Comment →

Semantic Web Company (SWC) had the pleasure and the opportunity to talk with two internationally recognised experts in the fields of information management and knowledge organization: Alan Gilchrist and Stella Dextre Clarke. SWC asked some questions about the “Future of Knowledge Organization on the Web & Linked Data” on the occasion of an event of the same name organised by ISKO UK which will take place on September 14, 2010 in London.

1. Alan, you are one of the leading experts in the field of thesaurus construction. Organising knowledge in a (worldwide) Semantic Web is a rather young discipline compared to your domain. What do you think can the Semantic Web community learn from “traditional” thesaurus management and vice versa?

You put inverted commas round the word traditional, but it might be more appropriate to put them round the word thesaurus! So long as words are used in information retrieval and in information sharing, different forms of structured vocabularies will be required, and many of the fundamental principles of thesaurus construction are still valid for their construction. Of course, the “traditional” thesaurus has mutated since the days when it was used only for controlled indexing and retrieval; and now, with the many enrichments possible it can be viewed as an ontology (in one of the definitions of this word). What remains a difficulty is to create a generalisable typology of associative relationships, though this is, of course, possible in relatively closed systems. In short, structured vocabularies with broadly thesaurus formats will be a necessary component in the web stack.

2. Stella, as a consultant you are specialized in the design and implementation of knowledge structures for information retrieval applications. In the last few months we have seen that SKOS can serve as a significant building block to link “traditional” thesaurus management to knowledge structures from the semantic web. Can you see that this development is market-driven, is there a significant growth of demand for solutions built around SKOS?

This question sounds surprisingly sceptical about the growth of SKOS. I guess the dizzying speed of phenomena like Facebook and Twitter has fuelled expectations of tools springing up overnight like mushrooms, fully formed and ready to eat. But actually it takes time, not just for the tools to be fashioned, but for the potential market to develop an understanding of what they can do and what will happen next when they are used.

Applications for SKOS are springing up all the time, as fast as people can grow the skills and vision to deploy them. At the moment the market, or shall we say the power-base, seems to be with the academic sector and allied not-for-profit organisations. This will spread progressively through the public to the private sector, as enterprises find ways of adapting their business models. The main hurdles to overcome could be intellectual property rights and the need for compilers of databases to keep earning their living.

3. Alan, constructing thesauri for the semantic web also means that one has to make the “open world assumption”. In which sense does this change the way to manage thesauri, keep them growing and assure quality? Can you see new, upcoming methodologies to do that?

Everything changes with the “open world assumption”! Following on from my answer to the previous question, it seems clear that one manifestation of the thesaurus will be found in those systems that support interoperability, such as federated searching or metadata registries. Even with simple thesaurus management software, it is possible to construct a “master vocabulary” or “word bank” to support different applications within an enterprise; thereby promoting interoperability. More sophisticated software is already available (though not very widely); more will be needed and, doubtless, will be created.

A more formal answer to both questions will be found in a new standard – ISO 25964, currently being prepared on the basis of BS 8723. The two fundamental features of these two standards are (1) the thesaurus as a theoretical and practical basis for the construction of structured vocabularies for information retieval and (2) the growing and vital need for interoperability between systems and the intelligent mapping of the vocabularies used by those systems.

4. Stella, just recently at ESWC 2010, Sean Bechhofer was asked during his keynote why there are so few SKOS tools on the market. What do you think are the reasons for this? Are there still shortcomings of the SKOS specification compared to other existing thesaurus standards? (see also: http://www.eswc2010.org/program-menu/keynote-speakers/155-sean-bechhofer & http://www.slideshare.net/seanb/skos-past-present-and-future )

Regarding the speed of development, see my reply above. As to shortcomings, did you note in one of Bechhofer’s slides: “Standardisation is necessarily a compromise: Everyone equally unhappy = success!” The SKOS development team took a conscious decision to keep the schema sufficiently simple that it could be applicable to as many different types of KOS as possible.  On the downside, this means SKOS is unsatisfactory for conveying sophisticated features of some thesauri and classification schemes. But by keeping the entry barrier low, more widespread use has been encouraged.

By way of illustration, compare SKOS with the data model and XML schema of BS 8723. This schema is comparatively specialized, with the aim of enabling exchange of any thesaurus carrying any or all of the features recommended in the standard. And incidentally, this data model and schema will have some further capabilities added when published in the forthcoming standard ISO 25964. SKOS does not provide for a number of features in these standards (such as compound equivalence). But the schemas in BS 8723 and ISO 25964 are designed for thesaurus developers to share their work, rather than for easy publication on the Web, and will never have so many users or associated tools as SKOS.

So I believe that SKOS has done well to accept compromises that encourage generalisation although they might not suit some specialists. That said, I do regret one of its weaknesses in the context of mapping. Compound equivalence mappings (that is to say, where Concept A in one vocabulary maps to a combination of Concepts  B and C in another) are very commonly needed when extending a search across multiple databases, and the SKOS mapping properties do not currently allow for them. Perhaps there will be some provision in future?

5. Stella, Alan, in September ISKO UK will organise an event on “The Future of Knowledge Organisation on the Web”. “Linked Data” seems to be a promising approach to organise knowledge in large scale environments.
Could you imagine that SKOS as a small subset of semantic web specifications will play a central role in this environment since it is quite intuitively comprehensible by virtually any knowledge worker or do you rather think SKOS is too simple (or too complex)? (see also: http://poolparty.punkt.at/using-skos-as-an-interface-to-the-linked-data-cloud )

Stella: Of course SKOS will have a central role (whether or not every knowledge worker finds it as intuitive as you suppose). “Linked Data” will find even wider applicability. ISKO-UK (the organiser of the meeting in London on 14 September) has a mission not just to spread the word about both these technologies, but to build bridges between the several communities who must share their expertise and data to build more exciting applications. We’re expecting an audience of over 100 at this low-cost event.

Alan: Yes, of course, just as all the tools in the web stack will be necessary if semantic web technologies are to be effective. But it is obvious that we are dealing with complexities of a higher order than ever before. Any structured vocabulary is an “artificial language” which, while acknowledging many aspects of theoretical linguistics is forced to be pragmatic in its construction. Consequently, it would not be surprising if SKOS is seen to be “catching up”, and this became apparent in the work of BS 8723 when thesaurus models using UML were being constructed. There remains much work to be done on all fronts.

Stella Dextre Clarke is an independent consultant specializing in the design and implementation of thesauri and other knowledge organization structures. She currently leads ISO NP 25964, the project to update and revise the international standards for thesauri. Previously she was the Convenor of the Working Group which developed BS 8723. In 2006 she won the Tony Kent Strix Award for outstanding achievement in information retrieval, in recognition for her development work on IPSV (Integrated Public Sector Vocabulary), as well as on the vocabulary standards. She is a Fellow of the Chartered Institute of Library and Information Professionals.

Alan Gilchrist has been a consultant for many years in the fields of information management and information architecture, specialising in the vocabulary aspects of information retrieval. He is co-author, with Jean Aitchison and David Bawden of Thesaurus Construction and Use, now in its fourth edition. In 1979 he founded and edited the Journal of Information Science, and is now Editor Emeritus. He has an Honorary Degree (D. Litt.) from the University of Brighton and is an Honorary Fellow of the Chartered Institute of Librarians and Information Professionals.

Sphere: Related Content

George Anadiotis: “Linked Data brings value by offering an alternative approach to lightweight data integration and mashups.”

December 10, 2009 By: Tassilo Pellegrini Category: Linked Data & Open Data, Mashups & Web services, Semantic Web Applications, Software Development, Tools & Software, Vocabularies & Languages No Comments →

george-imcGeorge Anadiotis is an expert on artificial intelligence with academic roots at the Vrije Universiteit, Amsterdam. In February 2009 he took the position as R&D Director at the Greek technology company IMC. I met him in September at I-SEMANTICS 2009 where he and his team contributed to the Triplification Challenge. In their paper Linked Data for the Masses they were pondering about the pragmatic value of Linked Data from an inbound and outbound perspective.  In his words:

We started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

Multimedia Semantics @ SAMT 2009

October 07, 2009 By: Tassilo Pellegrini Category: Conferences & Events, Linked Data & Open Data, Vocabularies & Languages No Comments →

samtOn accasion of the upcoming 4th International Conference on Semantic and Digital Media Technologies (SAMT ‘09) from December 2 – 4, 2009 in Graz/Austria, Werner Bailer from Joanneum Research gave us a short interview about state of the art in multimedia semantics.  When asked about the Multimedia and the Semantic Web he says:

There have been a number of proposals for multimedia ontologies and mappings of multimedia vocabularies (cf. the excellent report from the W3C MM Semantics XG), differing in complexity and expressivity. Thus the W3C has chartered a working group to develop an ontology and API for multimedia content on the Web. The group is developing a lightweight core set of metadata properties and an API specification for accessing these properties, which may come from metadata documents in different standards. Thus mappings to many relevant standards have also been specified. The set of metadata properties will be formalized for interoperability with the Semantic Web. A W3C recommendation is expected in 2010.

Read the full interview here.

Reblog this post [with Zemanta]
Sphere: Related Content

New W3C Rule Interchange Format (W3C RIF) standard published

July 28, 2009 By: Tassilo Pellegrini Category: Corporate Semantic Web, Vocabularies & Languages No Comments →

The W3C Working Group working on W3C Rule Interchange Format (RIF) has recently launched a new standard for the interchange of rules. Some guys from the Coporate Semantic Web Working Group of Freie Universität Berlin have been heavily involved. An interview on the practical aspects of RIF will follow in August.

Reblog this post [with Zemanta]
Sphere: Related Content

Last Call for OWL 2

April 21, 2009 By: Pascal Hitzler Category: Vocabularies & Languages No Comments →

OWL 2 is in the final stages of becoming a W3C recommendation – as announced today. This means that the revision 2 of the Web Ontology Language should basically be stable now, only final fixes are expected. The OWL 2 Document Overview is the general entry point.

Pascal Hitzler

Sphere: Related Content

Google and the Semantic Web: About Quad Stores and URIs

March 20, 2009 By: Andreas Blumauer Category: Internet & Media, Search Engines, Vocabularies & Languages 6 Comments →

Just recently Google launched another interesting service called “In Quotes”. It delivers quotes from stories linked to from Google News and users can compare opinions of e.g. politicians in a very comfortable way.

If  a closer look is taken at the system, one can see that any person whose quotes are listed has got a URI: Barack Obama has got the uniform “qsid” tPjE5CDNzMicmM.

It seems like “qsid” stands for “Quad Store ID” which would perfectly support such a URI based system.

Does Google slowly approximate to the Semantic Web?

Sphere: Related Content

Information Extraction in KiWi

November 28, 2008 By: Jana Herwig Category: Vocabularies & Languages No Comments →

The KiWi meeting is drawing to an end. Marek Schmidt and Pavel Smrz from Brno University of Technology have just given a really exciting presentation of their results in the area of information extraction – and it seems I have developed a case of tendonitis (a.k.a. “mouse hand”) and for the sake of my health will stop blogging for today. Instead of the usual comprehensive coverage, this photo must suffice as a proof of the magic Marek and Pavel’s system is already able to do – please marvel the complex tags that are the product of their information extraction (IE) module. The roles of IE as an enabling technology within KiWi will be in: automatic recognition of (new) terms, entity recognition, text classification and relation extraction.

Information extraction

KiWi team! In particular Klara Weiand who is about to start her presentation on Tags and Queries, please accept my apology! Thank you, good bye, and have a save trip home!

Reblog this post [with Zemanta]
Sphere: Related Content

GoodRelations webcast & spreading the word about the Semantic Web

November 26, 2008 By: Jana Herwig Category: Literature & Publications, Ontology Engineering, Vocabularies & Languages 1 Comment →

You have probably already heard about GoodRelations, “the web ontology for e-commerce”. Martin Hepp from Bundeswehr University in Munich recently created a webcast, giving a short introduction to semantic web-based E-Commerce and to the GoodRelations vocabulary – I want to see more of such introductions which aim at a wider audience in terms of style and intellectual accessibility!

Last week I had an an encounter with a social scientist (within an academic setting) who argued that discussing the Semantic web would not make sense for him (as a social scientist), because of the present lack of social practices in that field… (*jaw-dropping*) I could not persuade him with the argument that the Linked data cloud itself was the result of a social practice – the view he had of the semantic web (which I assume was not an uneducated one) even led him to denounce that developments like Dbpedia, Twine, Revyu, or the use of metadata in general had anything to do with the Semantic Web.

And this is a big challenge.

On the one hand, it is a good thing that there are social scientists who at least have a certain notion of the Semantic Web – on the other, it seems as if all the exciting ideas and developments that have taken place in the last few years have failed to reach those who have been sensitized for the SemWeb project when the idea was first conceived. I am not meaning to make a statement about social scientists here, but rather about the need to communicate what has further happened to the original idea outside also outside of one’s own community.

Btw: In its current issue, quarterly (German-language) magazine t3n is featuring a Web 3.0 and Applied Semantic Web topic as its opener. And that is a good sign, too!

Reblog this post [with Zemanta]
Sphere: Related Content

Looking back at a successful VoCamp Oxford

September 27, 2008 By: Matthias Samwald Category: Conferences & Events, Life Sciences, Vocabularies & Languages 3 Comments →

Thinking... about new vocabularies for the Semantic Web

(by Matthias Samwald)

The first VoCamp ever was successfully completed this week at Oxford University. Tom Heath (Talis) and Jun Zhao (Oxford University) led us through two days devoted to creating new vocabularies, schemas and ontologies. The first day was mainly spent on finding common interests, getting to know each other, and identifying the vocabularies that needed to be created. The second day was spent on creating the vocabularies, first on paper, then on the computer.

Fabien Gandon introduced me to his interesting work around corporate ontologies, which I will explore in further detail for the KiWi project. We also made significant progress on developing a basic, common ontology for the representation of agreement, disagreement and discourse, based on SIOC, SCOT, FOAF and the bibliographic ontology. Such an ontology can be of great utility in many knowledge domains, such as biomedical research or the representation of bug/issue reports in software development knowledge management (something that needs to be adressed for adapting the KiWi system for a use-case at Sun Microsystems). I will elaborate on these developments in separate blog posts next week.

In a very short timespan, the participants of the VoCamp created several new vocabularies, such as:

IRC Vocabulary

Participation Ontologies

UDO (Unified Discourse Ontology)

VotePost

Evidence ontology

Whisky Ontology (yes, it’s an ontology about whiskey)

Journey Ontology

Data publishing, sharing, visualisation ontology

On the second day of the VoCamp I also held a short session called „Do OpenCyc and UMBEL know it?“, where I asked for classes and properties that others wanted to create in their developing vocabularies. It turned out that OpenCyc and UMBEL had a relatively good coverage of the terms that others were creating, including whiskey, evidence, or the relation that one person is the boss of another person (relevant for the corporate Semantic Web). I tried to emphasize that linking and re-using such existing entities was vital for the success of new vocabularies, and of the Semantic Web as a whole. Others objected that re-using such existing resources might not be possible given the often very specific requirements and short time-frames of some projects. Still, I think that linking to existing, large resources on the Semantic Web should have a very high priority when developing new vocabularies.

If you are interested in getting new vocabularies out, but did not get the chance to attend VoCamp Oxford: don’t worry, there are many VoCamps planned for the near future. The next VoCamp will already happen in November, and will be located at DERI Galway. The Semantic Web Company will host a VoCamp in Vienna next year.

Sphere: Related Content

Packing my bags for VoCamp Oxford

September 22, 2008 By: Matthias Samwald Category: Conferences & Events, Life Sciences, Vocabularies & Languages 3 Comments →

(by Matthias Samwald)

I am packing my bags once again: The first VoCamp (hosted at Oxford University, UK) is about to start this week. So, what is a VoCamp supposed to be? The official definition reads like this: “A VoCamp is a series (hopefully) of informal events where people can spend some dedicated time creating lightweight vocabularies/ontologies for the Semantic Web/Web of Data. The emphasis of the event(s) is not on creating the perfect ontology in a particular domain, but on creating vocabs that are good enough for people to start using for publishing data on the Web.”

I always thought that the lack widely established vocabularies/ontologies has been very damaging to the developent of the Semantic Web. The VoCamp initiative could help changing this situation for the better, so I really hope that this is the start of a long series of events.

My topics of main interest are: 1) Associative Tags; 2) Agreement, Disagreement, discourse; 3) Corporate Semantic Web, 4) “Are upper level ontologies/vocabularies not so bad after all?”, 5) “ Cleaner schemas and ontologies”. These interests are motivated partly by use-cases from the “KiWi – Knowledge in a Wiki” EU project, and partly by developments in the area of biomedical research at DERI Galway and the W3C Interest Group for Health Care and Life Science. Details below.

__Associative Tags__

Tagging is one of the key components of the ‘Web 2.0′, and Semantic Web technologies will help to make tagging even more powerful. Schemas such as SCOT or MOAT have already been established, and make it possible to ‘tag’ not only with simple strings, but with entities. These entities (such as concepts described in SKOS) can be associated with clear semantics and can be further described with RDF statements, to describe hierarchies of entities, or to link entities to rich data sources such as DBpedia. This enables sophisticated data-integration and cross-data source queries that would not have been able with simple, string-based tags.

On the other hand, Semantic Web developers can learn from the simplicity that has made tagging so successful. Creating useful tags is very simple, and good user interfaces can further improve the simplicity of creating useful tag with feature such as autocompletion and tag recommendation. This simplicity should server as a role model for many Semantic Web applications.

Specifically, I am interested in what I call ‘associative tags’, bundles of tags/entities/concepts that can be used for the simple representation of facts. The primary intention of creating aTags is not the categorization of the document, but the representation of the key facts inside the document. Key facts in the biomedical domain might be, for example,

“Protein A interacts with protein B” (which can be represented with an aTag comprising of the three entities “Protein A”, “Molecular interaction” and “Protein B”) or

“Overexpression of protein A in tissue B is the cause of disease C” (an aTag comprising of the four entities “Overexpression”, “Protein A”, “Tissue B” and “Disease C”).

Once the aTags from these different sources are aggregated, it is possible to pose a query such as “show me molecules that are associated with molecules that are associated with disease C”, yielding “protein A” as an answer. Hierachies (in the form of rdfs:subClassOf and skos:narrower) can be used to expand queries based on background knowledge (e.g., that “disease D” is a subclass of “disease C”).

In many cases (especially with some ontologies in the biomedical domain), creating such associative tags can be much simpler than the creation of ‘real’ statements, i.e., relations between individuals and property restrictions of classes.

__Agreement, Disagreement, discourse__

Many people in the Semantic Web community are interested in the representation of argumentation structures on the web. For example: stating that one snippet of text contains statements that are in disagreement with another snippet of text, which is in agreement with yet another snippet of text. This can be of use for many knowledge domains, such as news articles, biomedical publications or reports submitted to a software bug tracker. Of special interest in this context are extensions of established schemas, especially SIOC. There is also another ontology called SWAN that is specifically tailored to the biomedical domain, and efforts to align SWAN with SIOC have started recently.

__Corporate Semantic Web__

As Semantic Web technologies are finally getting mature enough to allow industrial uptake, it is becoming clear that ontologies for describing organization structures and business processes are still lacking maturity. FOAF allows us to represent basic information about persons, organizations and their relationships, but lacks vocabulary for stating that one person is the boss of another person, that a project consists of several subtasks, et cetera. While there are some small projects that try to create such schemas/ontologies, a solution of widespread acceptance does not seem to be in sight at the moment.

__Are upper level ontologies/vocabularies not so bad after all?__

FOAF seemingly tried it a long time ago – foaf:Person is a subclass of, “http://xmlns.com/wordnet/1.6/Person”, foaf:Document “http://xmlns.com/wordnet/1.6/Document” and so on. Linking to external schemas/ontologies (or making use of their classes and properties directly) can definitly help in facilitating semantic interoperability. For a long time, many web developers were very skeptical about such ‘top-down’ approaches of data integration, but recently the recognition of the potential values of such resources seems to be increasing. In parallel, the recent 1-2 years brought us some very large upper ontologies that are available as linked data, such as:

  • Wordnet 2.0, hosted by the W3C
  • Yago/DBpedia
  • OpenCyc (now with new URIs)
  • UMBEL (derived from OpenCyc and others).

I think the practice of re-using and linking to such upper ontologies as should become popular (again). It helps in creating a highly interlinked Semantic Web, and helps to avoid re-inventing the wheel for each new schema/ontology. This linking should not be done post-hoc, but should be a central part of the early stages of vocabulary/ontology/data creation.

__Cleaner schemas and ontologies__

Working with established ontologies and schemas in ontology editors can be a chore. Most have dependencies on other ontologies, but don’t use owl:imports. Most use an awkward mix of OWL statements and RDF(S), resulting in ontologies that are OWL Full. Many require some OWL reasoning to make use of sameAs statements and inverse properties, but at the same time reasoning is complicated because the ontologies are OWL Full or even contain logical inconsistencies. Often enough, there seems to be no practical reason for the design choices that caused the trouble: some minor changes can turn a messy OWL Full ontology into an OWL lite or OWL DL ontology. At the moment, many different working groups have created local versions of schemas such as FOAF or Dublin Core that are valid OWL-DL to fix that problem.

It doesn’t have to be this way.

Trying to adhere to OWL lite/DL and adding owl:imports statements can help building cleaner, modular and more sustainable ontologies, and does not require significant additional effort during the creation of ontologies. Maybe we can find a consensus that this would be a worthwhile goal, and develop plans towards reaching that goal.

Sphere: Related Content