Thoughts on KOS (Part 2): Classifying Knowledge Organisation Systems

Traditional KOSs include a broad range of system types from term lists to classification systems and thesauri. These organization systems vary in functional purpose and semantic expressivity. Most of these traditional KOSs were developed in a print and library environment. They have been used to control the vocabulary used when indexing and searching a specific product, such as a bibliographic database, or when organizing a physical collection such as a library (Hodge et al. 2000).

Stella Dextre Clarke & Alan Gilchrist about the “Future of Knowledge Organization on the Web”

Semantic Web Company (SWC) had the pleasure and the opportunity to talk with two internationally recognised experts in the fields of information management and knowledge organization: Alan Gilchrist and Stella Dextre Clarke. SWC asked some questions about the “Future of Knowledge Organization on the Web & Linked Data” on the occasion of an event of the same name organised by ISKO UK which will take place on September 14, 2010 in London.

1. Alan, you are one of the leading experts in the field of thesaurus construction. Organising knowledge in a (worldwide) Semantic Web is a rather young discipline compared to your domain. What do you think can the Semantic Web community learn from “traditional” thesaurus management and vice versa?

You put inverted commas round the word traditional, but it might be more appropriate to put them round the word thesaurus! So long as words are used in information retrieval and in information sharing, different forms of structured vocabularies will be required, and many of the fundamental principles of thesaurus construction are still valid for their construction. Of course, the “traditional” thesaurus has mutated since the days when it was used only for controlled indexing and retrieval; and now, with the many enrichments possible it can be viewed as an ontology (in one of the definitions of this word). What remains a difficulty is to create a generalisable typology of associative relationships, though this is, of course, possible in relatively closed systems. In short, structured vocabularies with broadly thesaurus formats will be a necessary component in the web stack.

2. Stella, as a consultant you are specialized in the design and implementation of knowledge structures for information retrieval applications. In the last few months we have seen that SKOS can serve as a significant building block to link “traditional” thesaurus management to knowledge structures from the semantic web. Can you see that this development is market-driven, is there a significant growth of demand for solutions built around SKOS?

This question sounds surprisingly sceptical about the growth of SKOS. I guess the dizzying speed of phenomena like Facebook and Twitter has fuelled expectations of tools springing up overnight like mushrooms, fully formed and ready to eat. But actually it takes time, not just for the tools to be fashioned, but for the potential market to develop an understanding of what they can do and what will happen next when they are used.

Applications for SKOS are springing up all the time, as fast as people can grow the skills and vision to deploy them. At the moment the market, or shall we say the power-base, seems to be with the academic sector and allied not-for-profit organisations. This will spread progressively through the public to the private sector, as enterprises find ways of adapting their business models. The main hurdles to overcome could be intellectual property rights and the need for compilers of databases to keep earning their living.

3. Alan, constructing thesauri for the semantic web also means that one has to make the “open world assumption”. In which sense does this change the way to manage thesauri, keep them growing and assure quality? Can you see new, upcoming methodologies to do that?

Everything changes with the “open world assumption”! Following on from my answer to the previous question, it seems clear that one manifestation of the thesaurus will be found in those systems that support interoperability, such as federated searching or metadata registries. Even with simple thesaurus management software, it is possible to construct a “master vocabulary” or “word bank” to support different applications within an enterprise; thereby promoting interoperability. More sophisticated software is already available (though not very widely); more will be needed and, doubtless, will be created.

A more formal answer to both questions will be found in a new standard – ISO 25964, currently being prepared on the basis of BS 8723. The two fundamental features of these two standards are (1) the thesaurus as a theoretical and practical basis for the construction of structured vocabularies for information retieval and (2) the growing and vital need for interoperability between systems and the intelligent mapping of the vocabularies used by those systems.

4. Stella, just recently at ESWC 2010, Sean Bechhofer was asked during his keynote why there are so few SKOS tools on the market. What do you think are the reasons for this? Are there still shortcomings of the SKOS specification compared to other existing thesaurus standards? (see also: http://www.eswc2010.org/program-menu/keynote-speakers/155-sean-bechhofer & http://www.slideshare.net/seanb/skos-past-present-and-future )

Regarding the speed of development, see my reply above. As to shortcomings, did you note in one of Bechhofer’s slides: “Standardisation is necessarily a compromise: Everyone equally unhappy = success!” The SKOS development team took a conscious decision to keep the schema sufficiently simple that it could be applicable to as many different types of KOS as possible.  On the downside, this means SKOS is unsatisfactory for conveying sophisticated features of some thesauri and classification schemes. But by keeping the entry barrier low, more widespread use has been encouraged.

By way of illustration, compare SKOS with the data model and XML schema of BS 8723. This schema is comparatively specialized, with the aim of enabling exchange of any thesaurus carrying any or all of the features recommended in the standard. And incidentally, this data model and schema will have some further capabilities added when published in the forthcoming standard ISO 25964. SKOS does not provide for a number of features in these standards (such as compound equivalence). But the schemas in BS 8723 and ISO 25964 are designed for thesaurus developers to share their work, rather than for easy publication on the Web, and will never have so many users or associated tools as SKOS.

So I believe that SKOS has done well to accept compromises that encourage generalisation although they might not suit some specialists. That said, I do regret one of its weaknesses in the context of mapping. Compound equivalence mappings (that is to say, where Concept A in one vocabulary maps to a combination of Concepts  B and C in another) are very commonly needed when extending a search across multiple databases, and the SKOS mapping properties do not currently allow for them. Perhaps there will be some provision in future?

5. Stella, Alan, in September ISKO UK will organise an event on “The Future of Knowledge Organisation on the Web”. “Linked Data” seems to be a promising approach to organise knowledge in large scale environments.
Could you imagine that SKOS as a small subset of semantic web specifications will play a central role in this environment since it is quite intuitively comprehensible by virtually any knowledge worker or do you rather think SKOS is too simple (or too complex)? (see also: http://poolparty.punkt.at/using-skos-as-an-interface-to-the-linked-data-cloud )

Stella: Of course SKOS will have a central role (whether or not every knowledge worker finds it as intuitive as you suppose). “Linked Data” will find even wider applicability. ISKO-UK (the organiser of the meeting in London on 14 September) has a mission not just to spread the word about both these technologies, but to build bridges between the several communities who must share their expertise and data to build more exciting applications. We’re expecting an audience of over 100 at this low-cost event.

Alan: Yes, of course, just as all the tools in the web stack will be necessary if semantic web technologies are to be effective. But it is obvious that we are dealing with complexities of a higher order than ever before. Any structured vocabulary is an “artificial language” which, while acknowledging many aspects of theoretical linguistics is forced to be pragmatic in its construction. Consequently, it would not be surprising if SKOS is seen to be “catching up”, and this became apparent in the work of BS 8723 when thesaurus models using UML were being constructed. There remains much work to be done on all fronts.

Stella Dextre Clarke is an independent consultant specializing in the design and implementation of thesauri and other knowledge organization structures. She currently leads ISO NP 25964, the project to update and revise the international standards for thesauri. Previously she was the Convenor of the Working Group which developed BS 8723. In 2006 she won the Tony Kent Strix Award for outstanding achievement in information retrieval, in recognition for her development work on IPSV (Integrated Public Sector Vocabulary), as well as on the vocabulary standards. She is a Fellow of the Chartered Institute of Library and Information Professionals.

Alan Gilchrist has been a consultant for many years in the fields of information management and information architecture, specialising in the vocabulary aspects of information retrieval. He is co-author, with Jean Aitchison and David Bawden of Thesaurus Construction and Use, now in its fourth edition. In 1979 he founded and edited the Journal of Information Science, and is now Editor Emeritus. He has an Honorary Degree (D. Litt.) from the University of Brighton and is an Honorary Fellow of the Chartered Institute of Librarians and Information Professionals.