Combining Closed and Open Data Classification Mechanisms in an Extended Thesaurus
In the next session, Rolf Sint gave us insights into his approach to the combination of closed and open data classification mechanisms, which is informed by his findings in his master’s thesis. The probably most widely used retrieval method for digital content is full-text search; Google and Yahoo’s indexing methods, for instance, rely on full-text search. To be able to use this method, words must be contained within the content, leading to obvious problems with synonyms, ambiguities or the different lexical inventory of different languages. Advantages are that full-text search is easy to use, and that no maintenance is required as this responsibility rests with the content providers.
On the other end of the spectrum, within open data classification mechanisms, we have social tagging. Tagging (in general) means that a user asigns labels to content items. The advantage here is that content is immediately classified; as such, tagging is an easy way to provide metadata for content, in particular as the user does not to have think about (arbitrary, system-dictated) structures. However, this leads to problems if singulars and plurals are used simultaneously, if synonyms are used, spelling mistakes occur etc etc. With tags, the exact same spelling has to be used if items are to be assigned to the same group. But if done collectively (and that is what social tagging is about), the wisdom of crowds can improve the signal to noise ratio significantly – see the miracle of the tag cloud.
What Rolf proposed in his thesis was to combine the two approaches. In his design, he used an extended thesaurus as an instrument to achieve vocabulary control – we’re looking at an extended thesaurus here, because it’s not simply built around a taxonomy, but expanded by tags that were assigned by users and integrated using a vocabulary management tool.

This extended thesaurus can be applied in multiple ways. (more…)
Sphere: Related Content

