The Semantic Puzzle

Matthias Samwald

Packing my bags for VoCamp Oxford

(by Matthias Samwald)

I am packing my bags once again: The first VoCamp (hosted at Oxford University, UK) is about to start this week. So, what is a VoCamp supposed to be? The official definition reads like this: “A VoCamp is a series (hopefully) of informal events where people can spend some dedicated time creating lightweight vocabularies/ontologies for the Semantic Web/Web of Data. The emphasis of the event(s) is not on creating the perfect ontology, an ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain. In theory, an ontology is a "formal, explicit ... in a particular domain, but on creating vocabs that are good enough for people to start using for publishing data on the Web.”

I always thought that the lack widely established vocabularies/ontologies has been very damaging to the developent of the Semantic Web. The VoCamp initiative could help changing this situation for the better, so I really hope that this is the start of a long series of events.

My topics of main interest are: 1) Associative Tags; 2) Agreement, Disagreement, discourse; 3) Corporate Semantic Web, 4) “Are upper level ontologies/vocabularies not so bad after all?”, 5) “ Cleaner schemas and ontologies”. These interests are motivated partly by use-cases from the “KiWi – Knowledge in a Wiki” EU project, and partly by developments in the area of biomedical research at DERI Galway and the W3C Interest Group for Health Care and Life Science. Details below.

__Associative Tags__

Tagging is one of the key components of the ‘Web 2.0′, and Semantic Web technologies will help to make tagging even more powerful. Schemas such as SCOTThe Schools Online Thesaurus (ScOT) provides a controlled vocabulary of terms used in Australian and New Zealand schools. It encompasses all subject areas as well as terms describing educational and administrative processes. The thesaurus links non-preferred terms to curriculum terms. It also ... or MOATLightweight ontology to represent how different meanings (i.e. URIs of Semantic Web resources) can be related to a tag. Defines local meanings and global meanings for each tag. Global meanings represent the list of all meanings that could be related to a tag in a complete folksonomy space (e.g. ... have already been established, and make it possible to ‘tag’ not only with simple strings, but with entities. These entities (such as concepts described in SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ...) can be associated with clear semantics and can be further described with RDF statements, to describe hierarchies of entities, or to link entities to rich data sources such as DBpediaDBpedia is a project aiming to extract structured information from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, .... This enables sophisticated data-integration and cross-data source queries that would not have been able with simple, string-based tags.

On the other hand, Semantic Web developers can learn from the simplicity that has made tagging so successful. Creating useful tags is very simple, and good user interfaces can further improve the simplicity of creating useful tag with feature such as autocompletion and tag recommendation. This simplicity should server as a role model for many Semantic Web applications.

Specifically, I am interested in what I call ‘associative tags’, bundles of tags/entities/concepts that can be used for the simple representation of facts. The primary intention of creating aTags is not the categorization of the document, but the representation of the key facts inside the document. Key facts in the biomedical domain might be, for example,

“Protein A interacts with protein B” (which can be represented with an aTag comprising of the three entities “Protein A”, “Molecular interaction” and “Protein B”) or

“Overexpression of protein A in tissue B is the cause of disease C” (an aTag comprising of the four entities “Overexpression”, “Protein A”, “Tissue B” and “Disease C”).

Once the aTags from these different sources are aggregated, it is possible to pose a query such as “show me molecules that are associated with molecules that are associated with disease C”, yielding “protein A” as an answer. Hierachies (in the form of rdfs:subClassOf and skos:narrower) can be used to expand queries based on background knowledge (e.g., that “disease D” is a subclass of “disease C”).

In many cases (especially with some ontologies in the biomedical domain), creating such associative tags can be much simpler than the creation of ‘real’ statements, i.e., relations between individuals and property restrictions of classes.

__Agreement, Disagreement, discourse__

Many people in the Semantic Web community are interested in the representation of argumentation structures on the web. For example: stating that one snippet of text contains statements that are in disagreement with another snippet of text, which is in agreement with yet another snippet of text. This can be of use for many knowledge domains, such as news articles, biomedical publications or reports submitted to a software bug tracker. Of special interest in this context are extensions of established schemas, especially SIOC. There is also another ontology called SWAN that is specifically tailored to the biomedical domain, and efforts to align SWAN with SIOC have started recently.

__Corporate Semantic Web__

As Semantic Web technologies are finally getting mature enough to allow industrial uptake, it is becoming clear that ontologies for describing organization structures and business processes are still lacking maturity. FOAFhttp://www.foaf-project.org/ allows us to represent basic information about persons, organizations and their relationships, but lacks vocabulary for stating that one person is the boss of another person, that a project consists of several subtasks, et cetera. While there are some small projects that try to create such schemas/ontologies, a solution of widespread acceptance does not seem to be in sight at the moment.

__Are upper level ontologies/vocabularies not so bad after all?__

FOAF seemingly tried it a long time ago – foaf:Person is a subclass of, “http://xmlns.com/wordnet/1.6/Person”, foaf:Document “http://xmlns.com/wordnet/1.6/Document” and so on. Linking to external schemas/ontologies (or making use of their classes and properties directly) can definitly help in facilitating semantic interoperability. For a long time, many web developers were very skeptical about such ‘top-down’ approaches of data integration, but recently the recognition of the potential values of such resources seems to be increasing. In parallel, the recent 1-2 years brought us some very large upper ontologies that are available as linked data, such as:

  • Wordnet 2.0, hosted by the W3CThe World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web (abbreviated WWW or W3). Founded and headed by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together in the ...
  • Yago/DBpedia
  • OpenCycOpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine. (http://www.cyc.com/cyc/opencyc) (now with new URIs)
  • UMBEL (derived from OpenCyc and others).

I think the practice of re-using and linking to such upper ontologies as should become popular (again). It helps in creating a highly interlinked Semantic Web, and helps to avoid re-inventing the wheel for each new schema/ontology. This linking should not be done post-hoc, but should be a central part of the early stages of vocabulary/ontology/data creation.

__Cleaner schemas and ontologies__

Working with established ontologies and schemas in ontology editors can be a chore. Most have dependencies on other ontologies, but don’t use owl:imports. Most use an awkward mix of OWL statements and RDF(S), resulting in ontologies that are OWL Full. Many require some OWL reasoning to make use of sameAsService that helps to find co-references between different data sets. (http://sameas.org) statements and inverse properties, but at the same time reasoning is complicated because the ontologies are OWL Full or even contain logical inconsistencies. Often enough, there seems to be no practical reason for the design choices that caused the trouble: some minor changes can turn a messy OWL Full ontology into an OWL lite or OWL DLDescription logic (DL) is a family of formal knowledge representation languages. It is more expressive than propositional logic but has more efficient decision problems than first-order predicate logic. DL is used in Artificial Intelligence for formal reasoning on the concepts of an application ... ontology. At the moment, many different working groups have created local versions of schemas such as FOAF or Dublin CoreSpecification of all metadata terms maintained by the Dublin Core Metadata Initiative, including properties, vocabulary encoding schemes, syntax encoding schemes, and classes. (http://dublincore.org/documents/dcmi-terms/) that are valid OWL-DL to fix that problem.

It doesn’t have to be this way.

Trying to adhere to OWL lite/DL and adding owl:imports statements can help building cleaner, modular and more sustainable ontologies, and does not require significant additional effort during the creation of ontologies. Maybe we can find a consensus that this would be a worthwhile goal, and develop plans towards reaching that goal.

3 thoughts on “Packing my bags for VoCamp Oxford

  1. FYI (as it might be useful for your aTag ontology modeling) SIOC defines an aggregatedTag class.
    Ayway, I’ll be interested in seeing how your aTags fits with MOAT (a tags = a set of URIs combined together) to provide even more Linked Data (eg: inference rules to say that an item linked to an aTag = an item linked to each URI separately)

  2. Pingback: ‘Exploding the Domain‘ in Context » AI3:::Adaptive Information

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>