The Semantic Puzzle

Jana Herwig

Semantic Desktop, Lifting and Human Language Technology [WOD-PD]

The next session at WOD-PD was given by Leo Sauermann (German Research Center for Artificial IntelligenceArtificial intelligence (AI) is the intelligence of machines and the branch of computer science that aims to create it. Textbooks define the field as "the study and design of intelligent agents," DFKI, Germany), and Brian Davis (DERI GalwayCentre for Science, Engineering and Technology (CSET) established in 2003 with funding from the Science Foundation Ireland. The vision of the Digital Enterprise Research Institute is to be recognised as one of the leading international web science research institutes interlinking technologies, ..., Ireland). Leo introduced the idea of the Semantic Desktop, and more specifically, the Nepomuk Social Semantic Desktop. There’s good article about Nepomuk on Linux.com, written by Bruce Byfield on August 26, 2008, from which I quote the following, enlightening passages:

Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk’s coordinator, explains, “The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate.” […] “The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information.”

At a high level of generalization, Nepomuk has three main aspects, according to Bernardi. First, there is a standard framework for annotating pieces of information so that connections can be made between them. Second, there are ontologies, the sets of “documented shared understanding” or common concepts that can be defined for particular types of information, such as bio-science or computer desktop use. Finally, there are the tools for making or using the annotations and ontologies, what Bernardi calls the “workspaces that connect to other workspaces and help you in your day to day activities of collecting information, structuring it, making sense of it, and creating new information and communicating it.”

Leo has provided the relevant download links for those who “want to get their hands dirty” with Nepomuk (as he put it) on his blog. Leo Sauermann and Ansgar Bernardi also contributed an article about the Semantic Desktop, the Semantic Desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data is more easily shared between different applications or tasks and so that data that once could not be automatically processed by a computer could be. ... to the recently published Social Semantic Web volume – a preview of the article is available here (in German – I’m sorry!).

Brian Davis‘ part of the talk focused on Lifting and Human Language Technology (HLT) for the Semantic Desktop – Semantic Lifting means to capture semantics and translate them into ontologies. Human language technology (HLT), in its broadest sense, can be described as computational methods for processing and manipulating language (for instance text analysis).

One of the goals of the Semantic Desktop is speech act detection for email – speech act here as defined by John Searle. At its most basic definition, a speech act is simply an utterance, but is also often understood more specifically as an illocutionary act (which is a term introduced by John L. Austin in How to do things with words), or a ‘performative utterance’, meaning that by saying something, one actually does something. For instance, the sentence “Please have the document ready for Workshop 1.” contains an instruction: It informs the reader about the requirements for a particular event, and asks him or her to meet these requirements.

Brian also introduced Roundtrip Ontology Authoring (ROA), which is a process that allows non-expert users to author or amend an ontology, an ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain. In theory, an ontology is a "formal, explicit ... by using simple, easy to learn, controlled natural language. The process is a combination of Controlled Language for Information ExtractionInformation extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information from unstructured machine-readable documents, generally human language texts by means of natural language processing (NLP). Due to the difficulty of the problem, current ... (CLIE) and Text Generation which is developed on top of GATE. ROA is documented on the the Nepomuk website; for further information about CLIE, read this article by Valentin Tablan, Tamara Polajnar, Hamish Cunningham and Kalina Bontcheva: User-friendly ontology authoring using a controlled language (PDF, 64 KB).

Reblog this post [with Zemanta]