Thomas Thurner

American Physical Society Taxonomy – Case Study

image_jb

Joseph A Busch

Taxonomy Strategies has been working with the American Physical Society (APS) to develop a new faceted classification scheme.

The proposed scheme includes several discrete sets of categories called facets whose values can be combined to express concepts such as existing Physics and Astronomy Classification Scheme (PACS) codes, as well as new concepts that have not yet emerged, or have been difficult to express with the existing PACS.

PACS codes formed a single-hierarchy classification scheme, designed to assign the “one best” category that an item will be classified under. Classification schemes come from the need to physically locate objects in one dimension, for example in a library where a book will be shelved in one and only one location, among an ordered set of other books. Traditional journal tables of contents similarly place each article in a given issue in a specific location among an ordered set of other articles, certainly a necessary constraint with paper journals and still useful online as a comfortable and familiar context for readers.

However, the real world of concepts is multi-dimensional. In collapsing to one dimension, a classification scheme makes essentially arbitrary choices that have the effect of placing some related items close together while leaving other related items in very distant bins. It also has the effect of repeating the terms associated with the last dimension in many different contexts, leading to an appearance of significant redundancy and complexity in locating terms.

A faceted taxonomy attempts to identify each stand-alone concept through the term or terms commonly associated with it, and have it mean the same thing whenever used. Hierarchy in a taxonomy is useful to group related terms together; however the intention is not to attempt to identify an item such as an article or book by a single concept, but rather to assign multiple concepts to represent the meaning. In that way, related items can be closely associated along multiple dimensions corresponding to each assigned concept. Where previously a single PACS code was used to indicate the research area, now two, three, or more of the new concepts may be needed (although often a single new concept will be sufficient). This requires a different mindset and approach in applying the new taxonomy to the way APS has been accustomed to working with PACS; however it also enables significant new capabilities for publishing and working with all types of content including articles, papers and websites.

To build and maintain the faceted taxonomy, APS has acquired the PoolParty taxonomy management tool. PoolParty will enable APS editorial staff to create, retrieve, update and delete taxonomy term records. The tool will support the various thesaurus, knowledge organization system and ontology standards for concepts, relationships, alternate terms etc. It will also provide methods for:

  • Associating taxonomy terms with content items, and storing that association in a content index record.
  • Automated indexing to suggest taxonomy terms that should be associated with content items, and text mining to suggest terms to potentially be added to the taxonomy.
  • Integrating taxonomy term look-up, browse and navigation in a selection user interface that, for example, authors and the general public could use.
  • Implementing a feedback user interface allowing authors and the general public to suggest terms, record the source of the suggestion, and inform the user on the disposition of their suggestion.

Arthur Smith, project manager for the new APS taxonomy notes “PoolParty allows our subject matter experts to immediately visualize the layout of the taxonomy, to add new concepts, suggest alternatives, and to map out the relationships and mappings to other concept schemes that we need. While our project is still in an early stage, the software tool is already proving very useful.”

About

Taxonomy Strategies (www.taxonomystrategies.com) is an information management consultancy that specializes in applying taxonomies, metadata, automatic classification, and other information retrieval technologies to the needs of business and other organizations.

The American Physical Society (www.aps.org) is a non-profit membership organization working to advance and diffuse the knowledge of physics through its outstanding research journals, scientific meetings, and education, outreach, advocacy and international activities. APS represents over 50,000 members, including physicists in academia, national laboratories and industry in the United States and throughout the world. Society offices are located in College Park, MD (Headquarters), Ridge, NY, and Washington, DC.

Enhanced by Zemanta
Andreas Blumauer

Survey on “Perception and Relevance of Controlled Vocabulary Quality Issues”

The University of Vienna (Research Group Multimedia Information Systems) and the Semantic Web Company are conducting a survey on “Perception and Relevance of Controlled Vocabulary Quality Issues”.

Image by Sean MacEntee

 

The survey is aimed at practitioners who are using or who are planning to use controlled vocabularies in their organisation. We’d be happy if you take the time to fill in the questionnaire here.

The goal of this study is to find out how developers and users of controlled vocabularies deal with quality aspects of these vocabularies. More specifically, we want to answer these questions:

  • What does vocabulary quality mean for taxonomists?
  • Given a number of possible quality issues, what is their relevance in practical settings?
  • What vocabulary usage scenarios are affected by the quality issues?

The questionnaire can be answered anonymously. Similar to our preceding
survey from last year (Do Controlled Vocabularies Matter?) we will publish the results as a scientific contribution so the community can gain a better knowledge on how to
create and use controlled vocabularies.

Andreas Blumauer

PoolParty Thesaurus Manager 3.1 with auto-population feature was presented at SemTechBiz 2012 in San Francisco

A new PoolParty Thesaurus Manager (PPT) release was presented at this year´s Semantic Technology & Business Conference in San Francisco: Version 3.1.0 is a major release offering lots of great new funcitionalities and improvements including auto-population of thesauri and linked data knowledge models.

The main new features are:

  • Autopopulation of Thesauri from DBpedia
    The Skossy functionality has been integrated into PPT. You can assign DBpedia categories to concepts and then autopopulate your thesaurus based on data from DBpedia.

  • Linked Data Based Synonym and Translation Service
    You can add labels (pref, alt, hidden) to the concepts of your thesaurus based on suggestions for synonyms and translations provided by data from DBpedia.
  • ADMS Description for Projects
    Metadata for PoolParty projects can now be published according to the Asset Description Metadata Schema (ADMS) developed by the joinup project of the European Union.

  • Windows Theme
    A new theme has been added based on the Windows GUI guidelines.

Andreas Koller from Semantic Web Company: “SemTechBiz 2012 was a great success for us, we had a lot of talks with people from various industries at our booth. Demonstrating how building knowledge models on top of linked data sources can improve text mining for example, attracted wide interest. We enjoyed the whole conference, the location and the support from the organization team.”

To get an overview over all changes made in Release 3.1.0 take a look at the Release Notes.

Thomas Schandl

Transforming spreadsheets into SKOS with Google Refine

Looking for high quality enterprise vocabularies we recently turned our attention to the Global Industry Classification Standard (GICS), which is an industry taxonomy designed to categorize any private company. It was developed by Morgan Stanley Capital International and Standard & Poor’s and is mainly used by the global financial community to aid in the investment research process.

It is available for download as .xls spreadsheet files in several languages. Of course it would be much better to have this valuable taxonomy in a standard and machine-readable format. The Simple Knowledge Organization System SKOS is a perfect fit for a taxonomy like GICS. But how to turn a spreadsheet into SKOS with minimal manual effort?

I chose to try Google Refine for this task, as recently a promising RDF extension had been released by DERI‘s Fadi Maali and Richard Cyganiak.

Google Refine is “a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases”. Previously it was known as Freebase Gridworks which is now further developed by Google since its acquisition of Metaweb.

Refine

Google Refine UI

Refine is a very useful tool to filter and consequently transform rows, colums and cells according to customizable patterns.

After applying all necessary transformations to the spreadsheet one can edit the “RDF Skeleton”, where the columns can be mapped to literals, RDF properties and RDF classes (which can be imported from their namespaces).

RDF Sekeleton

Editing the RDF Sekeleton

Once you got your valid SKOS model ready you can export it in RDF/XML or Turtle format. Then you may want to load it into an ontology editor like Protégé or a thesaurus management tool like PoolParty in order to build upon it or connect it to other knowledge models. With PoolParty the GICS taxonomy can also be utilized to tag and categorize documents, provide semantic search and facetted navigation and it can be published as Linked Data without further effort.

GICS in PoolParty screenshot

GICS loaded in PoolParty

Working with Refine and its RDF extension was easy and fun. It’s even possible to isolate and save the transformation steps done with Refine, so one can re-apply them on similar structured spreadsheets. This came in very handy as GICS is published in nine languages and as many separate, identically structured spreadsheets.