Pascal Hitzler

What’s Wrong with Linked Data?

Earlier this year, we posted a special call for Linked Dataset descSemantic Web journalriptions, to be published in the Semantic Web journal. This kind of call and paper type is a novelty in the Semantic Web community. We did this to provide another outlet for research enabling work (as opposed to research work as such), because Linked Data is currently one of the drivers of the Semantic Web effort. However creators and curators of datasets can rarely get acknowledgement for their contributions by publishing in high-profile conference proceedings or journals.

 

We expected a very good response to this call, and indeed we received 27 submissions. Consequently, we have now made Linked Dataset descriptions a standing paper type for the Semantic Web journal, which means that these types of papers can now be submitted at any time to the journal. In addition to the submissions, we also received very encouraging communication in response to our new paper type, some researchers even reported that “the call already prompts some people to improve their datasets” (we are in no position to verify this, though).

 

Following our own policy for the journal, the calls include a very crisp formulation of the review criteria which are to be applied to papers of this type. We strongly recommend to reviewers to directly reference these criteria in their reviews. For Linked Dataset papers, the criteria are as follows.
  • Quality of the dataset
  • Usefulness (or potential usefulness) of the dataset
  • Clarity and completeness of the descriptions
When we set these criteria, we thought that they should be easy to meet. The papers are supposed to be short (recommendation is 6 pages), and the third of the criteria is really only about doing a good job when writing the paper. Assuming that the publishers of a Linked Dataset were doing a good job, we thought that there should be no “quality of the dataset” problem. Assuming that people would not go through the trouble of publishing a dataset without it being useful (or at least potentially useful), we also thought that the “usefulness” condition should be easy to meet.

 

However, we were in for a bit of a surprise. At this point in time, we have completed the first-round reviews for 26 of the 27 papers, and half of them had to be rejected due to issues with dataset quality or usefulness. For 5 of the datasets, the reviewers even indicated that “they are in fact not linked data.”

 

Clearly, our sample is not necessarily representative for all of Linked Data. For some of the most prominent datasets we have not received papers (most likely because they are already published elsewhere). However, it may not be unreasonable to take our findings as an indication that often Linked Datasets may have substantial issues with quality and usefulness.

 

We will know more after the second round of reviews. And we’re looking forward to receiving more submissions of Linked Dataset descriptions.

 

We would be more than happy if the state of Linked Data would turn out to be better than our limited sample indicates and hope that our call and the paper type will contribute to the effort of improving the quality and especially applicability of Linked Data. We are optimistic, that with more experience, best practice, and application focus, Linked Data will become more than just more data.

 

Krzysztof Janowicz, http://geog.ucsb.edu/~jano/
Editors-in-Chief, Semantic Web journal, http://www.semantic-web-journal.net/

 

Enhanced by Zemanta
Pascal Hitzler

Semantic Web and Emerging Trends in Scholarly Publishing

In my capacity as one of the Editors-in-chief of the Semantic Web journal (the other one is Krzysztof Janowicz; the journal is published by IOS Press), I was recently invited to talk about the journal at Allen Press’ Seminar Emerging Trends in Scholarly Publishing.  This seminar is an annual event which draws decision makers from the scholarly publishing industry to hear about and discuss recent developments and hot topics related to their profession. This year’s event had a session on “Semantic Enrichment”, and one on “Rethinking the Structure of Peer Review.” All presentations, including videos, are available from the Allen Press website.

The invited speaker of the “Semantic Enrichment” session was Pam Harley, Vice President, Product & Market Development of Semedica, a division of Silverchair.  Pam gave a high-level account of the possibilities and added value which comes with Semantic Enrichment, in a way suitable for the non-technical audience. I personally benefited particularly from the large variety of reasons for adopting Semantic Technologies in publishing which she presented and discussed in her talk (see also her slides).

My presentation (see also the slides) about the Semantic Web journal was part of the “Rethinking the Structure of Peer Review” session, and was focused on the open and transparent review process which we have adopted for the journal. After the presentation, throughout the event, I received ample feedback and remarks which in particular commended us for setting up a realistic improvement of the review process while avoiding radical changes which are likely to meet too much resistance from researchers. I certainly agree with this assessment. The presentation also contains a bit of information on how the journal is doing (in short: it’s doing great).

The seminar was a very enjoyable experience. In particular, it was enlightening to learn about publisher’s perspectives on scientific publishing, reviewing processes, and emerging revenue models. It was also nice to see that Semantic Web as a technology has a natural place in these discussions and is seeing more and more adoption in practice.

If you’re curious to learn more, have a look at the videos of the presentations.

[Author: Pascal Hitzler]

Pascal Hitzler

Reasonable Minutes from ISWC2010

I find it quite clearly noticeable that ontology reasoning is slowly making its way into mainstream. I begin seeing more and more applications – and industry investigations – picking up ontology reasoning in a matter-of-fact way. It seems that the bickering between scientists whether ontology reasoning is needed and/or useful is simply ignored when it’s about applications. And I very much welcome this. The “why” question is no longer important. In fact, even the “how” question isn’t. It’s being used – although sometimes perhaps not in an entirely conscious way, or in a way in which traditional reasoning applications would have been set up. And I very much welcome this as well.

I’m not talking about the fact that 2 out of 3 shortlisted papers for the best paper award at ISWC2010 are reasoning papers (which continues an established trend) – the winner has not been announced yet, there’s one day of the conference still ahead. Rather, I found it noticeable that reasoning prominently popped up in the first in-use-track session (and that wasn’t artificially arranged – in fact the first session was on life sciences applications). Another, less obvious case in point was the excellent keynote given by Evan Sandhaus on how nytimes.com utilizes semantic technologies. Among other things, they used GeoNames for inferring that news from Rome are also news from Italy, and they used Freebase for equating different identifiers for entities (in this case, politicians). Both of these were not explicitly executed or identified as reasoning steps, but this is only a matter of algorithmization. Conceptually, this is ontology reasoning at its simple best: The derivation of implicit knowledge by automated deductive means is reasoning, whether you are aware of it or not.

Talking about applications – Tania Tudorache from Stanford Biomedical presented the ongoing work on ICD-11, which centrally utilizes WebProtege and OWL. I think that this work is completely underappreciated by the Semantic Web community, perhaps because they are not aware of the impact of this. The ICD classification of diseases is the world-wide manual for medical diagnostics, which means that Semantic Technologies – in a rather invisible manner, as it should be – result in something which will be used by millions of physicians world-wide in their everyday work life. That’s what I call dissemination into practice!

By the way – in the context of such trends, it strikes me as oddly outdated to hear panel comments like “OWL still needs to show its worth – what can it do what you cannot do with rules?” It’s about time we stop bickering and pushing our pet paradigms and simply make things work and improve. (And no, I didn’t bother to comment during the panel. A discussion like this is futile, and I think more and more people are realizing this now anyway.)

Another keynote, by mc schraefel, very nicely also put applications into perspective. And highlighted some of the shortcomings of the currently hyped Linked Data. (Don’t get me wrong – Linked Data is extremely necessary for the Semantic Web on several accounts, but there are indeed lot of issues with it which we need to face.) Interestingly, the reactions I heard were mixed – but in an unexpected way. On the one hand, there was wide positive reaction that this was an excellent keynote with a very important message (which is also my take). On the other hand, I heard voices saying that we already know these and other problems with Linked Data, so there wasn’t really any useful content in the talk. I’m rather happy, though, that I didn’t hear anybody disagree with the general message.

Another very notable presentation, as part of the Semantic Web Challenge, was by Deborah McGuinness, on the data.gov work at RPI. The scope of dissemination is simply impressive, and another milestone in the making of the Semantic Web.

What else? The Semantic Web journal‘s first Editorial Board meeting took place at the conference (the first issue will be out shortly). My showcase volume of our book was not stolen this time. And there were a considerable number of very interesting-looking papers in the reasoning sessions – all of which I regretfully missed because I was tied up in parallel events. I’m looking forward to reading the papers, though.

On the culinary side, I have to say that I was a bit disappointed. Actually, the food was very good, but at previous conferences I’ve visited in China, it was much more exotic (from a European perspective, anyway) – perhaps the reason for this was that these other events I’ve been to were mainly Chinese, with only a few international guests. And, certainly, the cuisine was not at all as bad as the internet connection at the conference center. But we’ve already become very accustomed to having Semantic Web conferences with too little bandwidth, so it’s kind of expected anyway.

[Author: Pascal Hitzler]

Pascal Hitzler

The Semantic Web journal – half a year later

SWJ-logo The journal “Semantic Web – Interoperability, Usability, Applicability” – in short: the Semantic Web journal – was launched 7 months ago, sporting a transparent open review process. Pascal Hitzler is one of the Editors-in-Chief (the other one is Krzysztof Janowicz). He answers some questions on the motivation, setup, and future plans of the journal. (Pascal also wrote the questions and this intro, so it’s really a fake interview. But it seemed an appropriate literary form …)

Question: Why did you launch yet another journal on Semantic Web?

Hitzler: Because the community is growing and the need for publication outlets grows with it. I heard the objection that there weren’t enough quality papers for all the journals, but I don’t think so. It’s just that most of the quality papers still end up in journals which are not dedicated to the Semantic Web as such.

Personally, my desire to start a new journal began when I wanted to do a special issue on Semantic Web reasoning in some other, established, journal, and the Editors-in-Chief basically replied with a lapidary “Is there anything to report?” I didn’t push the case back then (though I probably should have). But this and similar experiences made me think about scientific publishing from a different angle, a normative one: What should scientific publishing in our field look like? The journal gives me a possibility to realize some of my answers – or at least to go a few steps into the right direction. So when the opportunity arose to set up this journal with a well-known publishing house (IOS Press) and with a co-Editor-in-Chief (Krzysztof Janowicz, a strong proponent of open and transparent reviewing) who I knew would also put a maximum of energy into the venture, it was simply too good an opportunity to let it pass. However I also realize that the reality of scientific publishing can change only slowly, and that it needs time and gradual improvements. We can’t do it all at once.

Question: Your journal uses an open review process. What is that and why?

Hitzler: Open reviewing, in the sense we use it for the Semantic Web journal, is all about transparency. Submitted papers are made publicly available. Solicited reviews are made publicly available. Anybody else can additionally contribute a public review. Reviewers are publicly known by name. Discussions between reviewers and authors can (and should) happen in public. Reviewers and editors are acknowledged by name in the published versions of the papers.

The obvious reason for setting up an open review process is to improve the quality of the decision-making process. We have to realize that some persisting habits about reviewing have their origin in times when scientific publishing was made for a small expert audience, and had to be conducted by sending manuscripts and letters by conventional mail. Today, however, reviewing and publishing is inflationary, which substantially reduces the quality of the typical paper – and of the typical review. While we cannot simply reverse this trend, we can take advantage of the World Wide Web to counteract these developments and improve quality by bringing the review process out into the public space. Reviewers will put more effort into providing constructive reviews if they publicly sign their reviews. Open and public discussions on controversial submissions minimize errors in the decision making.

Personally, I also hope that the ensuing discussions will help to bring back a scientific tradition which has long been on the decline in our field: controversial but constructive discussion. Regretfully, these days we somehow tend to mainly present incremental results, bash opposing opinions, and sugarcoat our own …

Question: Past attempts to set up open reviewing for journals have failed …

Hitzler: Yes, I remember seeing some of these early attempts many years ago when I was a PhD student. Even back then I was doubtful if the sometimes rather radical setups had a chance. In the meantime, there is growing experience in other fields that open reviews can work out if set up carefully. In our case, we mix old-style with open, by still soliciting reviews, and by giving solicited reviewers the option to stay anonymous, if they see a need for this protection. We Editors-in-Chief also “steer” the journal in the sense that we have rather clear strategic targets, e.g. in terms of scope and quality, which we’re trying to meet. In short: rather than experimenting with radical changes, we mildly introduce a new but essential component – open reviewing – in a traditional scientific publishing process. That way, it will work.

Question: But isn’t anonymous reviewing necessary to protect the reviewers and in order to get objectively critical reviews?

Hitzler: Sometimes. That’s why it’s good that solicited reviewers can opt to stay anonymous. Open reviewing – like any form of assessment in science – isn’t perfect, and has its drawbacks. However, the current reality in Computer Science is that reviewing processes are often extremely poor and decision processes are not very transparent. For conferences, reviewer discussions and rebuttal phases were introduced some time ago to improve the decision making. Open reviews simply go a step further.

Question: Aren’t potential authors afraid of getting a public bashing in the review process?

Hitzler: Reviewers typically won’t bash if they sign with their name. And in fact, we monitor the reviews in order to make sure that they adhere to a certain minimal scientific standard. At the same time, it’s probably just as well if our public process makes people more reluctant to submit papers which are not yet mature enough for publication. We wouldn’t want to publish them anyway. And in order to protect authors of rejected submissions, we actually remove the corresponding papers and reviews from the website after some time.

While I understand that some people may be more reluctant to put their work out in the open before it’s been accepted through a review process, we have to be aware that many quality journal publications, like the ones we’re striving for, are extended versions of high-quality conference publications: so they have indeed already been through a review process. Furthermore, submitting to our journal gives added visibility for the work, since it’s up for public review on our website.

Question: Your journal also publishes papers which are not standard research papers. Aren’t you compromising scientific rigor by doing this?

Hitzler: Times are changing. The prime purpose of a scientific journal is to disseminate results to other researchers, and to do so through a quality filter. Traditionally, this dissemination was restricted to focused research contributions, targeted at other researchers working in the same narrow area as the author(s). Semantic Web as a field, however, is extremely diverse and comprises researchers and practitioners from many other communities. Consequently, high-quality tools, systems, ontologies, introductory surveys and application reports are very much needed for the dissemination of advances in our field to all interested parties. As for research papers, the role of the journal for these other types of papers is primarily quality assurance. And consequently, we have clearly formulated the evaluation criteria for different types of papers. A report on a high impact tool, for example, is thus not a direct research contribution in the traditional sense. But if the tool enables further developments in the field, then it is worth reporting, and it indirectly makes a contribution to scientific progress.

Question: Why are you still publishing through a commercial publishing house?

Hitzler: Because it helps. A lot. It’s easy to underestimate the amount of work which needs to be put into running a journal, and going with a commercial publisher rids the Editors-in-Chief and the Editorial Board from a lot of tasks which are not directly related with quality assurance. Open review does not mean that this kind of professional support is no longer needed. And we are glad that we have found a publishing house which is very accommodating to our ideas.

Question: What are plans for the immediate future?

Hitzler: We currently have more than 30 papers up for review, most of them responses to two recent calls, one on tools and systems papers, and one on applications of OWL – and some of the submissions seem rather prominent. We also have several special issues lined up, most of them have not been announced yet. The first issue will appear towards the end of the year and contain vision statements by the EB members – we do not normally publish vision statements, but this seemed an appropriate way to introduce the journal. Considering that the journal has been launched only 7 months ago, this means that we are already very well under way in pursuing our goal of establishing a high-quality scientific outlet in the field.

[author: Pascal Hitzler]