2012-05-23

DAIA & describing organizations, services, collections in RDF

In January Jakob Voß published the request for comments for the final DAIA specification. DAIA - Jakob writes -
"is more than an implementation: it provides both, an abstract standard and bindings to several data languages (XML, JSON, and RDF). The conceptual DAIA data model defines some basic concepts and relationships (document, items, organisations, locations, services, availabilities, limitations…) independent from whether they are expressed in XML elements, attributes, RDF properties, classes, or any other data structuring method."
As I have put some thought into the RDF-description of libraries, their services, collections, locations etc. for my master thesis (PDF, German) and as I will be holding a talk on this topic - together with Jakob - at this year's Bibliothekartag, I finally submit this comment on DAIA.

DAIA - Why and how?

Why was DAIA developed in the first place? As Jakob describes it in a post from 2009 DAIA tries to provide an API specification that's missing in the standard APIs for libraries, an
"open, usable standard way just to query whether a copy of given publication – for instance book – is available in a library, in which department, whether you can loan it or only use it in the library, whether you can directly get it online, or how long it will probably take until it is available again."
Thus, the" Document Availability Information API (DAIA) defines a data model with serializations in JSON, XML, and RDF to encode and query information about the current availability of documents" (from the English introduction). That is a very important standard and I hope DAIA will become widely adopted by libraries and system providers. I hope that my comments may help improving DAIA.

The Data Model

Here, I will focus on the underlying conceptual DAIA data model and on the corresponding OWL ontology. I am neither a heavy API user nor an API engineer and as such simply have no expertise on this so that I won't comment on the API specification at all.

As seen in the graphical representation of the conceptual model, DAIA's core classes are "Document", "Organization", "Service" and "Item".

I won't go into all the details here, just take a look at the specification and the ontology.

In my master thesis I took quite another approach than Jakob as my thoughts originated from the problem how to add structured data - e.g. in form of RDFa - to existing library websites.  Naturally, I didn't focus on the availability of documents, in fact I didn't go deeper into this topic but put my thoughts into describing an organisation, its site(s), collection(s) and services. I had a look at some library websites to find out what kind of information normally is displayed there and how to classify it. My general conceptual model that served as basis for a more detailled study (update and slightly reduced for this purpose as well as leaving aside the serials problem)  looks like this:


The DAIA model and my approach look quite similar in that both data models contain the core classes Document, Service, Organisation and Item. But also these models differ in at least one point: While DAIA establishes a Storage class the latter model uses the two classes of Site and Collection to express similar information.

Questions, Comments and Suggestions

After having highlighted my background, I want to proceed with actually responding to Jakob's call with some comments. I already gave some input on the W3C-LLD mailing list and might repeat myself here.

Storage

I have a problem understanding why the Storage class exists at all and whether it makes sense. If a collection is stored in closed stacks, it's of no interest to the user where an item actually resides, what's important to her, is where to get it, i.e. to know where the circulation desk is. Thus, I think one could even decrease complexity by omitting the Storage class. All users and applications care about is where to obtain an item not where it actually comes from. Thus, it's services that really count, not storage. (If an item is stored in a reading room, I would go on and classify the reading room as a service where items can be viewed locally but cannot be lent.)

Collection

Every library curates at least one collection which might comprise a number of subcollections. Often, on their web sites libraries give information about their collections by indicating the number of items in them, their mode of access, acquisition policy, contact information for a sub collection as well as storage location etc. That's why it might be useful to also make use of a collection class and to describe collections. But collection description is definitely not necessary to provide good services for users to easily obtain material they already identified as relevant for their needs. It's useful for other cases in which a researcher wants to find a number of libraries holding special collections he might be interested in. Or for libraries to find similar institutions in order to share data for improving the acquisition of new material.

DAIA namespaces

I don't get it, why the DAIA service classes get their own namespace, adding a "Service/" to the DAIA namespace, e.g. for loan . This isn't common practice and is really annoying. When you are writing turtle you either have to declare an extra prefix for the Service namespace or you have to write the whole URI as slashes aren't allowed after a prefix. (At least rapper doesn't like it.)

Minor comments
  • An update of the graph model is needed as the dct:spatial property is displayed to link an Item to a Storage. That is neither mentioned anywhere in the ontology nor a correct use of this property.
  • To be able to have a more concrete discussion I provide an example RDF description in a seperate post.