Now here’s what I’m talking about

Hamish Harvey has an interesting post on RDF and bibliographic metadata, complete with an example.

Let’s take a look:

    <biblio:Publication rdf:nodeID="oshinfDAbbott">
        <dc:title>Discussion of The relevance of Open Source to Hydroinformatics by Hamish Harvey and Dawei Han</dc:title>
        <biblio:author rdf:nodeID="abbott" />
        <biblio:availableFrom rdf:resource="http://www.iwaponline.com/jh/004/jh0040219.htm" />
        <biblio:partOf rdf:nodeID="jhinf5.3" />
        <biblio:startPage rdf:datatype="http://www.w3.org/2001/XMLSchema#int">203</biblio:startPage>
        <biblio:endPage rdf:datatype="http://www.w3.org/2001/XMLSchema#int">206</biblio:endPage>
        <!-- Something more specific that "cites" (such as "is discussion
        of") would be better here. Just imagine that publishers could be persuaded
        to generate this information. Then when you find an interesting article
        you could easily search for later discussion! Then again, publishers could
        easily add a forward hyperlink from an article to later discussion now, but
        don't necessarily do so. -->
        <biblio:cites rdf:nodeID="oshinf" />
    </biblio:Publication>

I like this, but do think Hamish could draw more on DC, use vCard for names (as he suggests himself), and with respect to page ranges:

  1. wrap them in the “partOf” element where they belong
  2. make them more generic (for legal scholars or film scholars, who might not only deal with pages)

The “cites” element, it seems to me, can be captured in MODS with the relatedItem “isReferencedBy” structure. DocBook has something similar. Doesn’t DC as well?

Regarding MODS, he observes:

This close relationship with MARC appears to carry over a lot of cruft, and I fear that MODS as a result is just too confusing to see widespread adoption in personal tools.

There is indeed a lot that is unnecessary for personal bibliographic management in MODS. Still, it seems to me better to build around something that offers too much, than too little. The key structures in MODS could easily be implemented in a nice clean RDF model and/or XML Schema:

  1. name with role (broader than just “author”)
  2. titleInfo
  3. originInfo (broader than “publisher”)
  4. location (for online and archival holding information)
  5. relatedItem (absolutely key for representing hierarchy)
  6. part (subelement of relatedItem, can capture all manner of details associated with parts: volume and issue numbers, but also paragraphs, sections, time segments, etc.)

A good RDF representation of bibliographic metadata might thus start with DC, and draw in structures from MODS where needed. I think Hamish is on the right track.

I’ve started to put together a document that shows the structural relationships that define citation formatting. I’ll add to it over time.

My argument is that the structure of citations closely relates to the structure of MODS metadata (which itself is influenced by DC).

Note: you need a good CSS browser to view this no doubt.

Hamish also mentioned the tricky name issue:

Just here is an example of the above mentioned conflict. Instead of author names as strings, I record a reference to a node representing that author, attached to which are details of the authors name. When I need to generate a reference list, how can I ensure that the author’s name is rendered correctly in each instance? Somehow with the link from publication to author needs to be recorded the way the name was listed on the publication itself (Hamish Harvey, or H. Harvey?)

I don’t think this is quite right. MODS has both parsed names allowing family, given, and terms of address (president, etc.). It also has a displayForm element, which could be used to handle full unparsed names, e.g.:

John Q. R. Smith, III

The trick is that this field ought to store the full name not as it is rendered in some journal style (which often bizarrely mangle names for absolutely no reason), but rather how the author themselves intent it to represented.

The job of handling hugely variable abbreviation for output formatting is really for the formatting engine and specification; not the metadata. If I record an article from one journal and cite it in another with a different style, the spec for the latter is what determines how to render the name.

2 Comments

  1. MishMash says:

    RDF bibliographies, take 2 A quick response from Bruce, so lets try to get another cycle through before the end of the weekend. More jumbled thought follows, I hope it makes some sort of sense. This will now really will have to stagnate for

  2. Bennett says:

    A close mouth catches no flies. (c)


Creative Commons License Creative Commons License