URIs as Names

One of the things that is confusing for those new to the semantic web—either in its full-blown RDF guise, or in other contexts like microformats—is URIs, which are often used simultaneously to identify things, and to locate them. I still find myself a bit confused by this, though the fog is lifting.

Norm Walsh has a nice, clear, overview of the issue, and his conclusion is:

Time and again, we see individuals and organizations inventing new URI schemes in order to tackle the problem of “names” versus “addresses”. That is, they want to provide some sort of a globally unique identifier for “This Thing” independent of where representations of that thing might reside. Almost inevitably, these individuals and organizations fall into the trap of thinking that an “http” URI is somehow an address and not a name and is, therefore, inappropriate for their purpose. They are mistaken. I used to believe this too and I was wrong. A new URI scheme is not necessary, nor does it actually solve the problem.

This is an interesting issue for me, as I’ve found myself using URNs and INFO URIs to represent standard identifiers in my bibliographic data.

I consider the need for a common idenfier infastructure critical enough that I’ve argued it ought to be a requirement in the new OpenDocument metadata support that we use URIs for identification; always. I think that Microsoft is not using URIs to identify bibliographic records in Word 2007 is a short-sighted mistake.

But we are then left with what is in effect a social question: which URIs to use?

For bibliographic metadata, this becomes somewhat complicated in the face of a myriad of differerent identifers, controlling authorities, and uris.

Take a book, for example, which I asked Norm about in comments. I typically use an ISBN as a widely-adopted, reasonably robust, identifier. They are far from perfect (sometimes an ISBN is in fact not unique, and it refers to physical manifestations that are somewhat different conceptually than the somewhat more abstract things that scholars cite), but much better than a lot of alternatives.

But there are many ways to encode ISBNs as URIs. Let’s take my book as an example, with an ISBN of 0415948738. The following are all perfectly valid ways to represent this as a URI:

  1. urn:isbn:0415948738
  2. http://worldcat.org/wcpa/isbn/0415948738
  3. info:isbn/0415948738
  4. http://www.amazon.com/gp/product/0415948738
  5. http://isbn.nu/0415948738

As you can imagine, the list could be much longer, particularly if you include redirect URLs as options.

Aside: you would think the Library of Congress would have a smart URI system like WorldCat, but I ‘ve not managed to find it.

So how does one decide which to use? I need ids that are stable and unique, and which I have some confidence others out there in the world of bibliographic metadata—in particular applications developers—might also use or understand.

My solution has been to use URNs; the first option above. I really don’t care if it resolves to some URL or not, and it’s easy enough to parse and then use to grab records from different locations. I also use URNs for encoding links to periodicals for the same reason: they provide a convenient abstraction.

I still think this decision is right, though Norm’s post and the subsequent comments have me scratching my head again.

For example, DOIs are even better identifiers than ISBNs. But DOIs aren’t registered as URNs, so that’s no good. I’m then left with either using the info URI scheme (as above, but using the doi prefix), or the HTTP address for a resolver.

Mind you, I have no doubt that any kind of 21st century metadata interoperablity depends on the use of URIs for identification. But there’s no denying there are social issues involved in getting agreement on which names to use.

Suffice it to say I’ll need to clarify this all, particularly as we move forward on the ODF metadata work.

Update: I mentioned in comments the possibility of using an OCLC number for books. On first glance, that might actually be better. The URI for my book, then is http://www.worldcat.org/oclc/60500684. This has one obvious advantage over an ISBN in that the same identifier applies to both the hardcover and the softcover.

8 Comments

  1. Rich says:

    Using a HTTP DOI gives you a way to fetch a representation from a resource. OTOH, you might be actually referring to that representation, which is not so good!

    The tradeoff is between precision & independence (URN) and dereferenceability (HTTP). In the case of the ISBN, there are so many known ways of dereferencing that I think the URN is the best choice.

  2. Hamish Harvey says:

    Hi Bruce,

    A thought, which I think belongs here rather than over at Norm’s weblog. You suggest there that, “ISBNs may be problematic, but they’re certainly much more useful than the old BiBTeX methods: doe99.” Surely not; they are simply differently useful.

    A free form, local identifier of the latter form allows me to insert citations very easily without any fancy user interface support. Names and years are things that humans are good at remembering; 6 digit numbers or letter/digit combinations aren’t.

    ISBNs on the other hand are useful (though far from perfect) in providing an externally agreed identifier for a book.

    In general, there will be room for multiple identifiers for the same cited thing, all of which have different properties. Some may have sets of properties which make them appropriate for use as a URI denoting the thing; others may simply be associated with it. I’d see both user-defined keys and ISBNs in that latter category:

    _:book bib:isbn “123456″ . _:book bib:key “doe99″ .

    You could of course make either of these into a URI and use it to denote the book directly, but keys are only locally meaninful, and ISBNs are reused, so neither choice would seem ideal.

    Having the ISBN there explicitly (rather than embedded in an “opaque” identifier) also of course means you can build any of the URIs you mention if you want to, and different URIs as resolution services (intended or accidental) come and go.

    Of course an RDF or OWL schema need not (probably cannot?) mandate the use of an explicit identifier at all, though extra-RDF processing machinery can make it necessary. Use of RDF does mandate that any identifier used must be an URI.

    Incidentally, what might Amazon do about the reuse of ISBNs when generating ASINs? On what timescale does it occur?

    Cheers, Hamish

  3. Bruce D'Arcus says:

    Hamish, you’re right, but how common is it to reissue ISBNs? I’ve never run into it.

    Another alternative might be something like an OCLC assession number, which can be represented as an INFO URI.

    My concern here is really citation linking. If I add a citation to my Word 2007 or OpenOffice 3.0 document, how it it identified internally? In my DocBook source these days, I tend to use urn:isbn, info:doi and http (for web stuff proper).

    We discussed some of this on the OOoBib dev list awhile back.

  4. Thom Hickey says:

    Actually ISBN reuse/misuse happens enough so that librarians run into it fairly regularly. One of the worst examples is 0123456789 that probably gets put in as a place holder and then ends up on the final item.

  5. Bruce D'Arcus says:

    OK, Thom, you convinced me.

    So then, I know you work for OCLC and so are not entirely neutral on these question but it seems your ids are better alternatives than ISBNs? If yes, should I recommend info:oclc/… or http://www.worldcat.org/oclc/…? And how do your ids compare to, say, LoC LCCN numbers?

    Would be exceedingly useful to be able to ping a web server with an ISBN and get a nice XML (better yet RDF/XML) representation that included an OCLC and maybe work identifier ;-)

  6. If you want to link to an isbn in the library of congress catalogue you can use this syntax: http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?db=local&SearchArg=ISBNHERE&SearchCode=STNO&CNT=25&HIST=1 eg http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?db=local&SearchArg=0415948738&SearchCode=STNO&CNT=25&HIST=1 Not very intuative I know, and there is not garentee they will continue supporting it if they upgrade their system etc..

    You can also link by bib id, which i’m guessing is similar to the OCLC number. the problem is of course that this number is different for every library…

    http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?bbid=13975008

  7. Bruce D'Arcus says:

    James - right, exactly my point. I would never rely on the existing LoC URIs to identify anything.

    What I want to see from the LoC is smart (REST-friendly), conconsistent, stable URIs, just like the new worldcat.org site. http://catalog.loc.gov/isbn/23431254.

  8. [...] This deserves it’s own post. Stu Wiebel (from OCLC and DC) has a whole series of responses to my post on URIs as Names. Oops, don’t forget this and this! [...]


Creative Commons License Creative Commons License