Web Services and Distributed Citation Processing

One of the ideas I stumbled on when writing CiteProc, my XSLT-based citation processor, is that citation processing can be totally decoupled from metadata storage. A simple example of this is how I processed my recently completed book: by letting Saxon query an eXist XML DB over HTTP and using the returned MODS metadata to format the citations on-the-fly.

That was great because I didn’t have to write any code but the XSLT, and it worked! But things start to get more interesting when you think beyond this fairly simple model. Consider two examples that came out of collaborations with other projects:

In the first, Matthew Dovey at Oxford put together a simple web service that takes four parameters: document url, data store type (eXist’s XQuery-over-HTTP or SRU), data store url, and citation style. Here’s an example, where the document is on one server, and the bibliographic metadata is stored on another.

The second example is similar, and a demo is included in the CiteProc release archive. If you run the docbook-test-sru-refbase.xml example with the refbase-xhtml.xsl stylesheet, the processor (Saxon for now) will extract the citations, construct an SRU query, which it issues to a test server in Germany somewhere, returning the corresponding MODS records and formatting them, once again, on-the-fly.

OK, this is starting to look very cool, and very useful. All of the sudden we have an easy, standards-based, path to interoperability!

But given that I’ve been thinking about RDF lately again, I’m imaging extending this further. A simple solution would be a web service that could take a list of references, query distributed RDF stores, and return a collection of MODS records for processing. A more radical solution might be to use, say, a SPARQL XSLT extension to work with the RDF directly from within XSLT.

In either case, my hunch is that there’s a lot of possibility in this idea, and that the old notion of every user having to store and manage their own citation metadata—or conversely that it all ought to be stored on a centralized server—is one that is seriously holding back innovation in this space. Why do I even have to maintain my own citation metadata anyway?

Comments are closed.


Creative Commons License Creative Commons License