Citation Metadata Workflow

I got the following question in my inbox recently. I’ve edited it slightly to protect anonymity:

At some point, it would be useful to have a higher level “workflow” diagram of your vision. This would be especially helpful starting with the … research process and then ending in publication/distribution of a manuscript (analog (printed) and digital). You might even include the peer review cycle as well. What I’m trying to grasp is the various vectors or entry points for citations (how they are gathered/collected, organized and then deployed).

This is a great suggestion, but after sitting in front of OmniGraffle for a bit, I realize it’s actually quite difficult to clearly diagram just how much a mess the current landscape is, and the nirvana that I’d like to get to.

Awhile back, Alf Eaton did a decent job capturing some of this awhile back. It’s worth noting, however, that Alf’s diagram models workflow for a hard scientist, who tend only to ever cite secondary academic literature. In many other fields in the social sciences and humanities–where one often cites primary data–the universe of citable content is significantly broader. Every time I come across a news article on the New York Times or BBC websites, or across information in a Lexis-Nexis database, that is potentially citable content for me.

What I want to show in my diagram, then, is something like the following:

  1. scholarly data and its metadata can come from many sources
  2. it has to transfer between different kinds of applications, and across formats
  3. because the current software landscape in this area is so fragmented and without any real standards, the content/metadata link is incredibly fragile, and authors almost always—which is to say without significant exception—need to manually (and therefore incredibly awkwardly) maintain those links
  4. because applications and their file formats are similarly dumb, that link is again broken when authors release their work to the world, often through publishers

Leigh Dodds’ latest is on a similar theme of the metadata density of academic texts–he calls them “palimpsests”–and the need to break them open. As he writes:

I likened the process of authoring a scientific paper to that of the creation of a palimpsest. Starting from original research results and working through the synthesis of a cogent explanation of the results or discovery, at each step the content becomes more abstracted from the original results, the previous work being “lost” to the reader.

Data is presented in pre-analysed forms and is not amenable to reuse. Like the palimpsest the raw data has not really been lost, its just not (easily) accessible to the reader.

If the scriptio inferior, the underlying data, were made available to the reader, then there a lot of interesting possibilities arise.

Nice to have smart people in good places.

In his presentation, BTW, Leigh notes the utility of formats like OpenDocument for facilitating this sort of integration. Indeed, this is why the current metadata discussion is so important. So if I can’t quite work out a diagram that details all the links in the currently broken chain of the academic workflow, I can go back to this diagram I used to encapsulate the larger vision I presented at the Access 2005 conference:

2 Comments

  1. Bruce D'Arcus says:

    Trackbacks seems to be screwed up here, but Alf has an updated diagram here.

  2. OpenOffice Bibliographic Project

    The bibliographic project will design and build an easy to use and comprehensive bibliographic facility within OpenOffice. It will be easy to use for the casual user, but will meet all the requirements of the professional and academic writer. The


Creative Commons License Creative Commons License