Archive for November, 2003

DocBook Example

Posted in General on November 11th, 2003 by darcusb – Comments Off

Markus has put together a customization layer that implements our RFE in DocBook. Here’s a minimal example document:

<!DOCTYPE book PUBLIC "-//Markus Hoenicka//DTD DocBook XML V4.1.2 extended citations//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"> <book> <title>The Example Book</title> <chapter> <title>First chapter</title> <para><citation renderas="full"><biblioref linkend="miller1999" unit="page" start="22"/></citation></para> <para>This is a <quote>quote<biblioref linkend="doe2001" unit="page" start="22"/></quote></para> </chapter> </book>

Why we need better bibliographic metadata in XML

Posted in General on November 11th, 2003 by darcusb – 2 Comments

One frustration I have with all current bibliographic systems – both free and commercial – is their metadata models.

Consider the following scenario:

I go to an archive and find a report in a collection.

Question: How do I represent this in existing standards?

Answer: Awkwardly.

In Endnote, for example, there is – as in most of these standards – a report reference type. So I can code most of the information there. However, there are no fields to represent archival holdings, which are typically handled by a separate reference type of “manuscript.” Yet this makes little sense; a report is still a report, whether it is in an archive or not.

What about a postcard from the same archive? Here one generally falls back on a “miscellaneous” reference type, and so lose metadata in the process.

How about a newspaper or magazine article republished in a book? Here all of the metadata models – DocBook, Endnote, BibTeX, etc. – fall apart.

By contrast, MODS is cleanly and rigorously structured, and so can handle all of these examples gracefully. The first example is just a report contained in a collection. The second has the exact same structure, but with different resource type and genre values. The third is just an article contained in a book, but originally contained in a newspaper.

I have been trying to convince people of the central importance of the model for bibliographic metadata. Thankfully, it seems I am having some success. To wit, the bibliographic project at OpenOffice plans to adopt MODS as its primary metadata format. Likewise, Chris Putnam is moving to MODS as his XML format for his bib conversion tools.

There’s still a lot more to be done, though. How about an elegant form-based web interface for mods data entry, for example?

Proposed Solutions: DocBook, etc.

Posted in General on November 11th, 2003 by darcusb – 2 Comments

During the past year, I’ve been talking a fair bit to people with similar perspectives, including Markus Hoenicka (author of RefDB) and Peter Flynn. One result of these discussions is this RFE to improve citation support in DocBook.

The proposal is planned to be discussed at the next TC meeting, which is sometime around November 20, so if you’re reading this and interested in these issues, please submit your comments at the above link!

We plan to make a similar proposal to the TEI group. If nothing else, it is my hope that we’ll see this functionality added to the next generation versions of these schemas, both of which will be developed in Relax NG.

The second project the three of us have been involved with is an all-xslt bibliographic and citation processing framework called Bibliofile. Aside from a simple XML-based style spec, bibliofile also has an xsl-based input-driver system. When finished, it ought to be possible to use it with any XML input data (including MODS), as well as with any document schema: DocBook, TEI, OpenOffice, even Word 2003.

Conclusion 5: XML document standards are poor for scholarly writing

Posted in General on November 11th, 2003 by darcusb – Comments Off

The unfortunate reality is that XML does not yet have an equivalent to BibTeX, much less anything better. Neither of the two dominant XML authoring standards – DocBook and TEI – have the level of citation and bibliographic support necessary for scholarly writing.

Even my beginning students are required to cite page numbers when they quote material, yet neither DocBook nor TEi support structural encoding of this information.

The question of structure vs. presentation becomes essential to academic publishing when you consider the diversity of citation styles. If I submit an article to one journal, I might need to cite like so (Smith, 1999:33-35). If I submit to another journal I might need to have instead (Smith 1999, pp33-5). For still another journal those citations need to be footnotes.

So, the XML world really needs:

  1. Better citation markup.
  2. An XML equivalent to BibTeX, with styles defined in XML files.

There is also a third improvement needed: support for emerging bibliographic metadata standards, such as MODS.

Conclusion 4: XML is ideal for scholars

Posted in General on November 11th, 2003 by darcusb – Comments Off

When lamenting my problems awhile back to Hans Hagen (the author of the fantastic TeX macro system ConTeXt), he cryptically said something like “look into XML.” I remember at the time having no idea where to begin.

Since then, it has become clear to me that XML is ideally suited to the needs of academic work. It is designed to precisely and flexibly encode meaning in textual documents. This is what most of my professional life involves. I write a lecture, I assign meaning and structure to that content. I write an article, I do the same. When I take notes on research sources, I often want to code that information so I can access it later.

Example:

I am reading a newspaper article relating to my research interests, and it includes a quote from an interview source that I may want to use later. I can easily do something like the following:

<para>A paragraph with a <quote name="John Doe" name-id="doej" keywords="one two three">quote</quote>.</para>

XML is designed for precisely this kind of data coding, and is far better as a markup language than LaTeX. Throw in XSLT and its ability to transform XML markup to virtually anything else, and you have nirvana: open, rich, flexible data.

Conclusion 3: LaTeX and BibTeX is no solution

Posted in General on November 11th, 2003 by darcusb – Comments Off

While I quite like TeX as a typesetting engine, LaTeX is a horrible markup language. Beyond my issues with its convoluted markup, in the world of academic publishing in the social sciences and humanities, it is absolutely imperative to be able to submit final documents that precisely conform to publisher specs; specs which demand that files be readable in Word.

What about BibTeX? It was designed for hard scientists. I cite everything from journal articles, to legal and government documents, and media sources. The BibTeX data model is woefully inadequate to this task.

The two issues together make LaTeX + BibTeX unsuitable for my needs.

Conclusion 2: Endnote is every bit the monopoly product Word is

Posted in General on November 11th, 2003 by darcusb – Comments Off

For any academic thinking of alternatives to Word, they are immediately stuck with the issue of bibliographic data. Endnote developed as a well-integrated bibliographic plug-in to Word. It makes the task of managing complex citations and bibliographies manageable.

However, Endnote was bought by ISI Researchsoft a few years back, which now holds a monopoly of the commercial bibliographic manager market. Not only do they now own Endnote, but also Reference Manager and ProCite.

The result: glacial innovation, buggy software, and regular paid updates. Indeed, version 6 of Endnote – the first compatible with Mac OS X – came very late, buggy, and missing key features. Endnote 7 is released roughly a year later as a full paid upgrade that a) fixes bugs, and b) adds as a “new” feature the ability to process RTF documents; functionality that had been around for years before ISI Researchsof opted to drop it in the previous release.

Enough!

Nevertheless, for anyone who uses these two applications together, it becomes quite a challenge to find alternatives.

Conclusion 1: Word is not well-suited to academic work

Posted in General on November 11th, 2003 by darcusb – Comments Off

My first conclusion – which I reached a long time ago – is that Word is not well-suited to academic work.

For me, Word is a lumbering and unstable beast of an application. When I am writing a book, I cannot afford to worry about crashes, or binary-data inflexibility. In addition, I absolutely hate supporting a law-breaking monopolist.

There is nothing particularly illuminating about this observation, but best to start somewhere.

Welcome

Posted in General on November 11th, 2003 by darcusb – Comments Off

For the past year or so I’ve been looking for an alternative to Word and Endnote for my academic writing. I am going to use this blog to post some conclusions, just in case anyone out there is looking for something similar.

While I have other blogs I use for work-related material (namely teaching) this wlll – at least for now – be focused on the intersections of XML and bibliographic metadata.


Creative Commons License Creative Commons License