Citation Formatting in Word 2007
Yesterday I examined the encoding of citations and bibliographic data in Microsoft’s Open XML formats. Today I’d like to discuss another crucial piece of the puzzle, which is citation style configuration and formatting.
Examining the contents of an example document, I came across the following attribute: SelectedStyle=”\APA.XSL”. This naturally suggested to me they’re using XSLT to do the formatting. A quick ping to M. David Peterson confirmed it.
In general, this is a very good thing. They’re using a W3C standard technology in such a way that it ought to be possible to easily enhance it, or substitute alternate implementations. So if that’s all true, kudos to Microsoft!
Not surprisingly, though, given that I have a little experience using XSLT for these purposes, I have some thoughts/observations.
The first is that bibliographic and citation formatting is pretty complicated, and fully supporting a style like APA using XSLT 1.0 is going to be really difficult. Just take a close look at the output example from citeproc for the APA style. This is hard to do even with the much more advanced capabilities of XSLT 2.0. I’d venture to say it is impossible to fully implement n XSLT 1.0 without extensions.
Even if an XSLT expert manages to program it, it will be almost impossible for even tech-savvy users to create or edit styles in any significant way. I consider my XSLT skills strong, and I find understanding how I’d modify or implement a style really difficult. The code is really complicated.
Just as a hint to the complexity, the archives with the XSLTs—both some generic processing files, as well as 10 styles—weighs in at 2.6 MB (!). The APA.XSL file is a whopping 340 KB. By contrast, the lib directory (which contains all the XSLT files) of XSLT 2.0 version of citeproc weighs in at 584 KB. Though this doesn’t include the CSL files to configure the styles, those are each quite small (my APA style, which AFAIK fully implements the spec, is only 8 KB).
But what this does suggest to me is that it ought to be easy to swap in citeproc, or for Microsoft to port it to XSLT 1.0 if they like. The benefits to using a domain language like CSL for styling are significant. It becomes easy for users to create new styles, and for developers to create tools for it.
In other news, the XSLT gives insight into the data model, and things are a little better than I’d worried about earlier. The range of reference types, for example, is broader than those in BibTeX. OTOH, types such as “ElectronicSource” start to look quite dated. Most sources these days can be electronic, and the design should reflect that. Also, the model is indeed flat, with elements like b:JournalName.
Creative Commons License
[...] darcusblog » Blog Archive » Citation Formatting in Word 2007 [...]
[...] IMHO the best solution is to charge on with XSLT 2.0. Microsoft have stopped developing MSXML but their .NET replacement is still under development. Speculation is rife that they may be including XSLT 2.0 support. Also there is the Saxon XSLT engine (in Java) early rumblings from the Apache Xalan crowd and Oracle. Although none of these are likely to be used in the browser, they may provide some impetus for XSLT 2.0 as a whole.. [...]
[...] Hmm, this is one of the reasons I was saying using XSLT 1.0 to format citation properly is difficult. Citeproc handles this all correctly, incidentally, and it took me a long time to figure it out, with a lot of help from XSLT experts like Michael Kay and Jenni Tennison. I still haven’t implemented another little wrinkle, which is to disambiguate multi-reference citations where two different authors share the same last name. [...]