Bleeding-Edge Workflow

Over on the OpenOffice bibliographic project user list, we’ve been having a conversation that started with me asking if anyone found the existing bibliographic support adequate. Every single answer was a resounding “no!”

One user complained that he may need to move from his current Linux-only solution to a dual-boot setup with Word and Endnote. Always wanting to encourage people to stay with free software, I told him to contact me off-list if he wanted to try a DocBook-based alternative.

So, here’s my suggestions:

1) Get a good XML editor. I use both oXygen and emacs NXML mode. I’d tend to recommend oXygen for new users. It has a free 30-day demo, and is otherwise reasonable. Let’s assume oXygen for these instructions.

2) Download the appropriate DocBook DTD or RELAX NG schema. I despise DTD-based toolchains, so recommend RELAX NG. You want to use v4.4 or the in-developement v5, since they both include enhanced citation support. The latter is available here. When you create a new document, use this schema.

3) You need to convert a bunch of references from Endnote? No problem; download these tools.

Export your references from Endnote as Endnote or Refer, and use end2xml to convert to MODS XML, preferably using the -s and -un options, so that each record becomes a separate file, and all-unicode.

4) Next, download and install the eXist XML DB. Create a collection (with the Java client) called mods, and load all your docs. Point your web browser to here and make sure everything works.

5) To be able to process your documents with the citations, you need to download CiteProc.

6) In oXygen, with your example document open, setup a “transformation scenario” that uses one of the stylesheets in the citeproc/xsl/document directory; let’s say dbng-xhtml.xsl. Make sure you choose the “Saxon 8″ processor option, and give it a “citation-style” parameter; say “author-year.”

That’s it (OK, it’s a little long!). When you run the transformation, the XSLT processor will go through your document looking for citations, then request all those documents from eXist, and format them.

RefDB is another DocBook solution for bibliographic storage and formatting.

PS: If you’re writing a dissertation, think about version-controlling it with cvs or subversion.

Tags: ,

5 Responses to “Bleeding-Edge Workflow”

  1. Software Documentation Weblog Says:

    Workflow for Documents with strong bibliographic requirements Bruce DArcus recommends following workflow to writers with a strong need for support for bibliographic data. His solution does not only include only DocBook -XML (he recommends the new RELAX NG ? based DocBook NG ? ), but also some cutting edge t

  2. Maarten Sneep Says:

    Warning, rantish text ahead.

    Following some discussion on the ars-technica openforum, I’ve experimented a little more. I’m afraid I have to agree with Kevin Walzer’s assessment, at least in part: the user tools are not good.

    Validation is great, but with the present tools - described by you as bleeding edge - there is no difference between validating and simply running through TeX to see whether the document contains errors. Yes, I know that validation serves a different purpose than the actual typesetting process as is done by TeX, but to the end-user that difference is moot. Validation makes sure that a document conforms to some strict, formal standards, while the TeX way is an ad hoc validation, not a formal one. But both “validations” run after the fact, and to the end-user they’re both a pain.

    None of the sample documents I could find validated without error. The error messages are more cryptic than those of TeX - and to some those are already notorious for being vague.

    Installation is arcane at best, and doesn’t bring anything close to a toolchain that will produce an acceptable pdf as output, and even the html output didn’t quite work. Inexperience? Yes, of course, but if you deliver a set of tools, at least make sure they work on a default install, with sample documents you provide. I noticed that oxygen added some meta-comments as to where the documents describing the structure are to be found - shouldn’t there be an OS level install location, or at least a configuration library where tools can find that out without needing system and machine dependent information in the document? Sort of kpselib with TeX installations.

    XML is great as an information interchange format, and allows applications of very different origin to co-operate, instead of working against one another. And if an application uses it behind my back, I’m all for it. Manual intervention is a pain, given the verboseness of it, although that may just require some getting used to.

    No, thanks, I’ll stick to LaTeX. It may be arcane, but the quality of the output is good, I know how to write with it, and it just works™. If this is progress, they can keep it.

    With respect to bibliographic tools: Having an interchange format that actually works is great, but given the limitations (no maths in titles or abstracts, as you’ve indicated on the Ars Technica thread), I’ll stick to something that is arcane, poorly documented, has a formatting language that puts your mind into reverse gear, and is impossible to validate, but actually works.

  3. Bruce D'Arcus Says:

    It’s cool that you went through the trouble to actually try the tools, Maarten. However, you come to some awfully strong conclusions after such a short time. I have a strong feeling your first experience with TeX was also frustrating (I’ve never met anyone who found TeX immediately intuitive). Your comments about validation are simply not right; most XML editors do real-time validation. And if you want answers to your others questions, they’re out there.

    When I complain about TeX and BibTeX, I’m not speaking from a position of ignorance; I’ve used TeX for years. I still do! It’s great if you do math (no argument from me there), if all you ever cite is journal articles and conference proceedings, and if all you care about is creating PDF or PS output.

    If, OTOH, you need to output to different formats, if you need good international support, if you deal with complex archival material, if you want something which many tools can work with (XSLT processors, XML databases, etc., etc.), then TeX can’t help you.

    I’ll give you two examples of how I use XML practically:

    1. I wrote my own chema for course documents like syllabi and assignments. The source is thus XML. I use XSLT to easily generate both TeX for print and HTML for the web. I use CiteProc to do the formatting for course readings. I get both high-quality (and flexible) output in both formats.

    2. I often do textual analysis of documents. if I have HTML source, I convert it to XHTML, markup important passages with span (and q) tags and class attributes. The documents are stored in an XML DB. If I want access to all quotes with a class that include “TeX”, I can instantly assemble them with a simple query, I can create an HTML output document however I want, etc.etc.

    If you’re happy with TeX, that’s great. But I hope you learn a little more about the possibilities of XML before spreading misinformation.

    Is there a learning curve? Yes. Should it be easier? Probably. Is there real value gained at the top of the curve? Absolutely!

  4. Maarten Sneep Says:

    A few added notes: I’m complaining about the tools, I do see the advantages of having a base format that is flexible and standardised – although it may not be for me, I do need some maths (spot the understatement).

    My first experience with TeX was fairly smooth, actually, because the provided sample documents just worked, with a nice integrated installer, it was pretty much a single download and off I went. I will not deny that TeX isn’t frustrating at times, but at least the start was smooth – very important in attrarcting new users.

    When I look back at the code I wrote in my first year of using LaTeX, things get ugly, very ugly, and that is part of the reason why an XML based workflow is probably better. However, I recently helped out in the students lab (physics undergraduates, first year), and one of the students picked up the basics of writing his reports in LaTeX within a day (others struggled on with Word, and asked later on if there was anything better out there – kids learn fast these days).

    What I think the DocBook/XML world needs is a CTAN/CPAN like initiative, to get tools, stylesheets, solutions to particular problems, documentation etc. in a central location. CDAN, anyone? The information may be out there, but it is scattered around.

    I know you are speaking from experience, and your reasons to want an XML based workflow are very valid (no pun intended). The weak points in TeX are very familiar to me, but fortunately my requirements fall within the intended audience for (La)TeX, so I’m relatively untouched by them (international support when requiring other scripts is possible, but very painful, storing TeX matrial in a database and then use it is impossible).

    If you’re happy with TeX, that’s great. But I hope you learn a little more about the possibilities of XML before spreading misinformation.

    Ok, my reply may be a bit too harsh, and yes, I’m happy with LaTeX. Scratch the remark as far as the validation goes, although I received no error in oxygen while typing, but I did after pressing the validate button – probably a case of PEBKAC. I stand by my remark that there seems to be no install method that gets you a working toolchain that works on the provided samples. To get structured documents to the masses, things have to improve there. And truth be told: we all have to gain a lot from easy to process, well structured electronic documents.

    Is there a learning curve? Yes. Should it be easier? Probably. Is there real value gained at the top of the curve? Absolutely!

    I believe you. I wish it were easier to get started. I guess part of the problem is that the tools and the public schemata (like DocBook) are still progressing – and even though everything is XML and should be able to translate flawlessly, as a (potential) new user that would worry me a little.

    Of the examples you give, number 1 is valid for most people – although creating your own schema may be beyond reach of many, I guess it can be compared to writing a documentclass in LaTeX, at least in the role it plays within the workflow. Number 2 is hardly relevant to me, at least I don’see how it applies to me.

    Maarten

    PS. A related question: A few years back I read about a Brazilian company that was working on a truly live validating editor. It allowed to select part of a document (with the selection colour changing depending on whether you had a valid range (no “[i] text [b] more text [/i] what’s this [/b]” in the document), and the elements – are those called elements? – you could assign to the selection were only those that led to a valid document. They had a working system under Linux. I didn’t find this in Oxygen, and I lost track of the bookmark I created. Does this sound familiar to you?

  5. Bruce D'Arcus Says:

    My examples weren’t necessarily intended to resonate with you (I know math people don’t care about textual analysis), except to show you why some people might find it attractive.

    As for editors, I’m not sure; there are a variety of GUI XML editors. There’s one called Serna that I understand is pretty good.

    And of course Word and OpenOffice are both XML editors of sorts, by virtue of their XML file formats. OOo, I learned, also does MathML.

    In an editor like oXygen or emacs nxml mode, the model is that you tell the editor which document type you want to create (often via config options), it loads the schema, and then when you start to type — by for example typing “<” — you will get a pop-up (or similar) with the valid elements. When you choose the element you want and hit a space, you then get the valid attributes, etc.


Creative Commons License Creative Commons License