BibYAML
Sometimes people misunderstand my objection to bibtex and advocacy of XML. While I do believe XML and associated tools have tremendous advantages for the sort of data and workflow I’m interested in, my objection to bibtex goes beyond simply thinking XML is a better markup language. The problem with bibtex is really the data model.
To illustrate, here’s what a compact non-XML bibliographic representation could look like:
> bib = <<EOL
" -
" id: doe2000
" title: A Book Title
" creator: Doe, Jane; Smith, John
" genre: book
" origins:
" year: 2000
" publisher: ABC Books
" place: New York
" -
" id: smith2001
" title: Article Title
" creator: Smith, John
" genre: article
" container:
" title: Journal of ABC
" origins:
" date: 2001-11
" parts:
" volume: 21
" issue: 3
" pages: 23-34
" genre: academic journal
" EOL
> YAML.load bib
=> [{"creator"=>"Doe, Jane; Smith, John", "title"=>"A Book Title", "id"=>"doe2000",
"origins"=>{"place"=>"New York", "publisher"=>"ABC Books", "year"=>2000}, "genre"=>"book"},
{"container"=>{"title"=>"Journal of ABC", "parts"=>{"issue"=>3, "pages"=>"23-34", "volume"=>21},
"origins"=>{"date"=>"2001-11"}, "genre"=>"academic journal"}, "creator"=>"Smith, John",
"title"=>"Article Title", "id"=>"smith2001", "genre"=>"article"}]
Note: this is not a serious attempt; it took me a few minutes. Still, it shows you can represent a MODS/DC-like structure in a compact syntax (YAML) which languages like Ruby (which I’m using above) support out of the box.
Creative Commons License
I do like MODS XML, but bibtex is (or at least has been) “good enough” for most things. While XML is well-suited for complex documents, bibliographic content is not nearly so rich that people still can’t make arguments for XML being overkill.
Bibtex still offers some clear advantages: It plays extremely well with traditional text processing tools–no need for an XML parser. XML parsers do make for bigger, more complicated programs. (This bulk is “cheap” when extending a system like LaTeX or especially OpenOffice.org, but is still something to consider if you want people to easily write a lot of small tools that can handle your data format easily). It can also be hard to see the CONTENT of XML, as it is buried by all the markup.
The true advantages of MODS will make up for these disadvantages: syntax checking is easier, because XML is (slowly) becoming a standard, the tools needed to play with the data will be legion, it is more extensible and has the ability to store far more and far more complex information on references. MODS is clearly positioned to become the standard for libraries & XML has already been praised for rich document generation (through OO.o, docbook, etc.) & it just makes sense that MODS is the way to go in the future.
I don’t think it is nearly so obvious that people should become early adopters, as you have. Just as there are still people who find that LaTeX offers them tools that they can’t currently find in the various word processors, I think people will stay tied to bibtex for a little while longer.
Sorry Richard. I missed that you posted this awhile back, and am now just approving it. I need to add a note that I have moderation turned on as well!
Anyway, your points are valid. I’m not arguing we ought to create BibYAML as a serious effort; it was just ilustration.
However, this sort of sttructure is much better suited than bibtex to rendering the more complex metadata common in the social sciences and humanities: for example, a review pubished in a magazine, archival manuscripts, etc.
The problem with adding new fields and such is you break portability. I submit bibtex is only “good enough” when your data does not stray beyond its core types, which typically means you are in the hard sciences.
BTW, I use TeX too (but for typesetting, not generally authoring).
Regarding breaking portability, I anticipate this will be a problem for MODS as well. Unless authors adhere to using MARC types (which some find to be “not quite enough”), they are basically adding metadata that many programs wouldn’t have anyway of anticipating. Similarly, I suspect there will be XSLT files and programs that will be able to handle MODS will also not be able to handle some complex records that can be created or records that weren’t anticipated.
To borrow from the Mutt mantra: all bibliographic formats sucks, but MODS sucks less. An XML format will hopefully have more tools & more people who can support those tools than bibtex. I imagine it would be a lot easier to get help with an inadequate XSLT file than an inadequate .bst file.
And BTW, I just added MODS XML export (currently our only export format) to refbase.
–Rick
Yes, yes, this it true … to some extent. As I have been designing my stylesheets and xml style language, I have made a number of assumptions:
The structural logic depends on correct coding of relatedItem elements, and inclusion of “issuance.”
“Type” is determined by an algorithm which is probably fragile, because it relies on genre not only as a single value, but can concatenate levels; e.g. “article-magazine.” Still, this is not critical for formatting generally.
Dates need to be in the form of YYYY-MM-DD, YYYY-MM, or YYYY if you want the stylesheets to handle month formatting.
Names must be parsed if you want the stylesheets to handle proper formatting (order, initialization, etc.).
So, yeah, all bib formats suck, but MODS will — I think — be more reliable across research fields than bibtex is.
Note: I don’t think MODS perfect. I don’t like the name element, and I think in general it’s rather loose. Still, writing a bib schema is really difficult, in part because it involves balancing different priorities (ease-of-use vs. comprehensiveness, flexiblity vs. reliable portability, etc.) that are impossible to fully resolve.
Just downloaded and managed to get working your XLST stuff. I am not familiar with this type of programming yet frustrated enough by BibTex and EndNote that I want to look further into this. And I am not even in Law or Social Sciences…
The lack of unicode support in BibTex (and LaTeX) is my main problem with it. Although I had admittedly not yet found the alternative to it, but I believe that with XML I might get much closer to this.
I am looking forward to future releases of this stuff, and will try to acquire some more knowledge about the techniques your using here, so I’d be able to contribute later on perhaps…
Yes, unicode and international support is a good reason to look at XML. While I have not yet implemented international support in the system, I will do so, and would like comments (and even help!). We’ve been discussing this a bit at the OOoBib dev list:
http://bibliographic.openoffice.org/servlets/SummarizeList?listName=dev
Forgot to ask: How is your project different from http://silmaril.ie/bibliox/ ?
It’s a fork/rewrite. I worked with Peter on BiblioX, but had some fundamental disagreements with the design. Among other things, I couldn’t figure out how to modify it.
So I rewrote it, using RELAX NG for the citation style language, and XSLT 2.0 for the processing code. Unlike BiblioX, I’ve also designed around MODS (though it would be possible to write drivers for other formats).
As they stand now, then, BiblioX handles more input formats, while my system offers more features (and I’d argue has a more sound design). I’m also committed to finishing it, and incorporating it into various applications: web-based, desktop-based like OpenOffice, etc. Peter is a bit overextended with other things, and so hasn’t had the time to spend on BiblioX.
One of the big design ideas in my system is the distinction between reference class and reference type, which forms a solid fallback system. For example, you can format almost any record reliably just by defining the three required types. In other systems, you generally need to define a new type for every possible input type, which is tedious, prone-to-error, etc.