BibTeX/Endnote/RIS Typing and Bibligraphic Formatting
My last post seems to have stirred some controversy, which is good. My purpose is mainly that the next person that thinks hmm .. maybe I ought to write an open source bibliographic application?
will think twice about how to do so.
In comments, Mark Grimshaw wrote:
[A] programmer designing conversions from bibliographic databases for such styles (as wikindx does) HAS to know what type a particular resource is as the presentation of the resource entry for a particular style very often depends directly on the type of resource. A journal article is displayed quite differently to a newspaper article, to a chapter in a book, to an article on the web or to a proceedings article.
I’ve had the same discussion recently with Paul Tremblay, and let me get quickly to the point and say that this wrong. I put together a demonstration awhile back that shows my argument. Moreover, I recently put together an XSLT stylesheet that fairly successfully proves that it works.
A data model and formatting system that is based only on bibtex-like typing will fall down once it has to handle the needs of many scholars. Data will be inaccurate or vague and formatting styles will fail to format any record type not explicitly defined.
The solution is a system that has a rigorous generic fallback system based around structural class, and only secondarily on genre/type. This is nothing radical; it’s just an extension of the existing models in Endnote and Reference Manager.
update: ok, here’s a compromise example:
- use typing for main layout, but list of types is extensible in schema
- require definitions for article, book, and chapter, which serve as the generic fallbacks
- some logic then associates other record types with their appropriate fallback
- rendering of name roles and genre and media description is moved into separatee elements
This has the advantage of being more-or-less familiar and author-friendly, while also being quite flexible.
<citationstyle> <info> placeholder for metadata </info> <content> <name-roles> <role> <term>editor</term> <renderas> <single>Ed.</single> <multiple>Eds.</multiple> </renderas> </role> </name-roles> <genres> <genre> <term>dissertation</term> <renderas>PhD Dissertation</renderas> </genre> <genre> <term>letter</term> <renderas>letter</renderas> </genre> </genres> <media> <medium> <term>cdrom</term> <renderas>CD-ROM</renderas> </medium> </media> <citation> <author-year> <names> <firstname/> <middlename/> <lastname/> </names> <entry> <creator/> <year before=", "/> <point before=": "/> </entry> </author-year> </citation> <bibliography> <names> <firstname/> <middlename/> <lastname/> </names> <entry> <reftype name="book"> <creator/> <date before=" (" after=") "> <year/> </date> <title font-shape="italic" after=", "/> <origin> <place after=":"/> <publisher/> </origin> <physical-location before=", "/> <url before=", "/> </reftype> <reftype name="chapter"> <creator/> <date before=" (" after=") "> <year/> </date> <title/> <container before=", In "> <creator after=", "/> <title/> <origin after=", "> <place/> <publisher before=":"/> </origin> <part-details> <pages/> </part-details> </container> <physical-location/> <url/> </reftype> <reftype name="article"> <creator/> <date before=" (" after=") "> <year/> </date> <title before="“" after="”"/> <container> <title/> <origin/> <part-details> <volume/> <issue before="(" after=")"/> </part-details> </container> </reftype> </entry> </bibliography> </content> </citationstyle>
Creative Commons License
Interesting - the website stuff is similar to the way I structured/analysed entries for the style formatting I did so obviously there’s validity in it ;).
However….
In the case of a style such as MHRA and two resources such as a journal article and a newspaper article:
Journal article:- John D. Spikes, ‘The Jacobean History Play and the Myth of the Elect Nation’, [i]Renaissance Drama[/i], 8 (1970), 117-49.
Magazine article:- Mark Grimshaw, ‘My Life in Pictures’, [i]Pictorial Monthly[/i], May 1989, pp. 10-12.
(As they would be presented by MHRA.)
You might say that the only difference here, the only way a programmer could know that the latter is a magazine article as opposed to a journal article and hence format the required differences (pages for example), is that the latter has a month. There are some styles however (off the top of my head can’t remember which) which require the month of publication (if given) of a journal article to also be printed. If that month for the journal article were then stored (in the same field presumably as a magazine article’s month of publication), MHRA would then be confused. It finds, author, title, publication title, month, year, page(s) - but is this a journal article or a magazine article, how does it format for display?
I can certainly see the reason for your argument and how useful it would be for those [i]entering[/i] resources and how it can be used without change for future types of resources, but a programmer’s life is made a lot easier by simply being told this is a journal article and this is a magazine article. In a sense, they’re two opposite poles.
Yes, some styles do have these sorts of genre-specific details, and I don’t deny that. Still, I’d prefer my formatting system format 100% of my records 95% correctly, than 50% of my records 100% correctly. By having both generic AND type/genre-specific logic, you can get closer to perfect.
As you think about this stuff, just think about how you handle stuff that I have dealt with in my research:
Without a good generic fallback system independent of type, people like me (and there are many of us in the humanities) are screwed.
BTW, I just reposted the stylesheet archive with example files.
I think the controversy is going to continue. There’s no reason why something that is not based on MODS, XML and XSLT can’t do everything that you mention. Down with mono-standards for storage. Let a hundred flowers bloom, and let’s have a common tongue to communicate with.
jl
Bruce,
Things like:
Can be easily handled by spending a couple of minutes by setting up ‘types’ such as ‘postcard’ or ‘newspaper editorial’, etc. And then defining how they should be stored and displayed.
Why settle for anything less than 100% accuracy?
jl
There are some styles however (off the top of my head can’t remember which) which require the month of publication (if given) of a journal article to also be printed. If that month for the journal article were then stored (in the same field presumably as a magazine article’s month of publication), MHRA would then be confused. It finds, author, title, publication title, month, year, page(s) - but is this a journal article or a magazine article, how does it format for display?
Whether you have (May, 1999) or May, 1999 is a pretty trivial detail. If you felt it necessary to maintain the distinction, then have your generic definition be one, and create a genre/type-specific definition for the variant. Look at the genre/type in your record, and all is gracefully handled.
You can see an interesting application of something like this in the archive I posted if you change the genre value in one of the example mods records from “book” to “thesis.”
Jonathan — I posted something on this issue awhile ago, but you’re expecting a lot of users if you’re going to force them to create and/or edit style definitions for every type they might want to format. I want to design a system that will scale to thousands of styles, and tens-of-thousands of users, from all kinds of communities.
If you’ve ever downloaded an Endnote style file to format your dissertation and realized it didn’t work well, you’ll understand.
I was going to make my last line in my first comment above ‘I sense a compromise brewing’ but decided to leave (can’t think of a better word) the discussion to continue and more ideas to flow.
Your point about adding a specific genre/type: why not call a spade a spade
I’m a little confused as to whether you’re proposing a new style guide that does away with the requirement to specify the type because the guide will format each field in exactly the same matter whatever the resource type, or whether you’re searching for a universal storage solution that satisfies all the thousands of style guides out there. In both cases I wish you luck ;). (Which is not to say it shouldn’t be done or isn’t worth discussing.)
The point about dates (bracketed or not) is certainly a trivial one and I’d quite happily line up those responsible for such a proliferation of style guides against the nearest wall. However, if you’ve ever had a paper submitted to a journal returned because your citations and/or bibliographies did not fit the house style (mainstream or private) then you’ll understand that it’s anything but trivial. (Actually, this has never happened to me but I have commiserated with those it has happened to.)
I myself work in the humanities (as I pointed out in an earlier post, I’d never heard of bibtex until a few months ago) and can understand your concerns. However, it is, as Jonathan points out, a case of simply adding that type with all its rules to the database and code. The system I’m working on (and I should point out that Jonathan (jl) is also a developer on wikindx) doesn’t yet handle all types that I myself, in my own research, require but I fully intend that it will. Furthermore, it is my intention that, somewhere down the development line, an administrator will be able to add new type definitions to the system.
(If you’d asked me 6 months ago whether I’d be spending so much time discussing bibliographies/styles etc. I would have laughed in your face…)
I have to admit I’m a little perplexed Bruce. You seem to be anti anything to do with typing a resource yet the mockup you presented on 21st May this year (http://www.users.muohio.edu/darcusb/misc/form-start.html) has fields for title, subtitle and genre (aka ‘type’). This last is a selection box with choices including book, edited book (presumably an anthology of some type), web article etc. This seems to be quite contrary to things you’ve been saying here.
I don’t mind types; I just don’t want to be limited by them. In the example I posted, there’s no fized list of types; it would all be configurable, and internally the fields would be generic enough to allow that.
See my update for a compromise idea (which I would propose for BiblioX). There the stuff on top (what you see in the XML) would be type-based, but the stuff underneath would be more generic to take advantage of the structural powerful of stuff like MODS.
Better?
The system I’m working on (and I should point out that Jonathan (jl) is also a developer on wikindx) doesn’t yet handle all types that I myself, in my own research, require but I fully intend that it will.
If you design the model right, adding a type is just adding a value, and the mapping interface. Right?
(If you’d asked me 6 months ago whether I’d be spending so much time discussing bibliographies/styles etc. I would have laughed in your face…)
If you design the model right, adding a type is just adding a value, and the mapping interface. Right?
It should be - Jonathan already has a mock-up running.
Yes, and I’m very excited about it. The only thing left is to make it possible to add database fields in the admin section. Once that’s done, on one page (with lots of checkboxes and fields!) you will be able to define what database fields you want the type to encompass, how the fields should appear (the ‘title’ database field can appear as ‘journal title’ for journal article types and ‘book title’ for monographs), and what fields are required.
This is all stored in a separate table in the database with just two fields - name and rules. The rules, stored as an array, get zipped up using php’s serialize function. This makes it easy to deal with in the database, umm, or lack of it.
The wiki-like markup for styles is also coming along.
This is turning into an interesting development. I was dreading moving this stuff out of config files and into the database, but it’s been worth the pain.