Reference Types
When designing a citation format (or, um, microformat) one of the more difficult issues is deciding on types. Book and article are really simple and straightforward. But what happens if you need to handle weblog posts, archival documents, dissertations, and so forth?
The conclusion I came to when designing an RDF representation was to use a hierarchical model. Graphically, that model might look like this:

The advantage of this approach is that a developer can adopt different levels of granularity. If they want basic support, they just use the top-level classes. If they want a bit more richness, they can use the first-level subclasses. If they need still more, they can drill down farther. Moreover, none of these levels are all-or-nothing; they can pick and choose which they need.
More importantly, this works better for end-users and metadata producers.
Case in point: I recently wanted to store the online transcript of a recent Bush speech using Zotero. So I have a “press release” and that is pretty much just a straight “transcript” of a “speech.” So, potentially three different types, none of which Zotero even remotely supports! Instead, I’m stuck using “Website,” which is just wrong.
Solution: add the top-level “Document” type. It’s not very precise, but at least it’s accurate.
The hierarchical type model, then, is not only good modeling practice for developers, but it’s good for users, who no longer need to feel confined by the straight-jackets of a flat model.
Creative Commons License
Bruce, aren’t journals, magazines and newspapers also documents?
In a straight hierarchy like what you’ve proposed, it seems like either multiple inheritance is necessary to get the top-levels you want and still capture all the meaning of the types - it really seems like there are two dimensions of type that you’re covering here - an organizational type (part of a series vs. standalone document) and a “physical” type (what the document itself is).
What do you think?
Maybe Mike. As I’ve been thinking about it, periodicals are really document collections. They aren’t really cited per se. Maybe a periodical issue would be a document though? I just chose not to separately model that (because it adds what seems like needless complexity).
In terms of different kinds of types, this is really what FRBR gets at. You have works (abstract creations), expressions (aka versions; how a work is expressed as text, sound, etc.), the manifestations (or formats; the physical things like books, etc.), and items (or copies; the thing you have on your shelf).
I’m trying to keep this logic in mind when designing these classes, but to not complicate the model here. It’s hard to do.
How would you diagram what you are talking about above?
N.B.: RDF types are sets. There’s nothing stopping you defining PeriodicalIssue as a subclass of the intersection of Periodical and Document, for example.
The main purpose of coming up with a ‘hierarchy’ like this is to relate different kinds of items together.
Rich — “nothing stopping” me but my own ignorance of the intricacies of RDF/OWL
I think the concepts work, with some caveats. One is that I never met a classification system that worked without some sort of Miscellaneous category, the catch-all class. In this model, that might be a “Special” class that flags a citation for special, manual handling.
Another caveat is that more metadata types are needed for automated processing of citations at least in the legal profession where the bibliography is a table of authorities with subdivisions and different groupings and hierarchical sort orders based on different sorting keywords for citations within each. E.g., Cases, Statutes & Constitutions, Legislative History, Rules & Regulations, Treatises, Law Journals, Encyclopediae, and Other; these are most of the major headings in a traditional table of authorities. For example, within the Cases and Statutes & Constitutions classes there are groupings (no subheadings) by level of government, federal and state. But within those groupings, Cases are sorted alphanumerically by case title, whilst Statutes and Constitutions get a further grouping by particular bodies of legal code (e.g., U.S. Code; Oregon Revised Statutes) and within those groupings an alphanumeric sort by (in order) code title number, code abbreviation, and code section number.
The state of the art in software that automatically marks legal citations for indexing and then generates the indexed table of authorities (with page numbers for each page where a particular source is cited) has to work around the inability of the major word processors to include metadata with citatios. Instead, they work from a database of known citation forms, some fairly smart algorithms, and fuzzy search techniques. Even at that, you typically wind up with several unclassified citations needing manual handling and citations are often omitted from the table of authorities. Higher quality processing could be obtained using citation metadata.
There are other problems as well that suggest a need for more metadata. One is the necessity in the legal world of distinguishing between citations to legal authorities and citations to evidence. Citations to authorities wind up in the table of authorities whilst citations to evidence are normally processed only for generation of the Exhibit List.
I do not know enough about RDF to have a clue whether such metadata needs can be addressed in an RDF model. But if a metadata approach is taken to table of authorities generation, more than three classes would seem to be necessary.
There is a further practical matter, and that is that the time for generation of the citation metadata is when the research note is taken with the source material in hand and both the citation and its metadata need to be easily exportable from the research assistant software to the word processing format.
I’ll be sending along by email some further detail on relevant unique characteristics of citation processing in the legal profession.
@marbux:
A top-level type like “Document” is a kind of miscellaneous type in fact, but just a little more specific. So I don’t think there’s a need for supporting it explicitly.
A lot of the metadata you note could be indicated in ways other than using types.