RDF and Bibliogaphic Metadata
An RDF ontology based on BibTeX.
I don’t understand RDF, as I’ve said here a few times, but I do understand bibliographic metadata enough to strongly believe that the BibTeX metadata model is horribly broken. Would be nice to see someone in the RDF community come up with a representation that draws inspiration from MODS instead.
I’ve pretty much dropped further development of my Biblio schema because I’m happy to use MODS instead. However, because it’s simpler than MODS, it could be a good blueprint to use for an RDF representation; at least Steve Cayzer seems to think so.
The one cavaet is that I do not believe hard-coding typing (e.g. book, article, stamp, poster, statue, lecture, etc.) is a good idea. There is a specific reason I did it in biblio (to drive input validation), but my intention was always to use some RELAX NG magic to make this list of types flexible.
Creative Commons License
Hi,
I found your site in my referer logs. To be honest, I’m not well-versed in bibliographic metadata; bibTeX is something that I’ve worked with in the past and so it felt like a nice starting place for me.
I’d be interested in hearing your thoughts on better systems, especially MODS, which I’ve never heard of.
I’d drop you an e-mail, but I don’t see your address on the site…
Thanks,
Nick
Hi Nick,
The big thing is that I think it a mistake to think in terms of concrete classification (book, booklet, article, etc.). Better to think in terms of structural relationships as DC does (standalone objects versus those that are parts of other objects, etc.). The problem with DC is that it’s not fine-grained enough for bibliiographic purposes. MODS fixes that problem, which is why I think you ought to study it a bit.
Also, I just stubled on this interesting-but-not-yet-announced project:
http://disobey.com/noos/LibDB/index.cgi?HomePage
Thanks for the heads up Bruce. Couldn’t find the ontology itself but seems interesting…
Hi,
Have been reading this posting and agree with the fact that bibliogrpahic data should not be limited to a list of types. For my 4th year dissertation (Heriot-Watt Universit, Edinburgh, Computer Science) I am attempting to develop a reasonable data model in RDF to replace BibTeX. Bruce has come across this before. I have decide that there would be the equivalent of 2 objects to hold information and being RDF they (and the model) are easily exstensible. The objects are sources and entities. The source object encompasses any possible type and can have any details required. A field such as “in” describes that the source is part of another resource allowing the nesting as far as necessary. e.g an article in a book in a series with each being treated as a separate object. The entities object encompasses person or entity (publisher company or any other such`entity)details such as name and address. Any comments on these ideas would be great. thanks Richard
Hi Richard — sounds good on the face of it. How would you propose to draw on existing ontologies — FOAF, DC, and so forth — to represent these structures? Or not?
Richard,
Good to hear you’re working on this. A more useful project than most, I suspect, and there are plenty of interesting things to learn.
Regarding entities, I would strongly advocate using foaf:Agent, with it’s subclasses foaf:Person and foaf:Organization. There are a number of reasons.
First up, reinventing the wheel (or in this case the vocabulary) is a Bad Thing(tm). I summarised[1] my understanding of this on the RDF Interest mailing list in a discussion[2] of BuildOrBuyTerms[3].
[1] http://lists.w3.org/Archives/Public/www-rdf-interest/2004Jan/0076.html [2] http://lists.w3.org/Archives/Public/www-rdf-interest/2004Jan/0069.html [3] http://lists.w3.org/Archives/Public/www-rdf-interest/2004Jan/0069.html
Second, there’s an increasing amount of FOAF out there. Anyone who has a typepad web log has a FOAF file. What this means is that more people are already publicly represented in RDF using FOAF than any other vocabulary, and this in turn means that FOAF is most likely to get the traction to become truly universal. Don’t be distracted by discussions you see about FOAF being a social networking tool. It isn’t, though if you think social networking is interesting (I don’t) then FOAF could be used as an underlying layer for that.
Third, there is already some discussion about representing the finer details of people’s names and such in RDF.
If FOAF has limitations for your purposes, I suspect you will find the FOAF people receptive to input and keen to have someone else thinking about these things. Remember that as you said, RDF is open to extension, so you can declare properties in your own vocabulary to have range and domain foaf:Agent. In particular if there are authority records out there about people (Bruce mentioned this to me in the past) you could declare that
yourvocab:someAuthorityRef
has domain foaf:Person and is an inverse functional property (using OWL) of foaf:Person. That could be done in foaf itself if the foaf people think it is useful, or in your vocab quite independently. Either way it provides another way of unambiguously specifying that a node represents a person.
Another point about “reuse rather than reinvent” is that if the data model can have an unambiguous 1:1 mapping with other bibliographic data formats - here MODS springs to mind - then you have a lot of extra power (and data) for free. Note that I don’t think this means you have to follow the MODS data model, just that part of the development process is to get as close as possible to being able to round-trip between MODS and your RDF. A good way od doing that would be building and maintaining a MODSyour RDF translation tool (in both directions). The mental effort of doing this will I suspect be amply rewarded in an increased understanding of the data model and of pragmatics.
In passing I note that FOAF has a foaf:Document type. Since the FOAF vocab is at version 0.1, and they have a note on that term pointing out that it is limited, providing a vocabulary for describing “works” which FOAF could use would probably be valuable to them as well.
See http://xmlns.com/foaf/0.1/ for the draft FOAF spec.
I’ll second Hamish’s points; all of them!
It’s true the FOAF people are lookking to broden and deepend the spec, and I think it’s worth talking to them. Representation of personal names, for example, is up for revamping, which will be a good thing. I posted something on my blog on how I think naming ought to work in bib data. Interestingly, when I posted the example on the MODS lists, I got a positive comment on it from one of the key people involved in shaping the direction of the MODS metdata model.
As documents and works, I seem to recall that the FOAF people have looked at the FRBR, which makes a distinction between works, expressions, manifestations, and items. The LibDB project (mentioned above) is based on this, and will be spitting out RDF, so that might be worth looking at. I still have the sense that the FRBR might be too much for serializing the sort of content we deal with, but I did post an example of how it might be useful a couple weeks ago.
Also, re: the notion of building a translation tool, Chris Putnam has released the (GPL) source for his bibtex/ris/endnote –> mods conversion tools, and my understanding is it wouldn’t be hard to add an additional module to spit out some flavor of RDF.
That could actually be really useful to work in conjunction with Chris to have component-based GPL conversion tools. They’re written in C, BTW.
Re: Other ontologies. I totally agree on the use of other ontologies like DC. I intend to draw on exsting well documented (and standard) ontologies to identify items such as creator and title (DC) that are relevant. I have a feeling that my data model is going to draw on a lot of these existing ontologies and create a method for the uses of them together as a whole to store bibliographic data.
Re: FOAF FOAF is one such ontology that I have looked at. In looking at FOAF, i see some limitations. It does not appear to support any multi cultural issues with person names. For example in Dutch names a “van Nistlerooy” is alphabetised with the Ns as “van” is so common. Yet in other cultures it would be with the Vs. This is still difficult to sort out. But also there are so many different ways in which a name can be represented. I have come across an XML standard wich I have previously mentioned to Bruce at http://www.oasis-open.org/committees/ciq/ciq.html They have developed an XML based schema for XNAL (eXtensible Name and Address Language) that covers a lot of the issues to do with the various types of writing person names and addresses. Having contacted them, they are currently developing an RDF schema for xNAL. This is such that a name could be represented as a simple string or as the sum of as many parts as required. They claim to be able to represent a person name in at least 36 different ways. It is intended to be application independent. This would give any user of my data model the power to use whatever format they require for there own bibliographies instead of restricting them to just a few.
Note: I corrected two typos in the oasis link.
On FOAF, the developers are aware of the problem, and there’s a proposal to revamp name representation laid out at http://rdfweb.org/topic/NamesInFoaf. I think those of us thinking about name representation in bibliographic metadata ought to help shape the direction of FOAF in this regard.
XNAL is indeed interesting, in part because it tackles name structures somewhat differently than the orthodoxy would suggest. For example, it has elements for first, last and middle names, which are typically seen as very Western (even U.S.) centric. I think I’ll post an entry in the blog on this.
Also, Richard, with respect to representing hierarchy, why not use DC isPartOf?
Of course! I overlooked the isPartOf dc term. Thanks
I have an example of some development ideas by the way of bibliographic data in RDF. It uses a lot of the DC terms and elements and lacks some of the entity naming structure that I am looking for. I have a namespace “bib” which is designed as a catch all for elements and predicates that could exist. As of yet there is no RDFS for it but the file does parse with the W3C Validator. It is intended as a first prototype to demonstrate my design ideas. Please feel free to comment on it as harshly (or not so harshly!) as required. Thanks
http://www.macs.hw.ac.uk/~ceerdl/Dissertation/example4.rdf
This will not linked from my main dissertation webpage until later on this week.