Archive for January, 2004

HubMed: RDF and Academic Literature

Posted in Uncategorized on January 20th, 2004 by darcusb – Comments Off

OK, one of the coolest things I’ve seen is HubMed. This presents an easy-to-use interface to PubMed data. One of the most interesting aspects of HubMed is that you can create an RSS feed of your chosen query.

Way-cool feature number two is graph-based display of related literature; precisely the sort of thing RDF tools are well-suited to. Here’s a screenshot:

The graph, incidentally, is not static, but can be dynamically transformed on the fly. If you hover over a citation, it lights up and presents an option to click a pop-up button for more info (abstract and so forth).

Very nice job!

Now, if only I had something like this for my literature!

Creating an Endnote(/BibTeX)-Killer

Posted in Uncategorized on January 20th, 2004 by darcusb – 5 Comments

Art Rhyno has an interesting blog on library-oriented technology, and some thoughts on some of the pieces necessary for an Endnote-killer.

Having thought about this deeply for the past year or so, I think it best to start fresh. Certainly it is important to learn from existing solutions, but it would be a mistake to think there are not better ideas floating around.

My key argument would be that it is time to get rid of monolithic applications. Like Word, Endnote is a monolithic application. It is a database for reference data, it is an online search/and query tool, and it is a citation formatter. These functions need not all be included in a single application, and we’ll get to where we need to go quicker and with greater flexibility if we can not just recognize that, but exploit it.

So here are the pieces we need:

  1. A formatting engine independent of any particular data model. Whether the thing is written in Perl, or C, or Java, or – as Bibliofile (soon to be renamed BiblioX) is – XSLT, the sole job of this engine ought to be to suck in XML data, and spit out formatted citations. The formatting style specification must itself be XML. Aside from that it ought to be open to any format for which one can write an input driver. If anyone has XSLT skills and is willing to contribute, Bibliofile is the best place to start.
  2. Rich metadata serialized in XML and representable in RDF. BibTeX is not the place to start here; its data model is simply wrong. Likewise, the data model in applications like Endnote is also broken.
  3. An online query/download utility. Like the above, this should focus on acquiring metadata in XML if possible, and passing it to other applications and tools.

All of the above have the common theme that they are small and task-focused, and based on open metadata. They also must be open source. Different projects – OpenOffice, Chandler, etc. – can contribute unique end-user experience by drawing on those tools.

MT Design

Posted in General on January 19th, 2004 by darcusb – Comments Off

OK, after complaining about the MT templates, I finally figured it out. I just copied a rendered HTML page into an editor, added the CSS code directly to the file, then modified it until it was more-or-less what I wanted, and then pasted the code back into MT. Not so bad after all. I’ll probably be tweaking this over time, as I’m not quite satisfied with it yet.

BTW, for people using Safari on Panther, you’ll notice I’m taking advantage of the CSS “text-shadow” property for titles.

OpenOffice: Unrealized Opportunity?

Posted in Uncategorized on January 17th, 2004 by darcusb – 1 Comment

Quite awhile back, David Wilson announced a project to revamp the bibliographic support in OpenOffice. This project presents a huge opportunity to add state-of-the-art bibliographic and citation support to the most significant open source alternative to MS Office.

In some ways, the project has come a long way. The key participants in the discussion have agreed on the basic functional requirement, which among them includes a significantly improved data model that can support MODS, a desire to exploit XML and XSLT throughout, etc. If realized, the project’s vision, put simply, would make OpenOffice not only a free alternative to MS Word and Endnote, but a superior one.

In other ways, however, the project has languished a bit for lack of dedicated programming talent. Thankfully, one programmer has volunteered to tackle coding of a MODS-friendly entry UI. This is fantastic! But, we need more programmers. It would be really nice if:

  1. Sun realized how important this project is to its prospects in higher ed, and dedicated some programming talent to realizing it. Not only will it draw in new users, but it can show-case emerging XML-related technologies in OpenOffice.
  2. High-Ed institutions themselves realized it’s in their interests to have open source alternatives. It can cut costs for institutions struggling with budget cuts, but it can also put some competitive pressure on the commercial vendors as well. Is it completely out of the question that a few institutions could dedicate some programming talent to OpenOffice as well?

More on People, Names and RDF

Posted in Uncategorized on January 12th, 2004 by darcusb – Comments Off

I’m new to RDF, so just trying to get a sense of the landscape. I fully get Hamish’s point about the utility of being able to integrate, for example, bibliographic metadata with contact metadata. A good representation of a person ought to work in both contexts; right?

However, this leaves an obvious question: to use FOAF or use vCard?

Earlier I posted an example where I wanted to be able to link quotes with encapsulated metadata to a person contact record. Let’s say I read a newspaper article with some interview content I want to cite. i want to attach a note to the person record that says not only what this person’s name is and where they are from, but also a little bit about them, including in what record they were referenced.

How best to do this in RDF (or even XML, perhaps using xlink), and with which of the person specs?

Parts, Details, Lists

Posted in Uncategorized on January 11th, 2004 by darcusb – Comments Off

Hamish asked for some more info.

On parts in MODS, here’s an example:

    < relatedItem type="host">
      < titleInfo>
        < title>Journal of Interdisciplinary History</title>
      </titleInfo>
      < typeOfResource>text</typeOfResource>
      < originInfo>
        < dateIssued>2000</dateIssued>
        < issuance>continuing</issuance>
      </originInfo>
      < genre>periodical</genre>
      < part>
        < detail type="volume">< number>31</number></detail>
        < detail type="issue">< number>2</number></detail>
        < extent unit="page">
          < start>259</start>
          < end>260</end>
        </extent>
        < extent unit="paragraph">
          < list>21, 24</list>
        </extent>
      </part>
    </relatedItem>

As for the DocBook citation proposal … hmm, I’m not sure if the current version is posted anywhere. The original RFE is here, though it’s dated.

The substance of it is a new element called biblioref. Example:

<citation><biblioref linkend="cite-ID" unit="page" start="19" end="20"/></citation>

I argue that page ought to be understood as default by processors, and the proposal assumes a single point would simply have a start but no end attribute. Perhaps not ideal from a pure meta perspective, but you can’t be too puritanical when dealing with document markup. The bibliospec element in the original proposal was also dropped as too verbose.

Modeling Bibliographic Data in RDF (round 2)

Posted in Uncategorized on January 11th, 2004 by darcusb – 1 Comment

Hamish continues with round 2 of his RDF experiment, and offers these comments. I’ll respond now before things get crazy next week.

I am unconvinced by the collation of publisher and place. I think this is an artefact of how information is used in citation, rather than a reflection of the structure of reality (whatever that it …) (are the address of a publisher and the place where a speech delivered structurally similar in terms of citation, even?).

The way I look at this issue is as follows:

originInfo represents the source of a given object: where it came from and when.

Strictly speaking, there is a difference between the place of a publisher and a place a speech was given, but that difference doesn’t strike me as particularly significant, at least not with respect to citation formatting.

Still, I have argued that MODS needs some notion of an “event,” as distinct from a physical object. It could be useful, then, to distinguish an event place (where something happened) from a publisher place (where a published object came from).

On parts numbers:

Those starts and ends still need to be generalised. I wonder if anyone has done any work on a “parts of things” vocabulary?

MODS distinguishes between “detail” and “extent” (what Hamish wants to call “range” but is slightly broader I think). The latter can contain “start,” “end,” and “list” elements. Somewhat strangely, it uses a “type” attribute on the first and a “unit” attribute (which sounds better to me) on the second to indicate the same thing:volumes, issues, pages, etc. Our DocBook citation proposal drew on this as well, using “unit” and “start” and “end” attributes.

Now here’s what I’m talking about

Posted in Uncategorized on January 11th, 2004 by darcusb – 2 Comments

Hamish Harvey has an interesting post on RDF and bibliographic metadata, complete with an example.

Let’s take a look:

    <biblio:Publication rdf:nodeID="oshinfDAbbott">
        <dc:title>Discussion of The relevance of Open Source to Hydroinformatics by Hamish Harvey and Dawei Han</dc:title>
        <biblio:author rdf:nodeID="abbott" />
        <biblio:availableFrom rdf:resource="http://www.iwaponline.com/jh/004/jh0040219.htm" />
        <biblio:partOf rdf:nodeID="jhinf5.3" />
        <biblio:startPage rdf:datatype="http://www.w3.org/2001/XMLSchema#int">203</biblio:startPage>
        <biblio:endPage rdf:datatype="http://www.w3.org/2001/XMLSchema#int">206</biblio:endPage>
        <!-- Something more specific that "cites" (such as "is discussion
        of") would be better here. Just imagine that publishers could be persuaded
        to generate this information. Then when you find an interesting article
        you could easily search for later discussion! Then again, publishers could
        easily add a forward hyperlink from an article to later discussion now, but
        don't necessarily do so. -->
        <biblio:cites rdf:nodeID="oshinf" />
    </biblio:Publication>

I like this, but do think Hamish could draw more on DC, use vCard for names (as he suggests himself), and with respect to page ranges:

  1. wrap them in the “partOf” element where they belong
  2. make them more generic (for legal scholars or film scholars, who might not only deal with pages)

The “cites” element, it seems to me, can be captured in MODS with the relatedItem “isReferencedBy” structure. DocBook has something similar. Doesn’t DC as well?

Regarding MODS, he observes:

This close relationship with MARC appears to carry over a lot of cruft, and I fear that MODS as a result is just too confusing to see widespread adoption in personal tools.

There is indeed a lot that is unnecessary for personal bibliographic management in MODS. Still, it seems to me better to build around something that offers too much, than too little. The key structures in MODS could easily be implemented in a nice clean RDF model and/or XML Schema:

  1. name with role (broader than just “author”)
  2. titleInfo
  3. originInfo (broader than “publisher”)
  4. location (for online and archival holding information)
  5. relatedItem (absolutely key for representing hierarchy)
  6. part (subelement of relatedItem, can capture all manner of details associated with parts: volume and issue numbers, but also paragraphs, sections, time segments, etc.)

A good RDF representation of bibliographic metadata might thus start with DC, and draw in structures from MODS where needed. I think Hamish is on the right track.

I’ve started to put together a document that shows the structural relationships that define citation formatting. I’ll add to it over time.

My argument is that the structure of citations closely relates to the structure of MODS metadata (which itself is influenced by DC).

Note: you need a good CSS browser to view this no doubt.

Hamish also mentioned the tricky name issue:

Just here is an example of the above mentioned conflict. Instead of author names as strings, I record a reference to a node representing that author, attached to which are details of the authors name. When I need to generate a reference list, how can I ensure that the author’s name is rendered correctly in each instance? Somehow with the link from publication to author needs to be recorded the way the name was listed on the publication itself (Hamish Harvey, or H. Harvey?)

I don’t think this is quite right. MODS has both parsed names allowing family, given, and terms of address (president, etc.). It also has a displayForm element, which could be used to handle full unparsed names, e.g.:

John Q. R. Smith, III

The trick is that this field ought to store the full name not as it is rendered in some journal style (which often bizarrely mangle names for absolutely no reason), but rather how the author themselves intent it to represented.

The job of handling hugely variable abbreviation for output formatting is really for the formatting engine and specification; not the metadata. If I record an article from one journal and cite it in another with a different style, the spec for the latter is what determines how to render the name.

BibTeX, Endnote, RIS to MODS Conversion Tools

Posted in Uncategorized on January 10th, 2004 by darcusb – Comments Off

Say you have years of bibliographic data locked up in existing formats such as BibTeX, or in commercial applications like Endnote or Reference Manager. How do you convert that data to a format like MODS, or vice versa?

Chris Putnam has been working on just this issue, revamping his suite of bibliographic conversion tools to use MODS as his intermediate XML format. In the process, Chris has significantly improved the tools as well.

Alpha binaries available here, with a new release ”coming soon.“

MODS eXist example

Posted in Uncategorized on January 8th, 2004 by darcusb – Comments Off

The latest snapshot of eXist includes a XQuery-based MODS-example, available at:

http://localhost:8080/exist/xquery/mods.xml

I’ve not had time to test it, but this is nice!

Update: oops, seems the stylesheet to render the detailed view of individual records hasn’t been included yet.


Creative Commons License Creative Commons License