Posts Tagged ‘Citations’

Opening Up Semantic Translators

Posted in Uncategorized on November 30th, 2007 by darcusb – Comments Off

Alf Eaton has been doing some important work on figuring out how to convert Mozilla-specific Javascript translators from the Zotero project into cross-browser and server/client-agnostic alternatives.

So let’s see, completely open and flexible …

  • data = RDF
  • identity and authentication = OpenID (and FOAF)
  • citation styling = CSL
  • legacy data translators = Alf’s work

CSL News

Posted in Uncategorized on November 16th, 2007 by darcusb – 3 Comments

Those of us working on CSL have been talking for awhile about realizing a vision: one in which users never again have to worry about the arcane details of citation styles, and where users can seamlessly access new styles from a diversity of online locations. So imagine two use cases. First:

Jane Doe is a researcher at the University of Gotham. She and her team decide they’d like to submit an article manuscript to the Journal of Nuclear Physics. She thus goes to the journal site. Rather than reading through the style guide for authors, she instead clicks a link. The style information for the journal is loaded into her citation processing software (in this case Zotero, but it could be any software). If the journal makes any corrections or changes to the style, the software transparently updates it.

Second:

John Smith is a graduate student in philosophy. He comes across a repository that aggregates styles from the majority of relevant journals. The main web page for the repository lists a variety of categories. John clicks the “subscribe” links next to his area of interest, and his citation software makes that list of styles available for his use. Those styles he activates are transparently updated as they are improved.

Zotero has made the first step towards this vision by hosting a CSL repository. So the styles are all now freely available for anyone to download. More importantly, each style has a URI id, which resolves to the location of the file. This means any tool, knowing the id for that style, can load it for use in formatting. Moreover, once Zotero releases 1.0.2, Zotero users will be able to click on the “install” link beside each style, and it will be available for immediate use.

A second related development is that people are really stepping up to help make this vision happen. The award for best contributor of late definitely has to go Julian Onions, who has been tirelessly doing a lot of the work I should have done over the past year. As a result, we now have a pretty good start of documentation for CSL, and a decent and growing collection of styles.

Finally, in other news, another contributor at the xbib project is working on fleshing out a Ruby implementation of a CSL engine and bibliographic ontology object model, and Peter Hedlund has been working on a CSL editing application.

Now just imagine if a few more people followed these people’s lead!

Resistance to Zotero?

Posted in Uncategorized on October 27th, 2007 by darcusb – 5 Comments

A few weeks back, I came across a review of Zotero. In the context of this very positive review, I came across the following comment:

For Scott, who is not affiliated with an academic institution, Zotero makes a lot of sense because it’s free.

But many IHE readers are affiliated with a college or university. Those folks may want to find out if their library has RefWorks. It’s available free of charge to anyone at the institution — and if you ever leave you can take all of your citations with you.

I won’t turn this comment into an ad for Refworks, which to my way of thinking, does all that Zotero does and more (for example, integrating your citations from your personal database directly into your word documents in any of dozens of citation formats — and with RefShare — scholars or students in a class can start to share their references ). I’ll just suggest that like many other great library resources that are made available to the campus community, faculty should not overlook them — and they should be advocates for encouraging their students to use them as well.

What struck me as odd is that this was not, as I first suspected, a post from a RefWorks representative. That would make some sense. Rather it was from one Steven Bell, who lists himself as “associate university librarian at temple u.” Hmmm …

In a followup, I proceeded to point out some factual errors and differences of opinion on the content of Steven’s comment. I also challenged libraries for spending large licensing fees for what is an inferior product when even a portion of those fees could be directed to Zotero. As I wrote:

Universities spend a lot of money for those licenses. Imagine if instead they invested in truly free solutions like Zotero (which I personally believe is superior to RefWorks in virtually every way; the only exception currently being the lack of server support)?

I then came across an even more bizarre comment from one H. Stephen McMinn:

I was going to enter the comment area and reply to the potential pros and cons of various bibliographic management software, but the level of discourse has discouraged me from even considering it. Zotero has some fine features which other bibliographic management software packages don’t but it also doesn’t have the functionality of others. I really don’t see the need to blast someone because he stated something is free when it is in fact subsidized by the university so it appears free to university community. Can’t we all get along?

Huh? So I was curious: is this another librarian sensitive to critique of vendor products? Well, it turns out, apparently yes!

So I’m just left scratching my head at this. Why on earth would librarians be defending costly, limited and closed solutions and subtly digging a project that is arguably better, certainly free, and developed by a group of scholars? It made no sense!

But I just came across another post that helps clarify what I would call the dysfunctional organizational politics of these positions. In Zotero proselytizing, a library information sciences student observes the following:

I don’t talk about Zotero too much at work because we subscribe to, and are busy promoting- RefWorks. I feel sorta like a traitor. But in my own research, Zotero has been an absolute godsend. I truly believe students are better off using Zotero, because they can store, annotate, and, if they install on a portable version of Firefox as I have, take their database anywhere, even places without an internet connection. Not to mention, when they graduate, they can take all their research with them and not have to pay $100 a year.

Ah ha! This starts to give some insight. Sounds a little like what I imagine CIA employees skeptical of the “slam dunk” intelligence on Iraq’s WMDs must have felt like before the invasion!

I know from talking to library IT people that most are really psyched about Zotero. Many of them promote Zotero on their blogs, or use it for their own research, and some even hack on it. And clearly library people get the useful innovations that Zotero brings to their users.

But what about this business of feeling like “a traitor” for not promoting the party line proprietary solution? It’s really a shame, since it seems that the only thing this student is betraying in promoting Zotero is a rather narrow-minded organizational group think; not their end users.

Aside: it occurs to me that when I use the term “free” in these contexts it may be a little unclear exactly what I mean. I mean it in the free as in free speech tradition; not simply that it is cost-free.

I don’t think many people realize how crucial bibliographic data is to a scholar. A rather intense frustration can result from feeling that such crucial data is locked-in to closed products that have a history of glacial innovation. A lot of my interest in data and metadata modeling really comes from having been unable to represent a lot of my data in applications like Endnote and RefWorks, and not having any faith its developers would improve their applications to accommodate my needs. With Zotero, by contrast, I know people like Dan Cohen have gone through similar frustrations, and that they will always strive to create a better tool regardless of market considerations. I am also confident that whatever work I directly or indirectly put into Zotero will have positive impacts beyond Zotero.

Social Networking Interop with OpenID

Posted in General on August 13th, 2007 by darcusb – Comments Off

Excellent overview of the possibilities of using OpenID to facilitate seamless exchange of profile information among social networking services and sites. Dare notes the business reasons why this is unlikely to happen in the for-profit world of Facebook, MySpace, et al. In the realm of open bibliographic services like Zotero, CiteULike, and RefBase , on the other hand, it seems a perfect solution.

What is a citation?

Posted in Uncategorized on June 7th, 2007 by darcusb – 10 Comments

As a part of Zotero and ODF discussions of citations recently, we’ve stumbled on a tricky issue that impacts on a lot of different pieces to the puzzle of robust and reliable citation formatting. The question for today is, what is a citation?

Let’s examine an example rendered in author-date style:

(Doe, 1999:25; see also Smith, 2000)

So how to model this? What to do with the cited page number, and the “see also” bit? How to encode it so that it’s easy to sort and reformat?

One option is to just say that a citation may have one or more references, each of which can contain parameters for cited pages and prefixes or suffixes. Graphically (think in terms of RDF), this might look like:

This is rather complex; probably more than it needs to be.

Another option is to treat the cited page not as a parameter of some reference abstraction, but to more directly encode what the user intends: namely to cite the item fragment itself.

That’s simpler. Hmm …

Come to think of it, I’m not sure it’s correct; the “see also” prefix isn’t really for the source, but rather for the reference to the source. So maybe we can’t throw out the reference abstraction just yet.

Moreover, we’re left with another, even more fundamental, problem. The “see also” prefix in fact references to a different kind of reference. As such, it gets sorted differently within the citation. So if you have a style that says to sort references within the citation according to author-date, it needs to group such references at the end of the citation, sort within that group, and attach a prefix to the entire group.

So a formatting system that fully supports real-world citation styles really ought to understand that this is a different kind of reference; something like:

Admittedly, option 3 would require some cleverness to make this all fully intuitive and natural for the user. It would also likely result in limiting the flexibility users are accustomed to with other applications like Endnote. However, in that case, users are basically forced to handle all this themselves, so it seems a small loss in flexibility is balanced by a fairly big payoff in automation.

Citations and Fields

Posted in Uncategorized on May 26th, 2007 by darcusb – Comments Off

I’ve been having an interesting discussion with people involved in implementing citation processing in Zotero. This is the functionality that allows one to add a citation to your Word or OOo Writer document, and have it and the bibliography automatically generated.

They’ve stumbled on a rather large conceptual and practical stumbling block: how to implement note-based citations. If a user adds a citation to the document and it is automatically rendered as a footnote, is that object then a citation in a footnote, or a citation that is simply rendered as a footnote?

Use Cases

Allow me to explain with some use cases:

Basic Case

A user starts a new research paper. They select a footnote-based citation style. They add citations to the document, and each of them is automatically rendered as a footnote.

They then realize they need to use a different citation style, and choose instead an APA in-text author-date style. The footnoted citations are then automatically moved into the text in the proper form.

Complex Case 1

Users wants to add a footnote to the document and include one or more citation references in it. They add the footnote, and then add both their commentary and the related citations. If they switch to a non-note-based citation style, this footnote remains a footnote; only the citation rendering changes.

Complex Case 2

User wishes to add commentary about the citations in the note to that note (as opposed to in the body text). User clicks in the body of the footnote and begins typing. If they switch to a non-note-based citation style, this footnote also remains a footnote.

Discussion

Citations can occur either in the main body text, or in notes. Whatever the citation style, (rendering of) citations in notes are different than body text citations, because they occur in the context of note-based commentary. Their position in the note is thus not an artifact of the citation style, but rather fundamental to the content. Both the content of that note and its citations will remain in the note regardless.

There is no disagreement about the basic case. We all agree citations should be automatically footnoted in note-based citation styles. This is not some theoretical problem. Some fields use both note-based and in-text author-date styles, and absent automation, users wishing to switch from one to the other would have to manually move every single citation in and out of their notes, a tedious process. We all agree it’s a major shortcoming of existing applications (like Endnote) that they do not manage this issue for their users.

Where we diverge is on implementation details highlighted in the complex cases.

Complex Case 1 illustrates the clear distinction between the two: it is a citation within a footnote, rather than a style-dependent footnoted citation.

Complex Case 2, however, demonstrates a likely case where the user in essence might want to convert a footnoted citation into the first form.

So two different issues of concern to me:

First, what should the user experience be here when a user would like to add commentary to citations?

Forget about footnotes. Consider short comments in in-text citations? I want to do (Doe, 1999; see also Smith, 2000, chapter 2). Can I do this? If so, how? If I do, how do I select the citation source?

Note: my questions above do not necessarily presume any answers. I am asking, though, because users sometimes do use notes in in-text citations.

Second, how should this be encoded in document formats (specifically ODF and OOXML) such that users can be confident of some acceptable level of interoperability in citations across different applications?

The debate we’ve been having touches on both dimensions of the question, but a bit more on the latter. In short, should a citation field in ODF or OOXML be allowed to contain a footnote or endnote, or must the citation always be wrapped in the note?

Allow me to illustrate using the new text:meta-field from the ODF metadata work. Let’s imagine a multi-reference citation with an author-date style. It might be done like so:

<text:meta-field xml:id="citation-1">
  (<text:meta-field xml:id="citation-1-r1">
    Doe, 1999
  </text:meta-field>;
  <text:meta-field xml:id="citation-1-r2">
    Smith, 2000
  </text:meta-field>)
</text:meta-field>

So we have a nested field. These fields are then hooked up (via a binding that uses the xml:id) to some RDF/XML in the file package.

To a user, this would display like:

(Doe, 1999; Smith, 2000)

They could individually select the references, which would be read-only.

So now: what happens if the user changes to a note-based style?

My argument is that because the footnote/endnote rendering is only an artifact of the processing, and does not reflect a user’s explicit choice, the XML encoding should reflect this by including the footnote within the outer field; something like:

<text:meta-field xml:id="citation-1">
  <text:note>
    <text:meta-field xml:id="citation-1-r1">
      Doe, 1999, Some Title, New York:ABC Books.
    </text:meta-field>;
    <text:meta-field xml:id="citation-1-r2">
      Smith, 2000, Some Other Title, London:XYZ Books.
    </text:meta-field>
  </text:note>
</text:meta-field>

The only time a citation should be contained within a note is when a user explicitly chooses to do so.

So the questions are, I suppose:

  1. Does this make sense from a user-experience and document-encoding perspective?
  2. Can this be implemented such that we can—at least some point in the not-distant future—have interoperability across different editing and bibliographic applications?

To be more concrete, when MS adds support for note-based citations, how will they encode them in OOXML? When OOo developers add support for the new metadata field and citations, how will they do it?

[update: fixed some minor typos]

BibMe

Posted in Uncategorized on May 17th, 2007 by darcusb – Comments Off

I was wondering when someone would finally apply best-of-breed contemporary web design approaches to the realm of citations. Well, BibMe does just that: AJAX and Ruby on Rails underpinnings and a gorgeous interface. What more could a time-strapped undergraduate want?

There are lessons here for applications aimed at more professional scholarly users. Consider how clean and simple it is to enter a book. The default interface allows you to enter a title or isbn:

default book interface

So I enter my book title, and some AJAX magic quickly brings up a results-list without loading a new page:

results list

Finally, I choose the correct item, and it gives me the pre-filled metadata:

filled form

If that auto-fill stuff doesn’t work, a quick JS-enabled flip to the “manual” form yields this:

manual book entry form

Nice; this is how all online bibliographic managers ought to work!

My only real critique (and it is fairly minor) is that they didn’t OpenID-enable the service.

And looking farther out, it’s really unfortunate that we’re faced with two levels of application: the simple more-or-less manual citation process of BibMe and others, and the more robust and automated integration of Endnote, Zotero, BookEnds, etc. As word-processors like Word, OpenOffice, Google Docs, etc. and their fil formats (OOXML, ODF) start to get real citation support, though, this awkwardness ought to go away, and we can have richer, more automated and more interoperable solutions.

Note: BibMe prefills “2005″ as the year for my book. While strictly true that it was published in 2005, the copyright date is actually 2006. I think this would result in a technically incorrect bibliographic entry, then (though have seen others cite it this way, so who knows?).

Zotero and the Bazaar: What Zotero Should Learn From Successful Open Source Projects

Posted in Uncategorized on January 27th, 2007 by darcusb – 7 Comments

For awhile now I’ve been watching the Zotero project, and admiring their ability to deliver a compelling application that users (including me) have been hungering for. I appreciate that they have built the tool around one of the preeminent open source applications (Firefox), and that they have relied on open standards and development tools where possible. Finally, the code is all freely available and unencumbered. In all these ways, the Zotero team has gotten it right.

But … is this really enough to take it to the next stage?

Consider that there is no community involvement in Zotero development planning. The Zotero coders add features, based on their own internal assessment, adopting their own solutions, without wider public input or discussion. Third-party developers find out about what they are doing only after they release the code as more-or-less a fait accompli.

The latest news about word-processor integration is a perfect case in point. As a co-project lead for the OpenOffice bibliographic project, I’m trilled to see them say elsewhere they intend to support similar functionality there, but also perplexed they plan to do this without apparent consultation with our project. I can only assume this because I have yet to see any evidence to the contrary.

Put simply, Zotero looks more and more like a proprietary project dressed in free software clothing.

My problem with this does not just reflect some hopeless idealism, though. The issues of process and governance have practical consequences. The Zotero dev list has been essentially dead since its opening, which I have to believe is a consequence of the fact that developers simply do not feel included in the process.

And Zotero needs outside developers. Innovation in this space depends on standardization of data formats and representations, document encoding, APIs, and so forth. And a lot of code ought to be able to be shared between projects. Without the collaboration that makes that possible, users will end up boxed into the same kind of corner that truly proprietary applications like Endnote have long painted us into.

Free software is not just about cost; it’s about an alternative development methodology. Eric Raymond famously described traditional closed development models as cathedral-like, and the free software revolution inaugurated by Linux as a bazaar. Raymond makes clear that the cathedral approach has its place, particularly in the early stages of a project. But beyond that, the advantages of truly open development are so compelling that they are hard to avoid. Indeed, Mozilla and Firefox are themselves a product of this methodology. That Zotero exists at all is because they have smartly built off of this free software infrastructure. Yet one gets the sense of a project at the portals of the cathedral gazing out at the bazaar, but not yet ready to step out the door.

Now—as they look to transition from a compelling first release to more advanced functionality—is a perfect time for the Zotero team to move from the hermetic world of the cathedral, to the open world of the bazaar. If done right, it will get us all where we want to go more quickly, and with better results. This has to mean cultivating a more collaborative and interactive community, particularly with developers. It has to mean publicly documenting and discussing what they want to do before they do it, so that other developers can give useful feedback, and in turn plan for forthcoming changes.

I should add that I hesitated to post this. But I’ve already gently pushed on this in both public and private, on more than one occasion, without much effect (though they did open up their code repository). I present this, then, in the spirit of constructive criticism.

System-Level Citation Services

Posted in Uncategorized on October 23rd, 2006 by darcusb – Comments Off

Been talking with Alf Eaton (now at Nature) and Thomas Zander (of KWord) about what it would take to make it easy for different bibliographic applications to plug-in to word-processors: OOo, KWord, AbiWord, Google Docs, etc. Thomas put up a couple of use cases to help think this through.

I think it’s clear we’d need:

  1. standard citation IDs; URIs
  2. standard in-document citation fields, which hold URIs + local parameters
  3. some kind of system service or API

An idea that emerged from this discussion is a system-level bot that could take a list of URIs from a client, and then return the formatted strings and/or raw metadata. This bot would first look in a local database, and if the items were not present, could then go out on the net and find the relevant metadata.

I think still to be settled is how to allow a tighter coupling of editor and service (for example, to be able to browse citations from within the editor), and the precise technology to make this all happen. It would be nice to see a generic mechanism for this, though, so that when need applications like Zotero come on the scene, they can instantly plug-in to this infrastructure.

XSLT 1.0 + EXSLT

Posted in Uncategorized on October 9th, 2006 by darcusb – Comments Off

If I (or, ahem, preferably someone else) was going to port my XSLT 2.0 version of CiteProc to 1.0, this is a hint of how to do it. So like the current version, use custom functions to do the heavy-lifting, and keep the templates as clean as possible. It’s really not possible to do without support for EXSLT though.


Creative Commons License Creative Commons License