Posts Tagged ‘Citations’

citeproc-rb gem, citeproc-hs

Posted in Technology on July 4th, 2008 by darcusb – Comments Off

Liam Magee’s been busy trying to finish the Ruby port of citeproc that I started way back. Along that path, a first milestone:

> sudo gem install citeproc-rb
Password:
Updating metadata for 97 gems from http://gems.rubyforge.org/
.................................................................................................
complete
Successfully installed citeproc-rb-0.0.1
1 gem installed
Installing ri documentation for citeproc-rb-0.0.1...
Installing RDoc documentation for citeproc-rb-0.0.1...

While Liam describes it as at best an early alpha release, this is still an accomplishment worth noting.

In related news, Andrea Rossato has been working on a haskell version, which he plans to integrate into pandoc.

With these two, plus two other in-progress ports—Johan Kool’s python, and Ron Jerome’s PHP—things should be getting interesting. And if these implementations happened to also support RDFa and bibo output, this could enable quite nice round-tripping of citation data.

Author Lists

Posted in Technology on April 6th, 2008 by darcusb – 5 Comments

As Fred and I are gearing up to finally release a formal first draft of the bibliographic ontology, one of the biggest decisions we need to make was about how to represent different kind of contributions. When you have a single book author, this is easy to do. But there are all kind of complicated real world examples that make this a difficult issue to resolve.

Let’s be concrete and look at an example from the journal Nature. We have here an article with 22 contributors. The list of contributors in turn has 12 notes attached to it, which for the most part indicate affiliation, but also group what seem to be primary authors. Finally, after the enumerated notes we have a note that indicates the corresponding author.

So the first question is, how does Nature represent this in a standard legacy format like RIS? Answer: they just have an ordered author list:

TY  - JOUR
AU  - Kleinman, Mark E.
AU  - Yamada, Kiyoshi
AU  - Takeda, Atsunobu
AU  - Chandrasekaran, Vasu
AU  - Nozaki, Miho
AU  - Baffi, Judit Z.
AU  - Albuquerque, Romulo J. C.
AU  - Yamasaki, Satoshi
AU  - Itaya, Masahiro
AU  - Pan, Yuzhen
AU  - Appukuttan, Binoy
AU  - Gibbs, Daniel
AU  - Yang, Zhenglin
AU  - Kariko, Katalin
AU  - Ambati, Balamurali K.
AU  - Wilgus, Traci A.
AU  - DiPietro, Luisa A.
AU  - Sakurai, Eiji
AU  - Zhang, Kang
AU  - Smith, Justine R.
AU  - Taylor, Ethan W.
AU  - Ambati, Jayakrishna

How to do this in a more relational model though; say a relational database, or RDF? Both of these are unordered models.

One option is to simply translate this directly to RDF:

<http://www.nature.com/nature/journal/v452/n7187/full/nature06765.html>
    a bibo:AcademicArticle ;
    dc:creator "Kleinman, Mark E." ;
    dc:creator "Yamada, Kiyoshi" ;
    dc:creator "Takeda, Atsunobu" ;
    dc:creator "Chandrasekaran, Vasu" ;
    dc:creator "Nozaki, Miho" ;
    dc:creator "Baffi, Judit Z." ;
    dc:creator "Albuquerque, Romulo J. C." ;
    dc:creator "Yamasaki, Satoshi" ;
    dc:creator "Itaya, Masahiro" ;
    dc:creator "Pan, Yuzhen" ;
    dc:creator "Appukuttan, Binoy" ;
    dc:creator "Gibbs, Daniel" ;
    dc:creator "Yang, Zhenglin" ;
    dc:creator "Kariko, Katalin" ;
    dc:creator "Ambati, Balamurali K." ;
    dc:creator "Wilgus, Traci A." ;
    dc:creator "DiPietro, Luisa A." ;
    dc:creator "Sakurai, Eiji" ;
    dc:creator "Zhang, Kang" ;
    dc:creator "Smith, Justine R." ;
    dc:creator "Taylor, Ethan W." ;
    dc:creator "Ambati, Jayakrishna" .

This is what Ingenta does in its RSS/RDF feeds. The problem here is that you lose order, and hence relative contribution. You also aren’t treating the authors as full objects, but just dumb strings. You can’t, for example, attach affiliation information to them.

Another option is an even more simple de-normalized form; a string with a delimited set of author names. In RDF, you’d basically join the creator strings into a single property.

This preserves order, but this doesn’t get you very far. From the data model perspective, the meaning of the data within that string is totally opaque. You can’t, for example, search based on author name within some programming gymnastics.

The more normalized form would represent the contributions explicitly. So, imagine a contributions table with foreign key references to both an “agents” or “contributors” table and to the “references” (or whatever) table, plus a foreign key reference to a “roles” table, and an integer column that track the “position” within the list. While more complex, this gives some additional advantages, such as being able to distinguish the first three on the list as primary authors, and the rest as secondary. In RDF, a fragment would be:

<http://www.nature.com/nature/journal/v452/n7187/full/nature06765.html>
    a bibo:AcademicArticle ;
    bibo:contribution [
        bibo:contributor [ foaf:name "Kleinman, Mark E." ] ;
        bibo:role bibo_roles:author ;
        bibo:position "1" 
       ]

This has been the agonizing part of designing the new bibliographic ontology. We’ve adopted the second approach by adding an explicit Contribution class. The approach gives a whole lot of flexibility, and maps well to a relational database.

But for legacy data and such, I’d expect some developers might want to use the de-normalized approach above. Thankfully, one can always do both. Triples are pretty cheap, after all, and using one form does not negate the other.

I do wonder, though, if perhaps we need to distinguish among different kinds of contribution, so as to make it easier to scope positions within different lists (primary-contributions vs. secondary-contributions, etc.).

Drupal, CSL, and Google SOC

Posted in Technology on March 17th, 2008 by darcusb – Comments Off

Ron Jerome has recently started work on a PHP port of CiteProc, for integration with Drupal. This would add the sort of citation processing support one sees in Zotero to Drupal, and potentially any other PHP application.

Having gotten roughly half way through the port, Ron got busy with other responsibilities, like updating his Biblio module for Drupal 6. So, instead, he’s decided to submit a project for consideration in the upcoming Google Summer of Code. It seems the idea piqued the interest of the right people, and it’s now listed among the “official” list of project ideas.

So if you’re a student with good PHP skills and interest in contributing in this space, feel free to apply. Or, if you know someone that might fit the bill, urge them to do so. If accepted, I’ll be a co-mentor, along with Ron.

RefDB and Word-Processor Integration

Posted in Technology on March 7th, 2008 by darcusb – Comments Off

RefDB author Markus Hoenicka discusses work he’s been doing on integrating the application with word-processors like OpenOffice. His argument:

Instead of expecting from all m bibliography tools out there to develop plugins for n word processors, thus placing a burden of maintaining m*n interfaces on the community, each word processor should implement a standardized interface to query and retrieve bibliographic data from any number of bibliography tools, which in turn have to support the same interface.

Indeed; I’ve been saying the same thing for years.

Of course, the devil is in the details, and this is a complicated problem. For example, if you have as your goal standardizing the interface so that it becomes easier for different tools to support a wider range of editors and word-processors, that doesn’t per se solve other problems. For example, it still leaves unanswered:

  1. How are citations encoded in the document?
  2. How are the citations and bibliography processed?

Even more fundamental from a use case perspective, if a user is now free to use different word-processors, are they also free to use different bibliographic data sources? Can they collaborate easily with their co-authors, who may or may not be using the same applications?

Markus’ proposed solution for the standard interface protocol is SRU, which is what we at the OOo bib project have advocated for quite awhile. WRT to my questions above, he has chosen to:

  1. use plain text markup within the document (rather than, say, fields) to encode the citation, using local (not global identifiers)
  2. a script scans the document and RefDB outputs a formatted RTF file

The implementation, then, has some limitations. As with Zotero, formatting is essentially specific to a particular application (and perhaps, even, database instance).

While I think Markus is right about the need for a standard interface, I think to really solve the issues I note above may well require moving more of the data and formatting logic and processing into the word-processor.

So imagine a Python/Perl/Ruby/Java library that was installed within the word-processor, and whose job it was to read standardized citation fields, match it to embedded (RDF in the case of OOo) source data, and to format the fields. So long as compliant applications could send the data in the right form, those documents would then be truly portable.

Remembrance Agent for Web 2.0

Posted in Technology on February 27th, 2008 by darcusb – 4 Comments

I’ve mentioned before an idea that Peter Flynn once put in my head. As I write this, I am taking a break from writing a manuscript (that is late!) using NeoOffice and Zotero. It all works fairly well, but I’m struck that the two together feel rather heavy. Consider the workflow if I need to add a citation and associated information:

  1. try to remember where I saw some information; go to Zotero to find it
  2. go back to NeoOffice to add content.
  3. go back to Zotero to insert the citation

While each time I do this the process is fairly quick, if you multiply it by a hundred it becomes a significant waste of time. More importantly, it’s a distraction. Writing is hard enough to be distracted by interruptions of this sort.

OK, so back to the idea:

Peter once mentioned Remembrance Agent. A screenshot with its emacs front-end:

Remembrance Agent

So the idea here is a service scans the content you are working on, sends it to a backend, which looks through emails, document and bibliographic references to find items of potential relevance, presenting it to the user for quick-and-easy access.

So here’s my thought: with all of the innovations in new Ajax-y applications, shouldn’t it possible to do something like this with web applications?

I’m starting to wonder about a nice web editor for academics: something stripped down and simple (but extensible) like the Mac application WriteRoom, that used a simple Markdown-like syntax, and which could plug-in to a bibliographic service a la Remembrance Agent.

Hmm …

From Proposal to Example: CSL Gallery

Posted in Technology on February 25th, 2008 by darcusb – Comments Off

So rather than just a CSL creation wizard, I realized it might be more sensible to do a full-blown web app. Am not the best coder, but am making some progress. Here’s the list of categories:

CSL Gallery screen 1

Here’s the (start of the) detail view of the APA style:

CSL Gallery screen 2

I’m using Django, which is nice. I got this together—complete with a full admin interface and multi-user authentication backend that comes for free with Django—in a few days. There’s still a lot of work to do (previewing, feeds, actual CSL generation, etc.), but I think this is promising.

So the idea is really an extension of the Zotero CSL repository, where accessing a style by its URI in a browser will give you the HTML view, complete with preview, but requesting it with an alternative CSL mediatype will instead get you the actual XML style file.

Of course, the real hard part will be in making it really easy for end-users to create new styles. But, I think I have the solution for this: most of the styling work will get handled with pre-assembled macros. In essence, I’ve built the class/table model based on what I outlined in the earlier proposal.

MakeCSL: A Proposal

Posted in Technology on February 13th, 2008 by darcusb – Comments Off

An observation:

CSL is at the stage where the language is virtually stable, and has gone through enough refinement that it has achieved its objective of being a powerful, open and accessible language for encoding citation styling information. It has been implemented fully in Javascript, and there are other implementations in progress for PHP, Python, and Ruby. In addition, styles are being written and deployed in publicly accessible style repository that allows styles to be accessed directly over HTTP.

However, it remains difficult for the average user to create new styles. There remains a large gap between the number of styles there are and the number of styles there needs to be to be declared a success.

If interested, read more about how I propose to resolve this issue here. The short version is, think MakeBST meets Web 2.0.

Alas, I don’t have the time or skill to do this all myself, so it won’t happen without help. Let me know if you’re interested.

Citations in HTML

Posted in Technology on January 24th, 2008 by darcusb – 5 Comments

So how to markup citations in standard (X)HTML that conforms to the rigors of scholarly standards? I’ve wondered about this periodically, and have again been wondering about it when looking at the recent (X)HTML5 draft.

Goals: simple, standards-compliant, information-rich. In short, I should be able to author my manuscript using the approach. It also ought to be trivial to add the support to applications, like, say, Google Docs.

How about:

<cite>
  <a href="http://ex.net/1">Doe, 1999</a>;
  <a href="http://ex.net/2">Smith, 2000</a>
</cite>

… then add a bit of CSS:

cite { font-style: normal; }
cite:before { content: "(" }
cite:after { content: ")" }

… and voila:

HTML citation

With this sort of encoding, automatically generating the reference list with, say, Javascript, would be trivial.

Potential issues:

What to do about reference lists? They’re quite common across the sciences and social sciences, and in an HTML document, it’s useful to link the in-text citations in some form to their reference entries. But with the approach above, I’m in fact linking directly; bypassing the internal reference.

So one way around this is to use some indirection:

<cite>
  <a href="#do99">Doe, 1999</a>;
  <a href="#smith00">Smith, 2000</a>
</cite>

… and in the reference list:

<li>
  <a id="do99" href="http://ex.net/1">Doe, 1999</>;
</li>

Hmm … can I do that; link to a link?? Seems to work fine in Firefox at least. So that seems like the best approach for at least my workflow.

The other issue is that some fields (notably law and the humanities) often don’t use reference lists, and put all the information in notes. I suppose a hidden reference list is a possibility, but that’s rather awkward for a reader.

MakeCSL: A Proposal

Posted in Technology on January 7th, 2008 by darcusb – Comments Off

The following is the introduction to a proposal document I just checked into the XBib SVN repository. Unfortunately, while I can do a lot on the design end, my Javascript skills aren’t good enough to achieve what I think we need, and I don’t have the time to acquire that level of skill. If someone out there has those skills and the interest to experiment, let me know, or post a note to the xbib dev list.

Anyway, the nutshell of the idea …

CSL is at the stage where the language is virtually stable, and has gone through enough refinement that it has achieved its objective of being a powerful, open and accessible language for encoding citation styling information. It has been implemented fully in Javascript, and there is another implementation in progress for Ruby. In addition, styles are being written and deployed in publicly accessible style repository that allows styles to be accessed directly over HTTP.

However, it remains difficult for the average user to create new styles. There remains a large gap between the number of styles there are and the number of styles there needs to be to be declared a success.

To rectify this situation, one obvious approach is to build a full-blown editing GUI, which allows a user to load existing styles modify them, etc. However, such a task is not straightforward. Citation styling can be quite complex, and CSL is designed to accommodate that complexity. While it is certainly possible to do such a GUI, it will take time to realize.

Rather than to take the next step for a fully implemented editor than can both read and write CSL styles, then, I propose instead a much simpler and more incremental enhancement that borrows from lessons of the past. Like the MakeBST utility that allowed BibTeX users to more easily create new citation styles by answering a series of questions, MakeCSL will make it much quicker for users to create new styles by focusing on writing new styles.

It seems to me such an approach is likely to have the most bang-for-buck in building the infrastructure that will allow the dramatic expansion of the number of freely available styles. Since it could be done using standard web technologies, it should open up the number of potential style contributors. In turn, the lessons learned from it can benefit more comprehensive editing GUIs.

Oh, I did start to put together a somewhat amateurish example of what I have in mind.

2collab

Posted in Uncategorized on December 1st, 2007 by darcusb – Comments Off

Somehow I stumbled on 2collab, a new social bookmarking site for scholars. From the about page of their website:

2collab is a social bookmarking site where you can store and organize your favorite internet resources – such as blogs, websites, research articles, and more. Then, in private or public groups you can decide to share your bookmarks with others – stimulating debate and discussion. Members of groups can evaluate these resources (by rating bookmarks, tagging and adding comments), or add their own bookmarks. You can browse public groups and bookmarks, but must register (your name and email address) to access the full functionality – such as creating groups, adding comments, and adding bookmarks.

FWIW, I haven’t the faintest interest in this service. It’s a publishsr-led effort, with no obvious open source development model (e.g. it is not free software), and some worrying terms and conditions. I’d urge potential users and developers caution in supporting it. There are better alternatives. If you’re looking forward to the next generation of such tools designed by and for scholars rather than publishers, look for Zotero 2.0. If you’re looking for a social bookmarking site now look into CiteULike or Connotea.


Creative Commons License Creative Commons License