Lawsuit

September 27th, 2008

Lawsuit; (a) definition:

What companies often do to smaller competitors when they fear they otherwise cannot compete on the merits of their products.

Make no mistake, this is a nuisance lawsuit designed to intimidate. To quote from the complaint:

A significant and highly touted feature of the new beta version of Zotero, however, is its ability to convert - in direct violation of the License Agreement - Thomson’s 3,500 plus proprietary .ens style files within the EndNote Software into free, open source, easily distributable Zotero .csl files.

So what are they complaining about here?

  1. They say that GMU reverse engineered Reuters’ EndNote software to create Zotero. They cannot possibly demonstrate any evidence to support this claim, as it is not true. What Zotero does in order to read Endnote style files is no different than what, say, OpenOffice.org does to read Microsoft Word binary files. All the Zotero team did was figure out how to map the style files to Zotero internal style structure, which has no connection to Endnote, but is in fact based on development work on CSL. Thomson’s complaint, then, has the same merit as if Microsoft were to sue OpenOffice.org for its ability to read .doc files.
  2. I suppose they might have some legal basis for complaining if Zotero distributed Endnote style files (no doubt most of which are developed by Endnote users), but they do not.
  3. Finally, Zotero does not technically convert Endnote style files to CSL files; this only happens internally.

It is my hope that individuals and institutions see this lawsuit for what it is, and that it becomes yet another reason to in fact support Zotero (and other free solutions) and move away from Endnote. It is also my hope that Zotero and George Mason are not intimidated by this, and that they might see some help on the legal front to fight it.

first citeproc-hs release

September 13th, 2008

For those (like me!) that might be interested in using a CSL-based citation and bibliographic formatting system with markdown, Andrea Rossato has announced the first release of his really promising new citeproc-hs. In conjunction with the excellent pandoc, one can now do serious academic writing in markdown, and easily output to a variety of formats: HTML, LaTeX, OpenDocument, etc. (though I’ve not yet tested all these formats; there may well be work to do on some details).

lcsh.info and sparql

July 13th, 2008

Ed recently added a SPARQL endpoint for his lcsh.info site. A simple example query to return the concept URI and label for all labels that match a particular regular expression (in this case, those that start with “public”):

SELECT ?concept ?label WHERE {
  ?concept http://www.w3.org/2004/02/skos/core#prefLabel ?label 
  FILTER regex(?label, “^public”, “i”) 
}
LIMIT 10

I’m integrating some of these concepts into my own RDF and (forthcoming) personal site.

citeproc-rb gem, citeproc-hs

July 4th, 2008

Liam Magee’s been busy trying to finish the Ruby port of citeproc that I started way back. Along that path, a first milestone:

> sudo gem install citeproc-rb
Password:
Updating metadata for 97 gems from http://gems.rubyforge.org/
.................................................................................................
complete
Successfully installed citeproc-rb-0.0.1
1 gem installed
Installing ri documentation for citeproc-rb-0.0.1...
Installing RDoc documentation for citeproc-rb-0.0.1...

While Liam describes it as at best an early alpha release, this is still an accomplishment worth noting.

In related news, Andrea Rossato has been working on a haskell version, which he plans to integrate into pandoc.

With these two, plus two other in-progress ports—Johan Kool’s python, and Ron Jerome’s PHP—things should be getting interesting. And if these implementations happened to also support RDFa and bibo output, this could enable quite nice round-tripping of citation data.

bibo 1.0

June 5th, 2008

Yesterday, Fred announced the first formal (1.0) release of the Bibliographic Ontology. See there for details.

The primary change from previous drafts is in how we handle contributors. This was a difficult decision, but we decided to split the modeling of roles (editor vs. author vs. translator) from that of order. So, for example, we have a bibo:editor property that is a subproperty of dcterms:contributor, and we also have a bibo:editorList porperty to record the list proper.

We also added in some structures from related ontologies to handle events like broadcasts.

git as data store

May 16th, 2008

So the following has me intrigued: using git as a data store for a personal website. The Ruby-based git-wiki is a perfect minimalist example of how this could work, and there’s an interesting fork of it that builds on the git infrastructure for things like search.

See also git-python, and Ben O’Steen’s rather different tack on similar questions.

OpenOffice 3.0 Beta and Metadata

May 7th, 2008

The OpenOffice project has announced a first public beta of version 3.0 of the suite.

The most interesting among the list of new features from my standpoint? Easily the powerful new metadata support that will accompany the move to ODF 1.2. I spent a pretty difficult year helping move this pretty ambitious new functionality through the ODF TC at OASIS, so it’s nice to see not only that it is making it into the spec, but ODF’s most high-profile implementation.

I don’t believe the new RDF API is in this beta version, but we ought to see it soon enough I imagine. For those that might be curious, the API will just be a wrapper for Redland.

Google Docs Adds (Some) Styles Support

May 6th, 2008

Google Docs has a lot of potential, but I’ve for a long time been complaining about it’s lack of styles support. Well, Google has made a first step towards resolving this. The new “Edit CSS” support allows a user to pop up a dialog to edit the CSS file directly.

While this is a good first step, there are a number of noted limitations, as well as one obvious one that is not noted: one can only style existing HTML structures like headings, or blockquotes. One cannot do this and have the UI pick up the new style:

h1.title { font-size: 150%; color: blue; }

If Google manages to add that (which at least one of its competitors has had for at least a year), as well as programmatic access to create field-like structures in Google Docs, it might just develop into the serious tool it seems destined to be.

Another, more serious, problem, is that to really take advantage of the new CSS support, the underlying HTML needs to be pretty clean and correct. Right now, Google Docs create a lot of crap. For example, I added this rule to my stylesheet:

p + p { text-indent: 0.5px; }

This rule says to indent all but the first paragraph after another element (say a heading). When you have clean underlying HTML, it results in a really nice, clean, looking output. But it breaks if, as in my example Google Docs, half of your paragraphs aren’t actually paragraphs, but rather one big paragraph with a bunch of br elements! Grrr …

Google App Engine

April 8th, 2008

While Microsoft was busy this weekend sending threatening letters to Yahoo in an effort to buy a real presence on the web, Planet Python is awash this morning in news of Google’s new web app effort; and with damned good reason! This will be a quantum boost for Django, a framework that was already building steady momentum.

Now, if Google could just make it brain-dead easy to integrate such applications with Google Docs, then things might get really interesting.

Author Lists

April 6th, 2008

As Fred and I are gearing up to finally release a formal first draft of the bibliographic ontology, one of the biggest decisions we need to make was about how to represent different kind of contributions. When you have a single book author, this is easy to do. But there are all kind of complicated real world examples that make this a difficult issue to resolve.

Let’s be concrete and look at an example from the journal Nature. We have here an article with 22 contributors. The list of contributors in turn has 12 notes attached to it, which for the most part indicate affiliation, but also group what seem to be primary authors. Finally, after the enumerated notes we have a note that indicates the corresponding author.

So the first question is, how does Nature represent this in a standard legacy format like RIS? Answer: they just have an ordered author list:

TY  - JOUR
AU  - Kleinman, Mark E.
AU  - Yamada, Kiyoshi
AU  - Takeda, Atsunobu
AU  - Chandrasekaran, Vasu
AU  - Nozaki, Miho
AU  - Baffi, Judit Z.
AU  - Albuquerque, Romulo J. C.
AU  - Yamasaki, Satoshi
AU  - Itaya, Masahiro
AU  - Pan, Yuzhen
AU  - Appukuttan, Binoy
AU  - Gibbs, Daniel
AU  - Yang, Zhenglin
AU  - Kariko, Katalin
AU  - Ambati, Balamurali K.
AU  - Wilgus, Traci A.
AU  - DiPietro, Luisa A.
AU  - Sakurai, Eiji
AU  - Zhang, Kang
AU  - Smith, Justine R.
AU  - Taylor, Ethan W.
AU  - Ambati, Jayakrishna

How to do this in a more relational model though; say a relational database, or RDF? Both of these are unordered models.

One option is to simply translate this directly to RDF:

<http://www.nature.com/nature/journal/v452/n7187/full/nature06765.html>
    a bibo:AcademicArticle ;
    dc:creator "Kleinman, Mark E." ;
    dc:creator "Yamada, Kiyoshi" ;
    dc:creator "Takeda, Atsunobu" ;
    dc:creator "Chandrasekaran, Vasu" ;
    dc:creator "Nozaki, Miho" ;
    dc:creator "Baffi, Judit Z." ;
    dc:creator "Albuquerque, Romulo J. C." ;
    dc:creator "Yamasaki, Satoshi" ;
    dc:creator "Itaya, Masahiro" ;
    dc:creator "Pan, Yuzhen" ;
    dc:creator "Appukuttan, Binoy" ;
    dc:creator "Gibbs, Daniel" ;
    dc:creator "Yang, Zhenglin" ;
    dc:creator "Kariko, Katalin" ;
    dc:creator "Ambati, Balamurali K." ;
    dc:creator "Wilgus, Traci A." ;
    dc:creator "DiPietro, Luisa A." ;
    dc:creator "Sakurai, Eiji" ;
    dc:creator "Zhang, Kang" ;
    dc:creator "Smith, Justine R." ;
    dc:creator "Taylor, Ethan W." ;
    dc:creator "Ambati, Jayakrishna" .

This is what Ingenta does in its RSS/RDF feeds. The problem here is that you lose order, and hence relative contribution. You also aren’t treating the authors as full objects, but just dumb strings. You can’t, for example, attach affiliation information to them.

Another option is an even more simple de-normalized form; a string with a delimited set of author names. In RDF, you’d basically join the creator strings into a single property.

This preserves order, but this doesn’t get you very far. From the data model perspective, the meaning of the data within that string is totally opaque. You can’t, for example, search based on author name within some programming gymnastics.

The more normalized form would represent the contributions explicitly. So, imagine a contributions table with foreign key references to both an “agents” or “contributors” table and to the “references” (or whatever) table, plus a foreign key reference to a “roles” table, and an integer column that track the “position” within the list. While more complex, this gives some additional advantages, such as being able to distinguish the first three on the list as primary authors, and the rest as secondary. In RDF, a fragment would be:

<http://www.nature.com/nature/journal/v452/n7187/full/nature06765.html>
    a bibo:AcademicArticle ;
    bibo:contribution [
        bibo:contributor [ foaf:name "Kleinman, Mark E." ] ;
        bibo:role bibo_roles:author ;
        bibo:position "1" 
       ]

This has been the agonizing part of designing the new bibliographic ontology. We’ve adopted the second approach by adding an explicit Contribution class. The approach gives a whole lot of flexibility, and maps well to a relational database.

But for legacy data and such, I’d expect some developers might want to use the de-normalized approach above. Thankfully, one can always do both. Triples are pretty cheap, after all, and using one form does not negate the other.

I do wonder, though, if perhaps we need to distinguish among different kinds of contribution, so as to make it easier to scope positions within different lists (primary-contributions vs. secondary-contributions, etc.).


Creative Commons License Creative Commons License