Zotero Groups and Teaching

Posted in Teaching, Technology on September 24th, 2009 by darcusb – Comments Off

Like Sean Takats, I’ve been experimenting with using Zotero’s new groups functionality in a graduate seminar I teach. Here’s a quick report.

The course in question is a beginning seminar required of all grad students in my department (though this year I also have someone from history as well). Its purpose is to introduce them to the history of the discipline (geography), and to given them basic skills to analyze the development of literatures in more focused subfields.

The course involves weekly readings and reading responses. In the past, students posted the reading responses to a course listserv. The major product of the term is a literature review paper on the evolution of a subfield.

So my initial plan was:

  1. setup a private Zotero group for the course
  2. create collections for different broad topics, as well as weekly topics
  3. ditch the class listserv and have students comment on readings by adding notes to the Zotero items

How well did this work? Not exactly as planned. Item 3 above was a disaster, since Zotero groups are not setup to facilitate discussions. So I switched back to the listserv.

It’s been a challenge to get students up and running with Zotero, but they’re starting to adjust, and contributing to what may have a lot of promise: a collaborative annotated bibliography of sorts that will hopefully develop over time so that it can be a resource for future grad students.

But, issues:

  1. Tag management is a PITA for individual users, but unmanageable for groups. Automatic tagging is really more trouble than help, but before realizing this, you end up with dozens and dozens of useless tags, and no easy way to bulk manage them. Morever, there appears to be some weird syncing-related bugs that happen when I edit or delete tags individually. This is a problem that I hope gets resolved.
  2. Sometime sync issues (which could be networking related; not sure).
  3. There’s no easy way to see who contributed what to a group library.
  4. Students have struggled a bit understanding how group items relate to personal items (they are copied, not shared).
  5. No annotated bib support (I ask them to submit one).

So I think it’s fair to say we’re finding promise in the group functionality, but that there’s still some work to do.

Promoting an Extended Date-Time Format

Posted in Technology on September 7th, 2009 by darcusb – Comments Off

For people that work with bibliographic data, you quickly realize that standard date-time formats formats often don’t go far enough. So it’s nice to see this effort from the Library of Congress. It seems to be a clean superset of the ISO 8601 date-time format, and to cover the most important missing pieces. This is a datatype that can and should be used anywhere that people need to represent bibliographic dates. I’m interested in using it in both RDF (Dublin Core) and in JSON, for example.

Yet the first paragraph of the description page presents the effort much more narrowly:

There is no standard date/time format that meets the needs of various well-known XML metadata schemas, for example MODS, METS, PREMIS, etc. For several years there has been discussion of developing a reasonably comprehensive date/time definition for the bibliographic community, and submitting it either for standardization or some other mode of formalization - a W3C note for example, a NISO Profile, and/or an amendment to ISO 8601.

Issues:

  1. MODS, METS, etc. are not at all “well-known” except in the library world
  2. XML is not the only way to represent or move around data in 2009 (RDF and JSON are two I quite like, for example)
  3. XML Schema is not the only way to represent XML formats; many people avoid it like the plague
  4. formal standards are over-rated; better to first establish de facto standards through adoption

So my observation is simply this: the EDTF is little gem, and can and should be widely used in a variety of different contexts. The LoC should recognize this and adjust details (documentation, examples, namespace URI) accordingly to promote it as such.

Public Work From the Start

Posted in Research on August 7th, 2009 by darcusb – 2 Comments

I’ve become increasingly disillusioned with the nature of academic publishing. Just today I had a manuscript accepted for publication in a special issue of a journal that I myself have no access to (!). I hate that ideas I may be working on only get airing either in conference presentations, or after going through the peer-review process, or by informally passing around a manuscript among friends. I hate that the readership of my work is severely constrained by the publishing model that predominates in 2009.

As I’m working on setting up a new source-control and backup system for my academic manuscripts, I’m wondering: why not to put it all in public (say github) repositories? It’s certainly much easier technically. And it can have other benefits if I want comments during the process.

Three obvious arguments against:

First, there’s a long history of researchers treating their work as proprietary. There are entirely rational reasons for this that have to do with the rewards structure of the academy. In short, you don’t get tenure without being able to brand your work, and there is a competition for new ideas, a concern about people borrowing or stealing those ideas, and so forth.

But, I’m not that concerned about this issue. If ideas are public from the start, the digital paper trail is there such that interested parties can fairly easily determine the provenance of ideas.

The second issue is potentially bigger: peer review. If all work is public from the start, then peer review in theory cannot be blind. But maybe this is all the more reason to push on this idea; blind peer review is both overrated, and a bit of a fiction anyway.

The third issue is related to both of the above: the “previously published” standard for publication. Publishers almost without exception demand you assign copyright of your work to them, and part of involves guarantees that it has never been previously published. What does it mean to publish version-controlled draft work on the web, or to blog pieces as you go?

Maybe I ought to just to go all public, and all open access. Hmm …

Ah, the above is obviously related to the much more catchy notion of the open scholar [via DigitalKoans]

Law and the Thomson Reuters-Zotero Suit

Posted in General on August 3rd, 2009 by darcusb – Comments Off

Sean Takats blogged awhile back about the dismissal of Thomson Reuters suit about Zotero. I had a chance to read the transcript of the hearing. As Sean wrote, the judge dismissed the Thomson Reuters complaint due to a lack of jurisdiction. What exactly does this mean? From my non-expert read, the dismissal was on a technicality: that Thomson Reuters asserted damages ($10 million/year worth) it could not demonstrate. There was never any discussion of the substance of the suit; rather, virtually the entire hearing focused on the question of how Thomson Reuters came up with the $10 million figure. Answer: a very precise 80% of a vague estimate of number of downloads from the Zotero site, multiplied by $200 (the average price of Endnote software). The judge recognized this as ridiculous, and so threw out the case.

Here’s hoping Thomson Reuters has learned a lesson here and backs off refiling.

Google Wave and Rich Text

Posted in Technology on July 30th, 2009 by darcusb – 2 Comments

I finally got an account for the Google Wave sandbox account. Am still adjusting myself to the UI, and so don’t have a whole lot to say, except … it’s a rather large frustration to me to see Google reverting to the 1980s with its rich text widget. Yes, they have structured lists. That’s good. But, they have nothing approximating styles support. If you want to denote a section heading, you have to bold the damned text!

Come on Google, you can do better than this. Drop the damned font-family support and add at least basic support for headings (and farther out options for custom styles)!

HTML5 Process

Posted in Technology on June 9th, 2009 by darcusb – 2 Comments

Ben Adida on the microdata in HTML5 proposal:

So, I cannot live with something that throws away existing important implementations of the *exact* same use cases for no valid technical reason.

Ian’s response:

Indeed; I examined all the existing solutions that I could find closely as the first step (well, the second step, after collecting use cases). I didn’t go through all of them one by one in the e-mail, but I did explicitly examine Microformats and RDFa: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

If you go to that URI, here’s his explanation for why not RDFa:

- it uses prefixes, which most authors simply do not understand, and which many implementors end up getting wrong (e.g. SearchMonkey hard-coded certain prefixes in its first implementation, Google’s handling of RDF blocks for license declarations is all done with regular expressions instead of actually parsing the namespaces, etc). Even if implemented right, namespaces still lead to flaky copy-and-paste behaviour.

- it sometimes uses rel=”" and sometimes uses property=”" and it’s hard to know when to use one or the other.

- it introduces much more power than is necessary to solve this problem.

I think the first point is a reasonable one in the sense that prefixes have costs as well as benefits. But the same is true of unprefixed names. A balanced discussion of these tradeoffs seems warranted. Is it really (really!) worth it to invent an entirely new spec because of one fairly trivial issue? Is it really (really!) worth it to force tools developers and publishers to have to do double work?

The other two points range from trivial to entirely ridiculous. Who really decides, for example, how much power is needed for extensible metadata in HTML? Surely the answer will depend a lot on particular use cases? For example, on the general citation case, WikiPedia may have less demanding needs than an academic or legal journal. Shouldn’t that understanding that one size does not fit all be at the center of any extensible metadata support in HTML5?

He then goes on to try to “fix” these problems by removing prefixing, and the rel/property ambiguity. Recognizing that removing the prefixing introduces other problems for readability, etc., he concludes that This, though, is quite ugly.

OK, so aesthetics are now a requirement shaping the design; I have no clue where that came from. To solve this problem he introduces an equally ugly, and completely arbitrary, new way to indicate a global name: the reverse DNS. Where’s the analysis that justifies these conclusions? Do we just accept these claims about aesthetics and usability without any kind of evidence?

Is there no sanity at all in the HTML5 process?

Thomson Reuters Suit Dismissed; Give it to Zotero

Posted in Technology on June 5th, 2009 by darcusb – 1 Comment

So the ridiculous nuisance suit that Thomson Reuters filed against GMU has been dismissed. Am curious to learn the precise details of the ruling, but those should be available soon enough.

In related news, I’d like to encourage people to do as one of the commenters to this story did; take what you would normally pay for an Endnote upgrade, and donate it to the CHNM instead.

Google Wave and Learning

Posted in Teaching, Technology on June 2nd, 2009 by darcusb – Comments Off

Michael Feldstein has two smart posts on the implications of Google Wave for learning:

Quick summary: like me, he thinks Wave is a potential game-changer that has major implications for learning. But he basically answers “no” to the question presented in his second post. His argument comes down to the core point that Wave is unstructured, and this is not always in sympathy with the goals of learning. The argument is twofold:
  1. Permissions: as he puts it, there are times when you want to control permissions, when you don’t want everything to be editable to everyone, when you want to steer a conversation or process in a particular direction. In those cases, the Wave Server as currently being demonstrated will not provide the necessary structure. It is possible that Google will implement fine-grained permissions structures in future versions, but I doubt it.
  2. Sequential structure: there are times when waves are exactly the opposite direction of where you want to go. I believe that half of good teaching is sequencing experiences such that students are more likely to learn in deep and meaningful ways…. Wave is not designed for that at all. To the contrary, it is designed to get out of the way of free-form communication.
I think Michael is spot on: how much of an LMS Wave could replace hinges on how much permissions control Google opts to add, and how well it can be integrated into other, more structured environments. I also suspect he’s right that Google is unlikely to add this sort of structure, but instead make it easy to integrate Wave into other, more structured, environments (such as an LMS).

But if we accept the conclusion that Wave is likely to be more a supplement than a replacement to a more structured LMS, this leads to I think an important question: how might the two worlds—structured and un/semi-structured—be best integrated? It doesn’t seem to me to necessarily follow, for example, that taking a big existing LMS and bolting Wave on is likely the best answer.

Google Wave Free Association

Posted in Teaching, Technology on May 29th, 2009 by darcusb – Comments Off

So Google’s announcement of Wave seems like a big deal. Rather than the typical deep and thoughtful post, which I’m sure others have done, some random thoughts:

  • from what I can tell, unlike much of Google’s current application infrastructure (GMail and Docs), the code will be open source; this would be really big
  • also unlike the current apps, Wave is distributed; this is also a really big deal
  • the collaborative document-editing seems wicked cool, and goes a fair bit beyond Docs
  • I absolutely cringe when I hear the phrase rich text in 2009, particularly when Google has so failed to get the basics of structured documents right in Docs
  • OTOH, HTML 5 provides some room for them to improve this if they put their mind to it
  • would really love if the extension mechanism was rich enough to allow integration of citations (say a Zotero extension; though perhaps something more distributed), and flexible enough to do it right (which by definition means not based on bibtex)
  • This has potentially big, though as yet unclear, implications for higher education, and for the sort of work that happens these days in LMSs.

On the Inclusion of BibTeX in HTML5

Posted in Technology on May 20th, 2009 by darcusb – 9 Comments

As part of the HTML5 effort, editor Ian Hickson has proposed a new way to encode structured data in HTML. Ian has since included within the proposal encodings of various widely used standards to describe events, contacts and citations. These vocabularies have normative status within the proposed spec, and have a privileged place within the DOM.

On the last use case, he has chosen BibTeX, on the basis that it is widely used and simple to author and process. Ian and I have chatted about this via email. To summarize my thoughts, then, I would like to argue against the inclusion of BibTeX based on the following points:

  1. BibTeX is designed for the sciences, that typically only cite secondary academic literature. It is thus inadequate for, nor widely used, in many fields outside of the sciences: the humanities and law being quite obvious examples. For this reason, BibTeX cannot by default adequately represent even the use cases Ian has identified. For example, there are many citations on Wikipedia that can only be represented using effectively useless types such as “misc” and which require new properties to be invented.
  2. Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways.
  3. The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which have expertise in this domain—become the gate-keepers for such extensions. In either case, we have a rather brittle and anachronistic approach to extension.
  4. The BibTeX model conflicts with Dublin Core and with vCard, both of which are quite sensibly used elsewhere in the microdata spec to encode information related to the document proper. There seems little justification in having two different ways to represent a document depending on whether on it is THIS document or THAT document.
  5. Aspects of BibTeX’s core model are ambiguous/confusing. For example, what number does “number” refer to? Is it a document number, or an issue number? [note: it's actually both, depending on context; in a report it's the former, while in an article it's the latter]

My suggestion instead?

  1. reuse Dublin Core and vCard for the generic data: titles, creators/contributors, publisher, dates, part/version relations, etc., and only add those properties (volume, issue, pages, editors, etc.) that they omit
  2. typing should NOT be handled a bibtex-type property, but the same way everything else is typed in the microdata proposal: a global identifier
  3. make it possible for people to interweave other, richer, vocabularies such as bibo within such item descriptions. In other words, extension properties should be URIs.
  4. define the mapping to RDF of such an “item” description; can we say, for example, that it constitutes a dct:references link from the document to the described source?
The result would be something more consistent, general and extensible, while also still being easy to author and process. From a DOM perspective, we’re just talking about things like ref1.type returning a URI rather than doing ref1.bibtex-type that returns a string, and accessing a periodical title like ref1.isPartOf.title rather than ref1.journal (which of course doesn’t work for newspapers, or magazines, or court reporters, or weblogs, all of which have the exact same characteristics: they’re publications of sorts).


Creative Commons License Creative Commons License