HTML5 Process

Posted in Technology on June 9th, 2009 by darcusb – 2 Comments

Ben Adida on the microdata in HTML5 proposal:

So, I cannot live with something that throws away existing important implementations of the *exact* same use cases for no valid technical reason.

Ian’s response:

Indeed; I examined all the existing solutions that I could find closely as the first step (well, the second step, after collecting use cases). I didn’t go through all of them one by one in the e-mail, but I did explicitly examine Microformats and RDFa: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

If you go to that URI, here’s his explanation for why not RDFa:

- it uses prefixes, which most authors simply do not understand, and which many implementors end up getting wrong (e.g. SearchMonkey hard-coded certain prefixes in its first implementation, Google’s handling of RDF blocks for license declarations is all done with regular expressions instead of actually parsing the namespaces, etc). Even if implemented right, namespaces still lead to flaky copy-and-paste behaviour.

- it sometimes uses rel=”" and sometimes uses property=”" and it’s hard to know when to use one or the other.

- it introduces much more power than is necessary to solve this problem.

I think the first point is a reasonable one in the sense that prefixes have costs as well as benefits. But the same is true of unprefixed names. A balanced discussion of these tradeoffs seems warranted. Is it really (really!) worth it to invent an entirely new spec because of one fairly trivial issue? Is it really (really!) worth it to force tools developers and publishers to have to do double work?

The other two points range from trivial to entirely ridiculous. Who really decides, for example, how much power is needed for extensible metadata in HTML? Surely the answer will depend a lot on particular use cases? For example, on the general citation case, WikiPedia may have less demanding needs than an academic or legal journal. Shouldn’t that understanding that one size does not fit all be at the center of any extensible metadata support in HTML5?

He then goes on to try to “fix” these problems by removing prefixing, and the rel/property ambiguity. Recognizing that removing the prefixing introduces other problems for readability, etc., he concludes that This, though, is quite ugly.

OK, so aesthetics are now a requirement shaping the design; I have no clue where that came from. To solve this problem he introduces an equally ugly, and completely arbitrary, new way to indicate a global name: the reverse DNS. Where’s the analysis that justifies these conclusions? Do we just accept these claims about aesthetics and usability without any kind of evidence?

Is there no sanity at all in the HTML5 process?

Thomson Reuters Suit Dismissed; Give it to Zotero

Posted in Technology on June 5th, 2009 by darcusb – 1 Comment

So the ridiculous nuisance suit that Thomson Reuters filed against GMU has been dismissed. Am curious to learn the precise details of the ruling, but those should be available soon enough.

In related news, I’d like to encourage people to do as one of the commenters to this story did; take what you would normally pay for an Endnote upgrade, and donate it to the CHNM instead.

Google Wave and Learning

Posted in Teaching, Technology on June 2nd, 2009 by darcusb – Comments Off

Michael Feldstein has two smart posts on the implications of Google Wave for learning:

Quick summary: like me, he thinks Wave is a potential game-changer that has major implications for learning. But he basically answers “no” to the question presented in his second post. His argument comes down to the core point that Wave is unstructured, and this is not always in sympathy with the goals of learning. The argument is twofold:
  1. Permissions: as he puts it, there are times when you want to control permissions, when you don’t want everything to be editable to everyone, when you want to steer a conversation or process in a particular direction. In those cases, the Wave Server as currently being demonstrated will not provide the necessary structure. It is possible that Google will implement fine-grained permissions structures in future versions, but I doubt it.
  2. Sequential structure: there are times when waves are exactly the opposite direction of where you want to go. I believe that half of good teaching is sequencing experiences such that students are more likely to learn in deep and meaningful ways…. Wave is not designed for that at all. To the contrary, it is designed to get out of the way of free-form communication.
I think Michael is spot on: how much of an LMS Wave could replace hinges on how much permissions control Google opts to add, and how well it can be integrated into other, more structured environments. I also suspect he’s right that Google is unlikely to add this sort of structure, but instead make it easy to integrate Wave into other, more structured, environments (such as an LMS).

But if we accept the conclusion that Wave is likely to be more a supplement than a replacement to a more structured LMS, this leads to I think an important question: how might the two worlds—structured and un/semi-structured—be best integrated? It doesn’t seem to me to necessarily follow, for example, that taking a big existing LMS and bolting Wave on is likely the best answer.

Google Wave Free Association

Posted in Teaching, Technology on May 29th, 2009 by darcusb – Comments Off

So Google’s announcement of Wave seems like a big deal. Rather than the typical deep and thoughtful post, which I’m sure others have done, some random thoughts:

  • from what I can tell, unlike much of Google’s current application infrastructure (GMail and Docs), the code will be open source; this would be really big
  • also unlike the current apps, Wave is distributed; this is also a really big deal
  • the collaborative document-editing seems wicked cool, and goes a fair bit beyond Docs
  • I absolutely cringe when I hear the phrase rich text in 2009, particularly when Google has so failed to get the basics of structured documents right in Docs
  • OTOH, HTML 5 provides some room for them to improve this if they put their mind to it
  • would really love if the extension mechanism was rich enough to allow integration of citations (say a Zotero extension; though perhaps something more distributed), and flexible enough to do it right (which by definition means not based on bibtex)
  • This has potentially big, though as yet unclear, implications for higher education, and for the sort of work that happens these days in LMSs.

On the Inclusion of BibTeX in HTML5

Posted in Technology on May 20th, 2009 by darcusb – 9 Comments

As part of the HTML5 effort, editor Ian Hickson has proposed a new way to encode structured data in HTML. Ian has since included within the proposal encodings of various widely used standards to describe events, contacts and citations. These vocabularies have normative status within the proposed spec, and have a privileged place within the DOM.

On the last use case, he has chosen BibTeX, on the basis that it is widely used and simple to author and process. Ian and I have chatted about this via email. To summarize my thoughts, then, I would like to argue against the inclusion of BibTeX based on the following points:

  1. BibTeX is designed for the sciences, that typically only cite secondary academic literature. It is thus inadequate for, nor widely used, in many fields outside of the sciences: the humanities and law being quite obvious examples. For this reason, BibTeX cannot by default adequately represent even the use cases Ian has identified. For example, there are many citations on Wikipedia that can only be represented using effectively useless types such as “misc” and which require new properties to be invented.
  2. Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways.
  3. The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which have expertise in this domain—become the gate-keepers for such extensions. In either case, we have a rather brittle and anachronistic approach to extension.
  4. The BibTeX model conflicts with Dublin Core and with vCard, both of which are quite sensibly used elsewhere in the microdata spec to encode information related to the document proper. There seems little justification in having two different ways to represent a document depending on whether on it is THIS document or THAT document.
  5. Aspects of BibTeX’s core model are ambiguous/confusing. For example, what number does “number” refer to? Is it a document number, or an issue number? [note: it's actually both, depending on context; in a report it's the former, while in an article it's the latter]

My suggestion instead?

  1. reuse Dublin Core and vCard for the generic data: titles, creators/contributors, publisher, dates, part/version relations, etc., and only add those properties (volume, issue, pages, editors, etc.) that they omit
  2. typing should NOT be handled a bibtex-type property, but the same way everything else is typed in the microdata proposal: a global identifier
  3. make it possible for people to interweave other, richer, vocabularies such as bibo within such item descriptions. In other words, extension properties should be URIs.
  4. define the mapping to RDF of such an “item” description; can we say, for example, that it constitutes a dct:references link from the document to the described source?
The result would be something more consistent, general and extensible, while also still being easy to author and process. From a DOM perspective, we’re just talking about things like ref1.type returning a URI rather than doing ref1.bibtex-type that returns a string, and accessing a periodical title like ref1.isPartOf.title rather than ref1.journal (which of course doesn’t work for newspapers, or magazines, or court reporters, or weblogs, all of which have the exact same characteristics: they’re publications of sorts).

A Home NAS and Backup Solution

Posted in General, Technology on May 18th, 2009 by darcusb – Comments Off

So I’ve for awhile now been thinking I need to get more serious about a storage and backup solution for my personal and household data. After casually looking around at alternatives, I finally decide on a solution. I effectively took this information about hardware, with this and this information about using OpenSolaris and ZFS for software, and now have 1 TB of mirrored networked storage (and automated snapshots when I get to it), all for less than $500.

It was far more of a PITA getting OpenSolaris running as I wanted than I’d hoped, but I think the end product is both better and cheaper than the commercial alternatives.

HTML 5 Microdata Use Cases

Posted in Technology on May 10th, 2009 by darcusb – Comments Off

I mentioned in my previous post on the HTML 5 microdata draft that it included a use case from me; it’s this one:

A scholar and teacher wants other scholars (and potentially students) to be able to easily extract information about who he is to add it to their contact databases.

This is close to my description, but significantly narrower. Compare my words:

I want to be able to add structured data to my web site to denote who I am, what I have published, and what I teach in such a way that other scholars (and potentially students) can easily extract that information to add it to their contact databases, or to their bibliographic applications, or whatever. This involves contact data, for sure, but also other, domain specific, data, as well, and so presumes a flexible and extensible model and syntax.

The distinction is important because it makes clear that fixed encoding formats like hCard are not close to adequate; this is not just about a one-size-fits-all profile format, nor about possible integration into one particular kind of application (a contact database).

HTML5 Microdata Proposal

Posted in Technology on May 10th, 2009 by darcusb – 4 Comments

I’ve been following the discussion about extensible metadata in HTML 5 from afar, not really having the time to get any more involved. The bottom line for one of the primary use cases I provided was, can I represent what’s embedded in my home profile and publications pages? This isn’t just about data relating to me and my pages, but linking them to other data, elsewhere. For example, I will be changing my subject pages to link to the new Library of Congress id service, such as subject headings. Can I do that in HTML 5?

The group (well, let’s be real, Ian Hickson) released a first draft of a proposal today. I haven’t really looked at it carefully and thought through all the implications, but my initial take is it seems an attempt to split the difference between RDFa and microformats. So one can encode metadata properties, for example, using either plain string tokens (the microformat way), or using URIs (the RDF/RDFa way). I might well prefer to use RDFa, but perhaps with some tweaks, the microdata proposal might well allow the most important pieces of RDFa. At least I hope so.

But there are places where there seem some arbitrary restrictions. For example, I see no way to define a microdata item’s identity as anything but local to the document (the spec only allowing local IDs; not global URIs). If I have that right, that’s a critical and arbitrary flaw, and needs to be changed.

And, as Shelley Powers points out, it’s really, really strange and arbitrary to allow one to use a “reversed DNS identifier” as a global identifier alternative to an HTTP URI, but not allow other prefix mechanisms (such as CURIEs), particularly when the common argument against namespace prefixes in general and CURIEs in particular if they are too difficult. I’d rather see all three, or only URIs.

Finally, the “item” attribute is odd. It’s effectively equivalent to the RDFa typeOf attribute, in that it allows one to type the related properties. But then a) why not just call it typeOf?, and b) related to my point about identity above, the notion of an “item” is quite ambiguous, and seems to confuse identity and type.

I’d really love if the relevant open-minded experts in this space could find time to have a f2f meeting over this proposal, and iron out these sorts of details.

Thomson Reuters Wants Your Name

Posted in Technology on May 7th, 2009 by darcusb – 1 Comment

I recently learned that, as part of their lawsuit regarding Zotero, Thomson Reuters has successfully forced GMU to release the contact information for all 286 people who have SVN and Trac accounts at zotero.org.

I don’t personally care, because I’m sure these lawyers already know my name. But this seems nothing more than yet more thuggish intimidation.

New Laptop

Posted in Technology on May 6th, 2009 by darcusb – 3 Comments

It’s hard not to notice MIcrosoft’s new add push against Apple. The punchline is that buying a “PC” (the ads never mention Windows, oddly enough) tends to give a consumer more choice and better value compared to buying a Mac.

As a longtime Mac user, I tend to agree. Except the logical extension to the argument is to point out that Windows isn’t the only non-Mac OS in town, and that Linux-based alternatives such as Ubuntu offer the same value proposition: more choice and better value (not to mention “free’).

So it’s with that thinking in mind that I finally bought a new laptop after casually looking around for something to replace my aging Mac iBook G4. I wanted a machine with the following characteristics:

  1. good battery life
  2. good screen
  3. excellent keyboard (since I intend to use it for writing and notetaking)
  4. light weight
  5. rugged
  6. inexpensive
  7. decent performance

I seriously considered one of the recent larger netbooks, but ultimately went with a Thinkpad X61s. I got a refurbished model for less than $700 direct from Lenovo, complete with 3 GB of RAM, a 9-cell battery, and a free bag.

Is it quite as elegant from a design standpoint as a Mac alternative? Not in the least! But despite being last year’s model, it’s really fast, it’s really light, it has very good battery life (haven’t really tested it, but I expect to get over four hours of real use out of it), and a great keyboard and screen. It’s also really nicely built.

So what about the OS? Some version of Linux was clear (I did boot into Vista at first in order to prepare the USB boot image, but subsequently wiped it out completely; good riddance), but which one? I started out with Arch, but gave up when I couldn’t establish a network connection to finish the basic installation. I then moved over to Ubuntu, which installed and configured without a hitch; everything simply worked: wireless network connection, suspend and wake, etc., etc.

But one thing I really like about Mac OS X is the design aesthetics. There’s something nice about working in a beautiful environment. Sadly, Ubuntu is not that for me. But xubuntu, on the other hand, is right up my alley! So a quick addition of the xubuntu packages and I’m happy.

The only thing that makes me a little hesitant to do a wholesale switch off of the Mac OS is it’s superior support in the image editing arena. If and when GIMP catches up to the ease-of-use and resolution-independent editing of Lightroom and Aperture, that will probably be it for me.


Creative Commons License Creative Commons License