HTML5 Microdata Proposal

I’ve been following the discussion about extensible metadata in HTML 5 from afar, not really having the time to get any more involved. The bottom line for one of the primary use cases I provided was, can I represent what’s embedded in my home profile and publications pages? This isn’t just about data relating to me and my pages, but linking them to other data, elsewhere. For example, I will be changing my subject pages to link to the new Library of Congress id service, such as subject headings. Can I do that in HTML 5?

The group (well, let’s be real, Ian Hickson) released a first draft of a proposal today. I haven’t really looked at it carefully and thought through all the implications, but my initial take is it seems an attempt to split the difference between RDFa and microformats. So one can encode metadata properties, for example, using either plain string tokens (the microformat way), or using URIs (the RDF/RDFa way). I might well prefer to use RDFa, but perhaps with some tweaks, the microdata proposal might well allow the most important pieces of RDFa. At least I hope so.

But there are places where there seem some arbitrary restrictions. For example, I see no way to define a microdata item’s identity as anything but local to the document (the spec only allowing local IDs; not global URIs). If I have that right, that’s a critical and arbitrary flaw, and needs to be changed.

And, as Shelley Powers points out, it’s really, really strange and arbitrary to allow one to use a “reversed DNS identifier” as a global identifier alternative to an HTTP URI, but not allow other prefix mechanisms (such as CURIEs), particularly when the common argument against namespace prefixes in general and CURIEs in particular if they are too difficult. I’d rather see all three, or only URIs.

Finally, the “item” attribute is odd. It’s effectively equivalent to the RDFa typeOf attribute, in that it allows one to type the related properties. But then a) why not just call it typeOf?, and b) related to my point about identity above, the notion of an “item” is quite ambiguous, and seems to confuse identity and type.

I’d really love if the relevant open-minded experts in this space could find time to have a f2f meeting over this proposal, and iron out these sorts of details.

4 Comments

  1. [...] mentioned in my previous post on the HTML 5 microdata draft that it included a use case from me; it’s this one: A scholar and teacher wants other [...]

  2. Ian Hickson says:

    It’s not clearly introduced in the spec yet, but the “about” property maps to the item’s identifier in RDF, if you want to identify items across documents. Alternatively, you can use an id=”" and then use a fragment identifier into the page.

    I’m not sure how it follows that Java-like identifiers (com.example.foo) mean we should allow prefixes. Prefixes in general have proved quite confusing to most authors — the problem with separating where the prefix is declared and where the prefix is used results in people breaking pages during copy-and-paste, getting the declarations wrong, etc. Even implementors seem to have a hard time with them, with implementors hard-coding individual prefixes, using regexps to detect particular keywords, etc.

    item=”" is similar to typeOf; the reason I didn’t use just typeOf is that it looks weird to say “typeOf” on its own. I discuss this in more detail in the e-mail: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

    HTH!

  3. darcusb says:

    Thanks for the note Ian.

    My first point was the most critical, so let me just add a couple of comments/questions.

    First, on identifying (and therefore being able describe relationships to) non-document objects, given that it seems your concerns about RDFa were largely about syntax, is there not a better solution to this requirement than an about property? I’m imagining that this particular solution will force some pretty awkward syntax.

    Second, I’m unclear from the examples, at least, but how does one establish a URI predicate on a relation? E.g. I want to say that “Jane Doe” co-authored a paper with me, where Jane has a profile page she uses to identify her? So I want an a element that links to that, and I want to embed a property that says she’s an author of that document. How can I do that with this proposal?

    Reading over Ian’s list email again, I think I have the next assumption wrong, and that one encodes the predicate using a property attribute on the “a” element. Am not totally sure though.

    From what I gather, the current answer is I can’t (since rel values are only string tokens; not URIs). Under the section on converting to RDF, item 2.9 just seems bad, since it severely constrains what you can say about the data. For sake of argument, is it possible to allow the rel/rev values to be URIs?

    The prefix stuff is less important to me, but my point is that the reverse DNS identifiers seems an arbitrary inclusion that doesn’t seem warranted by the use cases.

    Also, it seems there’s been work in the RDFa community on removing the requirement for prefixes by using things like profiles (see also this similar work on a JSON serialization of RDF; basic approach is you define the mapping of URI to string token somewhere apart from the actual encoded properties). Do you see the microdata proposal being able to accommodate that potentially productive evolution?

    Finally, on “item” vs. “typeOf” I guess this is exactly where I think you’re confusing type and identity. You say in that email you link to that What we need is a way to say that a property is really a new set of name-value pairs. Yes, correct. But in RDFa, this is exactly what the “about” attribute does; it switches the context of the statements. It’s just that in your proposal and associated examples, you are always having the impulse to want to create blank nodes; e.g. to not give the object you’re describing a global ID.

    So to boil down my point here, you are looking to the wrong mechanism to solve the problem you identify in the quote above. Suffice it to say I think this area needs work.

  4. darcusb says:

    And I guess another obvious question: given that so much of what you’ve done here does not conflict with the design of RDFa, and in some places explicitly borrows from it: might it not be a better approach to itemize exactly what you see in RDFa as a problem (I take it CURIE’s are the primary issue), and to either a) work on resolving that with the RDFa group, or b) perhaps simply remove the offending detail? E.g. apart from the narrow technical details of this proposal, I’m suggesting productive collaboration rather than stepping on people’s shoes?


Creative Commons License Creative Commons License