Posts Tagged ‘Metadata’

Personal Names

Posted in Technology on December 29th, 2007 by darcusb – Comments Off

An excellent post (seen on Planet Gnome) on designing data models and forms for internationally-diverse personal name traditions. The author concludes by suggesting:

If designing a form or database that will accept names from people with a variety of backgrounds, you should ask yourself whether you really need to have separate fields for given name and family name.

He seems to be suggesting that it’s better to model names based on context rather than breaking them into pieces, something along the lines of what Ian Davis suggested for FOAF awhile back. Such an approach has the advantage that it’s really simple, and really flexible.

Data Integration with RDF

Posted in Uncategorized on November 28th, 2007 by darcusb – Comments Off

Leigh Dodds has a nice post explaining how and why data integration is easy in RDF.

Leopard Does ODF

Posted in Uncategorized on October 29th, 2007 by darcusb – Comments Off

Awhile back I’d noted that Apple was adding ODF support to Leopard, but that it remained to be seen how good the support was. Well, at least one early report seems to suggest … pretty damn good. They do import/export (though still unclear how well), and it’s integrated into Cocoa and the new QuickLook previewer. So despite my general dislike of Apple’s incoherent stance on standards, here’s one place where they’re doing the right thing. Kudos to Apple, then.

If I had a suggestion for Apple for the future, I’d look into a way to bridge the new enhanced metadata support in ODF 1.2 with their CoreData framework. That way, Mac developers could easily embed richer intelligence into their documents using a W3C standard extensible metadata framework.

OpenDocument TC Approves Enhanced Metadata

Posted in Uncategorized on July 6th, 2007 by darcusb – Comments Off

Today the OASIS OpenDocument Technical Committee approved enhanced metadata support [pdf] for inclusion in ODF 1.2.

What do we mean by enhanced metadata support? We mean that you can:

  • describe your document the way you want to describe it; no longer are you limited to a pre-selected list of properties
  • describe different parts within your document: images, tables, paragraphs, and so forth
  • tag pieces of document content as metadata; you can say this heading is a title, this fragment is the name of some person, and so forth
  • create dynamic content fields based on this flexible metadata system

So how did we do this? What technology did we use?

At the heart the new metadata system is a standard model: RDF. We allow encoding of that model in two different ways; first, as RDF/XML graphs in the file package. Here developers can include custom metadata in the file package. They simply register their files in the new RDF-based metadata manifest, and they make it available for integration with the document.

Second, we also borrow some concepts from RDFa for in-content tagging. Elias Torres from IBM helped us greatly with figuring out how to elegantly knit RDF into ODF content, showing us along the way a very cool demo his team had put together in which they reworked OpenOffice Calc to store RDF natively. In this case every cell in the spreadsheet became layered with additional metadata, and the content could be mashed up dynamically with a Google map. The in-content metadata attributes will allow this sort of use case on top of existing spreadsheets, word-processing documents, and presentations.

RDF, then, provides a simple, extensible, and mixable data model. Using this model means not just that we provide extensibility within a particular domain (say bibliographic citations), but the ability to seamlessly blend and merge information across domains (say contact data and citations, image metadata and rights, etc., etc.).

All that remains if for developers to exploit the possibilities we’ve put in place. It will be interesting to see who implements it first. OpenOffice? KOffice? Google?

OOo: Quality Through Obsolescence?

Posted in General on June 2nd, 2007 by darcusb – Comments Off

Michael Meeks (from an interview) on what many of us have seen for a long time as serious problems within OpenOffice.org:

I would stress that there are people inside Sun that do ‘get it’. People that are open, and helpful, and really good. But there are also a large number who are very traditional, very staid people, particularly in quality assurance. You can’t argue with them, because they’re in their own self-reinforcing world view. They say specifications are necessary for product quality, and you say “That’s fine, but look at the quality. It’s still not very good.” They say more specifications are necessary! The answer is always more of the same, and you can’t argue with that. It leads to obsolescence - quality through obsolescence, is what I like to call it.

Michael notes a lot of progress on the OOo organizational front of late, such as the move to more frequent releases. But clearly the deeper organizational dynfucntions are really, seriously, weighing on the capacity for OOo to innovate. I really hope they don’t slow down implementation of the new metadata support in ODF 1.2. It really has the potential to be a killer innovation opportunity for OOo, but not if it gets delayed for five years by business as usual.

I’m cautiously optimistic, though.

More XMP Confusion

Posted in Uncategorized on May 26th, 2007 by darcusb – Comments Off

More uncritical praise for Adobe’s XMP. To quote the most problematic parts, first:

Luckily Adobe is not as protective and closed about standards as some software heavyweights, so when it drew up its own successor format, it chose to base it on XML and publish the specification. That spec is XMP: an XML for generic image metadata.

Ahem, once again this completely misses the RDF connection. XMP is not fundamentally an XML format; it is an implementation of RDF. That is to say, the value XMP offers is that it provides a general metadata model (borrowed from RDF) which allows one great flexibility in representing the metadata that you want to represent. That it has an XML serialization format is just gravy.

The second issue with this characterization is that it completely misses that XMP is currently effectively a proprietary subset of an open standard. As I wrote on another blog because XMP is a rather bizarre subset of RDF defined by Adobe very early in the life of RDF, there is a whole lot of valid RDF that is not valid XMP. For this reason, while vanilla open source RDF tools can easily read XMP, they cannot reliably write it. Hence the reliance on Adobe tools for the most part, which are C++ only..

Continuing:

Adobe has been shipping XMP-aware apps since 2001, but up until now we have not seen an open source application step up and tackle XMP head on, providing read/write support, the core namespaces, and creation of custom namespaces. In fact, the apps that did address XMP all stopped at reading the data, not even doing anything useful with it.

The article implicitly criticizes open source developers for the lack of support for XMP, when the real blame lies with Adobe. If XMP supported RDF proper (the full model), I am confident there would be much more widespread support for both reading and writing XMP in files.

Just to give you a sense of why this matters, a developer for a popular open source RDF library added experimental support for XMP serialization awhile back. When I emailed him and told him he needed to severely constrain the modeling and serialization in ways that he was not then, he basically told me it’d be too much work to do for too little payoff. I don’t in the least bit blame him.

Finally:

The XMP spec is open, and better still, extensible through XML namespaces.

I’m not really sure what definition of open this author is choosing to use. Yes, the spec is published and can be freely implemented. But it is not an open standard in the sense that it has no vendor-neutral standards body overseeing its evolution beyond the state-of-the-art in 2000.

I really do admire Adobe’s foresight in building and implementing XMP. Having a generic and flexible metadata system that works across file formats and applications is a really visionary goal, and they have largely achieved that.

But to really realize these benefits beyond Adobe applications, Adobe needs to loosen up XMP. I’d like to see them dedicate the spec to the W3C as providing a basis for embedding metadata in files, and work with the RDF experts there to bring it up-to-date with current best practices.

Such a move would no doubt present some short-term difficulties for Adobe itself given the large installed base of applications already using XMP at Adobe, but it would ultimately grow the market for enhanced metadata in the future.

The idea that our word-processing, spreadsheet, image, presentation, audio, etc. files should contain rich metadata that travels with the file is one whose time has come. It is my hope that the OpenDocument metadata work will help contribute to this goal, but we really need for this metadata to be able to travel across disparate formats. I’d love to see Adobe help realize this goal, but if it doesn’t, perhaps the W3C ought to consider doing so on its own.

DCMI Abstract Model and RDF

Posted in Uncategorized on May 7th, 2007 by darcusb – Comments Off

Library blogs were buzzing last week about a recent announcement of a collaboration around RDA and DC [summary here]. For those not familiar with the acronyms, this basically means an effort to bring a very high-level next-generation approach to library cataloguing together with more grounded ways of encoding metadata in the DC world. It also represents an effort to bring this library expertise to the semantic web.

In general, the commentary is positive. However, Jenn Riley brings up an interesting critique.

This seems to be to be entirely backwards – trying to harmonize DC principles with RDA after the fact. Didn’t the DC community learn its lesson about the pitfalls of this approach when developing the Abstract Model, only realizing long after developing a metadata element set that it would benefit from an underlying model.

She goes on to explain:

This general approach failed miserably with the DC Libraries Application Profile. There, the application profile developers wanted to use some elements from MODS, but weren’t able to because MODS doesn’t conform to the DCMI Abstract Model. So basically what the DC community said here was that application profiles are great, they form the fundamental basis of DC extensibility, but, oh yeah, you can’t actually use elements from any other standards unless they conform to the Abstract Model, even though are no approved encodings for even DC itself more than two years after the Abstract Model was released. OK then. Way to foster collaboration between metadata communities.

Ah, here’s this problem again: the DC group absolutely rightly argues that a model is essential for any real extensibility and interoperability. But the message for the value proposition of the DCMI Abstract Model per se is lost.

If I had a recommendation for the DCMI, it would be to drop the Abstract Model and use RDF. It would save a lot of technical and evangelism work. I’ve said this before, but the Abstract Model offers completely unconvincing value to me. It has a model that is essentially equivalent to RDF, and yet the claimed advantage that it has a non-RDF XML syntax. But the problem is that this syntax is even uglier and more complicated than RDF/XML!

Turning the focus to RDF does not solve Jenn’s issue, of course. The whole point is you agree on a model so that you can go your own way on all the details that really matter: what you call a book title, how you represent related items, and so forth. But with RDF, you get a rich infrastructure of technology and tools that goes way beyond the DCMI Abstract Model. That infrastructure includes OWL, RDF Schema … and GRDDL.

MODS has no model, and so cannot really be harmonized properly with anything. The solution there is to create an RDF version that can. But with GRDDL, you can just write an XSLT to map it your XML to RDF, and so get something like the best of both worlds: the flexibility to use your own XML, and to merge it with other kinds of metadata descriptions via a common model. It seems to me RDF + GRDDL is a better, more practical, solution to interoperability than the DCMI Abstract Model.

Ultimately the RDA/DC announcement will allow just this since it will include an RDF vocabulary, but the message is indeed a little confusing.

A Framework for the (Bibliographic) Future

Posted in Uncategorized on March 13th, 2007 by darcusb – Comments Off

William Denton has a link to draft of a Framework for a Bibliographic Future. I don’t have time to read it, much less comment on it, in depth, but just wanted to pick out a few places that raise some concerns.

First:

Metadata schemas (sometimes called ‘element sets,’ ‘metadata formats’ or ‘data dictionaries’) define the actual properties that will carry values in the data set, as well as the relationships between those properties. Data elements can be defined at any relevant level of granularity. They can have hierarchical relationships between them or non-hierarchical relationships.

The problem I read into this is that this is bound to an XML view of the world. The language of hierarchy is just that kind of view, and it excludes the more flexible relational and graph-based views of relational databases and RDF. So in any case, I suggest purging the draft of any suggestion that a tree-based XML model ought to be in any way privileged.

The second follows just after:

FRBR defines data elements in its attributes, but they must be restructured in a way that allows the development of different levels of granularity and that promotes extensibility of the schema, both over time and across communities.

… and:

Crucial to the proper development of a metadata schema is a clear notion of requirements for technical expression of the attributes, and a plan for maintenance and growth. We have learned much in the library community about the importance of community consensus and how to maintain important standards over time.

So the group wants a clean but extensible model that can be serialized in different ways, and integrated with backend systems I presume. They claim the need for “community consensus” that seems to suggest a requirement for centralized development and management.

While the first makes a lot of sense, the second seems more a consequence of limited technologies than a formal requirement. In fact, this is a major problem with MARC, MODS, MARCXML, MADS, etc. Wouldn’t it, for example, make much sense to have a framework that could evolve in a distributed way; where different organizations and communities could extend it as needed without need for wider consensus?

I’m going to repeat my mantra here: look at RDF. It provides the common and extensible model you want here, in ways that are relatively more friendly than generic XML to the relational databases so widely in use. It also can map fairly cleanly to object oriented programming. Finally, RDF also notably does not require the kind of centralized development and management suggested above.

Microformats and RDF: Moving Beyond the Hype

Posted in Uncategorized on February 28th, 2007 by darcusb – Comments Off

Ian Davis has a typically smart, open-minded, and reasonable response to a comment from Tantek Çelik that repeats the old-mantras: that microformats represent a kind of practical solution to the pipe-dream of RDF and the Semantic Web.

Ian does a great job of (politely) picking apart this argument, and also of emphasizing that the RDF and microformats community are really on the same side and have much to learn from each other. So I won’t really add much, except to say that I spent a lot of time on the microformats list last year trying to help with the hCite effort, and ultimately left in frustration.

There’s too much hype around microformats, too little willingness to consider serious engagement with contrary views (that are themselves based on practical experience), too much arbitrariness wrapped in the appearance of some kind of scientific rigor, and I have too little time in my life to bother.

After mentioning some of this last December here and here I just concluded my time was better spent elsewhere. I was particularly bothered to see the resistance to any discussion of adding new attributes to HTML 5 and so for harmonizing RDFa and microformats. It’s one thing to make do with the existing limitations of HTML, but quite another to claim they don’t exist and shouldn’t be fixed in future web standards.

RDF and Relational Databases

Posted in Uncategorized on November 6th, 2006 by darcusb – Comments Off

Bob DuCharme has had two recent posts on exposing relational databases as RDF (for access, say, using SPARQL) using D2RQ: here and here.


Creative Commons License Creative Commons License