As I’ve been looking into revamping and expanding my personal website, I’ve been interested in the Tumblelog. A traditional weblog essentially has one main object: the post. A post is typically a chunk of (typically) text content, with an author, a title, and so forth. A blog is thus a collection of posts, ordered by date.
A Tumblelog breaks out of the single object box. In addition to the post, depending on implementation you can also have links, people, places, photos, music, and quotes. That content can in turn be assembled from other sources: Delicious feeds, Flickr photo sets, etc.
From this perspective, then, a Tumblelog allows one to weave together a range of different kinds of content. So the date-ordered list can include different kinds of objects, but also these objects can be weaved together even within, say, a post.
So what lessons might this have for a scholar? What ideas might I steal from the Tumblelog, and how might I extend them?
I’d say the general approach goes really far. I think I would probably just get a little more generic. For example, a post and an article have little that distinguishes them, except the view. A draft manuscript isn’t conceptually any different than a draft blog post (unless you wanted to model sections). Notes are really just informal content, but still not really fundamentally different. Citations might be thought of as just a special kind of link.
So in the ideal CMS I am imagining, it would weave together links and associated metadata from Delicious and Zotero 2.0*, images from Flickr, and have a project view that allows me to group content and publications.
But what about the details? How to implement this?
In the world of Django, the approach seems to be to have different models for the different content, and then use a generic relation model to be easily able to weave together the content. So, separate classes/tables, for links, photos, quotes and so forth. This approach seems to work well for Jeff Croft, Wilson Miner, and Nathan Borror.
I have to say, though, that after dealing a lot with RDF, a relational database feels a little claustrophobic: having to define an entire model upfront, and to worry about the consequences of changes later. And while I love the automatic Django admin interface, I’m starting to wonder if it’s really worth all the hassle. For a personal site, it’s not like I’m creating and managing that much structured data.
On the other hand, the (currently PHP-based but soon to include Ruby) Chypr project takes a more generic approach, where there is essentially a single object again, but this can be extended. This makes sense, since projects like Chyrp are designed as both dedicated tools, but also to be easily extended with plug-ins.
But given the straight-jacket restrictions of a traditional relational database, exactly how can one store quotes, and events, and images all in the same table? In the current implementation, it seems that extended data is embedded as XML in the database. Ouch, this just feels wrong! Extended data becomes essentially a second-class citizen.
This seems a perfect place to borrow from RDF, either in whole, or in part. One approach would simply be include an RDF store wholesale, as planned in Drupal. With an example like ARC, you can just have a few tables sit alongside the main application tables, and handle all the flexibility you want. If a plug-in developer wanted to add extended data, they could just register the common data in the post table, but then add the extended triples in the generic RDF tables. Since each post gets a URI, it’s easy to then merge the data.
Of course, this raises the question: why not just go all RDF? If my project, publication, image, etc. metadata are all stored as RDF, then creating a Tumblelog could be a simple SPARQL query away.
I hope to figure this all out soon, as I really want to get this new website up and forget about it!