Posts Tagged ‘Documents’

Learning from the Tumblelog

Posted in Technology on March 9th, 2008 by darcusb – Comments Off

As I’ve been looking into revamping and expanding my personal website, I’ve been interested in the Tumblelog. A traditional weblog essentially has one main object: the post. A post is typically a chunk of (typically) text content, with an author, a title, and so forth. A blog is thus a collection of posts, ordered by date.

A Tumblelog breaks out of the single object box. In addition to the post, depending on implementation you can also have links, people, places, photos, music, and quotes. That content can in turn be assembled from other sources: Delicious feeds, Flickr photo sets, etc.

From this perspective, then, a Tumblelog allows one to weave together a range of different kinds of content. So the date-ordered list can include different kinds of objects, but also these objects can be weaved together even within, say, a post.

So what lessons might this have for a scholar? What ideas might I steal from the Tumblelog, and how might I extend them?

I’d say the general approach goes really far. I think I would probably just get a little more generic. For example, a post and an article have little that distinguishes them, except the view. A draft manuscript isn’t conceptually any different than a draft blog post (unless you wanted to model sections). Notes are really just informal content, but still not really fundamentally different. Citations might be thought of as just a special kind of link.

So in the ideal CMS I am imagining, it would weave together links and associated metadata from Delicious and Zotero 2.0*, images from Flickr, and have a project view that allows me to group content and publications.

But what about the details? How to implement this?

In the world of Django, the approach seems to be to have different models for the different content, and then use a generic relation model to be easily able to weave together the content. So, separate classes/tables, for links, photos, quotes and so forth. This approach seems to work well for Jeff Croft, Wilson Miner, and Nathan Borror.

I have to say, though, that after dealing a lot with RDF, a relational database feels a little claustrophobic: having to define an entire model upfront, and to worry about the consequences of changes later. And while I love the automatic Django admin interface, I’m starting to wonder if it’s really worth all the hassle. For a personal site, it’s not like I’m creating and managing that much structured data.

On the other hand, the (currently PHP-based but soon to include Ruby) Chypr project takes a more generic approach, where there is essentially a single object again, but this can be extended. This makes sense, since projects like Chyrp are designed as both dedicated tools, but also to be easily extended with plug-ins.

But given the straight-jacket restrictions of a traditional relational database, exactly how can one store quotes, and events, and images all in the same table? In the current implementation, it seems that extended data is embedded as XML in the database. Ouch, this just feels wrong! Extended data becomes essentially a second-class citizen.

This seems a perfect place to borrow from RDF, either in whole, or in part. One approach would simply be include an RDF store wholesale, as planned in Drupal. With an example like ARC, you can just have a few tables sit alongside the main application tables, and handle all the flexibility you want. If a plug-in developer wanted to add extended data, they could just register the common data in the post table, but then add the extended triples in the generic RDF tables. Since each post gets a URI, it’s easy to then merge the data.

Of course, this raises the question: why not just go all RDF? If my project, publication, image, etc. metadata are all stored as RDF, then creating a Tumblelog could be a simple SPARQL query away.

I hope to figure this all out soon, as I really want to get this new website up and forget about it!

Remembrance Agent for Web 2.0

Posted in Technology on February 27th, 2008 by darcusb – 4 Comments

I’ve mentioned before an idea that Peter Flynn once put in my head. As I write this, I am taking a break from writing a manuscript (that is late!) using NeoOffice and Zotero. It all works fairly well, but I’m struck that the two together feel rather heavy. Consider the workflow if I need to add a citation and associated information:

  1. try to remember where I saw some information; go to Zotero to find it
  2. go back to NeoOffice to add content.
  3. go back to Zotero to insert the citation

While each time I do this the process is fairly quick, if you multiply it by a hundred it becomes a significant waste of time. More importantly, it’s a distraction. Writing is hard enough to be distracted by interruptions of this sort.

OK, so back to the idea:

Peter once mentioned Remembrance Agent. A screenshot with its emacs front-end:

Remembrance Agent

So the idea here is a service scans the content you are working on, sends it to a backend, which looks through emails, document and bibliographic references to find items of potential relevance, presenting it to the user for quick-and-easy access.

So here’s my thought: with all of the innovations in new Ajax-y applications, shouldn’t it possible to do something like this with web applications?

I’m starting to wonder about a nice web editor for academics: something stripped down and simple (but extensible) like the Mac application WriteRoom, that used a simple Markdown-like syntax, and which could plug-in to a bibliographic service a la Remembrance Agent.

Hmm …

Reuters and the Semantic Web

Posted in Technology on February 3rd, 2008 by darcusb – Comments Off

The idea of the Reuters Calais semantic web service is upload free text content to a web service, and receive back that content enhanced with embedded RDF. So, for example, let’s say you’re content includes the fragment:

… it will be possible to exchange tolar banknotes (unlimited) and coins (until 2016) only at the Bank of Slovenia.

The service will recognize “Bank of Slovenia” and send back the RDF, complete with a URI for the resource in question:

<rdf:Description 
  rdf:about="http://d.opencalais.com/comphash-1/65c45759-512c-3044-a47f-f74d42f14f4e">
  <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Company"/>
  <c:name>Bank of Slovenia</c:name>
</rdf:Description>

From what I can tell, the service can only recognize the objects of description; it can’t identify relations. But certainly this is a nice start, and even nicer to see it’s free, and that they’re putting up a $5000 bounty to encourage a practical implementation in WordPress. Would also be nice if they could explore sending back to the content as RDFa-enhanced XHTML (or maybe OpenDocument 1.2 once it’s released and its metadata functionality is implemented).

Design and Open Access

Posted in Uncategorized on January 25th, 2008 by darcusb – Comments Off

Dan Cohen has a note about an open access scholarly book project. It’s interesting because the discussion gets beyond the idealistic notion that scholarly content ought to be open access, to examining some of the challenges that work against it; perhaps most notably how administrators and colleagues will value such contributions in promotion and tenure decisions. It’s refreshing the participants found many of these issues to be more-or-less non-issues.

I published my first book with a traditional publisher, and I think it would have probably been insane for me not to as a junior faculty member without an established name. The publisher advertised and promoted the book, and they were the ones that submitted it for consideration for a book award; something I would have never considered.

However, looking beyond that, I have the technical skills to have produced that text in electronic form of a quality virtually equal to that of the publisher. All it takes is TeX and some high-quality professional fonts, along with some design sense. So maybe the next book …

This leads me to one issue that I find problematic with most of the open access publications I’ve been exposed to: their production quality is absolute crap. The book that Dan points to, for example, is typeset directly from Word, using atrocious fonts like Arial and Times and Verdana.

I realize I may be rather unique in my attention to typographic detail, but I’m hardly alone. It negatively affects the quality of a reading experience if one has to read a poorly designed and executed text. Compare, for example, this PDF from the book that Dan discusses to this one. Both are open access publications. All else equal, which would you rather read?

[note: Alf Eaton had a really cool demo of using XHTML and CSS for scientific articles linked from this page, but for some reason it's now 411. Suffice to say I'm not suggesting prioritizing PDF at the expense of HTML.]

Obviously, there are some challenges in getting from here to there: in allowing the production of open access texts to be as easy as clicking a button on Word (or OpenOffice), but the results to be of much higher-quality. Am not sure exactly what the way forward might be, but one idea is to identify, as Alf has, the core structure one might want to see in output XHTML, and to design some openly available XSLT and CSS stylesheets that can achieve this. TeX is, unfortunately, more difficult given all the installation and compiling headaches, but PDF is arguably less important to the future of the open access world than HTML.

Leopard Does ODF

Posted in Uncategorized on October 29th, 2007 by darcusb – Comments Off

Awhile back I’d noted that Apple was adding ODF support to Leopard, but that it remained to be seen how good the support was. Well, at least one early report seems to suggest … pretty damn good. They do import/export (though still unclear how well), and it’s integrated into Cocoa and the new QuickLook previewer. So despite my general dislike of Apple’s incoherent stance on standards, here’s one place where they’re doing the right thing. Kudos to Apple, then.

If I had a suggestion for Apple for the future, I’d look into a way to bridge the new enhanced metadata support in ODF 1.2 with their CoreData framework. That way, Mac developers could easily embed richer intelligence into their documents using a W3C standard extensible metadata framework.

Comparing CDF and ODF

Posted in Uncategorized on October 26th, 2007 by darcusb – Comments Off

Gary Edwards describes his plans for using CDF in place of ODF.

The simple truth is that ODf was not designed to be compatible – interoperable with existing Microsoft documents, applications and processes. Nor was it designed for grand convergence… CDF on the other hand was designed exactly for grand convergence.

Now let’s compare this statement against the Compound Document Use Cases and Requirements document:

The Open Document Format … specifies an office application compatible style model, page layouts, index generations, text fields, table formulas which the CDF specifications will not address.

It’s not clear how one squares these two statements.

My prediction: if they even ship a solution that comes close to matching their lofty marketing rhetoric (doubtful), it will need to rely on non-standard extensions. If they try to standardize any of those extensions within the CDF group at the W3C, they will be rejected as out of scope.

Perhaps at that point we’ll hear noise about how the W3C is dominated by big vendors that are hostile to real-world interoperability and that CDF was never designed to meet market requirements.

Adobe Buzzword … Sigh

Posted in Uncategorized on October 1st, 2007 by darcusb – Comments Off

Wow, things are certainly interesting on the productivity application front these days. Today I see news that Adobe is buying a really interesting —if currently flawed — new web application.

This screenshot shows both how beautiful the application is, and also how limited.

Buzzword; oops no styles

So it is definitely the most beautiful word-processor I’ve seen recently. It also supports ODF.

But … no styles support … at all! This is a worrying trend I’m seeing; reinvent tools for the 21st century by stepping back in time to the 1980s.

Alas, Buzzword is still in development, so I will hold out some hope that they can correct ship and implement elegant styles support before final rollout.

Oh, I would like to add that it’s really nice to see professional quality Adobe fonts like Minion and Myriad in Buzzword and the technically-similar presentation application SlideRocket rather than the absolutely atrocious Arial and Times we typically see. Kudos on that.

another update: So I got an account, and am playing with Buzzword. One word: wow! Also, they will be adding styles support; if it’s anything like the rest of the application, I’m sure they’ll do a good job with it.

Apple Pages and Styles Redux

Posted in Uncategorized on September 26th, 2007 by darcusb – Comments Off

A few days ago I commented positively on Apple’s Pages styles UI. This is without having actually used it. Having just tried the latest version of the application, I’m rather appalled at how limited the styles support really is.

  1. they effectively deprecated the style-based UI in the new version, switching to the more familiar (and broken) direct formatting approach
  2. one can only edit styles by directly editing text, and then telling Pages to redefine the style based on those changes
  3. no support for style hierarchy it seems (!)
  4. whoever designed the default templates and styles did a horrendously poor job of it. The default font almost everywhere is Helvetica (!). This is despite the fact that OS X ships with a really excellent text font (Hoefler Text).

Sigh … so much for good examples. The result would be a nightmare for users really, forcing them either to rely completely on presentational formatting if they didn’t want the defaults, or having to modify every single style (since there’s no inheritance). For the limitations I’ve noted in OOo, it’s way ahead of Apple on this count.

Styles and Symphony

Posted in Uncategorized on September 19th, 2007 by darcusb – 1 Comment

IBM has released a free office suite based on OpenOffice called Symphony.

It seems they’ve done a nice job overall. The UI looks nice and clean and the website includes a list of easily accessible templates.

One problem with the UI, however, is its emphasis on presentational formatting. IBM is hardly unique here (Google Docs, for example, has no support for user-defined styles at all), but I’d like to think UI designers can do better.

Consider this screenshot:

Styles are present in the right-hand panel, but they are grouped within the local presentational styling section called “Font.” An average user will, not surprisingly, tend to fall back on the presentational attributes to get the formatting they want.

We know that semantic document authoring has all kinds of benefits; from easier document reuse and repurposing, to enhancing accessibility, and so forth. What if instead the panel had a top-level and more prominent “style” heading:

Style
-----
paragraph: ________
character: ________

Provide a wide-variety of excellent templates with a full gamut of possible (semantic) styles, make them available on the internet with previews and browseable from within the application, and users can instantly see benefits from this.

Obviously one needs to make it easy for users to quickly modify formatting, but must this require a “font” panel? Is there really not a better way?

Here’s a nice example from Apple’s Pages:

Styles are front and center in the UI, users can instantly see what they will look like when chosen, and each of them are about the meaning (heading, caption, etc.) of the content, rather than what it looks like (big, bold, etc.). I’d like to see something like this from the OpenOffice universe.

Metadata SC Use Cases and Requirements Approved

Posted in Uncategorized on October 13th, 2006 by darcusb – Comments Off

ODF TC chair Michael Brauer has a quick summary of the approval of the ODF metadata use cases and requirements document I edited that will frame the proposal we will deliver sometime in the next few months. As he writes:

It will be the basis for the future work of the metadata subcommittee, and therefore provides an outlook in which direction OpenDocument moves regarding metadata. And because OpenDocument is OpenOffice.org’s native and default file format, I’m sure it also provides an outlook in which direction OpenOffice.org may move.

Creative Commons License Creative Commons License