Posts Tagged ‘Documents’

Open XML Final Draft

Posted in Uncategorized on October 10th, 2006 by darcusb – Comments Off

Microsoft has released the final draft of their Open XML file format specification. I submitted a detailed list of comments to ECMA, and they did respond to them. However, it’s worth noting that they made no substantive changes at all. The most serious problem is that their name model is still U.S./Western-centric. They tried to get around the problem by adding a small editorial comment that a first name is a equivalent to a given name, a last to a family, but I hardly consider this an adequate response. Their continued use of “middle” name is even more annoying, given that it doesn’t even work for many Western names; consider “J. Edgar Hoover.”

I give Microsoft credit for opening up, but make no mistake: Open XML is really not that open. It is designed by and for Microsoft, and it’s clear that all decisions on the spec were driven by their product teams. The team implementing their new bibliographic support couldn’t be bothered to make what in effect were trivial changes, and so the ECMA TC45 couldn’t be bothered to fix the spec.

Contrast this with the OpenDocument process, where the people now driving the future of the specification are in many cases unaffiliated with any of the usual players: IBM, Sun, KOffice. Moreover, all our comments and responses to them are publicly available.

Nice Commenting UI

Posted in Uncategorized on October 9th, 2006 by darcusb – Comments Off

This is a fantastically cool and elegant way to add comments to content within a weblog post. Would be perfect bolted onto a scholarly-oriented CMS as a way to get comments from people on in-progress work.

Extensibility?

Posted in Uncategorized on August 12th, 2006 by darcusb – Comments Off

What does it mean to have extensible XML suipport?

This is a question that came up somewhat obliquely in the latest OpenDocument Metadata SC conference call, where I was presenting my draft requirements for the bibliographic use case, one of which was the need for extensilbility. XML, after all, is an acronym for eXtensible Markup Language. Given my focus on metadata, I’ll restrict myself more to that realm.

It seems to me there are largely two views on this question. One perspective—I’ll call it the “document-based” view—says that extensibility is defined first through the simple ability to create new languages, and second within those languages to create strategic extension points.

Another view—I’ll call it the “module” view—sees metadata not fundamentally in terms of documents and complete schemas, but rather in terms of modules of descriptions that can be plugged together, mixed up, or otherwise interact, mostly independently.

This first view suggests to me an image of a book, complete with introduction and conclusion, index, and covers. It’s a more hermetic view of metadata.

The second view is, I think, the view of the web and hyperlinks, RDF, and more recently microformats. Why invent invent elaborate new schemas, this view says, when you can instead mix-and-match from a rich set of existing alternatives?

So when we at the Metadata SC talk about “extensibillity,” then, as a requirement, what do we mean?

I can only really speak for myself, but to me—a partisan of the second view—extensibility has to mean both that one can add custom XML markup and that the markup conforms to some rules such that ad hoc mixing and interaction is possible.

Simply allowing anything-goes addition of arbitrary content achieves little that is useful. While there may well be use cases for this sort of thing—Microsoft’s custom schema functionality surely must be valuable in some contexts—it seems to me it would be counterproductive to not insist on some minimal expectations of interoperability across a document format’s metadata format.

This is not to say that all conforming applications must fully understand extension structures, but it is to insist on the need for at least minimal legibility (for example, the ability to display any foreign content).

New FUD Offensive

Posted in Uncategorized on July 27th, 2006 by darcusb – Comments Off

It seems Microsoft is gearing up for yet another new anti-ODF FUD offensive, and Brian Jones is leading it. I find responding to every little detail tiresome, so will just address this point about how the standards are developed:

I think the key here is for everyone to just be clear on the goals. The ODF format is based on Sun’s StarOffice, and Open XML was based on the Microsoft Office formats. Both have the goals of being open, both have been submitted to standards bodies, and both have a commitment from the donating companies (Sun and Microsoft) that there will be no licensing restrictions and anyone is allowed to freely use the formats.

This is classic FUD: factually true enough, but false by way of omission.

If you want to understand the goals of ODF, just read the TC Charter. There are a few goals which are notably absent from Microsoft’s, notably friendliness to processing using XML tools and the reuse of existing standards. I happen to think those matter to developers and ultimately users.

FWIW, I am on the ODF TC. But I have also given MS plenty of constructive comments on the way they are implementing citation support in OXML, because my interest is in promoting better solutions in general. I’d rather have two excellent open XML formats, than two weak ones.

Perhaps this will be a good test of how well the two standards processes work? My guess is none of my comments will have any effect on OXML.

VML, SVG, Standards

Posted in Uncategorized on July 25th, 2006 by darcusb – Comments Off

Rob Weir, with yet more smart commentary on the MS ECMA spec and standards:

Now take a look at Chapter 23, VML, pages 3571-3795 (PDF pages 3669-3893). We see here 224 pages of “VML Reference Material”, which appears to be a rehash of the 1999 VML Reference from MSDN, and in this form it hides itself in a 4,081-page OOXML specification, racing through Ecma and then straight into ISO. Is this right? Should a rejected standard from 1998, be fast-tracked to ISO over a successful, widely implemented alternative like SVG?

Good question!

Rob makes some good points about why using standards matter in a really practical sense (they are often technically-superior because they’ve done through extensive review, they have knowledge and tools built around them, etc.). I wonder how these issues relate to Rick Jelliffe’s discussion of the developer-friendliness of the two formats?

Call for Comments: Draft ODF Metadata Use Case Document

Posted in Uncategorized on July 19th, 2006 by darcusb – Comments Off

The ODF metadata subcommittee has wrapped up work on a draft [ODF, PDF] of the use cases document we will be submitting for approval to the OASIS ODF TC. The document lays out our vision of what this new support ought to make possible, authored as it was by a group that represents various areas: academia and research, medicine, law, real estate, and of course the software engineers who make it all happen. In many ways, we believe this will go beyond what MS offers in their custom schema support.

If we missed something, please send us your comments.

We will use this document to derive a set of requirements, and, once the ODF TC approves it, then move on to actual implementation details.

Politics, MS and ODF

Posted in Uncategorized on July 19th, 2006 by darcusb – 1 Comment

Two comments from people at Microsoft on the suggestion (from me and others) that they join the OpenDocument Technical Committee to help ease interoperability gaps in the two formats going forward; first Brian Jones:

I think there are still plenty of ways we can help out the OASIS folks with the ODF format. The entire translator project is open source, so the conversion will be completely transparent and everyone will have the ability to benefit from what we discover as the transformations are built. In addition to that, as I’ve looked through our Ecma documentation, I’ve also been looking at the ODF spec as a point of comparison. As I come across areas that are either missing, or just not fully specified, I’ll be sure to point them out on my blog. That should help them in creating a list of areas to improve.

On one hand, this sounds quite generous. To this I say, sure Brian, that’d be great.

But if you parse the language (and my career is just doing just that) it reflects the arrogance of a company that has for too long gotten by on the weight of its own monopoly position. Note: he does not acknowledge that MS might learn something from the experience (see below), and that OXML might be better for it. Likewise, he doesn’t acknowledge that OXML has already borrowed from ODF; for example, in its zipped package file structure.

Now, here’s Dare commenting on Brian’s post:

Unfortunately, the ODF discussion has seemed to be more political than technical which often obscures the truth. Microsoft is making moves to ensure that Microsoft Office not only provides the best features for its customers but ensures that they can exchange documents in a variety of document formats from those owned by Microsoft to PDF and ODF.

Make no mistake: there is something “political” in this position that MS is staking out, which seems to be:

  1. see, we are just as open as ODF?
  2. but ODF is a weak spec that pails in comparison to the technical excellence of Open XML
  3. MS is giving the people what they really want, which is file format support; witness the new BSD licensed ODF plug-in for Office

IBM’s Rob Weir is starting to pay some careful technical attention to these sorts to details. In his latest, he argues the heavy weight of OXML is going to introduce serious implementation, and thus interoperability, problems.

He addresses this through the 50+ pages of references to an obscure feature of page art borders. Yes, the spec actually includes these details! And as Rob points out, this sort of functionality is quite culturally-specific.

The images are heavily weighted to Western even Anglo-American celebratory icons, things like gingerbreadmen for Christmas or slices of Birthday cake, pumpkins for Halloween, or images of Cupid for St. Valentines day, or globes which are neatly centered on the United States.

Rob argues this is a perfect example of over-the-top spec bloat that will make implementation awkward for anyone but MS. Moreover, Rob actually provides an elegant alternative suggestion.

All of these problems (spec bloat, cultural bias, non-extensibility, copyright concerns) can be solved by one simple mechanism. Instead of having ST_Border be a fixed enumerated set of values, have it include only a small number of trivial values like the basic line styles, and have everything else (all of the Art Borders) be stored as a separate image file in the document archive.

Excellent!

Brian, you listening?

Elsewhere, Rob does a good job analyzing just how well MS is doing by their users in the ODF plug-in GUI and import quality.

Meanwhile, I have extensively pointed out where MS ha fallen down in their new citation support. They have invented their own source format, have ignored library communications standards, and appear to be using critical citation coding that will be impossible for standard xpath-based XML tools to process. Some of this has implications for the file format, and I’ve yet to see any serious concern about the issues out of Redmond.

To be fair, them inventing their own source format is no big deal, since there aren’t any good standards here. Still, my other critiques apply.

Despite what it might seem, my position on these matters isn’t blindly political. I believe in open standards because I think in the end they yield better results for end users. I expect to prove that with the citation use case, but I really do want to raise the bar for academic end users all around. Enhancing interoperability between ODF and OXML is an important part of that, and both groups can learn from each other.

FUD, Formats, Applications

Posted in Uncategorized on July 6th, 2006 by darcusb – Comments Off

Brian Jones:

Today we are announcing the creation of the Open XML Translator project that will help translate between the Office Open XML formats and the OpenDocument format… [Y]ou’ll have the ability to save to and open ODF files directly within Office (just like any other format).

An enthusiastic Dare Obasanjo (MS):

It’ll be good to see the debate migrate away from support for file formats back to exactly which product’s features provides the best value for customers. Everybody wins.

An unimpressed Bob Sutor (IBM), complaining about anti-ODF FUD in the press release:

One of the arguments around ODF from the beginning was around the long term preservation of customer information. This is one of the reasons why ODF was created. It is still an important reason why momentum around it continues.

In contrast, ODF focuses on more limited requirements, is architected very differently and is now under review in OASIS subcommittees to fill key gaps such as spreadsheet formulas, macro support and support for accessibility options.

All right, all right, ODF is under active development by a worldwide community of experts not under the control of a single vendor who are making it state of the art in such areas as accessibility! We admit it!

Me? I actually see both sides of this. On one hand, being a part of the ODF TC is an interesting experience, as I see engineers and domain experts from different areas come together to solve important problem by consensus. While IBM and Sun, for example, have a big presence on the TC, whenever I have raised questions, people have always respectively addressed them, and the discussion has improved the end result.

I should also note that at least some of the accessibility “gaps” the MS press release cites are shared by OXML! We now have a crack team of experts on the task, and the result will be that ODF will shortly include accessibility support that is superior to OXML. Ditto for metadata.

On the other hand, there’s no denying that Office 2007 will offer some important improvements, particularly for higher education and research, and that the ODF application market has a ways to go to realize the potential of the open format. The current OpenOffice and ODF citation support is really poor, for example.

So I guess if I’d like to see MS, Sun, IBM, etc. do the right thing for users, I’d like:

  1. MS to follow Bob’s suggestion and participate in the ODF TC, to address concerns we might have about OXML (for example, my interest in the new citation support) and they might have about ODF so that we can ease interoperability in the future
  2. Sun to spin-off OOo, and IBM to contribute to it; make the code more modular and easy-to-extend, and the community more open

Opening Up the Market

Posted in Uncategorized on June 13th, 2006 by darcusb – Comments Off

I said in my last post that:

I cannot emphasize enough how important it is that this stuff be standardized within document formats and included within editing applications. It’s critical, and the sad state of the current market is a direct consequence of the fact that it is not.

What I am saying may seem paradoxical: that including standard support commonly found in third-party plug-ins will actually open up the market, rather than close it. This is so only, however, if one can use alternate data sources. I should, put simply, be able to have Word access RefWorks, or Endnote, or whatever reference management software I want.

Thankfully, there’s a fairly easy way for Microsoft to allow this: tweak their Research Pane a bit.

Right now, the “insert citation” button on the Word ribbon includes an option to “search libraries.” When you click it, it brings the Reearch Pane up. Good!

Sadly, it doesn’t do anything useful (yet). What it should do is give default access to the Library of Congress SRU/W gateway, and to MS’s Academic search service. Further, it should be trivial to add any new data source to this.

Also, a user ought to be able to drag-and-drop the search results onto the document to cite them. I think this does suggest some enhancements to the Research Pane, including removing the requirement to use SOAP. RESTful web service are winning th day, and MS ought to support them.

Problem solved … mostly. We now have good standard base support, but open up options for different kinds of users and user communities, as well as developers.

One problem with this approach, however, is that it puts a lot of burden on the source data format for interoperability, and right now, it is rather more limited than it should be to fulfill that requirement.

Incidentally, everything I’ve been saying is pretty much what we’ve been advocating at the OpenOffice bibliographic project. While it could be coincidence, can’t help but wonder if people at MS haven’t been paying attention, and if we haven’t unintentially done a bit of design work for them!

Update …

From MS’s Chris Pratley, on some forum, more info:

Word 2007 comes with a citation library capability, and by the time we ship it will have connections to on-line reference libraries so you can search for citations and download them to your local library. In beta 2 you have to manually enter citations, but you can keep them in your library and re-use them in different docs.

Word 2007 beta 2 has a set of the most common citation formats (MLA, APA, etc.), and this can be expanded either by end users (need to edit an XML file), or by third parties or Microsoft in the future. We expect a lot of people to add more formats you can download so you don’t have to make them yourself. We’re just two weeks into public beta so that hasn’t had a chance to happen yet.

So seems like good news, though his explanation on citation styling is cryptic.

Plug-in vs. Standard, XSLT vs. CSL

Posted in Uncategorized on June 13th, 2006 by darcusb – 1 Comment

Peter again on citations in Word. Two issues he raises; first about my argument that MS ought to use a CSL (or CSL-like) abstraction on top of a generic XSLT:

Bruce has some concerns about the complexity and size of the XSLT involved, but I don’t think that matters so much -what matters is that XSLT is involved. All that’s required is an CSL to XSLT compiler. Feed CSL in one end and get a Word 2007 compatible stylesheet out the other. This could be done with a stand alone tool.

That would be possible, but not very realistic. It adds further steps to setting up a new style, and as I mentioned, each style file would be verty large. We need to start thinking about open citation style repositories, where a user (or even just a processing tool) can grab a new definition as needed. That is only convenient where the files are:

  • self-contained
  • small

The questions Peter asks near the end (about adding and creating new styles, repositories, etc.) will all have fairly uninspiring answers with the current approach. With CSL, not only do we have a feature-rich language that satisfies the above requirements, but one that is both language and document-format agnostic. One can use the same styles files with ANY document format: Open XML, OpenDocument, DocBook, XHTML, RTF; even TeX.

The second big question Peter asks is whether citations support ought to be standard in Word (or OpenOffice for that matter).

And I’m still dubious about the value of having the bibliographic software built into Word 2007; Microsoft’s site clearly states that if you load a file with citations in it into an earlier version of Word they will be converted to plain text. This means that the feature will not be usable in a real-world context for several years. People have to collaborate with others, work from home and in internet cafes; we can’t mandate Word 2007 in all those places.

First, I think MS can do better than convert the citations to text. I suggest that with their patch to add OXML suppot to previous versions of Office, they include at least basic support to preserve the new citation logic, and perhaps a separate plug-in that provides basic GUI support that would allow compatability with Word 2007.

I cannot emphasize enough how important it is that this stuff be standardized within document formats and included within editing applications. It’s critical, and the sad state of the current market is a direct consequence of the fact that it is not. So I’d emphasize again that I think there’s tremendous promise in this approach, and that it is just in need of some refinement.


Creative Commons License Creative Commons License