RELAX NG, XSD, Schematron

For anyone writing a new XML language in 2006, they are faced with a choice of schema languages. Despite all the marketing and engineering dollars thrown at XML Schema, it is a brain-dead specification; horribly complex where it doesn’t need to be, and incredibly dumb elsewhere. There are all sorts of practical XML constraints that simply cannot be modeled in XSD.

Want to condition the validation of child elements on an attribute? Sorry, you can’t do that.

Want to give users a choice between an empty element with attributes, or text content without attributes? Nope.

Want to define a content model where order is unimportant? Sorry, you can’t do that either.

Thankfully, there is a better alternative in RELAX NG. Here’s an example (using the compact non-xml syntax) from my Citation Style Language (CSL) schema, where I condition validation on a root class attribute:

  CitationStyle =
    element cs:style {
      AuthorDateStyle
      | NumberStyle
      | LabelStyle
      | NoteStyle
      | AnnotatedStyle
      | CustomStyle

The AuthorDateStyle pattern is then defined like:

  AuthorDateStyle =
    attribute class { "author-date" },
    Info,
    Terms?,
    Defaults,
    AuthorDateCitation,
    AuthorDateBibliography

So the AuthorDateStyle class requires a citation and bibliography element, the sort element child of bibliography must be set to “author-date” and so forth. The schema reflects the expectations tools developers should bring to the table in designing scripts, or GUIs, or whatever.

But what happens if you need to provide your RNG schema for validation in XSD-oriented workflows? Here’s my own conclusion:

Define the schema in such a way that it is easy to create a customization that overrides the more complex restrictions; a simplified schema that Trang can automatically convert to valid XSD. It’s as a simple as:

include "csl-alt.rnc" {
  cs-citationstyle =
    element cs:style {
      attribute class { cs-classes },
      Info,
      Terms?,
      Defaults,
      Citation?,
      Bibliography?
    }
}
cs-classes = "author-date" | "number" | "label" | "annotated" | "note"

… where all of the above patterns are simple ones without content restrictions that will make XSD choke.

Trang will then happily create a valid XSD file from this simplified schema.

However, you end up with a much looser schema, so now what? It’s hardly much use to be creating instances against such a loose schema, where they may be invalid against the normative spec and schema.

Answer: create some separate Schematron rules to model the constraints that XSD cannot. If you want to write it within your RNG customization schema (which can then be extracted using Trang + XSLT), then just do stuff like:

    s:rule [
      context = "/cs:style[@class='author-date']"
      s:assert [
        test = "cs:bibliography/cs:sort/@algorithm='author-date'"
        "Must use author-date sorting for the author-date class."
      ]
      s:assert [
        test = "name(cs:citation/cs:layout/cs:item/*[1]) = 'author'"
        "The citation item layout must include an author element first."
      ]
    ]

Finally, write a little shell script to run both validations.

Not nearly as elegant as the pure-RNG approach (certainly does little for any real-time validating IDE’s I know of), but it can assure that the instances match the expectations modeled in your RNG schema. And learning a little Schematron is probably good anyway, because it in turn can express things that RELAX NG cannot.

Am personally hoping not to have to have to do this with CSL though; it’s enough for me to worry about one schema.

Comments are closed.


Creative Commons License Creative Commons License