I was recently pointed to a presentation about Open XML that raised my curiosity. It found its way to me because it included my picture, but the content is what’s on my mind. I take the Open XML discussion pretty seriously; I’ve had very interesting and stimulating discussions about Open XML with a lot of folks, but I’ve also seen a lot of the nonsense that makes the discussion cloudy and difficult (see below).
The slide references a comment I made in a ZDNet Australia interview in reference to advantages of XML-based document formats over binary formats for enabling better security. Specifically, having file formats represented in XML makes parsing simpler, because XML documents are expressed using a pre-defined (in this case public) schema. They can be easier to parse than binary formats, which can be opaque and obscure, even when you already have its documentation. Given a choice, I’m sure that 99/100 developers would prefer to work with an XML-based format over a binary format, if only for the sake of simplicity, and my comment here illustrates one of those reasons.
The deck goes onto state that Open XML allows “arbitrary binary blobs of data”, citing this as a “security hole” (this isn’t really anything new; this has been rehashed on several forums). I’ll just take a guess and say the presenter probably missed a few important references about ODF (search for “Binary” in the text), or within the ODF spec itself… Section 9.3 of the ODF specification discusses how frames can contain “Objects represented either in the OpenDocument Format or in a Object Specific Binary Format.” Section 9.3.5, describes the ability to add “plug-ins” to documents for “a media type that is not usually handled natively by office application software.” Base64Binary is a core data type of ODF, as described in section 16.1.
Of course both Open XML and ODF allow the embedding of binary content. So I guess it’s not clear to me why we’re picking on the binary DevMode structure when (so-called “arbitrary”) Binary data is supported in both formats (and probably every other authoring file format that is in widespread use today). If the implication is that ODF doesn’t allow the inclusion of “arbitrary” binary information the implication is absurd and false. By this logic I’d guess it’s worth a question to OASIS if we should expect binary data to be removed from a future version of the ODF spec? – I know the answer to that question; it’s not even worth asking.
I haven’t heard the deck presented, nor do I plan to tear the rest of it down (might be fun for a rainy day), but it looks to me like whoever created this slide deck is attempting to criticize a fundamental purpose of XML. Or maybe this is a criticism of the entire list of XML-based format specifications. Nothing about this criticism is specific to Open XML… it is an indictment of XML and document formats.
It seems odd to pick a fight with yourself (… very Fight Club-ish… “I am Jack’s Self-deprecating Argument”…);
The discussion about parsing XML formats vs. binary formats is equally applicable to Open XML, ODF, UOF, CDF, or (pick your XML-based format of the day). These slides contribute nothing to the XML formats discussion other than confusion. Part of the reason that the XML Formats debate exists is because (I think) we at least agree that XML offers us better opportunities for document format management than a binary format would… but according to the their point of view, I seem to be mistaken on that point. I must also be seeing things, because when I read the ODF spec, I see a lot of “arbitrary” binary data types in there too… obviously I’ve missed something.
Silly me J.