What is "Custom XML?" … and the impact of the i4i judgment on Word


I recall saying recently that “this is my last post for 2009.” Whoops… I don’t think I was anticipating this. I watched with interest yesterday the coverage and reaction to the i4i judgment. I am not keen to share my own thoughts about the case here, but I would like to offer clarity around the specific area of Word in question, and suggestions for what people can do about it if they are using that functionality today. There is much confusion about the part of Word that is actually affected.

First, some things to understand:

We do not anticipate any interruption in the availability of Word or Office 2007. Additionally this ruling has no impact on the scheduled availability of the 2010 Office version which is planned for the first half of CY2010.


Current users are not affected. If you are using the custom XML tags in Word 2003 or 2007 (these show up in Word as Pink Tags around tagged content), you are free to continue doing so with the products you have already purchased.


Open XML standards (all ECMA and ISO versions) are not affected. Even if Word’s specific implementation of custom XML support does infringe the i4i patent (which Microsoft does not believe to be the case), i4i has never claimed that its patent is essential to the OXML standard.


Content Controls of Word (screen shot below) are not affected. In Word 2007 and Word 2010, this is a common method of binding document content to data stored in a custom-defined schema within a document.


image


The functionality that is in question is indicated by the screen shot below. Custom XML Tags in Word documents are visible in the Word user interface as Pink Tags surrounding tagged content in a document.


image


What you can do if you have questions about your solutions that use Custom XML Tags:

First, download the Office 2010 beta and test your solution. If your solution works in Office 2010, it does not depend on the functionality in question. If your solution does utilize Custom XML Tags, consider re-implementing the solution using Content Controls. Detailed guidance on the use of Content Controls in Word 2007 can be found here. Also note the Word Content Controls Toolkit on CodePlex. The Open XML SDK, of course, is quite useful for getting people up to speed on developing solutions for Word and Open XML.


Update: Additional Detail


In response to several inquiries on the topic, I have included additional text describing the feature area that is affected vs. what is not affected, including links to KB articles which illustrate the capabilities in more detail. 


Affected:


Word 2003 and Word 2007 distributed prior to 1/11/2010 can read files that contain XML markup (ref: “Understanding Word’s XML Markup [Word 2003 XML Reference]”, http://msdn.microsoft.com/en-us/library/aa212889(office.11).aspx. When custom XML markup is present, Word delineates this content in a Word document which allows it to later save the file to .DOCX, .DOCM, or .XML with that content marked up.


The Word 2007 product distributed by Microsoft after 1/10/2010 will no longer read the Custom XML markup contained within .DOCX, .DOCM, or .XML files.  These files will continue to open, but the Custom XML markup tags will be removed. Custom XML markup stored within .DOC files will not be affected by these changes.  Word 2003 and existing installations of Word 2007 will not be affected by this change.


 Not Affected:


Word 2007 also added features allowing Content Controls to map to XML data stored in a DOCX or DOCM file (ref: “Mapping Word 2007 Content Controls to Custom XML Using the XML Mapping Object”, http://msdn.microsoft.com/en-us/library/bb510135.aspx). Content Controls and XML data stored within DOCX or DOCM files will not be affected by this change. 


 


 

Comments (56)

  1. Anonymous says:

    @Daniel,

    The Word object model method Range.InsertXML is a way of loading XML content into an existing Word document via the OM.  In the patched version of Word, the Range.InsertXML method will not read custom XML markup anymore.   Any custom markup in the XML you pass to Range.InsertXML is removed.

  2. Anonymous says:

    @Ernst,

    We don’t plan publish the patch via Microsoft Update.

  3. Anonymous says:

    Shiv,

    Yes,  you can continue using your solution based on content controls mapped to custom XML. And you can expect to continue seeing the content control mapping inside the Word document.  Content controls are not affected by this.

  4. Anonymous says:

    @Brian,

    Word 2003 does not support content controls.   But as long as your customers who are using Word 2003 already have their copies (which they presumably purchased long before January 11, 2010) they can continue to use those copies and they don’t need to apply the patch.

  5. Anonymous says:

    @Barry,

    You will not lose your .docx and .docm files after January 11.   You will still be able to open them.

  6. Anonymous says:

    Ankish, I hope my addition to the post will answer your questions.

  7. Anonymous says:

    Thanks Gray for this post. That last thing we want (and our clients/partners need) is more confusion, and you do a great job of defusing it.

    Francis Dion

    CEO, Xpertdoc Technologies Inc.

    http://francisdion.blogs.com/software_process/

  8. Anonymous says:

    Volume licensees have been instructed to install the patch (already made available) for new licenses purchased after January 10, 2010.   Volume License customers can find instructions on the Microsoft Volume Licensing Service Center download pages. In addition, an updated version of Office 2003 Professional will be available to MSDN subscribers  on MSDN.   The updated version will no longer read the Custom XML markup contained within .XML files.     Office 2003 Standard was not affected because that version of Office 2003 did not have support for Custom XML markup to begin with.

  9. Anonymous says:

    @Kevin Forbes,

    The patch described here http://support.microsoft.com/kb/978951 can be applied to all language versions of Word 2007.   All versions of Word sold by Microsoft in the US  after Jan 10 (not just English) will remove custom XML markup when opening files.

  10. Anonymous says:

    @Andy Burns,

    The Open XML format (including document properties) are not affected. Typically these would be bound to a document using content controls, which are also not affected. So your metadta solutions should be in the clear.

  11. Anonymous says:

    @Brian again,

    Yes, your customers can process the XML parts inside the zip package which is a .docx file.   See Eric White’s guest post on my blog for some links to more resources.

  12. Anonymous says:

    @Rick R

    Hi Rick, thank you for your comment. First, let me just say that it’s great to see PHP development with Open XML. The best place you can go to get help with this question is to http://www.openxmldeveloper.org. There you will find a very large community of developers working with the formats, and can give you pointers on how to implement custom-defined schema as part of your solutions.

  13. Anonymous says:

    @Ron,

    I am unaware of any tool that will scan documents for the existence of CustomXML today.

    As we near the ship date for the patch, we will make more informaiton available about its deployment.

  14. Anonymous says:

    Hi Jeff,

    I wonder if you have also considered using InfoPath as an alternative? Among the extensive capabilities InfoPath has for creating and managing arbitrary XML fragments, it includes a robust template version management capability set that might be ideal for helping with this type of solution. It is hard for me to offer a lot more guidance than that in this forum, feel free to use the email button on the blog and I can point you at more specific resources on Open XML Developer and MSDN for the use of content controls and for InfoPath.

  15. Anonymous says:

    Barry,

    Updated versions of Word can still read and write DOCX and DOCM (as well as all other formats previously supported.) We have not reverted to any prior formats.

  16. Anonymous says:

    Hi Andy,

    I am not in a position to evaluate your specific solution, but I can offer some guidance in areas of technology that may benefit the scenario you are describing.

    Content Controls can be used to indicate placeholders for editing in a document, and those can be bound to XML schemas. I would also encourage you to look at the new co-authoring capability of Word 2010, which allows sections or fragments of documents to be edited by multiple users (while locking down the remainder of the document). Lastly, I would suggest Word Services of SharePoint Server 2010 along with the Open XML SDK for server-side document automation.

    http://blogs.msdn.com/microsoft_office_word/archive/2009/09/09/co-authoring-in-word-2010.aspx

    http://blogs.msdn.com/microsoft_office_word/archive/2007/01/16/mapping-magic.aspx

    http://msdn.microsoft.com/en-us/library/bb448854(office.14).aspx

    http://blogs.msdn.com/microsoft_office_word/archive/2009/12/16/Word-Automation-Services_3A00_-What-It-Does.aspx

  17. Anonymous says:

    @Yuhong Bao,

    Unfortunately there is no reliable way to tell from looking at the file dates or file version numbers.   The easiest way to check will be to open a very simple xml file containing customXML markup,  as in the screen shot above.   If the “pink tags” do not appear then that version of Word does not have the ability to load custom XML markup.

  18. Anonymous says:

    Hi Jeff,

    I would recommend starting with this post: http://blogs.technet.com/gray_knowlton/archive/2010/01/15/associating-data-with-content-controls.aspx

    which discusses some of the aspects of binding custom XML to content controls. This might be a good way to approach building templates.

    Another great resource for Content Controls in Word is http://www.openxmldeveloper.org

  19. Anonymous says:

    Ryan,

    The patch only applies to newly sold licenses. Existing licenses of Office do not require the patch. Office 2003 is no longer for sale. The KB article for Office 2003 and its patch can be found here:

    http://support.microsoft.com/kb/979045/

    Also please note: .DOCX, .DOCM and .XML files are the file types that are affected. .DOC, the default format for 2003 is not affected.

  20. Anonymous says:

    What does ruling mean for custom Ribbon development in MS Word – I distribute a Ribbon to my customers – Ribbons have an XML component.  

    From the sounds of what I have read, look like custom Ribbon addins will be unaffected; and the will mostly large doc management projects.  

    Could this be confirmed please?

    Thanks

  21. Anonymous says:

    Hi Rick,

    I don’t really want to talk too much about the case itself, only the outcomes. Right now my focus is helping people understand the impact (non impact) to their solutions.

  22. Anonymous says:

    @Robert te Kaat,

    Yes, you will still be able to attach a schema to .DOC documents files and insert custom XML tags into .doc documents using the updated version of Word 2007.  Note that you will also still be able to save these files to XML based file formats such as .docx and the tags will be saved.   However if you open that .docx file again in the updated version of Word 2007,  any custom XML markup will be removed at that point.

  23. Anonymous says:

    To all commenters, please accept my apologies for the quietness. I have been out of the office and am now back, I will attempt to answer your questions as soon as possible.

  24. Anonymous says:

    @"Brian,"

    I think it would be best to focus on migrating solutions to content controls and Open XML. If you have solutions that use the CustomXML functionality
    today, this is a good solution you can move toward as a replacement.

  25. Anonymous says:

    @Ty,

    Correct, the .DOC format is not affected, so using the .DOC format is one option to keep your solution going.

  26. Anonymous says:

    @Suzy Davis,

    You are correct, Ribbon and XML-based extensibility is not affected.

  27. Shiv Khare says:

    does this mean that we can still use our solution around content control mapped to custom XML within the document. Only change will be we cannot visualize the mapping inside the Word application????

  28. Ankush says:

    Hi,

    I have reviewed the blog and still have few questions:

    a. For ex: if my document contains an attached XML schema, will that be affected? if yes, then what uses will be affected?

    b. My solution which is based on content control and Custom XML  & mapping will continue to work?

    c. will that affect the Word 2003 XML Schema feature as well? (using Office Compatibility pack)

  29. James Toney says:

    Great blog entry – you’re the first to address the implications of the i4i ruling on vsto development, and I was wondering the same thing.

    I did a quick test on office 2010 beta release, and was able to bind custom xml data to the content controls with no problems.  So far so good…

  30. Ty Anderson says:

    Hi Gray,

    From what I gather, the .DOC format with attached Custom XML is unaffected. So the thing to do is use .DOC format for this type of solution. Correct?

  31. Robert te Kaat says:

    “Custom XML markup stored within .DOC files will not be affected by these changes.”

    – Does this mean I can still use 2007/2010-versions of Word to insert custom XML tags into .doc documents?

    (My open source document generater depends heavily on custom XML tags: http://flexdoc.codeplex.com, so I’m very very disappointed this feature is being dropped!)

  32. Yuhong Bao says:

    Thank you for explaining it. Now I have another question: How do you tell if the Custom XML capability has been removed from the version of Word installed? File dates? File version numbers? Error messages?

  33. Andy Burns says:

    Thanks Gray, that post was just the clarification I was looking for! It probably doesn’t help that document properties are stored in ‘Custom.xml’ in a docx file… :)

  34. Lauro Colasanti says:

    Hi Gray,

            first of all thanks for your post which will save me a lot of wasted time.

            I have an old project (template+VBA) written in Word XP in which I programmatically swap portions of the document and attach to them attribute hidden to the user. Of course I couldn’t use XML, so I achived my goals using Hidden Text and Paragraphs Styles. My solutions was not very sound so I’m in the progress of rewriting it for Word 2007 and 2010 using  Custom XML Markup.

           I have written MyXMLSchema.xsd with elements and attributes and tag programmatically portions of the document.

           But it seems that I’m exactly in the situation you are describing. When I open a tagged document in Word 2010 Beta the Custom XML (pink tags) doesn’t show. I’m still able tag a document in Word 2010 Beta but after I save the file and reopen it nothing is there and I have to go on retagging everyting, altought the Custom XML is still there in the document.xml of the Package.

          So I think I have to use, as you suggest, Content Controls, which is probably even better because I can avoid the user deleting the tags/controls.

         But I have some problems:

    1)  How could I attach Attributes to Content Controls?

    2)  I don’t need (and want) mapping  my Rich Text Controls on a XML Custom Part and I’m afraid of making needless bigger my files.

         BTW, Do you think I’m on the right track or could I use a different approach?

        Thanks in advance, Lauro

  35. Rick Jelliffe says:

    I don’t understand this at all.  The i4i patent relates to a separation of metacodes and content doesn’t it? Where is the separation, if it is the pink tags only?  

    If this relates to concrete internal implementation issues only, how did i4i know without access to the source code? (Was the source code examined in the trial?)

    And what kind of addressing was actually used?  A kind of XPath? A kind of tumbler? A kind of ID? A kind of numeric offset?

  36. Ron McMahon says:

    Two questions:

    1) My organization has over 10,000 employees with more than 25,000,000 office files in active, spinning storage.  Does Microsoft have any tool that we can use for discovering if any of these Office files contain the impacted content?

    2) We (of course) have a pre 1/11/2010 version of Word installed now.  Will the removal of this functionality come in some form of a service pack, security patch or Office Update, or will this change in Word’s behavour only happen if we explicitly re-install Word from a post 1/10/2010 copy of Word?

  37. Rick R. says:

    I’m currently working on a system (in PHP) to manipulate Word documents as part of a DMS.

    This system replaces marked text with the Content Controls. However, Content Controls do not offer functionality to do similar things with tables.

    As a semi-nasty workaround I’m currently using the w:customXml tags (and w:element attribute) to tag server generated tables so I can keep them updated when a user has re-uploaded the document.

    I tried the new Office 2010 beta and found out that it removes these tags (as expected) when a user saves the document so I can no longer tag tables like this.

    So my question;

    Is there another tag like the w:customXml tags which I can (ab)use to tag my generated tables?

    I just need a way to tag a table which is preserved after a document is saved, I don’t particulary care if the solution is ‘clean’ as long as it’s likely to work with future versions of Word.

  38. Andy says:

    From my initial tests, I think the impact willl be very much more serious than you seem to be implying.

    My application processes Word XML documents (and fragments thereof) on the server, and puts hidden XML tags inside the document, then allows the user to edit the document and return it to the server for further processing.

    From my initial investigations of the KB974631 patch, it seems that if the user simply opens and saves the document from word, all extra XML information is deleted from the document. Surely this basically stop all applications that use custom XML tags dead in their tracks?

  39. Brian says:

    Gray,

    Thanks for fielding these questions.  A few for you:

    (1)  Given a valid Word .xml file with custom XML tags, when opened by a revised Word 2003 or 2007, it will remove the custom tags… but will it reliably leave the content within those tags?

    (2)  Given existing .xml files with custom XML tags, is there any way to have a revised Word 2003 or 2007 load up that .xml file and save out a .doc file without losing the custom XML tags?

    (3)  I understand why Microsoft must remove the existing implementation of custom XML tags… and why that means removal of the functionality FOR NOW… but will Microsoft be developing an alternative (non-infringing) implementation of that functionality in the near future?

    Thanks for any guidance you can provide,

       Brian

  40. This blog post was just what I’ve been looking for. Thank-you for taking the time to write this up in a simple and easy to understand way.

  41. Ryan says:

    Hi Gary,

    Thanks for a great post, and many great conversations inside as well.

    I Have a question on 2003 – My company has a big product relying on MS-Word 2003 Pro, and probably uses some custom XML markup.  

    We are trying to evaulate the impact – and seen some notes on this at http://support.microsoft.com/kb/978951 ,

    this KB states that there is a new update for Word 2003, but it is probably not changing anything.

    Do you know if indeed there are impacts in Office 2003, and where this version can be found for testing ?  (the article promises it’ll be on the MSDN, but couldn’t find it).

  42. Daniel Fisherman says:

    Thanks, Gray.  Given that custom xml in .doc files should not be affected in any way by the ruling, it is perplexing that I cannot insert custom tags into a .doc file. I have tried to do this both using the XML Structure Task Pane, and programmatically, through calls to Range.InsertXML, where the xml to be inserted contains custom xml.

    a) Can you verify that, indeed, custom xml tags cannot be inserted into .doc files?

    b) If this is the behavior for you, too, is there an explanation as to why this is the case?  Could this be an oversight by the Word team?  Might Microsoft be amenable to removing these restrictions?

    Thanks for any help on the issue, as my company’s product uses these codes extensively in our application.

    Dan

  43. Brian says:

    Gary said:

    “I think it would be best to focus on migrating solutions to content controls and Open XML. If you have solutions that use the CustomXML functionality today, this is a good solution you can move toward as a replacement.”

    Gary,

    Does (or will) Word 2003 support content controls in Open XML documents?  (I can’t force my customers to move to Office 2007.)

    The ability to export .xml files was a huge feature for my software.  For those worried about getting locked into a proprietary application, I would just tell them that we can export into .xml files that Word can open and any XML app can open… and all the structured data is tagged in the XML so you can import it into anything.

    Switching to Open XML is okay… I can still say the same thing sorta… its a bit more complicated as the XML is nested in a ZIP file.

    Switching from Custom XML to “content controls” is a bigger concern.  What do content controls look like in the .ZIP file?  Can my customers get to them by processing the XML in the XML file inside the .ZIP file?

     Thanks, Brian

  44. Kevin Forbes says:

    Hi Gray,

    Thank you for this very helpful post.

    Do you know if the post 01/11/2010 update for Word 2007 applies to all international versions of Word, or just in the States?

    Thanks,

    Kevin

  45. Ernst Scheithauer says:

    Hi Gray,

    we are based in Europe and develop a solution for a multinational company heavily relying on CustomXML tags. We use them because ContentControls currently do not support some use cases we need.

    Now here are some more questions not answered yet:

    – The court ruling only applies to the US. Is Microsoft pulling the functionality only from all versions or US/English version only?

    – Are Word 2010 and further version going to support CustomXML tags?

    – Is the corresponding patch “Update for Microsoft Office Word 2007 (KB974631)” going to be published via Windows Update? Is it going to be an automatic update?

    Thanks in advance,

    Ernst

  46. Brian says:

    Sorry, my fingers keep typing "Gary" instead of "Gray"…

    feel free to type "Barin" instead of "Brian".  ;^)

  47. Barry says:

    hey Gray

    After finding out about this news. I was definitely surprised.

    Even though I purchased word 2007 from one of the stores out of town via phone around 2008.

    I am a concerned, about the files I had saved in docx and docm. I was wondering if that I might lose my files after January 11. the files that were saved in docx and docm won’t be opened post january 11.

    I have so many files saved, that I had them stored in two 2GB USB drives.

    I mainly use my ms word to format all of my writing, whether it be screenplays and fiction writing.

    Unfortunately, it is too many to save my writings in PDF and RTF.

    A friend once stated that his wife has word 97 on her vista laptop. For some unknown reason, its been updating.

  48. Barry says:

    Oops,

    I was meant, doc and docx, something tells me that we would all find out.

    word 97 can’t open word 2003.

    Man I sure hope I can still open all of my files saved in docx and still save my files in that format post january 11

    I had a feeling that it would an interesting year, low and behold I was right.

  49. Brian says:

    Gray,

    Per Eric White’s guest post on your blog, you use content controls when you want to give semantic meaning to groups of paragraphs.  Although that’s one use of Custom XML, that doesn’t cover the common use:  identifying single words or numbers or phrases within a paragraph that have particular semantic meaning.

    Our customers are companies… companies that buy new copies of Office and have downgrade rights so that they can keep everybody on the same version… and many of those companies are sticking to Word 2003.  So, if a new company buys an additional license of Office and downgrades to Word 2003, can they legally downgrade to the version everybody else in the company has (supporting Custom XML)??  Or will new hires have to use a version sans Custom XML even though all the existing employees will have Custom XML?

    Thanks for all the answers.

  50. David Austin says:

    For Office 2003, what will happen to volume license/open license downgrades? Will the original software, or a patched version of office 2003 be available on partners/technet/MSDN?

  51. Jeff says:

    Currently my solution used custom xml schema (pink tags) that users can map to word 2003 and create templates as xml files.

    They normally create master templates and store on their machine. Whenever there is change is schema they just add additional tags to their master templates.

    Now i want to upgrade the application to use word 2007. What is the best way to create templates in word 2007. If i map content controls to custom xml will it work the same way if i change the schema (add or delete the node)?

    Once the word 2007 template is created i want to fill the data using openxml sdk by using linq.

    Any link that has entire process will help me a lot.

  52. Jeff says:

    I appreciate your reply. Before i dig into it can i know if users can update the latest schema with changes that i expose from my web application without losing their changes?

    Due to the nature of my application the templates are very dynamic by nature and every few days i have to modify my xml schema for addition of new xml tags that users use in their template.

    Than they upload the template to my application and i grab them to create documents using wordml.

    My another requirement is to insert html text and images to word template and i guess i can do that with altchunk.

  53. Keith says:

    Thanks for the post, Gray. Our IT organization pushes updates, so the pink tags just mysteriously disappeared a few days ago. My organization uses Word 2007 as an editor for a custom XML language.  One style sheet is used to transform our custom XML into WordML for editing, then a second style sheet is used to transform WordML back into our custom XML when the file is saved. When the file is opened, the style sheet is listed in the XML data views pane along with the data only view. The pink tags provided a visual indication to authors that they needed to apply the style sheet. Sadly, no more. Is there a way to force Word 2007 to automatically apply the style sheet when the file is opened that I have overlooked?

  54. Barry says:

    A lot of people were inconvenience after jan 11. I think I am all right.

    I am just wondering would this updated versions of word 2007 be able to open and create doc.x and doc.m

    A friend of mine stated that MS word had reverted to MS word 97. which is old.

  55. robobot says:

    It’s an incredible decision. custom XML tag is a very powerful tool and a lot of user use it to extract well formed and validated XML using their XML Schemas attached in Word.

    There is no alternative.

    Content Controls  cannot be validated against XML Schemas associated to the Document. The entire mechanism of validation, attributes insertion, XML extraction of data is not possible with Content Controls.

    If you have to write a normal document and you want to TAG portions of the document with XML tags respecting your Schema then you can’t now with the Patch.

    I am in Europe and I haven’t understand If this patched Office is only for the USA or the Europe Area too.

    Please Do you have any fresh and good news?

    I