SQL 2005 - XML parsing... illegal xml character

 

You may run into an error when using SQL 2005 and a .NET 2.0 aspx or windows client UI.

 

Perhaps you’ve encoded your xml – you are using stored procedures and proper sqlParameters and everything – yet you still run into this exception.

 

Example repro:

DECLARE @xml xml

SET @xml = N'<A>&#x2;</A>'

SELECT @xml.query('<A>{substring(.,1,1)}</A>')

XML parsing: line 1, character 1, illegal xml character

 

This is because SQL 2000’s xml parsing was based on MSXML 3.0 – and SQL 2005’s xml parsing is based on MSXML 6.0. The rules have tightened up a bit. You may be able to store and work with xml characters in your .NET UI layer (ASPX or C# Windows Client) – but once you try to INSERT them you will get this exception.

 

As of SQL 2005 only W3C Compliant characters are allowed in XML. The range of valid XML characters as defined by W3C XML Language specifications 1.0 are #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]. Any characters outside of that range (as in the example above) may actually be INPUT by the user, stored as xml in your client – but will throw an exception when the INSERT command is executed in your SQL server.

 

One answer to this problem (maybe not the most elegant) is to scrub user input via the method shown below:

 

 

   

    /// <summary>

    /// IsW3CCompliant

    /// https://w3c.org

    /// The valid XML characters and character ranges (hex values) as defined by the W3C XML language specifications 1.0:

    /// #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

    /// </summary>

    /// <param name="c"></param>

    /// <returns></returns>

    static private bool IsW3CCompliant(char c)

    {

      int charInt = Convert.ToInt32(c);

      return charInt == 9 || charInt == 10 || charInt == 13 || (charInt >= 32 && charInt <= 55295) || (charInt >= 57344 && charInt <= 65533) || (charInt >= 65536 && charInt <= 1114111);

    }

    /// <summary>

    /// Scrub Xml ensures that each character is W3C compliant.

    /// This is a major performance hit. . .

    /// #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

    /// </summary>

    /// <param name="xmlStr"></param>

    /// <returns></returns>

    static internal string ScrubString(string xmlStr)

    {

      if (xmlStr == string.Empty || xmlStr == null)

        return xmlStr;

      string pattern = @"[^\w\.@-]";

      StringBuilder strB = new StringBuilder(xmlStr.Length);

      //-- If there are no special chars just return the original (99%)

      Regex regex = new Regex(pattern);

      if (!regex.Match(xmlStr).Success)

        return xmlStr;

      char[] charArray = xmlStr.ToCharArray();

      for (int i = 0; i < charArray.Length; i++)

      {

        if (IsW3CCompliant(charArray[i]))

        {

  strB.Append(charArray[i]);

        }

      }//for

      return strB.ToString();

    }

  /// <summary>

  /// XmlTextWriter override that scrubs the attributes

  /// </summary>

  public class MyXmlTextWriter : XmlTextWriter

  {

    /// <summary>

    /// MyXmlTextWriter scrubs attribute values. Constructor just calls base.

    /// </summary>

    /// <param name="tw"></param>

    public MyXmlTextWriter(System.IO.TextWriter tw) : base(tw) { }

    /// <summary>

    /// MyXmlTextWriter scrubs attribute values. Constructor just calls base.

    /// </summary>

    /// <param name="filename"></param>

    /// <param name="encoding"></param>

    public MyXmlTextWriter(string filename, System.Text.Encoding encoding) : base(filename, encoding) { }

    /// <summary>

    /// MyXmlTextWriter scrubs attribute values. Constructor just calls base.

    /// </summary>

    /// <param name="w"></param>

    /// <param name="encoding"></param>

    public MyXmlTextWriter(System.IO.Stream w, System.Text.Encoding encoding) : base(w, encoding) { }

    /// <summary>

    /// This class scrubs the attribute value - the sole

    /// reason for this class.

    /// </summary>

    new public void WriteAttributeString(string localName, string value)

  {

      base.WriteAttributeString(localName, XmlHelper.ScrubString(value));

    }

    /// <summary>

    /// This class scrubs the attribute value - the sole

    /// reason for this class.

    /// </summary>

    public void WriteAttributeStringRaw(string localName, string value)

    {

      base.WriteAttributeString(localName, value);

    }

  }//MyXmlTextWriter