How to deal with invalid characters in SOAP responses from ASP.NET web services

ASP.NET webservices use XML 1.0 which restricts the character set allowed to the following chars:

[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
(Source: http://www.w3.org/TR/REC-xml/#charsets)

As you can see several characters below 0x20 are not allowed with XML 1.0. This also includes characters like the vertical tab (0x0B) which is used pretty frequently.

For backward compatibility reasons .NET webservices do not support XML 1.1 which would allow these character as explained in the following article:

http://msdn.microsoft.com/en-us/xml/bb291067.aspx
W3C Recommendations NOT Supported at This Time

XML 1.1 – Microsoft has deliberately chosen not to support the XML 1.1 Recommendation unless there is significant customer demand.

Usually the limitation of XML 1.0 is not hurting – except if the XML response sent back to the client would include one of the forbidden characters like the vertical tab.

An interesting tidbit is that the web service stub (server) routines implemented in .NET framework do not bother about the invalid characters when encoding the XML response. They encode the invalid characters as numeric character reference like  for the vertical tab char. The problem occurs in the web service proxy (client) routines. These raise an exception when an entity is returned which is not allowed in XML 1.0:

There is an error in XML document (8, 1314).
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
   …

So there is a slight discrepancy between the handling of the invalid characters on the web service proxy and stub side.

It would be great if the ASP.NET webservice classes would avoid sending invalid numeric character references to the client at all – but that is not implemented in .NET framework. As the client also cannot get hands on the content before the exception is raised it would be required to fix the issue in the application logic of the web service itself.

Means each web service method would have to replace the invalid characters before sending the XML content to the client.

The problem here is that usually the automatic XML serialization of managed objects is used to create the XML response. So fixing the issue inside the web service is also not trivial.

This issue also affects the standard sharepoint web services which allow access to SharePoint content.

To overcome this problem it would be required to remove the invalid characters from the XML response (e.g. replace them with a space char) after they have been serialized to XML and before they are sent over the wire to the caller of the web service.

The good message is: ASP.NET has indeed a way to achieve this: SoapExtensions. A SoapExtension allows to consume and modify the SOAP message sent from client to server and vice versa.

Below is a SoapExtension which replaces the invalid control characters with a blank character (0x20) using Regular Expressions:

///
///  This source code is freeware and is provided on an “as is” basis without warranties of any kind, 
///  whether express or implied, including without limitation warranties that the code is free of defect, 
///  fit for a particular purpose or non-infringing.  The entire risk as to the quality and performance of 
///  the code is with the end user.
///

using System;
using System.IO;
using System.Web;
using System.Web.Services;
using System.Web.Services.Protocols;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace StefanG.SoapExtensions
{
    public class XmlCleanupSoapExtension : SoapExtension
    {
        private Regex replaceRegEx;

        private Stream oldStream;
        private Stream newStream;

        // to modify the content we redirect the stream to a memory stream to allow 
        // easy consumption and modifcation
        public override Stream ChainStream(Stream stream)
        {
            // keep track of the original stream
            oldStream = stream;

            // create a new memory stream and configure it as the stream object to use as input and output of the webservice
            newStream = new MemoryStream();
            return newStream;
        }

        public override object GetInitializer(LogicalMethodInfo methodInfo, SoapExtensionAttribute attribute)
        {
            // the module is intended to look at all methods. Not on methods tagged with a specific attribute
            throw new Exception(“The method or operation is not implemented.”);
        }

        public override object GetInitializer(Type serviceType)
        {
            // create a compiled instance of the Regular Expression for the chars we would like to replace
            // add all char points beween 0 and 31 excluding the allowed white spaces (9=TAB, 10=LF, 13=CR)
            StringBuilder RegExp = new StringBuilder(“&#(0”);
            for (int i = 1; i <= 31; i++)
            {
                // ignore allowed white spaces
                if (i == 9 || i == 10 || i == 13continue;

                // add other control characters
                RegExp.Append(“|”);
                RegExp.Append(i.ToString()); 

                // add hex representation as well 
                RegExp.Append(“|x”);
                RegExp.Append(i.ToString(“x”)); 
            }
            RegExp.Append(“);”);
            string strRegExp = RegExp.ToString();

            // create regular expression assembly 
 
           Regex regEx = new Regex(strRegExp, RegexOptions.Compiled | RegexOptions.IgnoreCase);

            // return the compiled RegEx to all further instances of this class
            return regEx;
        }

        public override void Initialize(object initializer)
        {
            // instance initializers retrieves the compiled regular expression
            replaceRegEx = initializer as Regex;
        }

        public override void ProcessMessage(SoapMessage message)
        {
            if (message.Stage == SoapMessageStage.AfterSerialize)
            {
                // process the response sent back to the client – means ensure it is XML 1.0 compliant
                ProcessOutput(message);
            }
            if (message.Stage == SoapMessageStage.BeforeDeserialize)
            {
                // just copy the XML Soap message from the incoming stream to the outgoing
                ProcessInput(message);
            }
        }

        public void ProcessInput(SoapMessage message)
        {
            // no manipulation required on input data
            // copy content from http stream to memory stream to make it available to the web service

            TextReader reader = new StreamReader(oldStream);
            TextWriter writer = new StreamWriter(newStream);
            writer.WriteLine(reader.ReadToEnd());
            writer.Flush();

            // set position back to the beginning to ensure that the web service reads the content we just copied
            newStream.Position = 0;
        }

        public void ProcessOutput(SoapMessage message)
        {
            // rewind stream to ensure that we read from the beginning
            newStream.Position = 0;

            // copy the content of the stream into a memory buffer
            byte[] buffer = (newStream as MemoryStream).ToArray();

            // shortcut if stream is empty to avoid exception later
            if (buffer.Length == 0return;

            // convert buffer to string to allow easy string manipulation
            string content = Encoding.UTF8.GetString(buffer);

            // replace invalid XML entities using regular expression
            content = replaceRegEx.Replace(content, “&#32;”);

            // convert back to byte buffer
            buffer = Encoding.UTF8.GetBytes(content);

            // stream byte buffer to the client app
            oldStream.Write(buffer, 0, buffer.Length);
        }

    }
}

The above code should be compiled into a C# class library project and signed with a strong name to allow placing the DLL into a GAC.

Afterwards the SoapExtension can be registered in the web.config of the affected web service. For SharePoint webservices this would be the web.config in the following directory:

C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\ISAPI\

The following entry needs to be added to the web.config:

<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?>
<configuration>
    <system.web>
        <webServices> 
            <soapExtensionTypes>
                <add type=
“StefanG.SoapExtensions.XmlCleanupSoapExtension, XmlCleanupSoapExtension, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0e15300fe8a7b210” priority=“1” group=“0” />
            </soapExtensionTypes>
 

        </webServices>

    </system.web>

</configuration>

Afterwards all responses from webservice methods in the affected web application will automatically be cleaned up.

The complete source code can also be downloaded from here: http://code.msdn.microsoft.com/XmlCleanupSoapExtens

 

16 Comments


  1. Wondering why you kept the default throw exception for the first Method ?

    If something calls it would it not be better to just call the base ?

    Reply

  2. Hi BinaryJam,

    the "base" class is an abstract class and the GetInitializer method is not implemented there.

    So you have to implement the method yourself.

    Cheers,

    Stefan

    Reply

  3. Does this work for .NET web services that were created using VS’s wsdl.exe app? I’ve compiled a third party’s web service into a webservice.cs class in this way, and need some way to clean up the returned xml.

    Sorry if this question belongs elsewhere – I’m very new to implementing web services, so maybe there’s an even simpler answer – thanks for your help!

    Reply

  4. Hi Nat,

    from my understanding this should work for any type of ASP.NET web service.

    Cheers,

    Stefan

    Reply

  5. instead of GetInitialize to build RegEx.  Isn't private shared variable make sense ?!

    private shared m_RegEx = new RegEx("&#(0|1|x1|2|x2|3|x3|4|x4|5|x5|6|x6|7|x7|8|x8|11|xb|12|xc|14|xe|15|xf|16|x10|17|x11|18|x12|19|x13|20|x14|21|x15|22|x16|23|x17|24|x18|25|x19|26|x1a|27|x1b|28|x1c|29|x1d|30|x1e|31|x1f);", RegExOptions.compiled | RegExOptions.Ignorecase)

    Reply

  6. Fantastic! Used this to fix my issue: Sharepoint's web service's method GetListItems fails with error "'_', hexadecimal value 0x0B, is an invalid character. I did modify the following method, as my invalid characters were coming back from Sharepoint.

    public void ProcessInput(SoapMessage message)
    {
    // copy content from http stream to memory stream to make it available to the web service

    TextReader reader = new StreamReader(oldStream);
    TextWriter writer = new StreamWriter(newStream);
    string outputStr = reader.ReadToEnd();
    outputStr = replaceRegEx.Replace(outputStr, " ");
    writer.WriteLine(outputStr);
    writer.Flush();

    // set position back to the beginning to ensure that the web service reads the content we just copied
    newStream.Position = 0;
    }

    Reply

  7. So when SharePoint sends a vertical tab, how can we stop it on the client side? SharePoint web services are not something that can easily be modified to fix what is clearly a bug.

    Reply

  8. Hi Dale,
    you would need to read the stream in raw format as otherwise the code generated by .NET will run into the exception.
    Cheers,
    Stefan

    Reply

  9. Hi Stefan GoBner
    I am getting Exception :Response is not well-formed XML. Inner ExceptionMsg :'', hexadecimal value 0x1F, is an invalid character. Line 1, position 1.

    how to reslove this issue based on your SoapExtension in Console Application

    Reply

  10. Hi Tejas,
    this extension fixes the issue on the server, before the content is sent over the wire.
    This solution cannot be used to fix the issue on the receiver side.
    Cheers,
    Stefan

    Reply

  11. Hi Stefan GoBner,
    Thanks for your reply,
    Is there any way to use this type of SOAP extensions in receiver side to resolve issue.

    Reply

  12. Hi Tejas,
    you cannot use this method on the receiving side, as you cannot modify the stream content before the exception occurs.
    Cheers,
    Stefan

    Reply

  13. It is totally possible to implement a SOAP extension that works client side (receiving side) to clean up the response from the server. I implemented it myself, and it works great.
    It seems like Stefan was saying this is not possible, but it is absolutely possible.

    Reply

  14. Hi Stefan,
    I tried your solution, and made all the stages i think. But i get now new error:
    “client found response content type of ” but expected ‘text xml’ The request failed with an empty response.”
    When i delete from the web.service the lines:
    so the error doesn’t appeaer (but then the original problem you tried to solve comes back..)
    Any idea what i can do ?…
    Thanks a lot for your help !!
    Nathan

    Reply

    1. I meant that the lines you added to the ws make the error

      Reply

    2. Hi Nathan,
      your comment seems to be incomplete.
      Which lines did you remove?
      The error message indicates that an empty response was received – means that everything was removed from the response.
      My code does not do this so I think there is something different in your code.
      Cheers,
      Stefan

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.