Encapsulate This!



Heads up, gateway developers and consumers of raw mime data from drop directories! If you do not develop gateways for Exchange 2007 and write mime devouring applications, I highly suggest you use this opportunity to test Internet Explorer 8’s new “Back” functionality. In IE 8, clicking that “Back” button moves you back to the page you were at before. Oh, ok, so it’s not really new functionality, but it’s important and if you don’t develop gateways you are unlikely to find this topic of interest. For the three of you still with me, read on, because in the words of the immortal Dylan, The times, they are a changin’

We’re going to talk about Address Encapsulation today and a change in Service Pack 1, Rollup 7, what we are doing with it, why, and how to cope. Before we can go where we’re headed we have to know where we’ve been, so I invite you to come along with me as I set the time machine for 1996 and take a look at

The Email System That Time Forgot


The truth is that time didn’t really forget this email system.  That’s just wishful thinking, and wishing something doesn’t make it true.  For instance, I could wish that my house wasn’t infested with Koala bears, but that won’t make it so.  The email system in question is called Exchange, but it isn’t 2007, or 2k3, or 2000, or even 5.5.  No, this is Exchange 4.0, specifically version 4.0a, meaning we just applied service pack two and now our Directory service is capable of updating the schema without rebooting the box, which is really going to pan out well in the future.
This particular 4.0 box is different than the others, because:

a. It’s taking a break from crashing to transfer mail.
b. It is connected to the Internet.

This 4.0 server is home to an Internet Mail Service, a wonderful little add on that translates from Exchange 4.0’s internal format into something that is fit for sending out to other SMTP speaking systems (Like, for instance, MS-Mail 3.2).  And inside the IMC something is happening that is downright different.  Like many 4.0 systems, this one exists primarily to transfer mail between diverse sections of the company  and this new “Internet” thing, and so everywhere in the company, from the MHS clients in  the basement to the compuserve dedicated terminal in the CEO’s office, they can now send mail beyond the limits of the company, into the exchange server and out into the world.  If this sounds like a phenomenally bad idea you are probably on to something.

So that night the CEO decides to settle once and for all the issue of who is the better Captain of the Enterprise.  After four long hours of labor his eight page treatise is ready, and he carefully types in the name of a “Mailing list”, trektalk@trkfn.com.

“What are you doing?” asks the admin, who was summoned after the CEO repeatedly tried to find trkfn.com in his C:\DOS directory.

“Mailing the internets.”

“All of them?”

“I hope so.”

And with that the message goes out.  Out of the MHS system and into a transfer directory (conveniently named “OUT”).  Out of the Transfer directory and into the Exchange Gateway.  From the gateway to the MTA, from the MTA to the IMC, from the IMC to the – WAIT!

We can’t mail to the internet.  First off, it’s in MAPI format and no one speaks MAPI.  Secondly, the CEO doesn’t have an SMTP address.  He’s not even sure what his home address is without consulting his secretary.   The IMC sees the mail coming from MHS:ceobob{PicardForever/HUB}

Not to worry.  Through the magic of “Address Encapsulation” the IMC will provide a Miranda proxy for the  user (“If you cannot afford an internet address, one will be provided for you free of charge”).  To do this, it will start with the special keyword “IMCEA”.  For the rest of time this string of letters will be interpreted as meaning “This is an encapsulated address.”  The IMC then appends the address type and a dash, yielding:

IMCEAMHS-


Now comes the part where the encoding algorithm is invoked.  If you can locate your Tardis, you can zip forward to 2009 and read all about this algorithm on technet – http://technet.microsoft.com/en-us/library/bb430743.aspx.  (If you can find a Tardis, could you please do two things?  First, go back in time and find out where I left mine.  Second, there’s a Dalek invasion that is going to happen last Thanksgiving unless someone prevents it.) 


The algorithm works like so (for the click challenged).  Alpha numerics:  Ok.  Slashes get converted to _.  Everything else gets “Plus” encoded.  That means there’s a + and the two digit hex value of the character.  Finally the Exchange server’s primary domain is appended (so that hopefully replies get back to it.  So Bob’s encapsulated address is


IMCEAMHS-ceobob+7BPicardForever_HUB+7D@contoso.com.


When his message arrives it immediately generates a firestorm of replies (“A bald man in a jumpsuit?  Are you serious?”) addressed to Bob’s encapsulated address.  As these pour into the Exchange Server, it will perform reverse osmosis on the address, “De-encapsulating” it. First it looks for the super secret key, “IMCEA”. Check!


Next it looks for a dash that could function as an address type separator. There it is, right after MHS.


So the address type is MHS. Now goes through and “Plus decodes” everything else. The resulting email is reconstituted as “MHS:ceobob{picardforever/HUB}” and dutifully passed on.  Where once a person had to wait for WIVV boards to replicate in order to make an idiot out of themselves electronically, through Address Encapsulation it is now faster and easier.  That is what we call “progress.”?


Back to the Future


Here in 2009, address encapsulation isn’t a mystery.  It’s documented on technet.  Exchange servers are now tied to the Active Directory, they speak primarily SMTP, and the thought of there being a controversy about who is the better captain is a long distant oddity that people laugh about.


Exchange Server is widely used across the world in all kinds of businesses, all kinds of languages and many, many different address formats, all of this still using the same basic algorithm that first crept up in Exchange 4.0.  This is where things that worked fine in the world of a decade ago don’t work so well now.  You see, Exchange 4.0 was not the world’s most localized product.  It shipped in four languages, English, German, French and Japanese, and it worked in English.  We also supported a number of different character sets, as long as they worked fine when converted directly to raw ASCII.  Exchange 2007 lives in a different world.  Unicode should work.  UTF is the standard format, and generally speaking any place where the code handles a string it had better be Unicode.  


There are problems with Unicode Addresses and Address Encapsulation.  Particularly, that address encapsulation supports single byte characters really well, and anything that’s not a single byte character not so well (where by “not so well” I mean “not at all”).


Our CEO from the previous example, he’s grown up as well.  He no longer argues over things like star trek captains.  He understands that James Tiberius Kirk would capture an alien enemy and beat the secret of their doomsday device from them with his bare hands.  If Piccard were to capture an alien enemy it would be to negotiate with, surrender to, or preferably negotiate an unconditional surrender to.   No, our CEO now has more mature tastes in entertainment.  He mails his video on demand service and they mail him the stream address.


So now when he takes a lunch break, he sends a mail to [VOD: Super Psychic Battle Angel ??? Part 2: ?????????????].  And he is in for a nasty surprise.  See, Sakura’s name consists of decidedly non English characters.  The Exchange Server will start to encapsulate the address to transfer it to the Video On Demand gateway and things will not go well.


The CEO gets an NDR.  It’s pretty, in HTML, with helpful diagnostic information and a generous description of what is wrong:


Delivery has failed to these recipients or distribution lists:


Super Psychic Battle Angel ??? Part 2: ?????????????


The format of the recipient’s e-mail address isn’t valid. A valid address looks like this: username@contoso.com. Microsoft Exchange will not try to redeliver this message for you. Please check the e-mail address and try sending the message again, or provide the following diagnostic text to your system administrator.


While it’s a very nice NDR it contains neither super psychics nor battle angels and you can bet money it’s not what he had in mind when he pushed “Send”.  Furthermore, there’s absolutely nothing the CEO can do about it.  It’s a limitation of the encoding.


Extended Encapsulation


It is a limitation of IMCEA encoding that multibyte characters cannot be encoded, so if we want to be able to address them in encoded format, we need an extension, and an extension we have made.  Under the new rules, an existing address without multibyte characters will be encoded exactly as before.  That’s right: IMCEA<Address Space>-<encoded string in + format>. 


What about our Battle Angel Video?  Well, it definitely has extended characters.  And this is ok.  Now the conversion system will apply a new encoding:  The address will first be converted to UTF8.  The UTF8 address will then be plus encoded as needed.


So, [VOD: Super Psychic Battle Angel ??? Part 2: ?????????????] becomes


IMCEAVOD-UTF8-Super+20Psychic+20Battle+20Angel+20+E3+81+95+E3+81+8F+E3+82+89+20Part+202+3A+20+E3+83+87+E3+82+A3+E3+83+AC+E3+82+AF+E3+82+BF+E3+83+BC+E3+82+BA+E3+82+A8+E3+83+87+E3+82+A3+E3+82+B7+E3+83+A7+E3+83+B3@contoso.com


The key addition here is the “UTF8-” key after IMCEA<TYPE>-.   One side effect of this is that if you had one off addresses that begin with UTF8-, Exchange Server will treat them as “Special” and attempt to decode them.  This means that DLs or recipients whose actual, non encoded non SMTP address is “UTF8-<something>” will NOT work correctly if the address is ever encapsulated.


Exchange Server 2003 does not understand the extended encoding format.  It does not create them, nor does it decode them.  It could not deliver to them before and it cannot do so now.  Exchange Server 2007, on the other hand, will be able to address and deliver to these addresses.  Such is the price of change.  If you are a gateway developer reading from the pickup directory, you too may encounter these addresses.  Fear not.  By turning the encoding steps backwards you can in fact decode the address.  In otherwords :


1.    Look for “IMCEA” as the start.  If it’s not there, bail, this isn’t an encoded address of either flavor.


2.    Everything between IMCEA and “-” is the address type. 


3.    Plus Decode as normal.


4.    Look for “UTF8-“.  If it’s not there, this is a standard IMCEA encoded address.  Process (or fail to process) as normal.


5.    If it is a UTF-8 encoded Encapsulated address, un-UTF8 encode it by stripping the “UTF-8” prefix and then decoding the rest of the string.


There we have it.  Technet will be updated to reflect this extension in time.  Mail to your video on demand services, your fax services, even your star trek mailing lists.  Use English, bad English, and non English characters.  We don’t mind.  Just don’t claim that Piccard was a better captain than Kirk.  Some things software simply can’t accommodate.


Jason Nelson






Share this post :

















Comments (12)
  1. Mike Crowley says:

    Picard (with one "c") – way better, hands down!

    http://plaza.ufl.edu/joec/startrek_picardvkirk.htm

  2. Simon says:

    Amazing article

  3. tucker says:

    Hilarious.  You should be in charge of all MS documentation. (That might be a curse, so my apologies in advance.)

  4. Tarran says:

    Very good.

  5. Devin L. Ganger says:

    WIVV? I think you mean WWIVnet, which was a sad and lame network that only 14yo SysOps used. For extra geek cred, you should switch that out with FidoNet.

  6. Adam G says:

    So the long short is encapsulated UTF8 is now properly decoded?

    But, if I do not have the proper charactor set on my machine wont the non standard charactors still show as junk?

  7. Mark O says:

    Well written!  Good amount of history leading up to the topic at hand.

    Again, well written!

  8. Gloria L says:

    Enjoyed reading it! =)

    Yes, without the proper character set on the machine, the non-standard characters will look garbled.

  9. Adam G says:

    I think this post is a little out of character for a technically focused blog.

    I would recommend a shorter technically focused post with a link to the full cooler post on a personal blog.

    As always the technical information is superb.

  10. xC0000005 says:

    Adam, thank you for sharing your opinion on writing styles.  When I compose the blog post after the next one (ok, really the one after that) if I have questions about the tone and style I will leave a note to myself to study this comment and consider adapting the article accordingly.

    I’m not putting odds on that happening, just saying that if it did I promise I would read this comment and meditate on it for several seconds at least.  Unless something else came up.  Then I’m not sure I could promise several seconds, but I promise that whatever time I had before I was disturbed would be devoted at least in part to meditation on the nature of blog content and style in a serious presentation format like the internet.

  11. Adam G says:

    to xC0000005 – Ty, Very Picard-ish of you…

  12. Jason says:

    Update Roll-up 7 can cause a huge issue.  I’m working a case right now with Support – I am literally waiting for a call-back as I type this.  If, say for example, your CEO has a hidden mailbox he uses for email and sends emails outbound, it appears the reply-to will be in IMCEAEX_gibberish_something@yourdomain.com format.  People CANNOT reply to this address.

    This is a fundamental functionality change that should have been better advertised. I am very unhappy.  This needs to be communicated better.

Comments are closed.

Skip to main content