Following our recent announcement of the release of Update Rollup 1 for Exchange 2010 Service Pack 2 you will see we released a ton of fixes and I wanted to blog about one specifically, and maybe at the same time provide some background into how issues like these come about and how we go about fixing them.
The specific fix is one cunningly referred to as 2556113, with the title, It takes a long time for a user to download an OAB in an Exchange Server 2010 organization.
With a title like that you might be thinking that we simply figured out a way to make OAB downloads ‘faster’. You might start thinking that we did that by just deleting randomly some of the users in the OAB, those you don’t know, the people working in accounting on the fourth floor, for example. Or perhaps we had tried to reduce the details we included in the OAB, perhaps by just removing unnecessary information like family names, office location or phone numbers. Or maybe we simply increased the speed of the Internet. Because that’s really easy.
Well, we didn’t do those; (though we are looking into that whole Internet thing to see what we can do about it, as it sounds awesome) we instead added some logic to ensure that Outlook tries to download the OAB from a CAS closest to itself.
“Why?” you ask. Well, it’s a good question and I reply with “As the KB article says, ‘Consider the following scenario….”
- You have two Active Directory sites on a slow network in a Microsoft Exchange Server 2010 organization.
- You have an Exchange Server 2010 Client Access server and an Exchange Server 2010 Mailbox server in one Active Directory site.
- You have an Exchange Server 2010 Client Access server and add an Office Outlook user in the other Active Directory site.
- The user whose mailbox is located in the different Active Directory site tries to download the Exchange Offline Address Book (OAB).
In this scenario, it takes a long time to download the OAB.
Well yes. No kidding. It really can. If you have a large OAB, it can really, really take a long time. But let’s expand on the scenario a little, as frankly there’s a bit of information I think you need to know, and having an AD site with nothing but a CAS in it doesn’t seem like a very smart move to most people.
So consider this more detailed scenario instead;
- You have a centralized deployment. All mailboxes are in one central location.
- You have lots of small locations where people touch down and work.
- These locations are connected to the central site with poor networks. Satellite, ISDN, PSTN, tropospheric scatter (I had a customer with one of these once. Brilliant. Until there was a storm), wet piece of string, etc.
- Your OAB is big. It is large. It is not small. Take your pick of the definition you like best. Suffice to say, it’s of significant size that you care.
- Your Outlook client tries to download the OAB, and it comes from the central datacenter. So does the Outlook client being used by the person sitting next to you, and the funny looking guy over there in the corner too. All of you are downloading the same OAB. Over the same wet piece of string. It’s getting very slow.
With luck you can see that you are all competing for the same bandwidth, while also trying to work, and even though the BITS client technology used for OAB downloads is good, it’s not really going to help you much.
So you add a CAS to each remote location. In fact, as the diagram detailed in http://technet.microsoft.com/en-us/library/bb232155.aspx suggests. The idea being that the client computer will download the OAB it needs from the local CAS. Well, it might sound like a great idea – but that’s not how Exchange has ever worked. Prior to 2010 SP2 RU1 that is…
How did it work then? And why am I telling you that TechNet lied to you?
Well to answer the first question, the URL the client uses to download the OAB from is provided to the client by the AutoDiscover service. And the AutoDiscover code has always picked a URL for the OAB you should be downloading from the AD site that your mailbox is in, not the AD site your client computer is in.
To answer the second of those questions, you need to first understand that TechNet is never wrong (my friends in UE, like Scott Schnoll get real touchy if you imply their articles are incorrect). It’s just that sometimes it isn’t right from a certain point of view, either. TechNet details this as it was part of the original PM specification back when 2007 was being designed. I probably shouldn’t have told you that, but heck, it was. And it didn’t get done. These things happen in a software product with over 20 million lines of code you know when stuff changes all the time. TechNet doesn’t usually lie. Well, not much.
Back to how it works. Just think about it for a moment. You have a 1 GB OAB. And you add a replica of that OAB to a CAS in the remote and distant AD site, where the users are. However they never use it. (Ok, unless their mailboxes are also in the same AD site but that’s not the scenario is it?). That kind of sucks doesn’t it. Yes, it does I hear you say. It looks a bit like this diagram.
Outlook uses the CAS closest to the client computer for the client’s AutoDiscover requests (well, it should, and we’ll come back to that in a moment) but the OAB URL it hands back is for the CAS in the same AD site as the mailbox. So even though we are replicating the OAB to AD Site B, the client pulls the OAB from AD Site A.
So, a large customer with lots of small sites and a whopping OAB tells us this won’t work and downloads are killing whatever WAN bandwidth they have. So, what can we do about this? It turns out there are a few ways to solve this, and I have to add that this is one of the fun bits of my job, trying to figure this kind of thing out. It’s a nerd thing.
- They could reduce the size of their OAB, speed up their WAN, move the remote offices closer etc. None of these will fly for them as a solution. Though we did ask.
- We could create lots of OABs that have the same content. And specify on a per-user, or per-database level the OAB the user should download. And then we only have that OAB available in the remote location. Therefore AutoDiscover will provide the only URL it can for it, in the remote location. Now this sounds good, except the users move from site to site. And a download then would mean a double slow network hop. Ouch. Scratch that.
- Same thing with mailboxes – move the mailboxes to the remote locations… well, they move around plus that would really complicate administration and High Availability and consequently increase cost.
- We could do some kind of reverse IP address to AD site mapping thing. Now I believe this was the original way we had planned to solve this, and it’s actually kind of hard. It’s hard because you need to ensure all subnets a client could come from are in AD Sites and Services, and then try and reverse engineer the AD site the user is in, and then look at site link costs and …you get the idea I hope. It’s complex, and defeated by NAT, or if the admin doesn’t list every possible subnet in AD Sites and Services.
- We could ‘interfere’ with DNS or the AutoDiscover XML to try and make the client think it is talking to the centralized location but in fact be talking to a local IIS instance. Again, it’s hard, tricky to implement and support and just plain ugly if you’re asking.
- Something else. I picked this one, as the others seemed really hard.
So cast your mind back just a few short paragraphs to the sentence that stated “Outlook uses the CAS closest to the client computer for the client’s AutoDiscover requests”, the one that I said I would come back to. Well, it is worth returning to because of something called AutoDiscoverServiceSiteScope.
AutoDiscoverServiceSiteScope is a CAS setting that helps the Outlook client map AD sites to CAS for the purposes of finding the closest CAS to the client for AutoDiscover requests. He does this by seeking out Service Connection Points (SCP’s) which are in fact pointers to the AutoDiscover service.
Here’s how it works. When an Outlook client starts up he heads off to the triangle, sometimes and otherwise known as ‘AD’, and looks for all the SCP’s put there by Exchange setup. He finds a bunch (we hope), and on each is an attribute, the Keywords attribute, which is set/changed/sometimes messed up by the use of Set-ClientAccessServer –AutoDiscoverServiceSiteScope: ADSiteNameA, ADSiteNameB, etc. The Keywords attributes is used to specify which AD sites this CAS is responsible for, for AutoDiscover requests.
When the Outlook client finds more than one SCP he builds himself a list of usable SCP’s by comparing the value stored on the Keywords attribute with his own AD site (which is dynamically updated by the local Netlogon service, when he starts up or changes IP address).
He then builds one list. Either all those that match his AD site (where Keywords attribute = client AD Site) or, if there are none, he puts every SCP in the list. These are the servers he can use for his AutoDiscover requests.
He then starts at the top of the list (which is always in the same order by the way, by date of install) and tries to connect to the URI contained within the ServiceBindingInformation attribute – which is the location of the AutoDiscover service itself. He then posts XML, gets a response etc., and then lives happily ever after. More details for all this good AutoDiscover stuff can be found here.
Why is this interesting? Well this AutoDiscoverServiceSiteScope thing helps Outlook find the CAS closest to the client’s location, assuming the admin has set up the site scopes correctly (and we do tell admins how to do that). So we really don’t need to figure out which CAS is closest to the client once we get the request, as that has already happened by the time the request reaches CAS.
Once that request hits CAS we figure out the settings to return to the client – but then we always forget one thing – that the OAB the user needs, could be local to the CAS we are executing the request on, and instead, we always gave the user a URL from a CAS way, way, over there. And that’s what we needed to fix.
The solution for this is therefore theoretically very simple and it means we don’t have to invent a new way to figure out the closest CAS to the client, as we already have one which works quite well thank you very much.
If we were to make the assumption that the admin has set up AutoDiscoverServiceSiteScope correctly, the CAS the client connects to for AutoDiscover will be the CAS closest to the client. If this assumption holds true, the CAS, when figuring out what to return in the AutoDiscover XML needs to simply check to see if he himself has a copy of the OAB the user should be using – and if so, he simply provides his own OAB URL. Not that for a CAS in the AD site where the user’s mailbox is located. Of course if he doesn’t have a copy of the OAB the user needs, the old behavior should prevail, meaning the CAS will return the OAB URL of a CAS in the Mailbox AD site.
So basically the picture changes to look like this;
Now that’s much friendlier to the WAN isn’t it? One copy replicates over the WAN and all clients in that location will now get the OAB from the CAS local to them.
What do you have to do to get this new behavior to kick in? Just two things. Deploy SP2 RU1 on the CAS, and ensure that your AutoDiscoverServiceSiteScope parameters are set up correctly.
I hope you find this useful, and may your WAN forever be a long fat pipe.
Principal Program Manager
Exchange Customer Experience