Office 365 Connectivity Guidance


It's been a while since my last blog post, mainly because my online effort has been going into our official guidance around Expressroute. http://aka.ms/tune/ and www.office.com

However, one area I felt it would be worthwhile expending some effort on writing up whilst we work on more verbose guidance is, how to connect to your Office 365 implementation from your corporate environment. It's an area I speak to customers about on a daily basis and the wealth of options can make it a highly complex area to get right. There isn't a one size fits all model either, what's right for one customer will be the wrong thing for another which makes generalizing advice hard and we need the details to make a correct recommendation.

There is also perhaps not quite enough information out there on Microsoft's global network infrastructure to allow customers to use this information to their advantage. This is something I aim to at least try to attempt to address here and further in the verbose guidance being worked on and the Azure networking team have also started adding more output on this subject.

The other complexity revolves around the fact the cloud is a moving goal post for connectivity, it's very nature means endpoints change regularly, applications may change their connectivity type as improvements are rolled out, new services are added, and so on. As such we need a network design which abstracts the organization from these changes, allowing the fluid nature of the cloud to be invisible to the end user whilst allowing the power of the services to be delivered optimally and enable the organization to consume service optimizations when they are rolled out.

What we need to do therefore is to drive for some standard connectivity principles, and by doing this we can achieve the 'north star' of optimal, flexible and abstract connectivity to our cloud services.

So, what are these standard approach elements from a Microsoft standpoint?

  1. Optimized connectivity to Microsoft's global network
  2. Localized network egress as close to the user as possible
  3. Unhindered access to the endpoints required
  4. Local DNS resolution

It's a simple list but only if you know the detail on how to implement each stage, so let's have a look in detail on how to achieve each:

  1. Optimized Connectivity to Microsoft's global network

Before the emergence of cloud services, network infrastructure was generally designed to connect an organization's user locations to their data. This is a simple planning methodology to implement. With cloud services such as Office 365, that methodology no longer makes sense, your data isn't in a 'place' it is in many locations in the region where your tenant is. For Office 365 specifically your data will be held simultaneously in up to four locations. Therefore, the connectivity model needs to reflect this. Rather than connect to where your data is, we need to connect to Microsoft's global network quickly, and allow Microsoft to route that connection to wherever the active endpoint is for a given request. It's also important to note that the nearest front end for a service, (such as the Exchange or Skype front end) may be very close to your users, even though their data is in a tenant in a remote region. This advice applies to any location in the world where you have users, ignoring the actual location of the data, connecting locally and quickly to Microsoft's global network is the optimal approach.

Did you know Microsoft owns one of the largest WAN backbones in the world to enable us to do this?  It's one of Microsoft's greatest unsung assets and its part of Microsoft's multi-Billion dollar investments to enable a truly global, dynamic cloud infrastructure. This network consists of very high bandwidth, low latency, failover capable links with tens of thousands of route miles of privately owned dark fiber. Multi Terabit connections from the Datacenters to each other, and the internet edge. This network is what allows you to fully utilize the power of Office 365 without having to connect to multiple locations.

We peer with over 2500 ISPs globally and have 70 points of presence where we peer with ISPs to pass traffic from them onto our own network and back. This network is optimized to get your traffic to and from its destination as quickly as possible and covers an array of Microsoft traffic, from Azure to Office 365, to Xbox Live, to Bing. We can also deliver elements of some services from the edge of this network to improve performance or provide network optimizations to connections to improve performance. You can read more about this network here and here.

The list of peer points (correct as of April 2017) can be found below and we often have multiple locations in these cities. For a full list and to see which other ISPs have presence in these locations, peeringdb is a public site where you can look this up, Microsoft's ASN number (network ID) is 8075. We are constantly adding to this global network as demand requires.

Table of Microsoft's global network peering locations.

City

Country

City

Country

City

Country

Brisbane

Australia

Chennai

India

Taipei

Taiwan

Melbourne

Australia

Mumbai

India

London

UK

Perth

Australia

New Delhi

India

Slough

UK

Sydney

Australia

Dublin

Ireland

Manchester

UK

Vienna

Austria

Milan

Italy

Ashburn

USA

Brussels

Belgium

Turin

Italy

Atlanta

USA

Sofia

Bulgaria

Osaka

Japan

Boston

USA

Sao Paulo

Brazil

Tokyo

Japan

Chicago

USA

Rio de Janeiro

Brazil

Kuala Lumpur

Malaysia

Dallas

USA

Sofia

Bulgaria

Mexico City

Mexico

Denver

USA

Montreal

Canada

Amsterdam

Netherlands

Honolulu

USA

Toronto

Canada

Auckland

New Zealand

Las Vegas

USA

Vancouver

Canada

Wellington

New Zealand

Los Angeles

USA

Santiago de Chile

Chile

Warsaw

Poland

Miami

USA

Zagreb

Croatia

Lisbon

Portugal

New York

USA

Prague

Czech Republic

Bucharest

Romania

Palo Alto

USA

Copenhagen

Denmark

Moscow

Russia

Phoenix

USA

Helsinki

Finland

Singapore

Singapore

San Jose

USA

Marseille

France

Cape Town

South Africa

Seattle

USA

Paris

France

Johannesburg

South Africa

Ashburn

USA

Berlin

Germany

Seoul

South Korea

Frankfurt

Germany

Barcelona

Spain

Athens

Greece

Madrid

Spain

Hong Kong

Hong Kong

Stockholm

Sweden

Budapest

Hungary

Zurich

Switzerland

Given the scale of this network, in theory, your traffic should only be on the internet path for a very short period until your ISP hands it to Microsoft's network.

So, we know where Microsoft have peer points, how do we check how our ISP is doing in peering with this network? A simple tracert to any Microsoft endpoint is the answer, a good example would be outlook.office365.com. In this trace, we're not interested so much in the endpoint itself, more on seeing the transition between the ISP network and Microsoft's global network and this should be the same regardless of the endpoint. What we're looking for is a sensible handoff (e.g. egress is in the UK and peering occurs in London) and that that happens in a reasonable time (e.g. <30ms for the example below).

In this example, the peering is likely happening in hop 7 here on a BT router in 13ms. Hop 8 shows a Microsoft router in London (we know this by the code LTS which is Telehouse London) but by this point we're seeing the total latency of 26ms to the endpoint which is a Café server for Exchange online (which is in Dublin (db3). This rolled up latency is down to seeing the response from the tunnel end as we're on an MPLS switched network from hop 8. Up until Hop 8 we're seeing normal inter-router ICMP responses as this is a classic IP network. Most Microsoft routers have a similar code to reflect the metro area, they often are obvious (AMS for Amsterdam for example) and similar to metro/IATA codes.


tracert outlook.office365.com

Tracing route to outlook-emeawest.office365.com [40.101.73.162] over a maximum of 30 hops:

1 5 ms 5 ms 10 ms bthub [192.168.1.254]

3 * 13 ms 13 ms 31.55.187.177

4 15 ms 13 ms 14 ms 31.55.187.184

5 14 ms 16 ms 14 ms 195.99.127.26

6 14 ms 12 ms 11 ms peer1-et-3-1-0.redbus.ukcore.bt.net [62.172.103.195]

7 13 ms 13 ms 13 ms 195.99.126.38

8 26 ms 27 ms 26 ms be-71-0.ibr02.lts.ntwk.msn.net [104.44.9.152]

9 25 ms 33 ms 24 ms be-3-0.ibr02.dub30.ntwk.msn.net [104.44.5.4]

11 31 ms 23 ms 22 ms ae12-0.db3-96c-2a.ntwk.msn.net [204.152.141.89]

16 25 ms 24 ms 24 ms 40.101.73.162

The important thing here is that the peering is happening in a sensible place (London) for a client in the UK, and the peering is happening in a reasonable time (13ms), so BT are doing an excellent job of handing my traffic quickly to the Microsoft global network to take on from there. We have to bear in mind that your ISP may not peer with Microsoft in the nearest exchange but as long as this is done in a sensible area/timeframe then there is nothing to worry about. An example may be, and ISP may peer with Microsoft in Ashburn VA instead of NYC, so your traffic is taken from New York to Ashburn to be handed to Microsoft, this is based on the ISPs preference and as long as the timeframe this is done is minimal, then there is no cause for concern.

However, let's take a look at a scenario where this is non-optimal:

Below you can see a customer in the UK with the traffic on the ISP network passing through London in hops 5-6. Here we would expect the traffic to peer onto the Microsoft network. However, this ISP keeps the traffic, and we see a jump in latency to 83ms in hop 8 and we're now in NYC but still on the ISP network. The traffic is finally handed to Microsoft New York in hop 11 in 87ms. Microsoft then have to backhaul this across the Atlantic to its destination in Amsterdam which takes 149ms, nearly 6x the time it should take. The location of the endpoint is irrelevant again, the traffic should hit Microsoft in London to be taken to its destination.

1 <1 ms <1 ms <1 ms 10.201.100.1

2 <1 ms <1 ms <1 ms 10.201.0.1

5 14 ms 14 ms 15 ms ABC-e-0-0-0-0.londonuk5.badlypeeredISP.net [*.*.123.111]

6 16 ms 15 ms 15 ms AB2-e-0-0-2-0.londonuk1.badlypeeredISP.net [*.*.123.113

7 83 ms 83 ms 83 ms AB1-tengig-0-0-0.newyork.badlypeeredISP.net [*.*.123.119]

8 82 ms 82 ms 82 ms AB2-e-0-1.jfk2.badlypeeredISP.net [*.*.123.120]

9 82 ms 83 ms 82 ms ab1-e-10-1-1.jfk2.badlypeeredISP.net [*.*.123.121]

10 82 ms 82 ms 82 ms nyc-br-01.badlypeeredISP.net [*.*.123.122]

11 82 ms 82 ms 82 ms nyc-edge-01.badlypeeredISP.net [*.*.123.124]

12 85 ms 86 ms 87 ms  be-4-0.ibr02.nyc04.ntwk.msn.net [104.44.4.28]

14 141 ms 143 ms 145 ms xe-7-3-0-0.lts-96cbe-1a.ntwk.msn.net [207.46.43.45]

15 149 ms * * xe-9-1-1-0.ams-96c-1a.ntwk.msn.net [207.46.42.135]

So, what's the problem? This behavior doesn't benefit anyone. For the Microsoft customer, the traffic is taking an unnecessary trip across the Atlantic adding latency, for the ISP this is unnecessarily using their bandwidth to transport all Microsoft bound traffic to New York on their own network when Microsoft would prefer to take the traffic from them in London.

This was simply an issue of poor peering configuration, the nearest route to Microsoft's network the ISP's network knew was in NYC. The solution? The ISP can speak to Microsoft's peering team directly and work with us to set up peering in London, or as a Microsoft customer you can flag this with your account team and we can speak to the ISP on your behalf. It's worth pointing out once again, only obviously non-optimal peering such as this example would be worth flagging. Microsoft's public policy on peering can be found here.

One major advantage of utilizing this network locally to its fullest is that Microsoft is increasing usage of optimizations at the edge of this network depending on workload. An example is that SharePoint/OneDrive is currently switching from a Unicast connectivity model to an Anycast model utilizing network nodes at the edge of our network. This means that connections to your SharePoint tenant are directed to the nearest edge node where they can be optimized, and put onto a hot TCP session to the endpoint which means performance is improved over establishing a new connection. In some circumstances, this, and the optimizations delivered can see significant performance improvements to services utilizing this method.

So, in summary, ensuring good access to Microsoft's global network enables us to optimize your traffic for the majority of its journey, deliver services and enhancements locally and ensures your traffic isn't on the internet for anything other than a short leg from the edge of your network to the edge of Microsoft's and a simple tracert to any Microsoft endpoint which resides on this network will show the point of handoff. However, to do this successfully we need localized network egress as close to the user as possible:

Next up – Part 2 Localized Network Egress as close to the user as possible


Comments (2)

  1. Gary says:

    Is there any way to determine which 0365 URLsare routable via ExpressRoute and those which must be routed via Internet?
    Thanks

    1. Hi Gary, yes, these are listed here https://aka.ms/o365endpoints and each URL which is routable via ER is noted as such. I also write a PAC file which does this split for you here https://support.office.com/en-us/article/Managing-Office-365-endpoints-99cab9d4-ef59-4207-9f2b-3728eb46bf9a?ui=en-US&rs=en-US&ad=US#ID0EABAAA=2._Proxies

      Regards,

      Paul

Skip to main content