Office 365 Connectivity Guidance: Part 4

Article
06/18/2019

4. Assess bypassing proxies, traffic inspection devices and duplicate security technologies

The final principle is to get your core Office 365 traffic to Microsoft as efficiently as possible; this means minimising the work done on the traffic or ensuring any work done doesn't cause an impact to that traffic. As already noted, when moving to the cloud it's necessary to rethink about how you handle traffic leaving your managed network for trusted endpoints such as Office 365.

The bulk of an enterprise's cloud traffic will be user-initiated connectivity to the cloud and that is what this page is referring to, where the traffic is from the enterprise to Microsoft endpoints which will account for the majority of the traffic flow. For inbound flows, (Microsoft initiated inbound to the enterprise) it is understood that a higher degree of inspection/control may be desired, the volume of traffic in this direction is also comparatively low.

When there is a move to SaaS services like Office 365, numerous connections which were once kept within the corporate network in their entirety, now traverse the egress equipment to reach the endpoints in the cloud. As noted, this is often a large volume of traffic which puts terrific load on the standard egress equipment/method to egress to the internet, and thus is performing break and inspect on all traffic.

Understandably the default security model will often be to break and inspect all outbound traffic to ensure that the endpoint is something we want the users to be able to access, and also that there is nothing malicious being bought into the corporate network, or anything exfiltrated.

Whilst necessary, this inspection is computationally expensive and can impact performance, however, when visiting unknown websites this impact is entirely acceptable to the business. With SaaS services like Office 365 this is certainly not the case and can severely impact user experience whilst requiring massive spending to scale up and still not solve all of the issues which arise.

Here is the question we need to ask ourselves though. Does the same policy which applies to unmanaged and unknown endpoints on the internet, need to apply to known, managed and trusted endpoints used for business-critical services? The answer is hopefully no. Services like Office 365 should be treated somewhere between how you treat internet traffic, and how you treat traffic to your on premises datacentres (which usually has no security applied).

Where exactly in-between depends on how much you trust the endpoints/service in question but it's certainly worth assessing whether what you're doing at the edge is already done, or can be done in the backend and thus is unnecessary inline where it can cause performance issues.

Data Loss Prevention can be done in Office 365 itself and AV/anti malware scanning is also done by Office 365 against certain elements so should not be required inline for them. We're not always going to unknown and untrusted/unmanaged endpoints with Office 365 like we are with the internet traffic, we're going to known, business critical, managed and security controlled endpoints. Microsoft Trust Center holds all the relevant information on how we secure your data in the cloud.

Therefore, Microsoft strongly recommend that SSL interception/DLP etc is not performed on the core Office 365 traffic. Previously it was very difficult to identify which traffic to apply this advice to as Office 365 consists of various elements, some of which are Microsoft owned and IP provisioned, others (CDN, CRL endpoints etc) are not. As described in part one of this series, Microsoft now mark endpoints in one of three categories "Optimize", "Allow" & "Default" which allows us to selectively treat certain traffic elements in different ways.

The advice is that at the very minimum, the 'Optimize' marked endpoints should be sent directly to their destination with no proxy or SSL break and inspect, egressing as close as possible to the user's location, essentially following the principles outlined in this set of posts to the letter.

Any other endpoints (eg the Allow group) if proxying/interception is done, it should be carefully managed so as not to cause a bottleneck. "Default" marked endpoints can be treated like normal web traffic without issue.

This brings us to the next point, what is Microsoft's recommended egress model? How do we recommend you get Office 365 traffic out of the corporate network and to Microsoft? There are essentially three main methods:

Direct Routing
Proxied Access
ExpressRoute (which is essentially direct routing via another path)

There are distinct pros and cons to each method and they may or may not be applicable depending on the type of traffic.

Direct Routing

Direct routing would be similar to that which you have at home, a single TCP session is used to connect to the endpoint with (in most cases) the source IP simply being translated from an internal (e.g. 10.x.x.x) to a publicly routable one on the way out of the managed network. The egress device may also ensure that the destination IP and or port is also allowed. This means the endpoint connected to receives the request from the translated (public) source IP, but the client connects to the public IP address of that endpoint.

This method is generally the recommended way to connect Office 365 services if possible.

Pros:

• Allows direct UDP traffic meaning Skype can work at its best.

• Generally, no interference with payload at egress meaning optimal connectivity for all services

• Allows for local egress use in most cases meaning minimal latency to the service front door and Microsoft's global network

• Minimal work done on traffic means scaling (whilst still necessary for the volume of connections and network address translation) is less demanding

• Best connection method to Office 365 core endpoints for most customers when ISP routing is optimal

Cons:

• It is required to authorize Office 365 URLs/IPs and open required ports on all firewalls used (if controlled egress is desired). These need to be monitored and firewalls updated with changes as missing updates to IP

ranges can cause connectivity issues. However, the Office 365 web service should allow for easy automation of these updates on supporting network equipment.

• Having controlled egress by using the IPs for Office 365 also means endpoints where we cannot provide the IPs (such as CDNs, DNS, CRL lookups) have to be routed via another path with URL based or unrestricted access.

• Routing to the appropriate egress needs to be managed internally and needs to include external IP address routing (which can be an issue for some customers)

• External DNS resolution is required (which can be an issue for some customers)

• Devices still need to scale to the increased connection count needed for Office 365 services

Due to the efficient, low impact manner of egress, allowing connections to flow direct, using the protocol of choice, this method is the recommended method to connect your Office 365 services for the Optimize marked endpoints and ideally the Allow marked endpoints too.

Proxied Access

This is a common method used to connect to the internet as it simplifies the connectivity process and allows a centralized device to control access and intercept traffic.

Pros:

Easy to Configure to get Office 365 connected
Often the existing internet access method
Small Number of IP addresses for clients to direct traffic to
Uses known ports for easy firewall traversal
No need to route external IP address on internal network
Easy monitoring/auditing
Provides a Security Barrier between clients and the internet

Some of these pros mean a proxy is the only way a customer can access the internet without major network redesign work. All Office 365 services will work though a proxy, even Skype for Business, however there are a large number of drawbacks to this method.

Cons:

Proxies generally do not handle UDP traffic & Skype traffic is therefore forced over TCP to traverse the proxy.
Skype's coping mechanisms for poor networks are drastically reduced when TCP is used
The proxy functions can delay frames on their way through adding jitter and latency
Older proxies often struggle to deal with the long lived, high throughput connections SAAS services entail.
Proxies commonly alter TCP level settings which can cause performance issues
SSL issues can also occur as the proxy is a 'man in the middle'
Often don't scale and were not installed/designed with SaaS services in mind.
Capacity upgrades to cope with the additional workload SaaS services will add in terms of Ports/Memory/Processing are likely to be very expensive.
End result is often poor-quality calls and performance

As you can see, there are some considerable downsides to standard proxies when it comes to SaaS services. Skype traffic is highly likely to run into issues via standard on premises proxies due to the use of the non-optimal (for real-time voice/video traffic) TCP protocol (as opposed to UDP) and the processing at the proxy layer is likely to introduce issues such as jitter. Imagine the load the proxy would be under handling thousands of real-time TCP media sessions during an all hands call for example, scaling up to handle this is likely to be very expensive and still run the risk of causing performance issues due to the protocol used. The other thing to bear in mind is that proxies were likely designed for access to transient endpoints, in that a TCP connection will be made to a website, the data will be obtained and then the session closed and the resources (memory, processing, ports) will be returned to the pool. SaaS services tend to work very differently however. Outlook as an example will open multiple TCP connections per users, and sit there all day with them in use, as such the resources aren't returned to the pool as they would be with transient access and again, the devices need upgrading to deal with this extra load. We recommend around 2000-4000 clients per public IP address for network address translation.

Due to the high risk of these devices causing performance issues due to their design and role being for a different purpose, Microsoft recommends you don't use these types of proxy solutions for the Optimize marked endpoints and ideally the 'Allow' marked ones unless absolutely necessary. If there is no other option, or it is a very strong requirement to use proxies, the following advice should be followed.

• Ensure the devices are scaled up to cope with SaaS services, in terms of memory, processing and NAT capability

• Avoid overly centralized proxies which can increase latency

• Ensure they are in the local region of the client

• Evaluate Cloud Proxy nodes if the above isn't possible as these often can allow for localized, scalable proxy use

• Avoid packet inspection (i.e. SSL break & inspect)

• Ensure all settings are checked and optimized

• Avoid using Skype for Business through these devices unless they can bypass for UDP

Whilst not a recommendation for any vendor over another, Microsoft are working with various vendors such as Z-scaler and Bluecoat to help better align cloud proxy products to best practices for Office 365. Zscaler for example have a button which automatically optimizes Office 365 traffic (e.g. disables SSL offload) for customers who use the service.

If a proxy is a requirement for your business, it's worth checking that your current implementation is going to work well with SaaS services like Office 365. If not, then it's worth talking to the vendors you choose to remediate this, around their alignment to the Office 365 (and cloud in general) connectivity principals discussed in this post, and ensure they are followed upon implementation.

A final point on proxies, they are absolutely a supported method for our customers to reach Office 365, however they are very likely to provide performance issues if not redesigned & uplifted from their old internet access design, to their new usage with cloud services.

ExpressRoute

ExpressRoute is private peering with the Microsoft global network described above. Essentially, it's simply a private network connection from the edge of the customer network to the edge of Microsoft's network (the same network you'd reach over the internet) avoiding the leg which the internet takes in connecting to Microsoft. This private network can carry some elements of Microsoft bound traffic via two types of peering:

1. Azure Private – Connecting to virtual networks in Azure (e.g. to private IP addresses on virtual machines)

2. Microsoft Peering – Connecting to Azure public IP ranges and a subset of Office 365 endpoints (the same public endpoints reachable over the internet path)

These two types of peering are the recommended connection methods to Azure endpoints, they are relatively easy to configure requiring little change in the corporate network infrastructure. The Office 365 elements of Microsoft peering is the area we're interested in with regards Office 365 connectivity, this type of peering requires authorization from Microsoft to enable, and this is for good reasons, however let's look at the pros of this type of peering first.

Pros:

• We can provide a 99.95% SLA for availability

• Because it's a dedicated circuit for Microsoft traffic and managed end-to-end, it can provide predictable performance and bandwidth

• It can provide better QoS capability than the internet path for Skype (All QoS markings are stripped at the edge over the internet path).

• It avoids the internet path for the bulk of Office 365 traffic, some organizations have a regulatory requirement for this.

• If configured such, it allows a customer to bypass network egress equipment doing SSL interception or other behaviour which may cause a bottleneck

• It allows Skype for Business to use UDP which is the preferred protocol for performance on real-time traffic

Cons:

• Performant and working internet connectivity is still required for endpoints which cannot use ExpressRoute (e.g. DNS, CDN, CRL checks) As such if this internet pipe is not available, Office 365 is largely inoperable.

• A well configured internet connection can, in many cases, give a similar, or in some cases, better performance levels (for example if the internet peer point was closer than the nearest ExpressRoute peer point).

• Often encourages the hub and spoke model for connectivity which runs contrary to the high-level guidance of local connectivity.

• Often a higher cost of implementation and usage than a standard internet connection. (This isn't always the case depending on the locality and equipment upgrades required)

• Enabling this type of peering (Microsoft as opposed to Azure) is very complex and without what we see as typically 2-6 months of planning and work from a large cross skilled team, will very likely result in an outage of your Office 365 implementation

• If the direct method is used (i.e. non-proxied to the edge) then external DNS and IP routing needs to be available as with the direct internet path above.

• It is often possible to resolve performance issues quicker, easier and at a lower cost by isolating the issue and resolving it via using an unhindered, direct internet peered connectivity model or other optimized method.

• Investment in your internet egress is likely to be beneficial to other cloud elements outside of Office 365

Because of this list of cons, especially the complexity and high risk of outages if not correctly implemented, Microsoft have a review policy of requests to use ExpressRoute for Office 365, this is so we can discuss these pros/cons with the customer and ensure that all parties are aware of the 2-6 months of planning, extra complexity and what ExpressRoute can/cannot deliver. The end goal is that if the customer chooses ExpressRoute they have done so fully armed to make an informed decision that it's the right thing for their business. Also, that they are aware of the guidance to ensure the implementation is a success and is going to deliver the desired benefits before spending the time, money and effort implementing.

We have a wealth of technical guidance which covers the implementation, routing, some training videos and some Ignite 2016 content which should give you a good overview of what is required to implement this type of peering. As you can see, this option is not for everyone, in fact a direct internet path connection is the best for the majority of use cases. However, if you think ExpressRoute might be the right thing for your organization's Office 365 traffic, after reviewing the links here, contact your Microsoft account team for assistance in requesting a review for approval and we can then work to help you make the right decision for your business.

So, in summary, there are an array of ways to get your traffic to Microsoft's network, direct networking generally works best for the core endpoints to ensure unhindered and quick access, but where you need to use a proxy, ensure it isn't causing a bottleneck to your traffic and follows the principles outlined in this post, and finally ExpressRoute for Office 365 is not simple to implement and isn't for everyone.

Office 365 Connectivity Guidance: Part 4

Additional resources