The Curious case of QoS and the intermediate devices

Hi! My name is Gopal and I work with the Microsoft Networking Escalation team. I write this blog to discuss an interesting scenario that we faced recently with our troubleshooting experiences here with QoS and DSCP, which I hope will be able to help others out there as well who maybe in a spot with the implementation.

The scenario revolves interestingly around an issue where sleepless hours in plenty were spent by our customer in tireless efforts to get QoS working before eventually deciding to give us a ring.

Before we get into the scenario, below is a quick overview on QoS and the references to how QoS works in Server 2003, Server 2008 and later.

What is QoS?

When packets are delivered over the network, the usual method to go about this is on a “Best Effort” basis. This means no matter how many applications are present on a box or on a network — that need to communicate over the network, the data will be delivered on a best effort basis for every application that is trying to communicate over the same network.

So, if there is an application which for example can handle certain amount of delay/data loss, it will be treated just the same as another application which might not be able to cope with the delay/data loss. For example in an application like a video conferencing application, the data is real time and can be rendered useless if the data is not delivered on time. So if there other applications in contention for the network which can take the back seat, it would make more sense to prioritize delivery of the real-time data tied to the video application ahead of the others.

QoS helps us achieve just that. QoS helps applications/Network administrators to prioritize data for delivery. The goal of QoS is to provide preferential delivery service for the applications that need it by ensuring sufficient bandwidth, controlling latency and jitter, and reducing data loss. The network characteristics that QoS helps to maintain are “Bandwidth, Latency, Jitter and reliability” in short.

For QoS to be implemented end-to-end it is important to ensure that the network infrastructure in between is enabled for QoS as well. For example, for QoS to function, the network cards, the switches, the routing infrastructure in between the source and the destination all have to be QoS aware. Though most of the devices these days are QoS aware, support for QoS is normally turned off by default. If QoS is not enabled on the devices in between, the packets are treated as any other normal packets and “best effort delivery” comes into the picture again.

Detailed information on how QoS works in 2003 can be found here at the QoS technical reference:
http://technet.microsoft.com/en-us/library/cc739780(v=ws.10).aspx

QUICK RUNDOWN ON LAYER 2 And LAYER 3 MARKINGS:

802.1p is a protocol that specifies prioritization at layer 2 (MAC layer) of the OSI model.In order for layer 2 markings to be enabled, 802.1p has to be enabled on the NIC card driver, the layer 2 devices must support the same and the appropriate group policies have to be in place.

Support for 802.1p and the prioritization value is configured via GPO. Configuration for layer 2 markings are located under computer configuration->Administrative templates->Network->QoS packet Scheduler->Layer 2 priority value. Here the various priority values can be enabled such as settings for:

Best Effort, Controlled Load, Guaranteed Service, network Control, qualitative service.

DiffServ on the other hand, is the protocol that is used for prioritization at layer 3 of the OSI model.This enables devices such as routers and other devices such as layer 3 switches in between the source and the destination to look into the packet markings and prioritize delivery accordingly.

The TOS (type of service field in the IP header) is what is used to specify the DiffServ value. A Differentiated Services Code Point (DSCP) value is used to mark the packets per the priority specified. This is an 8 bit field of which the last 2 bits are left unused.

As in case of 802.1p for the layer 2 priority, the DSCP values are also controlled via GPO located in Administrative templates–>Network->QoS packet Scheduler–>DSCP value of conforming packets. The types of priority that can be controlled are similar in “Best effort, controlled load, guaranteed, network control and qualitative service”.

WHERE ARE THE POLICIES LOCATED?

image

 QoS and the 2008 WAY OF DOING IT:

With the advent of the new TCPIP stack in 2008 – moving forward, the implementation of QoS brought in more fine grained control and a better user experience as a result of better control over bandwidth usage/latency based upon prioritization.

All the detail around how this works in 2008 and a beautiful flow through with the architecture can be found in our own Cable guy's article here 🙂

http://technet.microsoft.com/library/bb878009

Like in 2003, the policy based QoS on 2008 can be controlled via GPO located under Computer Configuration->Windows settings.

clip_image003

References that give you all the detail that you need around QoS can be found below:

http://technet.microsoft.com/en-us/library/cc757120(v=ws.10).aspx

http://technet.microsoft.com/en-us/library/cc728211(v=ws.10).aspx

http://technet.microsoft.com/en-us/library/dd759093.aspx

http://technet.microsoft.com/en-us/library/cc771283.aspx

http://technet.microsoft.com/en-us/network/bb530836.aspx

http://technet.microsoft.com/library/bb878009

Now without much ado, let me take you straight into the scenario that we battled against.

SCENARIO:

We had our customer Scott (fictional) who was battling against a deadline in setting up QoS markings for Lync traffic in their environment. 2012 Lync servers had been setup, all up and running and doing their job wonderfully well. There was only one small problem though.

Scott had setup the 2012 Lync servers for QoS markings however he was not seeing any packets being marked at all from network traces that he had been doing. The goal of setting this up was to improve the quality of voice calls by prioritizing voice traffic.

The server side policy setup looked quite simple as you can see below:

clip_image005

The policy as you can see looked pretty generic and straight forward in that it was set to tag traffic between a specific port range for audio, with code 46 for both tcp and udp irrespective of the application.

All in place, we still were not seeing any packets being tagged from the server on the client side!!!

We went ahead and checked to see if the policy had made its way through to the client side. Looking in the registry, we clearly saw that the policy had come down under:

HKLM\software\policies\microsoft\windows\qos\policy_name

clip_image007

We had live tracing going on at the time and the trace was running on the client side. From the trace we clearly saw that the packets from the DHCP server were not showing up with the DSCP tag on the client side. The data, was of the call I was having with Scott himself (pretty cool haan?) taken from a mirrored port, so there was no interpreting this wrong — that the packets from the server did not seem to have the tags at all!!

We had to get back to doing some fundamental research and found the following articles on how to configure Lync for QoS:

Creating a QoS policy on a lync server: http://technet.microsoft.com/en-us/library/gg405413(v=ocs.14).aspx

Creating a client QoS policy: http://technet.microsoft.com/en-us/library/gg405414(v=ocs.14).aspx

After going through them and making sure the settings are in place, we decided to do some testing as Scott wanted this to be fixed on priority due to business requirements. In a similar situation you can always use the following PowerShell command to find the current QoS policies:

Get -NetQoSPolicy

clip_image009

Whenever you run into a similar situation, a simple and safe test to do would be to create a test QoS policy for an application like telnet.exe for example. We did the same in this scenario, we opened up the local group policy editor and added a new local policy for QoS for the telnet client application and gave it a sample QoS value.

Though we ran this test with an intention to see if the telnet packets from the server are being QoS tagged, the results that we got from the tracing that we did after, was something that was extremely interesting. We took a trace from the server side while trying to perform telnet tests from the server and we found that all the telnet packets were being tagged as expected.

However, since the Voice traffic from the client was also reaching the server at the same time that the tracing was being done, we tried to filter the server side trace for traffic from the phone's IP address as well to see what the VOIP traffic looked like. What we found now was that all traffic to the phone from the server was being tagged with DSCP code 46 as well.

The DSCP tag field can be found inside the IPv4 header as below: (though the packet below does not show the tag—just wanted to bring in the packet structure for reference)

+ NetEvent:

+ MicrosoftWindowsNDISPacketCapture: Packet Fragment (98 (0x62) bytes)

+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-15-5D-64-9C-0A],SourceAddress:[B4-14-89-E3-F1-C1]

– Ipv4: Src = 10.7.254.99, Dest = 10.107.100.72, Next Protocol = UDP, Packet ID = 54723, Total IP Length = 84

+ Versions: IPv4, Internet Protocol; Header Length = 20

DifferentiatedServicesField: DSCP: 0, ECN: 0

DSCP: (000000..) Differentiated services codepoint 0

ECT: (……0.) ECN-Capable Transport not set

CE: (…….0) ECN-CE not set

TotalLength: 84 (0x54)

Identification: 54723 (0xD5C3)

+ FragmentFlags: 0 (0x0)

TimeToLive: 122 (0x7A)

NextProtocol: UDP, 17(0x11)

Checksum: 62391 (0xF3B7)

SourceAddress: 10.7.254.99

DestinationAddress: 10.107.100.72

+ Udp: SrcPort = 4102, DstPort = 55822, Length = 64

+ Rtp: PayloadType = Audio, Codec: RT Audio, ClockRate: 8000, P-Times: 20,40,60, Channels: 1, SSRC = 3434382355, Seq = 3280, TimeStamp = 842549477

When tagged, the DifferentiatedServicesField would be tagged with the appropriate tag value.

So here we had a scenario where we had all the QoS policies in place for both the client and the server sides, however from data on the client side we saw that the packets being sent from the server were not showing up to be tagged,. At the same time, from the trace on the server side we found that the packets sent by the server were indeed being tagged, but the packets reaching the server from the client did not seem to have the tags (note that we clearly saw the tags being placed by the client from the client side trace).

As far as the client and the server were concerned, they were doing their job of tagging the packets with the appropriate QoS tag values, however when the packets reached either side they were being stripped of their DSCP code value. This was a clear indication of the fact that some intermediate device on the network was taking the DSCP tag values out and thus making life painful in this scenario.

Always, in a situation where QoS needs to be implemented, it is absolutely crucial to ensure that the network in between is capable of supporting the implementation, all devices in between need to be capable of supporting and enabled for QoS tagging and prioritization. Though we may have all the necessary settings in place, one device in between can wreak havoc on the entire implementation. A simultaneous trace can always help reveal a holistic picture in similar circumstances.

We managed to fix Scott's problem and narrowed it down to a network device with help from the data and careful investigation into the setup. I hope the above article may help someone else out there as well.

Many thanks for reading through this patiently and visiting our blog today and always remember when you need some help just shout! 🙂

 

With support from Joel Christiansen.

– Gopalakrishnan Krishnan

Platforms Networking team