Top 10 Networking Features in Windows Server 2019: #10 Accurate Network Time

Share On: Twitter      Share on: LinkedIn
 
This blog is part of a series for the Top 10 Networking Features in Windows Server 2019!
-- Click HERE to see the other blogs in this series.

Look for the Try it out sections then give us some feedback in the comments!
Don't forget to tune in next week for the next feature in our Top 10 list!

Windows Server 2019 provides regulatory compliance with highly accurate time that is traceable and UTC-compliant, including support of leap seconds.  In this article, we’ll talk about the technical advances we made between Windows Server 2016 and Windows Server 2019 including true UTC-compliant leap second support, a new time protocol called Precision Time Protocol, and end-to-end traceability.  But before we talk about the technical details, let’s talk about why this matters to you.

In the past, the requirement for time accuracy on Windows was limited to domain-based scenarios that required all devices to be synchronized within 5 minutes.  Now worldwide government regulations (for example, US: FINRA, EU: ESMA/MiFIDII) are demanding much higher accuracy time – as stringent as 100µs (microseconds).  Self-proclaimed accuracy is not enough.  You must also be able to prove or “trace” your time back to an authoritative time source – More on this later.  ESMA justifies the accuracy and traceability requirements in this way: “…It is also essential for conducting cross-venue monitoring of orders and detecting instances of market abuse and allows for a clearer comparison between the transaction and the market conditions prevailing at the time of their execution.”

As a result, we first brought 1 ms (millisecond) time accuracy to Windows Server 2016 meeting some of the regulatory requirements – This is supported in-market today.  However, our work was not done, and so Windows Server 2019 makes improvements to comply with these regulations and allow Windows to be the preferred choice for workloads with time dependencies.  Now, let’s talk a little bit about the features you’ll find in Windows Server 2019 and current Insider builds.

Important! While many of our efforts directly address concerns from regulated industries, 
this technology applies to any industry, application, or cloud-service with a time dependency.

There’s a lot of content in this article (because we did a lot!) – here’s a quick summary of the information you’ll see in this article

  • Compliant Leap Second Support
  • Accuracy Improvements (Precision Time Protocol, Software Time-stamping, Clock Source Stability)
  • Traceability (including system logging, performance counters, and our work with partners)

Leap Second Support

A leap-second is an occasional 1-second adjustment to UTC.  Now you may be thinking, “why on earth would anybody need to adjust UTC?”  As the earth’s rotation slows, UTC (an atomic timescale) diverges from mean solar time or astronomical time.  Once UTC has diverged by at most .9 seconds, a Leap Second is inserted to keep UTC in-sync with mean solar time.  Since the practice of inserting leap seconds began in 1972, a leap second has typically occurred every 18 months (for more information, please see the Leap Second FAQ).

In the US, the maximum end-to-end divergence from UTC(NIST) is 50ms – It’s even more strict in the EU.  This requires that Windows Server 2019 be able to maintain accuracy during a Leap Second.

Note: It’s not enough to apply leap-seconds; it matter how you apply them. 
Leap-second smearing has been condemned by the Time authorities at NIST and other national labs
around the world.  As such, Microsoft will not include a smearing option in Windows Server 2019.

Keep reading to understand the difference between the Microsoft approach,
and the non-compliant practice of leap second smearing.

To most, this seems like a such a simple idea – just add 1 more tiny, little, insignificant second to the day.  As IT Pros, we remember all those Y2K shenanigans that had us (rightfully) a little…well..worried…

So how does a leap second actually work?  Normally, computers keep seconds from 0 through 59 for a total of 60 seconds.  When a leap second occurs, an extra second is added to the last minute of the UTC day and the clock goes from 0 through 60 for a total of 61 seconds.

On the clock it looks like this (in my time zone, the last minute of the UTC day is actually 4:59 PM local time):

Without a Leap Second With a Leap Second
16:59:58 16:59:58
16:59:59 16:59:59
17:00:00 16:59:60
17:00:01 17:00:00
17:00:02 17:00:01

Important: Some of the "gurus" out there (I’m looking at you Neil deGrasse Tyson) might
rightfully say “technically, there can be both positive or negative leap seconds.  A positive 
leap second adds one second and a negative leap second removes one second from the day.” 

Rest assured Neil, while a negative leap second has never actually occurred, if it does, you
can still celebrate your leap seconds with very tiny bottles of champagne – We’ll support both 😊
Here's how it looks with a negative leap second
Without a Leap Second With a Leap Second
16:59:57 16:59:57
16:59:58 16:59:58
16:59:59 17:00:00
17:00:00 17:00:01
17:00:01 17:00:02

The problem with Leap Second smearing

As noted above, we will not include a leap second smearing option.  Leap Second smearing (where you carve the extra second up into smaller units and add them throughout the day) has “an error of order ±0.5 s with respect to the definition of UTC” (see below).  As noted previously, this will not meet the accuracy requirements in these regulated industries and as outlined below, there is no standard method for applying smearing frequency adjustments which can lead to a disagreement in time stamps.  As such, smearing does not meet customer regulatory requirements.

In their 2018 paper, “Metrological and legal traceability of time signals“, presented at the Precise Time and Time Interval Meeting, industry leaders from NIST and USNO outlined these two primary problems with Leap Second smearing:

Some corporations, in an attempt to minimize the impact on their systems and eliminate the discontinuity, have implemented “smears”, that slow down their clocks for a period around the time of the leap second insertion.

This method has the advantage that the time stamps are monotonically increasing even in the vicinity of the leap second, but it has an error of order ±0.5 s with respect to the definition of UTC.

In addition, there is no standard method for applying this frequency adjustment, so that different implementations may disagree among themselves in addition to the time error with respect to UTC.

I’m sure there will be many implementation and application compatibility questions stemming from this article; please stay tuned for more detailed information.   In the meantime, please note that regular day-to-day operations, you won’t need to change anything.  Check the “Leap Seconds for the Dev” validation guide for examples and stay tuned for further guidance.


Ready to give it a shot!?   Download the latest Insider build and Try it out!
Leap Seconds for the IT Pro          Leap Seconds for the Dev

Accuracy Improvements

We’re also improving our inherent accuracy in the platform.  First, why is it so hard to get the time right!?  While the answer may not be immediately apparent, there are a lot of pieces working against time-sensitive systems, some of which I’ve listed below:

Here’s some of the work we did to address each of the challenges listed above:

Precision Time Protocol:

In Windows Server 2019, Windows will include a new time synchronization protocol called Precision Time Protocol (PTP).  You may be asking yourself what’s wrong with NTP?  It’s served us well for so many years!

Think back to the last thunderstorm you saw  – Did you see lightning and hear thunder at the same time?  Unless you’re very close to the storm, you’ll likely detect an audible delay after you’ve seen the lightning.  How much of an audible delay are you experiencing?  The delay is not based strictly on the speed of sound and your distance from the storm.  It’s also affected by buildings or other influences that introduce additional acoustic delay.  If you want to know just how close you are to the storm, you’d have to consider all the influences.

Likewise, there is delay (latency) introduced in the timing packets being passed from the time server across the network.  If that delay is not accounted for, or if it is not symmetric (equal in both directions – to and from the client), then it becomes increasingly difficult for the client to properly apply the time stamp sent from the time server.

Network Time Protocol (NTP) has long been the primary time synchronization method for Windows but unfortunately, NTP does not have a solution to this problem; NTP assumes that the round-trip delay introduced by the network is symmetric.

Enter Precision Time Protocol (IEEE 1588v2).  PTP enables network devices to add the latency introduced by each network device into the timing measurements thereby providing a far more accurate time sample to the endpoint (Windows Server 2019 or Windows 10, host or virtual machine).

Precision Time Protocol is not for everyone; due to the network configuration requirements, NTP will continue as the default protocol.  However, for customers with the highest of accuracy requirements, you can drive towards even higher accuracy systems using our inbox PTP Client in Windows Server 2019.


Ready to give it a shot!? Download the latest Insider build and Try it out!

Software Timestamping:

When a timing packet is received over the network from a time server it must be processed by the OS’ networking stack prior to being consumed in the time service.  Each component in the networking stack introduces a variable amount of latency that affects the accuracy of the timing measurement.  This may sound insignificant, but this can add 30µs and in extreme scenarios closer to 200µs.  You may remember from earlier in this article, some systems are targeting sub-100µs accuracy!

In addition, there may be many other services on the system all looking for data from the network.  As a simple example, imagine a SQL Server with remote databases, or file servers with SAN/NAS storage that also require time accuracy.  Packets for these workloads would all compete with the Windows Time service packets attempting to traverse the networking stack introducing additional delay.

To address this problem, we timestamp packets before and after the “Windows Networking Components” shown above. Now we can improve time accuracy by accounting for software delays!


Ready to give it a shot!?   Download the latest Insider build and Try it out!

Clock Source Stability

Our final accuracy-based improvement actually affects the stability of the clock.  It’s not enough to have an accurate clock occasionally; you must maintain that accuracy over long periods of time.  It’s important to understand that a host system receives time “samples” from its time server, however it does not immediately apply these samples to the clock.

You can imagine that if a time sample is subject to variable network delay (among other unpredictable network challenges) and we immediately stepped the clock to match every time sample, the clock would likely be incorrect fairly often – it could even move backwards – a problem that would certainly make for a rainy day in the life of an IT Pro…

Instead we take multiple time samples, eliminate the outliers, and discipline the clock with the goal of bringing the system closer and closer to synchronization with the time server.

Disciplining the clock entails making adjustments to gradually converge on the correct time.  Ultimately there is a natural limit to how small of a change we can make but the key is that smaller is better.  Just how granular can we get?  This is a complicated question but is based on the frequency of the QPC clock.

For a more in-depth look at this subject including QPC, please reference this article.

Previous versions of Windows allowed for a QPC granularity (the smallest change we could make to the system clock) of 6.4 µs/second (microseconds / second).  In Windows Server 2019, the QPC granularity drops to 100 nanoseconds / second!  This is akin to the difference in clarity between 480p and 4K television.  There is much finer granularity in the 4K picture!

So why does all this matter?  Well accuracy as measured over time is reflective of your stability; not only can we hit the bulls-eye, we can hit the bulls-eye over and over again.  In a 3.5-day measurement, our partners at Sync-N-Scale measured, and NIST corroborated, Windows Server 2019 pre-release bits.  In the picture below, notice the MIN Time Offset reports 41µs (microseconds) RMS diverged from UTC(NIST)!

Note: The AVG method involves comparing the system under test to UTC(NIST) every 10 seconds, then averaging these measurements for 10 minutes (60 readings). UTC(NIST) is available with 0.0001 ms resolution. The difference between the two 10-minute averages is the difference between the time broadcast by the server and UTC(NIST).

The MIN method involves comparing each NTP server to UTC(NIST) every 10 seconds for a 10 minute interval (60 measurements). However, only one of the 60 measurements is saved, the one with the shortest round trip delay. This method is based on the assumption that NTP measurements with the shortest round trip delays provide the best estimate of the true time difference.

This leads me to our last topic, Traceability.

Traceability

Self-proclaimed accuracy is not enough – you must be able to prove, or trace, your accuracy to a known reference time source.  In the US, this would be UTC(NIST).  Traceability is a multi-faceted aspect of the regulations.  FINRA for example, states:

Members must document and maintain their clock synchronization procedures. Among other requirements, members must keep a log of the times when they synchronize their clocks and the results of the synchronization process.

System Logging

The first step in meeting these requirements is auditing changes and synchronization of the local system.  To do this, Windows Server 2019 will include additional logging capabilities that can be used to audit the actions taken by the Windows Time service.  We’ve documented the full list of events here.  These logs can be used to answer the questions above, such as:

  1. What is the chosen time server and synchronization frequency
  2. When was the last synchronization and results of that synchronization
  3. What actions were taken after the synchronization (did we discipline the clock?)

These logs are contained in a standard event log channel called Time-Service (more details in the link provided) and can be queried and forwarded by your SIEM of choice.

Performance Counters

We also have performance counters that allow you to observe and troubleshoot a number of critical time-related areas.  In the picture below, you can see two of the included counters, the Computed Time Offset (in microseconds) and the NTP Roundtrip Delay (also in microseconds).

The Computed Time offset is the absolute time offset between the system clock and the chosen time source, as computed by W32Time Service – This number should be as small as possible indicating how close your clock is synchronized with the reference clock.  The NTP Roundtrip Delay is the time elapsed on the NTP client between transmitting a request to the NTP server and receiving a valid response from the server – The higher this number, the harder it will be to maintain an accurate clock.  There are other counters and we encourage you to explore and provide some feedback!

SCOM Management Pack

If your monitoring system includes SCOM, you could also leverage a SCOM management pack that allows you to monitor and alert when a specified NTP Offset threshold is exceeded for a particular node.


Ready to give it a shot!?   Download the latest Insider build and Try it out!

Completing the Unbroken Chain

Dr. Judah Levine of NIST defines traceability as requiring an unbroken chain of measurements.  While Windows can provide information about its local system, traceability requires timing information from the entire chain of time sources as well – This is more than what Windows alone can provide.  Windows Server 2019 can participate in a fully traceable environment through our partners like Sync-N-Scale and Spectracom, .  Shown here is the partner solution from Spectracom: 

Summary

Previous time accuracy requirements were lax by today’s standards.  Now regulated industries have much more stringent accuracy requirements but accuracy alone is not enough – Your systems must also be traceable.

Windows Server 2019 meets the current accuracy and regulatory requirements required for time-sensitive workloads through a variety of improvements including compliant and accurate time during a leap second, a new time synchronization method in Precision Time Protocol, inherent platform improvements for stability, and lastly (but equally important), system-wide and end-to-end traceability.  You can use Windows Server 2019 for time-sensitive workloads, whether you’re in a regulated industry, application, or cloud service.

I’m sure there will be additional questions about some of these features as we near Windows Server 2019 launch at Ignite; please stay tuned as we’ll update our public documentation and provide additional blogs on this site as necessary.  Please give our validation guides (shown in the Try it Out links above!) a shot!  And most importantly, let us know what you think in the comments!

For the Windows Core Networking Team,

Dan “Sometimes my seconds Leap” Cuomo

 

Here’s a list of all the Try it Out! sections in this blog

Leap Seconds for the IT Pro – Try it out!

Leap Seconds for the Dev – Try it out!

Precision Time Protocol – Try it out!

Software Timestamping – Try it out!

High Accuracy Validation Guide – Try it out!