ConfigMgr 2007: Current issues with Wake On LAN (WOL)

 UPDATE:  We released a hotfix for this issue under KB953148. The fix is actually in SP1 so if you're current you should be good to go.

========

We have received several customer reports describing problems with Wake on LAN – specifically the symptom of machines not waking up. There are two specific issues to be aware of that can contribute to this symptom (other than machines not being capable of or configured for Wake on LAN).

First, there is a recently diagnosed issue which results in only 31 machines receiving the Magic Packet. This affects both Unicast and Subnet Directed Wake on LAN traffic.

The tell-tale log entries can be seen in wolmgr.log on the site server:

//
Failed to add data record to message.
31 client network data processed.

//

Note that the ‘Failed to add…” message may be repeated multiple times, always ending with the “31 client…” entry.

Our Sustained Engineering team is currently working on a code change (hotfix) to address this for the RTM release. We’re also working to roll it into SP1 but as we’re very late in the development cycle, it could end up as a post-SP1 fix as well. Once the KB article associated with this is ready we’ll update this blog entry with the article ID and a link.

Second, there is a more general network limitation that we now expose with our Unicast implementation and machines on remote (from the site server) subnets. This is mentioned in other documentation but we wanted to review it here. This is not as prevalent with machines in the same subnet as the site server, and doesn’t occur with subnet directed broadcasts.

The issue is that a Unicast Magic Packet may have been sent, but the machine – again assuming all configurations are correct - still doesn’t wake up if powered off (S5 power state). This is because the MAC address of the target machine is no longer in the ARP cache of its local router. Without that data being present in the ARP cache, the router doesn’t know how to reach the machine. Since it is powered off, it won’t respond to the ARP request sent from that router. Even with a couple of retries from the site server, the machine still won’t wake up and the router will give up due to the target being unreachable.

The time to live (TTL) for data in the ARP cache of a router will vary depending on the network environment. The lower the TTL, the more prevalent this symptom will become. To avoid this issue, the TTL should be at least as large as the typical time interval between when a computer is powered off and when the WoL magic packet is sent for an off-hours update or other operation. If you want to lengthen the ARP cache life, check your router manufacturer documentation for instructions on how to increase the ARP TTL.

Within the confines of our architecture today this isn’t something that we can readily program around. In other words, there isn’t a ‘hotfix’ that could change this; it’s inherent in the networks themselves when combined with our approach to WoL.
There are larger scale changes that could be made to the product, and those are being investigated, however the time frame to release something of that scale would likely be the next version of the product (SCCM v5).

For machines that are in a Standby or Hibernate (S1/3/4) power state and have a fairly modern Network controller, they can be configured to wake on that ARP request from the router. Depending on the make/model of the NIC this could be seen as “Wake on ARP” or “Wake on Network Access” for example. To enable this may require additional configuration in the advanced properties of the NIC, as well as in its power management properties. More detailed documentation on this topic will be forthcoming this summer.

Brian Huneycutt | System Center Sustained Engineering