Remote Wipe Now and MDM Alerter troubleshooting
A key security feature of SCMDM is the ability to wipe a device remotely. Often time is of the essence, so it is important to know if a wipe was successful or not. Here we will discuss how remote wipe works and how to troubleshoot it,
Remote wiping of a device is a security feature that allows administrators to remotely send a command to an MDM managed device that causes it to erase all data and return to factory defaults. This is useful if the device has been lost or stolen. In a remote wipe, the flash and storage card data are overwritten, leaving only the base OS on the device and no user data. Administrators can also deploy the Self Service Portal which allows users to wipe their own devices.
For wipe operations, time is critical - that's why we tend to call the feature "Wipe Now" internally. Without the Now you just have a Wipe Soon operation. Wiping in 8 hours doesn't work great when your device and all its data are lost or stolen. Remote wipe commands in MDM depend heavily on networking between the server and the device and this is where problems can often occur.
In pilot environments administrators often run MDM Remote Wipe tests because it is easy to see it working. However, this is one area in deployment where it is easy to make mistakes and cause the Wipe to take longer than expected. Below are troubleshooting steps to help to determine the cause of a Wipe Now command taking longer than the expected
2-6 minutes. Failed to catch this before posting...expected time is 3-15 minutes.
How Remote Wipe Works
Before going into troubleshooting, here's a brief overview of how Wipe Now works.
- An administrator or user submits a wipe request through the console, MDM Shell, or Self Service Portal.
- The wipe request is stored in the DM Engine database for the device to pick up at its next scheduled OMA session
If we were to stop at this point it would be a "Wipe Soon" and the device would connect at its regular interval (4 hours, for example) and pick up the wipe. Wipes submitted are always submitted as a "Wipe Now" now command, and thus we have to go a step further
- In parallel to adding the wipe request for retrieval, the wipe driver also calls the Alerter component to inform the device of a pending wipe request.
The Alerter sends an alert to the device over the Mobile VPN. The alert can only be sent through the VPN tunnel and thus Wipe Now requires VPN connectivity
The Alerter client on the device receives this Alert and immediately starts a management session with the Device Management server.
- The device picks up its wipe request from the Device Management server, sends back an Acknowledgement that started the wipe, and starts the wipe process
Troubleshooting Wipes that are Taking Longer than Expected
From the moment a wipe request is submitted, a wipe should take approximately
2 - 6 minutes 3-15 minutes] depending on the network and other factors. If it is taking longer than that, below are some troubleshooting steps that you can perform to determine the cause of the latent wipe request.
1. Verify that Management Sessions are operating as expected
Ensure that typical device management sessions, not related to wipe are working as expected. You can download the MDM Connect Now tool located in the SCMDM 2008 Resource Kit Client Tools package here: http://technet.microsoft.com/en-us/scmdm/cc304591.aspx Using this tool and associated documentation, you can verify that management sessions for devices are successful.
If management sessions are not working, you will need to fix this problem before devices can successfully receive the remote wipe command.
2. Verify the device is connected to the VPN
As discussed in the above "How Remote Wipe Works" section, remote wipe relies on the VPN being connected in order to send alerts to the device. The device may not be connected to the Mobile VPN for many reasons, but some of the most common are:
The device is switched off
- The Mobile VPN is disabled (if users have the ability to disable it)
- The user is roaming and VPN is off
- The data connection is improperly configured
- The user is temporarily out of service coverage area
In all the above cases, the device will not receive the Alert message as it relies on the VPN tunnel being up. When the device reconnects to the VPN, it will receive an Alert message or will start a management session immediately if it has missed its regularly scheduled session.
3. Verify the Gateway Server is not behind a Network Address Translator (NAT)
Once you have verified that management sessions are operating and the VPN was up on the device you were attempting to wipe, you need to check that the Gateway is not behind a Network Address Translator. MDM does not support locating a Gateway Server behind a NAT. There are several reasons for this requirement; one of the reasons has to do with the Alerter.
The reason the Alerter does not work with Gateway Servers behind a NAT is for security purposes. We'll talk about this some more, but for added security, the Alerter checks to ensure that the Alert it received is really from the MDM Gateway Server and not from a potential attacker.
The Gateway Server must have a public IP address and must not sit behind a NAT or the Alerter cannot verify the alert that the alert is valid. The Alerter discards invalid alerts. Ensure that your Gateway is not behind a NAT.
4. Check the Event Log on the MDM Gateway
On the MDM Gateway server, open up the "VPN Policy Engine" Event Log. Search for events 5507 and events 5506 in the log
Event 5506 indicates that the Alerter received a response from the device.
Event 5507 indicates that the Alerter sent a number of retries,but never heard from the device.
Further debugging information is needed for the following scenarios:
- 5507 events for every wipe issued
- 5507 event for a wipe issued in a controlled environment, where you know the device connected, online and available
- Inconsistent 5507 and 5506 events
A mixture of 5507 and 5506 events are expected for normal operations as some devices may be offline, out of service range, or not connected for another reason. This is generally normal.
In our next post we will look at how to use the some client tools available in the resource kit, and how to use the Device Log to further narrow down any issues.
Mobile Information Worker