Updating Firmware for Disk Drives in Windows Server 2016 (TP4)

Updating the firmware for disks has historically been a cumbersome task with a potential for downtime, which is why we’re making improvements to Storage Spaces and Windows Server 2016 to enable you to more easily update disk firmware prior to placing a server in production. You can also use this new functionality to update the firmware of in-production disks if there’s a critically important disk firmware advisory from your hardware vendor or OEM, and your hardware supports this. However, if you’re going to update the firmware of a production drive, make sure to read our tips on how to minimize the risk while using this powerful new functionality.

The Goal

The long-term goal is to provide a simple way to update disk firmware without downtime when using Storage Spaces.

Note: Technical Preview 4 (TP4) includes APIs and Windows PowerShell cmdlets that help you update individual disk firmware. Coordination to mitigate an impact to I/O workloads or orchestrate across a cluster is not yet present.

Warning: Firmware updates are a potentially risky maintenance operation and should only be performed after thorough testing of the new firmware image. It is possible that new firmware on unsupported hardware could negatively affect reliability and stability, or even cause data loss. Administrators should read the release notes a given update comes with to determine its impact and applicability.

First Steps

To ensure common device behavior, we began by defining new and currently optional Hardware Lab Kit (HLK) requirements for SAS, SATA, and NVMe devices. These requirements outline which commands a SATA, SAS, or NVMe device has to support in order to be firmware-updatable using these new, Windows-native PowerShell cmdlets. To support these requirements, there is a new HLK test to verify if vendor products support the right commands and get them implemented in future revisions. Here are links to the various requirements:

PowerShell cmdlets

The two cmdlets added to Windows Server 2016 are:

  • Get-StorageFirmwareInformation
  • Update-StorageFirmware

The first cmdlet provides you with detailed information about the device’s capabilities, firmware images, and revisions. In this case, the machine only contains a single SATA SSD with 1 firmware slot. Here’s an example:

PS C:\> Get-PhysicalDisk | Get-StorageFirmwareInformation
SupportsUpdate        : True
NumberOfSlots         : 1
ActiveSlotNumber      : 0
SlotNumber            : {0}
IsSlotWritable        : {True}
FirmwareVersionInSlot : {J3E16101}
PS C:\>

Note: SAS devices will always report “SupportsUpdate” as “True”, since there is no way of explicitly querying the device for support of these commands.

The second cmdlet will allow the administrator to update the drive firmware with an image file. You should obtain this image file from the OEM or drive vendor directly.

Note: Before updating any production hardware, ensure the particular firmware image has been successfully tested on identical hardware in a lab setting.

The disk will first load the new firmware image to an internal staging area. While this happens, I/O typically continues. The image activates after download. During this time the disk will not be able to respond to I/O commands as an internal reset occurs. This means that no data will be served from this disk during the update. An application accessing data on this disk would have to wait for a response until the firmware update is completed. Here’s an example of the cmdlet in action:

PS C:\> $pd | Update-StorageFirmware -ImagePath C:\Firmware\J3E160@3.enc -SlotNumber 0
PS C:\> $pd | Get-StorageFirmwareInformation
SupportsUpdate        : True
NumberOfSlots         : 1
ActiveSlotNumber      : 0
SlotNumber            : {0}
IsSlotWritable        : {True}
FirmwareVersionInSlot : {J3E160@3}
PS C:\>

Since these cmdlets are usable through PowerShell, it is also possible to script their use.

Note: Drives typically do not complete I/O requests when they activate a new firmware image. How long a drive takes to activate depends on its design and the type of firmware you update. We have observed update times range from fewer than 5 seconds to more than 30 seconds.

This particular drive performed the firmware update within ~5.8 seconds, as shown here:

PS C:\> Measure-Command {$pd | Update-StorageFirmware -ImagePath C:\Firmware\J3E16101.enc -SlotNumber 0}
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 5
Milliseconds      : 791
Ticks             : 57913910
TotalDays         : 6.70299884259259E-05
TotalHours        : 0.00160871972222222
TotalMinutes      : 0.0965231833333333
TotalSeconds      : 5.791391
TotalMilliseconds : 5791.391

Updating drives in production

Before placing a server into production, we recommend updating the firmware of your drives to the firmware recommended by the hardware vendor or OEM that sold and supports your solution (storage enclosures, drives, and servers).

Once a server is in production, it’s generally a good idea to make as few changes to the server as is practical. However, there may be times when your solution vendor advises you that there is a critically important firmware update for your drives. If this occurs, here are a few good practices to follow before applying any drive firmware updates:

  1. Review the firmware release notes and confirm that the update addresses issues that could affect your environment, and that the firmware doesn’t contain any known issues that could adversely affect you.
  2. Install the firmware on a server in your lab that has identical drives (including the revision of the drive if there are multiple revisions of the same drive), and test the drive under load with the new firmware. For info about doing synthetic load testing, see Test Storage Spaces Performance Using Synthetic Workloads.

FAQ

  • Can I update firmware on my SAN through this mechanism?
    • No – SANs usually have their own utilities and interfaces for such maintenance operations. This new mechanism is for directly attached storage, such as SATA, SAS, or NVMe devices.
  • Will this work on any storage device?
    • This will work on drives that implement the correct commands in their firmware. The Get-StorageFirmwareInformation cmdlet will show if a drive’s firmware indeed does support the correct commands (for SATA/NVMe) and the HLK test allows vendors and OEMs to test this behavior.
  • From where do I get the firmware image?
    • You should always obtain any firmware directly from your OEM, solution vendor, or drive vendor and not download it from other parties. Windows provides the mechanism to get the image to the drive, but cannot verify its integrity.
  • Will this work on clustered disks?
    • The cmdlets can perform their function on clustered disks as well, but keep in mind that no orchestration exists in TP4 to mitigate the I/O impact on running workloads. In general, it is best to perform disk firmware updates when there is no, or just a minimal workload on the underlying drives.
  • What happens when I update firmware on Storage Spaces?
    • With TP4, all commands on the affected Storage Spaces will be paused for the duration of the update, as the disk cannot accept any commands during the activation of the firmware image. We are working on providing a mechanism that will make this possible in an online fashion in a future release.
  • What happens when the update fails?
    • The update could fail for various reasons, some of them are: 1) The drive doesn’t support the correct commands for Windows to update its firmware. In this case the new firmware image is never activated and the drive continues functioning with the old image. 2) The image cannot be downloaded to or applied to this drive (version mismatch, corrupt image, …). In this case the drive is expected to fail either the activate or download command. Again, the old firmware image should continue to function.
    • If the drive does not respond after a firmware update, you are most likely hitting a bug in the drive firmware itself. This is why all firmware updates should first be tested in a lab environment before putting them in production. The only remediation may be to replace the drive.