Microsoft Reinvents Datacenter Power Backup with New Open Compute Project Specification

Guest post by Shaun Harris, Principal Hardware Engineer

Microsoft has been designing, building and operating datacenters for over two decades and this experience has led to several key insights into driving efficiency at all levels of the datacenter design. This is why we chose to join the Open Compute Project (OCP) just over a year ago – to share our learnings with the community. Since joining, we have delivered the Open CloudServer (OCS) specification to the project, in addition to numerous other innovations.

 Today at the Open Compute Project (OCP) U.S. Summit, Microsoft continued its innovation with the announcement of the contribution of a novel distributed Uninterrupted Power Supply (UPS) technology, which we call Local Energy Storage (LES). We are excited to make this hardware innovation available to the OCP community so that its benefits can be realized by a broader set of IT customers.

LES offers an integrated power supply and battery combination which is fully compatible with the Open CloudServer (OCS) v2 chassis system. The new LES units are mechanically interchangeable with the previous PSUs (common slot compliant) that were announced at the time of OCS v2 launch at the OCP EU Summit in Paris last year. The IT administrator can now choose which PSU to use for a specific type of deployment, depending on existing datacenter topology and battery backup requirements.

Let’s take a deeper look at the challenges behind traditional UPS designs, the key technological innovation behind LES, and the advantages of deploying LES as a distributed power backup solution.

Challenges with traditional facility UPS power backup solutions

In a traditional datacenter environment, the UPS system is housed in large separate rooms and typically consists of a large array of flooded lead-acid batteries stringed together to provide the interim backup power for the IT load in case of utility power outages (until the generators are turned on). The graphic shows this type of traditional design with the dual redundant facility UPS. This type of traditional UPS design leads to several inefficiencies for the datacenter deployment:

  • The UPS and battery equipment room footprint accounts for 150,000 ft2 or approximately 25 percent of the total facility footprint (for a typical 25MW datacenter). Assuming a rate of $220/ft2, the UPS room construction and buildout accounts for $31 million in datacenter capex spend.
  • The power stages upstream of the IT load are comprised of AC/AC or AC/DC conversions before the delivering input voltage to the server. This type of double conversion UPS design leads to inefficiency in the power backup system which negatively impacts the Datacenter PUE by up to 17 percent. This is comprised of 8 percent set aside for battery recharge, 1 percent internal systems management and 9 percent for losses in double conversion.
  • Traditional UPS facility systems have a large failure domain that impacts availability for the entire IT load that is supported by the power backup solution. In case of UPS failures, this can lead to unexpected service outages that impact hardware availability and customer SLAs. To mitigate against this risk, datacenter designers add reserve UPS systems, coordinated power switchover systems, and safety margin to the UPS and battery systems. This creates a complicated solution for ensuring high availability power backup, which increases both the solution level capex and opex costs.

Leveraging our learnings building and managing datacenters, for the mechanical/cooling architecture, we moved to an Adiabatic based design which provided significant cost savings for the datacenter build and simplified the operational aspects. The graphic below shows the modified datacenter with the Adiabatic system, but still having the dual-redundant UPS design.

As we continued to evolve our datacenter designs for improved efficiency, we started to take a look at the power system distribution, so that we could address the various challenges highlighted above with the traditional UPS topology. The result of this research was a radical simplification to how the power solution could be designed. The key insight we had was that by completely eliminating the facility UPS and moving that capability directly into the IT load, we could not only achieve cost reductions and operational simplicity, but could open up a whole new area of architectural innovation by tighter integration of the battery system with the IT management and controls system. These observations led to the design innovations for LES, as outlined in the next section.

 

Key technology innovation in LES

Every server using a modern switch mode power supply unit (PSU) comes with active power factor correction, capacitive bulk storage, and an isolated DC/DC output stage. The DC/DC output stage pulls its energy from the bulk storage and maintains the proper output voltage tolerances across the blade load profile.  The LES topology reuses the PSU design and control loops (ensuring maximum leverage of already proven designs) adding only components such as batteries, battery management controller, low current isolated charger and a low current 380VDC isolated output.

The hand tool and electric vehicle industry has created a market for low cost, high performance, high quality Li-Ion cells. These same cells are used in the LES battery pack, to ensure that the LES solution can leverage industry volume economics and supply chain.

The LES design innovation takes commodity energy storage devices (batteries) and an industry proven PSU power train, fusing these together in a single package to maximize energy delivery efficiency and minimize cost overheads. LES integrates the battery in the 380VDC bulk capacitance section of the PSU.

Advantages of LES for Datacenter deployments

The simplicity of the LES design coupled with its cost structure and service model provides significant benefits for datacenter deployments.

  • Up to 5x cost reduction over traditional facility UPS, achieved by extreme simplification of the datacenter power delivery solution and moving the energy storage function to a high volume commodity supply chain.
  • Moving the energy storage local to the server eliminates up to 9 percent of the losses associated conventional UPS systems. The LES topology and lithium-ion batteries requires only 2 percent charge overhead versus conventional UPS systems (which require up to 8 percent charge overhead and 1 percent operating overhead). The net result is up to 15 percent improvement in Datacenter PUE
  • Given no requirement for a UPS or battery room, the facility footprint can be reduced by 25 percent for Datacenter build capex savings
  • Significantly improved serviceability model when compared to flooded lead-acid batteries in traditional UPS solution. LES units are hot swappable and safe upon removal, without any dangers associated with exposure to high voltage or chemicals. Distributing the battery in small power blocks delivers a ‘fail small’ system minimizing potential failure impact zones.
  • By locally integrating the energy storage devices we have enabled low latency detection and controls not possible in a conventional centralized UPS systems. The LES unit when tightly coupled with the IT management system (OCS chassis manager) can enable new architectural scenarios for utilization efficiency – such as peak shaving, trough charge, processor state control from the row distribution to server. We will release more details on these opportunities in the near future.

We are excited to share the innovative LES design with the OCP community, and will be making available all design collateral for LES as part extending the Open CloudServer (OCS) specification. If you’re at the summit, please stop by our booth (B2) to learn more about OCS and LES.

Shaun Harris
Director of Engineering, Cloud Server Infrastructure, Microsoft

Read more: Microsoft Furthers Open Networking Specification