(dsymalla Posting on Behalf of Roger Osborne)
Update: A Server 2012 version has been posted here.
Have you ever had someone – either a customer or colleague – build a mission critical server without taking time, up-front, to ensure they are following best practices? Or what about the build document you spent untold hours creating, only to have it completely ignored. If neither of these have happened to you, consider yourself lucky! For the rest of us, however, this happens far too often.
In my first official posting on Ask Premier Field Engineering (PFE) Platforms, I would like to share a checklist I developed over time while working with customers’ Hyper-V environments. I find it’s a great tool to use not only when reviewing an existing Hyper-V implementation, but one which can be easily leveraged as part of pre-planning stages, to ensure best practices are implemented from the start.
Although the majority of the items within this checklist still apply to Hyper-V in Server 2012, I will be sharing an updated checklist in the near future specific to the latest Hyper-V version. Stay tuned, folks!! J
My colleague, Mike Hildebrand, posted a somewhat similar entry, which I strongly encourage you to review, as well: http://blogs.technet.com/b/askpfeplat/archive/2012/03/18/the-journey-of-a-thousand-vms-begins-with-a-few-steps.aspx
Disclaimer: As with all Best Practices, not every recommendation can – or should – be applied. Best Practices are general guidelines, not hard, fast rules that must be followed. As such, you should carefully review each item to determine if it makes sense in your environment. If implementing one (or more) of these Best Practices seems sensible, great; if it doesn't, simply ignore it. In other words, it's up to you to decide if you should apply these in your setting.
⎕ Use Server Core if possible, to reduce OS overhead, reduce potential attack surface, and to minimize reboots (due to fewer software updates)
⎕ Hyper-V services should be configured to start automatically, to ensure uninterrupted VM services after reboots. (Verify in Administrative Tools à Services):
· Hyper-V Virtual Machine Management Service (To set to auto: sc config vmms start=auto)
· Hyper-V Networking Management Service (To set to auto: sc config nvspwmi start=auto)
· Hyper-V Image Management Service (To set to auto: sc config vhdsvc start=auto)
⎕ Ensure hosts are up-to-date with recommended Microsoft updates, to ensure critical patches and updates – addressing security concerns or fixes to the core OS – are applied.
⎕ Ensure all applicable Hyper-V hotfixes and Cluster hotfixes (if applicable) have been applied. Review the following sites and compare it to your environment, since not all hotfixes will be applicable:
⎕ Install the latest PowerShell version (currently 3.0) on each Hyper-V host:
⎕ Download and install the Hyper-V PowerShell Management Library
⎕ Ensure hosts have the latest BIOS version, to address any known issues/supportability
⎕ Host should be domain joined, unless security standards dictate otherwise. Doing so makes it possible to centralize the management of policies for identity, security, and auditing. Additionally, hosts must be domain joined before you can create a Hyper-V High-Availability Cluster.
⎕ RDP Printer Mapping should be disabled on hosts, to remove any chance of a printer driver causing instability issues on the host machine.
· Preferred method: Use Group Policy with host servers in their own separate OU)
o Computer Configuration à Policies à Administrative Templates à Windows Components à Remote Desktop Services à Remote Desktop Session Host à Printer Redirection à Do not allow client printer redirection à Set to "Enabled"
⎕ Set host power plan to Maximum Performance, to ensure maximum CPU performance.
· Preferred method: Use Group Policy with host servers in their own separate OU
Computer Configuration –> Preferences –>Control Panel Settings –> Power Options
Once there, create a new Power Plan (using the Vista or higher selection) and assign it "High Performance"
⎕ Do not install any other Roles on a host besides the Hyper-V host
· When the Hyper-V role is installed, the host OS becomes the "Parent Partition" (a quasi-virtual machine), and the Hypervisor partition is placed between the parent partition and the hardware. As a result, it is not recommended to install additional roles, services, etc.
⎕ The only Features that should be installed on the host are: Failover Cluster Manager (if host will become part of a cluster) and Multipath I/O (if host will be connecting to an iSCSI SAN, for example). (See explanation above for reasons why installing additional features is not recommended.)
⎕ Anti-virus software can be installed, if desired; however, be sure to exclude Hyper-V specific files using KB 961804:
o Default virtual machine configuration directory (C:\ProgramData\Microsoft\Windows\Hyper-V)
o Custom virtual machine configuration directories, if applicable
o Default virtual hard disk drive directory
o Custom virtual hard disk drive directories
o Snapshot directories
o Vmms.exe (Note: May need to be configured as process exclusions within the antivirus software)
o Vmwp.exe (Note: May need to be configured as process exclusions within the antivirus software)
o Additionally, when you use Cluster Shared Volumes, exclude the CSV path "C:\ClusterStorage" and all its subdirectories.
⎕ Default VM path and VHD path should be set to a non-system drive, due to this can cause disk latency, as well as create the potential for running out of disk space.
⎕ Enable iSCSI Service TCP-In (for Inbound) and iSCSI Service TCP-Out (for outbound) in Firewall settings on host (Port 3260), to allow iSCSI traffic to pass to and from host and SAN device. Not enabling these rules will prevent iSCSI communication.
⎕ Periodically run performance counters against the host, to ensure optimal performance.
Recommend using the Hyper-V R2 SP1 performance counter that can be extracted from the (free) Codeplex PAL application:
· Install PAL on a workstation and open it, then click on the Threshold File tab.
o Select "Microsoft Hyper-V R2 SP1" from the Threshold file title, then choose Export to Perfmon template file. Save the XML file to a location accessible to the Hyper-V host.
· Next, on the host, open Server Manager à Diagnostics à Performance à Data Collector Sets à User Defined. Right click on User Defined and choose New à Data Collector Set. Name the collector set "Hyper-V Performance Counter Set" and select Create from a template (Recommended) then choose Next. On the next screen, select Browse and then locate the XML file you exported from the PAL application. Once done, this will show up in your User Defined Data Collector Sets.
· Run these counters in Performance Monitor for 30 minutes to 1 hour (during high usage times) and look for disk latency, memory and CPU issues, etc.
⎕ If server has more than 32 physical cores, do not enable Hyper Threading, as it creates more logical cores than Hyper-V supports on Server 2008 R2. (Max is 64.)
⎕ Ensure NICs have the latest firmware, which often address known issues with hardware.
⎕ Ensure latest NIC drivers have been installed on the host, which resolve known issues and/or increase performance.
⎕ Consider disabling Chimney Offload, as it has been found to cause slowness of virtual machines.
From an elevated command-prompt, type the following:
netsh int tcp set global chimney=disabled
⎕ Jumbo frames should be turned on and set for 9000 or 9014 (depending on your hardware) for CSV, iSCSI and Live Migration networks. This can significantly increase (6x increased throughput) throughput while also reducing CPU cycles.
· End-to-End configuration must take place – NIC, SAN, Switch must all support Jumbo Frames.
· You can enable Jumbo frames when using crossover cables (for Live Migration and/or Heartbeat), in a two node cluster.
· To verify Jumbo frames have been successfully configured, run the following command from all your Hyper-V host(s) to your iSCSI SAN:
o Ping 192.168.1.130 –f –l 8000
· This command will ping the SAN (e.g. 192.168.1.130) with an 8K packet from the host. If replies are received, Jumbo frames are properly configured.
⎕ NICs used for iSCSI communication should have all Networking protocols (on the Local Area Connection Properties) unchecked, with the exception of:
· Manufacturers protocol (if applicable)
· Internet Protocol Version 4
· Internet Protocol Version 6.
Unbinding other protocols (not listed above) helps eliminate non-iSCSI traffic/chatter on these NICs.
⎕ When creating virtual switches, uncheck the Allow management operating system to share this network adapter, in order to create a dedicated network for your VM(s) to communicate with other computers on the physical network.
⎕ Recommended network configuration when clustering:
Min # of Networks on Host
VM Network Access
** CSV & Live Migration Networks can be crossover cables, if you are building a 2 node cluster **
VIRTUAL NETWORK ADAPTERS (NICs):
⎕ Legacy Network Adapters (a.k.a. Emulated NIC drivers) should only be used for PXE booting a VM or when installing non-Hyper-V aware Guest operating systems. Hyper-V's synthetic NICs (the default NIC selection; a.k.a. Synthetic NIC drivers) are far more efficient, due to using a dedicated VMBus to communicate between the virtual NIC and the physical NIC; as a result, there are reduced CPU cycles, as well as much lower hypervisor/guest transitions per operation.
⎕ Disks should be Fixed or Pass-Through in a production environment, to increase disk throughput. Differencing and Dynamic disks are not recommended for production, due to possible data loss (differencing disks) and increased disk read/write latency times (differencing/dynamic disks).
· See http://technet.microsoft.com/en-us/library/cc720381(v=WS.10).aspx for more information
⎕ Disable snapshots from all production VMs. Snapshots can cause disk space issues, as well as additional physical I/O overhead.
· Set the snapshot path for each VM to a non-existent location, so user gets an error if they attempt to create a snapshot.
· If snapshots are mandatory, the snapshot location should not be the host OS drive.
⎕ The physical format of hard disk drives used for hosting VMs should be 512-byte sectors, to prevent compatibility issues (see http://support.microsoft.com/kb/2515143).
It is not recommended to use 512e formatting for disks that will house VHDs, due to internal testing has shown a performance degradation of around 30% for most workloads.
Regarding 4K Disks:
“The VHD driver in Server 2008 R2 assumes that the physical sector size of the disk to be 512 bytes and issues 512 byte IOs, which makes it incompatible with these disks. The VHD stack fails to open the VHD files on physical 4kB sector disks for this reason.”
Taken from: http://support.microsoft.com/kb/2515143
Side-Note: Windows 2012 fully supports 4K disks out of the box.
⎕ Page file on Hyper-V Host should be set to a fixed size (4GB max) on the system drive, since most Hyper-V implementations have large amounts of physical memory, and, by default, the page file is the same size as the physical amount of memory.
· Can be placed on a SAN drive, if desired
· Should not be on a VM volume, to reduce possible disk latency if page file is being used by host
· Setting location: System Properties à Advanced Tab à Virtual Memory section, select Change –> Uncheck “Automatically manage paging file size for all drives” –> Click “Custom size” radio button and input desired initial size (MB) and Maximum size (MB) (e.g. Input “4096” for both to have a set page file size)-.
⎕ Set reserved Hyper-V Parent Host memory, to ensure memory is set aside for the host, itself.
· To determine minimum host memory reserve, follow these guidelines:
o Use the following calculation:
384 + (30 * Physical Memory)
For example: 384 + (30*48) = 1824MB min reserve recommendation on host with 48GB memory
· To set memory reserve, change the following:
o Registry Key:
Value is set in Decimal, and is in megabytes (e.g. 4096)
Requires a reboot to take effect
2-4 GB Minimum on average
⎕ Use Dynamic Memory on all VMs (unless not supported. e.g. Lync).
· Dynamic Memory adjusts the amount of memory available to a virtual machine, based on changes in memory demand using a memory balloon driver, which helps use memory resources more efficiently.
⎕ Guest OS should be configured with (minimum) recommended memory
· 2048MB For Windows Server 2008, including R2 (e.g. 2048 – 4096 Dynamic Memory)
· 1024MB For Windows 7 (e.g. 1024 – 2048 Dynamic Memory)
· 1024MB For Windows Vista (e.g. 1024 – 2048 Dynamic Memory)
· 256MB For Windows Server 2003 (e.g. 256 – 2048 Dynamic Memory)
· 128MB For Windows XP. Important: XP does not support Dynamic Memory. (The minimum supported is 64 MB). Note: Support for Windows XP Ends April 2014!
⎕ Ensure Integration Components (IC) have been installed on all VMs (Pre 2008/Pre Win 7/Other OS). IC's significantly improve interaction between the VM and the physical host.
· Enlightened OS's (Server 2008 or higher, Windows 7 or higher) don't need IC installed.
⎕ Set preferred network for CSV communication, to ensure the correct network is used for this traffic. (Note: This will only need to be run on one of your Hyper-V nodes.)
· The lowest metric in the output generated by the following PowerShell command will be used for CSV traffic
o Open a PowerShell command-prompt (using “Run as administrator”)
o First, you’ll need to import the “FailoverClusters” module. Type the following at the PS command-prompt:
· Import-Module FailoverClusters
o Next, we’ll request a listing of networks used by the host, as well as the metric assigned. Type the following:
· Get-ClusterNetwork | ft Name, Metric, AutoMetric, Role
o In order to change which network interface is used for CSV traffic, use the following PowerShell command:
o (Get-ClusterNetwork "CSV Network").Metric=900
· This will set the network named "CSV Network" to 900
⎕ Set preferred network for Live Migration, to ensure the correct network(s) are used for this traffic:
· Open Failover Cluster Manager, Expand the Cluster then Expand Services and applications
· Under Services and applications, click once on any of the VMs listed in the left pane
· Next, in the middle pane (under the title “Virtual Machine”), right click your VM and choose properties
· Click on the Network for live migration tab
o Use the Up / Down buttons to list the networks in order from most preferred (at the top) to least preferred (at the bottom)
o Uncheck any networks you do not want used for Live Migration traffic
o Select Apply and then press OK
· Once you have made this change, it will be used for all VMs in the cluster
⎕ The Cluster Shutdown Time (ShutdownTimeoutInMinutes registry entry) should be set to an acceptable number
· Default is set using the following calculation (which can be too high, depending on how much physical memory is installed)
o (100 / 64) * physical RAM
o For example, a 96GB system would have 150 minute timeout! (100/64)*96 = 150
· Might suggest setting the timeout to 10, 15 or 30 minutes, depending on the number of VMs
o Registry Key: HKLM\Cluster\ShutdownTimeoutInMinutes
· Enter minutes in Decimal value.
· Note: Requires a reboot to take effect
⎕ Each node in the cluster requires an identically named (case sensitive!) virtual switch. Failovers and Live Migrations will fail without identically named switches
· If any changes are made to the virtual switch – on any node – you must refresh the virtual machine configuration (Failover Cluster Manager à <Cluster Name> à Services and applications à <VM name> à More Actions (in right pane) à Refresh virtual machine configuration). Repeat this process for each VM listed in Services and applications.
⎕ Run Cluster Validation periodically to remediate any issues
· NOTE: If all LUNs are part of the cluster, the validation test will skip all disk checks. It is recommended to set up a small test-only LUN and share it on all nodes, so full validation testing can be completed.
VITRUAL DOMAIN CONTROLLERS (DCs):
⎕ It is recommended to partially disable the time synchronization between the VM DC and the host (using registry change). This enables the guest DC to synchronize time for the domain hierarchy, but protects it from having a time skew if it is restored from a saved state:
· On the virtual DC, enter the following from an elevated command-prompt:
reg add HKLM\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\VMICTimeProvider /v Enabled /t reg_dword /d 0
· Once done, you can leave the "Time Synchronization" enabled on Integration Services, under the DC's Hyper-V Settings
⎕ DC VMs should have "Shut down the guest operating system" in the Automatic Stop Action setting applied (in the settings on the Hyper-V Host)
⎕ If VHDs are IDE/ATA drives, ensure disk write caching is disabled, to reduce the chance of AD corruption.