A few months Bruce Langworthy wrote an excellent article regarding some new recommendations for setting the Windows Disk Timeout value - http://blogs.msdn.com/b/san/archive/2011/08/15/the-windows-disk-timeout-value-understanding-why-this-should-be-set-to-a-small-value.aspx.
This post got me thinking about Exchange and how we deal with I/O problems. If you haven't read Bruce’s article, it explains that the default disk timeout of 60 seconds means that Windows will not report the hung I/O for 60 seconds and won’t retry the I/O for 8 minutes. 8 minutes is far too long to wait before retrying a hung IO, so Microsoft is releasing new guidance recommending changing the Windows Disk Timeout setting to a value that aligns with your storage architecture.
The question in my mind for Exchange was simple, how does this disk timeout behavior affect Exchange DAG deployments; more specifically should I reduce the Windows Disk Timeout on my Exchange Servers as per the new recommendations or leave things alone??
To answer this question I approached some of our ESE developers to get their thoughts… this is what came from that discussion…
- The Windows Disk Timeout value is mainly intended for event logging and I/O retry.
- Prior to Exchange Server 2010, Exchange did not take any action for slow I/O other than report it in the event log.
- Exchange Server 2010 RTM introduced pre-emptive page patching (clean page overwrite) for pages affected by slow I/O.
- Exchange Server 2010 SP1 is the first version of Exchange to include intelligence for dealing with hung I/O and will actively fail (bugcheck) the server if the hung I/O is affecting active databases on a DAG node.
I decided that before we could determine what to do with our disk timeout settings that first we must understand what intelligence Exchange Server 2010 SP1 introduced and how it might interact with disk timeouts.
Read my favorites blogs: