SQL Server 2008 Storage

Here are some notes on “SQL Server 2008 Storage” I took while attending an advanced class on SQL Server taught by Greg Low (from http://sqlblog.com/blogs/greg_low/ and http://www.sqldownunder.com/).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Disk history and basics

  • Stone, paper, magnetic tape, magnetic disk 🙂
  • Amount of store increasing rapidly
  • Magnetic disk – Non volatile, random address, re-writable
  • Discussion - Reading data later – how far ahead can you guess (10 years?), especially removable media
  • Functions – Seek to, Random from, Write to, atomic access to single block (512 bytes)
  • Physical geometry obscured with LBAs – Reality of the geometry could be completely different
  • Discussion - For how long will we have mechanical drives?

Disk structure

  • Structure of disks - platters, head, tracks, cylinders, sector, density changes, more data on outer tracks
  • Discussion – Small drives or large drives
  • Spindles – Seek to track or stay in the track, rotating spindles, actuators
  • Discussion – Drive reliability, MTBF
  • Removable disks – Disk packs. Spin up/load heads/seek/read/write. Alignment packs.
  • Discussion – Engineering of old style removable disk.
  • Fixed disk – Winchester. Heads land and take off, like a plane. Radial connections.

Disks details

  • Media – magnetic (disks/tapes), semiconductor (flash/RAM), optical (CD/DVD/BD), hybrid, holographic?
  • Discussion – Trust the media? Keeping track of many tapes over time. How many copies?
  • Discussion – Magnetic physical limitations. RAMSAN, FusionIO, SSD. Video = array of 24 SSD drives.
  • Semiconductor limitations – SLC vs. MLC, wear leveling, block erases
  • Magnetics x Semiconductor – Fast seeks and low power vs. magnetic resilience
  • How/why disks fail – electrical/mechanical domains, wearing out, trying forever, data corruption, ECC
  • “Not quite broken” – transitory errors, connection bus, vibration, heat, RFI, static
  • Discussion - desktop drives x enterprise drives, error rates
  • Disk terminology – tracks per inch, bits per inch, seek latency, access time, response time, IOPs, bandwidth
  • IOPS – I/Os per second - max IOPS = bandwidth/IO size. IOPs per spindle. Controller capacity.
  • Latency – How long does it take? – IOPs/Bandwith vs. Latency – RPM vs. Latency – 7.2K vs 10K vs 15K
  • Discussion – How is “average seek” time calculated?

Disk topics

  • Real-time – What is real time? SAN QOS, SSD, SQL snapshot isolation, Resource Governor
  • Power – Lower power, lower heat, 15K disk vs. SSD, power is a significant part of the overall cost
  • How much data – How many copies of the data do your keep? Online, backup, compliance, archive?
  • Backup options – D2T, D2D, D2D2T, D2D2D, etc...
  • Future: Storage Utility? Compute Utility? Services for data access, validity, recovery. Cloud?
  • Discussion – Pods. Datacenters in containers. 3,500 servers/pod. 3 connections: power, network, cooling

Accessing data

  • LBA or LBN is disk address, not head/cylinder/sector anymore
  • It’s all a big bunch of sectors and blocks (512 bytes)
  • Initators (find and mounts drives/LUNs) and targets
  • Disk ordering of commands (fifo, seek time, latency, size of stream, direction)

Drive options

  • Parallel attachment – IDE, ATA, PATA, SCSI
  • Serial attachment – FC, SATA, SAS, Ethernet
  • Classes of disks – Desktop (SATA, 7.2K RPM), Enterprise (SAS/FC, 15K RPM)
  • Size and density of disks – 3.5” (standard) and 2.5” (small form factor, SFF)
  • Duty cycle
  • Serial interfaces (SATA/SAS/FC)
  • Channel Performance (USB/IDE/SATA/FC), Network Performance (10M/100M/1G/10G)
  • RAID – Software, Direct-Attach RAID, External RAID (SAN)
  • SSD –  Low (no) latency, low (no) seek time, high read IOPs, good bandwidth per drive
  • SSD – More expensive, lower capacity, higher cost,


  • What is an IO? What is the path from the App to the physical disk?
  • SQL Server – SQL Server protocol, query processor, SQL OS, storage engine, buffer manager, buffer pool, OS
  • “Hardware” I/O – Driver, HBA controller, buffer, FC switch, “SAN OS”, controller, cache, backend switch, disk
  • I/O request lifecycle – Should all happen in milliseconds. Better if can be done via buffer/cache.
  • I/O Latency – Optimal – Log: <5ms (ideally <1ms), OLTP: <20ms (ideally <10ms), OLAP: <30ms
  • I/O Latency – Depends on technology, should be consistent


  • Retains information on power removal. Disk, tape, optical.
  • Assumed to be reliable, fair amount of solution cost, regulatory requirements to keep data
  • Storage: Initiator, Target, LUN

SQL Server storage

  • 3 basic requirements: stable media, write ordering, torn I/O prevention
  • Applied also to any storage replication scheme – sync or async
  • The idea is to avoid data corruption and/or loss of transactional consistent

Disk structures

  • Platters, heads, tracks and sectors – typical sector size is 512 bytes
  • SQL Server uses 4KB sectors
  • Disk sector – Preamble, data sync, user data (512 bytes), ECC (40 bits), gap
  • Data is stored in sectors, accessed by LBN or block address
  • For SQL, should always have some sort of RAID construct for fault tolerance
  • NTFS – Default 4KB cluster size
  • Consider using 64KB cluster - aligns with SQL extents, improves read-ahead, reduces split I/O
  • Disable NTFS compression
  • GPT vs. MBR – MBR is the old partitioning. GPT is newer and allows volumes >2TB.

Sector alignment

SQL Server I/O

  • Write-ahead logging (WAL) - Key to make it ACID
  • All log records must go to stable media BEFORE data pages does.
  • If these are reordered and stopped in the middle, bad things can happen
  • Stable media – must be able to survive failure. Cache is OK, if battery backed.
  • Multiple channels, Load balancing, Mirroring, Remote Mirroring – Also need to guarantee order
  • Torn I/O – two bits on each of the 512 sectors on the 8K page
  • Page checksum – checksum for the entire page, replaces Torn I/O if enabled
  • Force unit access – CreateFile with FILE_FLAG_WRITEHTROUGH flag
  • Hardware read cache – In controller, reads more data than requested. Good for OLAP, reporting.
  • Latching – Data synchronization, reader/writer “lock” in SQL Server when writing page


  • Scatter-Gather I/O: Instead of doing individual I/Os, try to do multiple. For instance: entire extents.
  • Lazy Writer: On memory pressure, free pages. LRU stays. Up to 256 pages.
  • Checkpoint: Flush dirty pages in buffer pool to stable media. Up to 16 pages.
  • Eager Write: Minimally-logged, goes to disk without going to the buffer pool
  • Read-ahead: Read multiple pages ahead. Up to 128 pages (Enterprise Edition, up to 1024).
  • Enterprise Edition also can prime the cache on start.
  • Log manager: Writes log files, sequential. 8 pages (32 in 64-bit) and 480KB 32 outstanding log writes.
  • Orderly shutdown: Will checkpoint all databases, close out internal structures.
  • Don’t kill SQL Server process. You might need to increase shutdown time
  • See http://support.microsoft.com/kb/146092


  • Careful - Stale reads, lost writes, filter drivers, online file copy, drive fragmentation
  • Error 823 – Error, some data returned - Usually, something close to being broken (or broken)
  • Error 824 – Error, no data returned - Usually, something broken
  • Error 832 – Stale I/O or checksum issue
  • Error 833 – Hung I/O – nothing comes back (could be anti-virus, compression, network issue, etc)
  • Stale Read Protection – Holds some info on written pages, checked on read – Limited to ~320MB of data on 64-bit
  • Read Retry: Drives will try, OS will also try, SQL Server will retry on top of that - can mask issues
  • Page and log audit – trace flags to do checks
  • Idle server – Suspend/resume SQL Server – By default only on Express/Workgroup. Careful!

Scalable Shared Databases

Instant File Initialization

  • Both OS and FS must support IFI.
  • SQL Server account needs SE_MANAGE_VOLUME_NAME.
  • Used in data files, not logs.
  • Careful – Files will not be initialized.



  • Storage Area Network. Fibre Channel or iSCSI. Comprised of fabric and end points.
  • Terminology: HBA, array, frame, port, frames, PDU, fabrics, zoning, VSAN, masking, WWN, MPIO.
  • See http://www.snia.org/images/tutorial_docs/Networking/MKJibbe-Wilson_Fibre_Channel_Technologies-rev.pdf
  • Terminology: Write Atomicity, Restripe, Concatenation, Thin provisioning, LUN growth/shrink
  • “Magic SAN dust” – Scalability, Availability, Redundancy, Data Management/Mobility, Snaps, Close, Mirroring
  • Easy and flexible provisioning
  • Generally used with Failover Clustering
  • Careful – Shared storage performance – How to guarantee?
  • Careful – RAID and stripe size – Alignment still matters
  • Careful – LUN security
  • Discussion – SAN vs. DAS – Pros and cons of each…


  • Maintenance: Index rebuilds, updating statistics
  • Fragmentation: Only important if you’re doing scans. Watching for them is important.
  • Identify the few very large tables that need attention. Consider partitioning
  • Backup and Restore strategy


  • IOPS
  • Minimize Log write waits
  • Throughput vs. Latency
  • Reduce SQL locking
  • Add CPUs
  • Reduce I/O path

Multiple spindles

  • Capacity – Striping (RAID 0)
  • Availability – Mirroring (RAID1, RAID10), Parity (RAID5, RAID6)
  • Performance – Striping (RAID 0)
  • Tempdb in RAID0? SQL Server needs tempdb to be there all the time…
  • RAID5/RAID6 have a performance penalty


  • Determine capacity of I/O system prior to deployment
  • Tools: SQLIO, IOMeter, SQLIOSim (not for performance)
  • See http://sqlblog.com/blogs/kevin_kline/archive/2007/06/28/understanding-sqliosim-output.aspx
  • SQLIO: IOPS, MB/s, latency. One or many files. Single I/O type per run.
  • IOMeter: IOPS, MB/s, latency. Can combine many IO types per run.
  • IO sizes: SQL can use 8/64/128/256/1024KB. Random - look at IOPS. Sequential – look at MB/s.
  • Workloads: OLTP - heavy reads, writes during checkpoints, 60K sequential read/write for logs
  • Workloads: OLAP - sequential reads and writes, larger I/O sizes, more scans
  • Determining Saturation Point – IOPS constant, latency increasing
  • Careful – HBA can saturate before the SAN disks, consider multiple channels
  • Careful – HBA drivers/configuration, controller cache configuration, volume alignment
Comments (1)

  1. Anonymous says:

    Here are some notes on “SQL Server 2008 I/O Performance” I took while attending an advanced class on

Skip to main content