SQL Server 2008 DR, Testing, Waits and Queues

Here are some notes on “SQL Server 2008 DR, Testing, Waits” I took while attending an advanced class on SQL Server taught by Ron Talmage (from https://www.solidq.com/na/MentorDetail.aspx?Id=38).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


Disaster Recovery

  • RTO (Recovery Time Objective), RPO (Recovery Point Objective), RTA (Recovery Time Actual)
  • Backup Media (disk/tape) and Location (local, remote, offsite)
  • Recovery Technology (Backup/Restore, Mirroring, Log Shipping, Clustering, Replication)

DR and HA

  • What’s the difference between HA and DR?
  • How would you prepare for a disaster?
  • What happens if you lose the entire data center?
  • Primary data center, standby data center.
  • Consider risks, cost, time to invest.
  • Big question: What are you trying to protect against?

Class discussion

Technologies

Benchmarking and baselining

  • Load-testing, benchmarking and baselining
  • Meet performance requirements
  • Verify correctness
  • Determine system limits
  • Determine impact of changes

Class Discussion

  • Baseline - Ground zero, known state, Point of comparison
  • Benchmark - Measurement under specified conditions, goal, used to estabilish baseline
  • TPC - https://www.tpc.org/,  TPC-E for OLTP, TPC-H for DW/OLAP. Reports trans/sec and response time
  • Load test - Application of a load, looking for a sufficient or expected load
  • Stress test - Increase load. Done with concrete in construction.
  • Saturation test - Add load until it fails.
  • Endurance test – With specified load for a long time

White Paper

Performance counters

Tools

Waits and Queues

From the white paper, regarding OLTP workloads

  • Database Design
    • High Frequency queries having a high number of table joins (>4)
    • Frequently updated tables having # indexes (>3)
    • Big IOs - Table Scans, Range Scans (>1)
    • Unused Indexes
  • CPU
    • Signal Waits (>25%)
    • Plan reuse (<90%)
    • Parallelism: Cxpacket waits (>5%)
  • Memory
    • Page life expectancy (<300 sec)
    • Page life expectancy (Drops by 50%)
    • Memory Grants Pending (>1)
    • SQL cache hit ratio (<90%)
  • Disk
    • Average Disk sec/read (>20 ms) – Comment: this should be lower (>5, >8)
    • Average Disk sec/write (>20 ms) -- Comment: this should be lower (>5, >8)
    • Big IOs - Table Scans, Range Scans (>1)
    • ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR, WRITELOG, PAGEIOLATCH_x (in top 2)
    • Low bytes per sec
  • Blocking
    • Block percentage (>2%)
    • Block process report (30 sec)
    • Average Row Lock Waits (>100ms)
    • LCK_M_% (in top 2)
    • 5, High number of deadlocks (>5 per hour)
  • Network
    • High network latency + many round trips to DB (Output queue length >2)
    • Network bandwidth used up (Packets Outbound Discarded, Packets Outbound Errors, Packets Received Discarded, Packets Received Errors)

Troubleshooting Performance Problems in SQL Server 2005
https://download.microsoft.com/download/1/3/4/134644fd-05ad-4ee8-8b5a-0aed1c18a31e/TShootPerfProbs.doc

Diagnosing Transaction Log Performance Issues and Limits of the Log Manager
https://sqlcat.com/technicalnotes/archive/2008/12/09/diagnosing-transaction-log-performance-issues-and-limits-of-the-log-manager.aspx