SQL Server 2008 SQLOS

Article
03/30/2009

Here are some notes on “SQL Server 2008 SQLOS” I took while attending an advanced class on SQL Server taught by Gert Drapers (from https://dbproj.com/ and https://blogs.msdn.com/gertd/).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Processes

Processes (context, access token, virtual address space descriptors, handle table)
SQL is a user mode process
Threads (unit of execution, access token, shares process address space, can synchronize)
Threads can have the same access token as process, or another one (impersonation)
Every process starts with one thread - “main”
Context switches
Fibers (lightweight thread, need a thread to schedule fibers, fibers share same thread context)

Scheduling

Pre-emptive scheduler (Windows decides when and how long you run)
See https://windowsitpro.com/Articles/Index.cfm?IssueID=22&ArticleID=302
Windows Internals Book
See https://technet.microsoft.com/en-us/sysinternals/bb963901.aspx

Hardware

Defining: Processor, CPU, Socket, Core, Logical Processor, NUMA, NUMA node
Example: Notebook, dual-core, hyper-threaded = 2 cores, 4 logical processor
Windows has 64 threads of execution in Windows Server 2008 – R2 will change that.
NUMA node – group of logical processors and cache that are “near each other”
SMP – Symmetric Multi-Processing, Front Bus to access memory, contention
SMP layout – CPU, Northbridge (memory, graphics), Shouthbridge (PCI)

NUMA

NUMA = Non-Uniform Memory Access / Architecture, ccNUMA
NUMA nodes – contain own RAM, CPUs, memory controller, talk to other nodes
Goal: minimize/eliminate front-bus contention
Performance penalty to access memory in other nodes (foreign node)
Applications need to be NUMA-aware to take advantage
Soft-NUMA – try out how things behave – only performs on NUMA hardware
Support for NUMA – BIOS setting: Node interleaving disabled = NUMA
How does the OS knows? ACPI and SRAT – Static Resource Affinity Table
If Multiple nodes > 1 – You have NUMA, Number of procs – actual sockets populated
Thread scheduling with NUMA (how “ideal processor” is chosen)
Important for the app to know the nodes and available memory on each
https://www.microsoft.com/whdc/archive/numa_isv.mspx

Windows Server 2008 and NUMA

Optimizations for NUMA, like dividing kernel’s non-paged memory across nodes
Hardware has evolved and the OS added constructs to help apps
See https://msdn.microsoft.com/en-us/library/aa363804.aspx

Windows Server 2008 R2

Support for greater than 64 logical processors
See https://code.msdn.microsoft.com/64plusLP
Groups of CPUs, backward compatible with 64-bit affinity
New APIs to use groups, set the affinity mask and use more than 64 LPs
Nehalem System Architecture – See Intel QuickPath Architecture
See https://www.intel.com/technology/quickpath/demo/demo.htm
Example: HP – 64 dual-core Itanium
Example: Unisys – 32 dual core, hyper-threaded Xeon

SQL Server Scheduling

SQL Server 6.x scheduling – multi-threaded – worker thread pool – OS pre-empts threads
SQL Server 7/2000 – User mode schedulers, 1 per LP – UMS work queues – fewer content switches
- Users sessions assigned to UMS scheduler, round robin, no re-balancing
- Bulk Insert is single threaded, not in parallel within the same scheduler
- Memory: memtoleave, threads, bpool – simple calculations to divide the memory
SQL Server 2005/2008 – SQLOS

SQLOS

User mode layer between SQL Server and the Windows OS.
Thread mode or fiber mode. Default is thread.
See https://msdn.microsoft.com/en-us/library/ms189248.aspx
Scheduling/memory/buffer pool/hosting/exceptions/locks/latching
SQLOS Diagram – Protocols/Query Compilation and Execution Engine/Storage Engine/SQLOS
SQLCLR, Reporting Services are hosted process in SQL OS
NUMA – SQLOS directly maps to the hardware architecture, understands NUMA
NUMA – SQLOS (only one) / SOS_MemoryNode / SOS_CPUNode / SOS_Scheduler / SOS_Task
NUMA – Performance counters on per-node basis
SMP – Only one SOS_MemoryNode
See https://blogs.msdn.com/slavao/archive/2005/02/05/367816.aspx
See https://blogs.msdn.com/slavao/articles/441058.aspx
See https://www.redbooks.ibm.com/redpapers/pdfs/redp4093.pdf
Hot-add CPU? Depends – Windows Server 2003 or 2008, SQL Server 2005 or 2008
I/O Port – Per SOS_CPUNode
See https://blogs.msdn.com/slavao/archive/2006/04/12/575185.aspx

SQLOS Architecture

SOS_OS – Singleton – Manages nodes, system level information
SOS_Node – 1 per NUMA node (or 1 in SMP) – Memory node, Scheduler Management
SOS_Scheduler – CPU abstraction, binds tasks and worker threads, manages affinity
Affinity. See https://msdn.microsoft.com/en-us/library/ms187104.aspx
SOS_Tasks – Executes request from user, executes on a worker, abortable, timer tasks

Scheduler

Queue : Runnable / Pending / Timer / IO
Worker Pool, Current Work, Idle Worker, Wake Event, Abort Tasks List
States – New, Pending, Runnable, Running, Suspended, Preemptive, Done, Monitor
Non-preemptive – task runs until – yield, wait on sync. object, quantum expires
Preemptive – threads not in SQLOS control – external code – wait stat skewed
I/O completion – Worker on I/O completion port
See https://msdn.microsoft.com/en-us/library/aa365198.aspx
Scheduler Monitor – Preemptive/long running tasks, non-yielding, deadlock, maintenance
Dedicated Admin Connection – DAC – Only one, reserved memory at startup, own scheduler
DAC: Connection using sqlcmd.exe –a or server name = admin:servername
Backup– also has its scheduler
Ring buffer – Key construct for communication between multiple producers/consumers

DMVs

SELECT * FROM sys.dm_os_sys_info
SELECT * FROM sys.dm_os_schedulers
SELECT * FROM sys.dm_os_workers
SELECT * FROM sys.dm_os_tasks
SELECT * FROM sys.dm_os_waiting_tasks
SELECT * FROM sys.dm_os_wait_stats
- Look at wait_time_ms – signal_wait_time_ms, not just wait_time_ms
- Clear waits with dbcc sqlperf('sys.dm_os_wait_stats',clear)
- More detailed wait stats – In SQL Server 2008 with XEvents
SELECT * FROM sys.dm_os_threads
SELECT * FROM sys.dm_io_pending_io_requests
See https://msdn.microsoft.com/en-us/library/ms176083.aspx

OS Memory Management

Windows provide virtual memory services (except for AWE and PAE)
32-bit gives you up to 4GB – 64-bit, in theory, up to 16 EB
Three buckets: committed memory, reserved memory (not physically used) and free memory
Committed: working set (RAM), paged out (pagefile) and mapped (shared components, DLLs)
If you try to access reserved or free memory: exception

Memory Concepts

Allocation: Reserve with VirtualAlloc() in 64KB chunks, allocation in 4KB/8KB chunks
See https://msdn.microsoft.com/en-us/library/aa366887(VS.85).aspx
Memory Limits for Windows - See https://msdn.microsoft.com/en-us/library/aa366778(VS.85).aspx
/3GB – Instead of a 2/2 split for the 4GB, uses a 3/1 split – more pressure on kernel memory
/PAE – Access to memory between 4GB and 64GB – 36 bits to address memory
AWE – How an application can reach beyond the 4GB limit, when using /PAE
See https://msdn.microsoft.com/en-us/library/aa366796(VS.85).aspx
Case: Comparing 32bit, 32-bit (AWE), 32-bit (/3GB), 32-bit (AWE/3GB) and 64-bit
WOW – Running 32-bit applications on 64-bit Windows
See https://msdn.microsoft.com/en-us/library/aa384209(VS.85).aspx
64-bit – 16TB total, 8TB for kernel, 8TB application.

SQL Server Memory

32-bit: MemToLeave, thread stacks, page cache, plan cache, query workspace, locks, SQL Server, OS
64-bit: No MemToLeave, more thread stacks space
Lock pages in memory – Only EE – See https://support.microsoft.com/kb/918483
See https://blogs.msdn.com/psssql/archive/2007/10/18/do-i-have-to-assign-the-lock-privilege-for-local-system.aspx
Large pages in 64-bit – Only EE – T834 – See https://support.microsoft.com/kb/920093
Case: Running 64-bit, EE and 16 GB – By default, you are not running with large pages
Careful – Large pages can lead to increased memory fragmentation, set SQL memory to a fixed size
See https://msdn.microsoft.com/en-us/library/ms143432.aspx
Calculating SQL Server max memory
See https://msdn.microsoft.com/en-us/library/ms178067.aspx
Configuring Windows pagefile – Minimum size in x64 is 8GB. Physical memory + 4GB
SQL Server will swap to page file if – Not 64-bit, Not using EE, not setting “lock pages in memory”
See https://support.microsoft.com/kb/889654

SQL Server Memory Management

Memory Cycle: Memory broker / resource monitor / cache / heap / memory allocation
Components: SQLSO / Memory Node / SOS_Node / Memory Clerk / Memory Object
Components: Buffer Pool / Resource Monitor / Caching Framework / Memory Pools and Brokers
Allocators: Single Page / Multi-Page / Large Page / …
See https://support.microsoft.com/kb/907877
Buffer pool: shared, every node has a piece of it.
Buffer pool: reserved up front, committed/mapped on demand. Has a clerk, but not an allocator
Stolen pages: piece of buffer pool “borrowed” for other purposes, like procedure cache
Memory clerk: The actual page allocator, the one that actually gives you the memory
Allows us to know where the memory is going, caches are memory clerks too
Memory objects: memory allocators for arbitrary size
Caching framework – common way to build caches
Cache Store – User Store – Clock Algorithm – Clock Hands – Clock Entry Info

DMVs

DBCC MEMORYSTATUS – A good starting point – Overview on a per node and per clerk details
See https://support.microsoft.com/kb/271624
sys.dm_os_memory_clerks – multiple types, per node – By default not fully populated
See https://technet.microsoft.com/en-us/library/ms175019.aspx
sys.dm_os_memory_objects
See https://msdn.microsoft.com/en-us/library/ms179875.aspx
sys.dm_os_memory_pools
See https://msdn.microsoft.com/en-us/library/ms175022.aspx
sys.dm_os_memory_cache%
sys.dm_os_ring_buffers – Documented via KB articles
See https://support.microsoft.com/kb/920093

Resource Monitor

Monitors a set of processes and indicators, using a simple state machine
Also Important when you change things like max memory, affinity mask, etc.
Runs per node, on its own hidden scheduler, non-preemptive.
Sends notifications to clerks – “Hey! We’re running low on resources!”
Monitors: Low physical resources / Low Virtual Address Space / High physical resources
Reactive or Proactive: Memory node fails to allocate 4MB / Resource monitor itself tries to allocate 4MB
Internal pressure: Example: SHRINK notification from broker / Cache over 75% of target
External pressure: Example: Signaled by OS via events – LowMemoryCondition
See https://msdn.microsoft.com/en-us/library/aa490194.aspx
Careful – Trend to consolidate multiple instances of SQL on a single, large box
See https://blogs.msdn.com/slavao/archive/2005/02/19/376714.aspx
Memory pressure:
See https://blogs.msdn.com/sqlprogrammability/archive/2007/01/16/9-0-memory-pressure-limits.aspx

Memory Broker

Dynamic memory distribution: buffer pool / optimizer / query execution / caches
Always running, tunes memory consumption
DMV: sys.dm_os_memory_brokers
See https://msdn.microsoft.com/en-us/library/bb522548.aspx

Books of interest

SQL Server 2005 Practical Troubleshooting
Ken Henderson & others
https://www.amazon.com/SQL-Server-2005-Practical-Troubleshooting/dp/0321447743
SQL Server 2008 Internals
Kalen Delaney & others
https://www.microsoft.com/learning/en/us/Books/12967.aspx

Bonus topic: Extended Events

Captures event information, like SQL Server Profiler, but more lightweight on capture
Managed using CREATE/ALTER/DROP EVENT SESSION
Very flexible way to determine which events are capture.
Can save to another database, file., later use ETW tools to view, correlate with Windows events
See https://msdn.microsoft.com/en-us/library/bb630319.aspx
See https://www.microsoft.com/whdc/DevTools/tools/EventTracing.mspx

SQL Server 2008 SQLOS

Additional resources