First, a quick introduction. My name is Boris Baryshnikov and I'm a program manager in SQL Server Database Engine group. As of today, Resource Governor is the most recent large project I worked on. Not surprisingly, I will talk about the Resource Governor here.
While Resource Governor spans multiple components of SQL Server Engine, it seems logical to discuss the feature in this blog as the resource management in general is so close to the scope of SQLOS.
Without further delay, let's get to point. As we have recently released a public CTP (Community Technology Preview) of SQL Server 2008 (aka SQL Server 2008 November CTP), which has Resource Governor functionality, and a number of presentations with Resource Governor demo on various conferences, I started getting a lot of questions about the demo itself which made me write this post.
We created this demo when Resource Governor was only a prototype to illustrate the concept, but probably its simplicity made it so attractive that it was included in a number of talks and demos by different people all over the world.
The Demo in a Nutshell
In the demo we will create two workload groups which share a common resource pool and one workload group which has its own dedicated resource pool. Each group gets a CPU intensive workload. Using the Resource Governor you will be able to affect CPU distribution between these competing workloads. The effects of the resource management are observed using performance counters specific to Resource Governor.
Step 1: Initial Demo Setup
On a newly installed server (i.e. no prior Resource Governor configuration) you may need to run these setup steps. I normally doing my demo on a dual core laptop and for the sake of simplicity I'm using a single CPU for SQL Server. To do so, I adjust CPU affinity mask as follows:
Using 1 CPU for SQL Server on a dual proc machine has an interesting side-effect: we normalize "CPU usage %" counter to number of CPUs on the box and thus, the values will hover around 50% as maximum and not 100% as you might expect. I will illustrate this below.
Of course you can use both CPUs but this complicates the demo. This is a subject for whole another post - what happens when you have multiple CPUs.
In addition to that for demo purposes I will set min/max server memory to a fixed value, since it will improve predictability of the demo on the laptop.
Step 2: Workload groups and Resource Pools
Now we will be setting up the following hierarchy of workload groups and resource pools:
Each corresponding workload group contains queries of the corresponding class or department (i.e. Marketing, Adhoc, and VP). Note that, Marketing and Adhoc queries share the same resource pool, while workload group VP has its own similarly named pool. The reason of such separation will become clear as we see how we adjust Resource Governor controls.
You will start building the above configuration in a bottom up manner (i.e. starting from pools and going up)
To do so, we execute the following T-SQL:
Step 3: Classification
Now, what you have just done is created hierarchy of the groups and pools, however, how does the server know about which query goes where? This is where classification comes in. The above picture becomes:
There is a couple of things:
- To do the classification you will need to create a user-defined function that will be executed for every new connection and it will place these new connections in the corresponding workload groups.
- How will we separate different connections? For demo purposes we will use 3 separate login names which we will check and use inside of the function
To implement the above 2 steps we will run the following:
Step 4: Are we there yet?
After all this work, can we start workloads and see what happens? The answer, as you have guessed by the question is - no. What's left? Again, a couple of steps:
- We need to tell Resource Governor to use the function that we just created
- Make all the changes effective
First step is done by
For the second step, let's compare output of catalog views with in-memory information (note difference in names of catalog views and dynamic management views (DMVs) which are prefixed with dm_:
Now transfer changes from metadata to memory by running the following statement. Also, do not confuse it with already existing RECONFIGURE command:
And rerun the above query on metadata and DMVs and you should see that new groups, pools and classifier function ID are present in corresponding DMVs.
Step 5: Running the workloads
The easiest way to simulate a CPU intensive workload is to run the following in a loop:
Also, instead of running this query from the Management Studio, consider saving it in a file and running from a command prompt by using a script similar to the below. Note that we are using 3 different user names to connect to the server.
To observe the effects of the load, add the following performance counters in the perfmon:
- We will monitor CPU usage per group in the 1st instance of perfmon; add "SQLServer:Workload Group Stats object", "CPU usage %" counter for "GroupMarketing", "GroupAdhoc" and "GroupVP" instances
- We will monitor CPU usage per pool 2nd instance of perfmon, add "SQLServer:Resource Pool Stats object", "CPU usage %" counter for "PoolMarketingAdhoc" and "GroupVP" instances
Before you start the next workload, observe the counters for pools and groups for a number of seconds, you should see approximately the following:
For groups (click on the image to open in a new window):
Few things to note:
- Remember I set affinity mask to 1? This is why the maximum on the figures is only 50% (it is for a single CPU while usage is normalized to all CPUs on the machine, 2 in my case)
- At point A, I started Marketing workload, CPU usage went to the maximum on the CPU for a single workload, 50% in this case
- At point B, I started VP workload, which, as you remember, belongs to a separate resource pool and it made Marketing group share half of the CPU with VP group. The same is true for the pools (because we have 1 to 1 match of active requests in groups to pools at this point)
- At point C, I started Adhoc workload, which has its own group but shares the pool with Marketing workload. What happened here on the groups is that all 3 of them are now sharing the CPU getting approximately 1/3 of it or roughly 17%. On the pool side, however, we can see that MarketingAdhoc pool which shows aggregate resource usage by all groups inside of it, has 2/3 or the CPU while PoolVP has only 1/3. This behavior is exactly what you get on SQL Server 2005. Resources are distributed as they are being requested and whoever needs more CPU simply gets it. This was one of the goals: to make behavior as close as possible to SQL Server 2005 when you do not use the Resource Governor or adjust any parameters of it. Note, however, by creating the groups and pools you are already slightly altering the behavior of SQL Server 2008, but more on this in the next few posts.
Step 6: Management Actions
Now we came to the point where we want to apply action to change the above picture. Specifically, we want our VP workload to proceed faster and thus, limit CPU usage by Marketing and Adhoc workloads to 50% of the CPU.
To do this, we alter the PoolMarketingAdhoc using the following syntax (remember, we created the pool using all default parameters):
Remember to make changes effective:
Now, let's look at the counters:
What you will see is what happened at point D: Pool usage by PoolMarketingAdhoc went back to half of the CPU while PoolVP took the remaining part of it. Note that, on group side GroupVP usage went up to use half of the CPU (it has a single workload) while groups Marketing and Adhoc divide another half equally (12.5% each).
Further step is to alter IMPORTANCE parameters of the Marketing and Adhoc groups. IMPORTANCE affects CPU distribution when two groups share the same pool. Value of importance has a numeric meaning for CPU bandwidth distribution. Ratio of Low:Medium:High IMPORTANCE is equal to 1:3:9.
To illustrate this, let's run the following:
When changes become effective corresponds to the point E on the above graphs. Note that distribution between pools does not change, amount of CPU available to GroupVP also does not change (there are no other groups in this pool), but GroupMarketing takes about 10 times of the bandwidth available to GroupAdhoc (remember the ratio of Low:High = 1:9, but together they stay within PoolMarketingAdhoc limits.
That's enough for the first post. In the upcoming post I will try to go over the cases when the demo does not seem to work or shows "unusual" behavior and possible explanations of it.