power management solution in Windows HPC Server 2008 R2 (SP1) Monitoring Management Pack


S

ummary

As Windows HPC Server 2008 R2 (SP1) Green IT offering, we have enabled the power management solution in Windows HPC Server 2008 R2 (SP1) Monitoring Management Pack with two configurable rules: “Calendar-based Power Management Rule” and “Consumption-based Power Management Rule”.

· Calendar-based Power Management Rule

With calendar-based power management rule, you are able to define a certain time period in a day when you want a specified portion of the compute nodes going hibernated to save the power; and also you can define this policy only applies for certain days in a week.

· Consumption-based Power Management Rule

With consumption-based power management rule, we are able to evaluate the cluster utilization over a time period with the number of queued jobs and make the decision on whether we should hibernate a portion of the compute nodes to save the power.

We have defined three levels of cluster capability and each time when the hibernate condition is met, the cluster will change from current capability level to a lower level; and on the other hand, when the wake up condition is met, the cluster will change from current capacity level to a higher level.

Configure the Rules

Both of the above rules are disabled by default, admin is able to enable them and configure them easily after importing Windows HPC Server 2008 R2 Monitoring Management Pack into SCOM server.

Open “System Center Operations Manager”, go to “Authoring” wunderbar, select “Rules”, look for keywords “Power Management”, then you can find these two rules listed as following.

 

Here lists a set of important configurations you are able to override for the two rules and their default value:

Calendar-based Rule

Parameter Name

Default Value

Notes

Enabled

False

The rule is disabled by default

Start Time

0:00

The time each day when power-saving mode for compute node starts.

End Time

6:00

The time each day when power-saving mode for compute node ends.

Exclude Days

A list of days each week when compute nodes are excluded from entering power-saving mode. The “exclude days” format is like: “Saturday, Sunday”

Power On Percentage

70

The percentage of compute nodes that will remain power on during the power-saving mode

Consumption-based Rule

Parameter Name

Default Value

Notes

Enabled

False

The rule is disabled by default

HighCapacityLevel

100

The percentage of high compute node capacity definition

MediumCapacityLevel

80

The percentage of medium compute node capacity definition

LowCapacityLevel

60

The percentage of low compute node capacity definition

UpperQueueLength

5

The length of the job queue above which the rule can cause the compute nodes to reach a higher capacity level

LowerQueueLength

1

The length of the job queue below which the rule can cause the compute nodes to reach a lower capacity level

LowConsumption

40

The compute node consumption percentage below which the rule can cause the compute nodes to reach a lower capacity level

Number of Samples

6

The number of samples to identify the LowConsumption which can push the compute nodes to enter a lower capacity level, the sampling interval is following “interval seconds”

Interval Seconds

300

The sampling interval, default is 300 seconds.

Power Saving Evaluation

To evaluate the Power saving efficiency and the impact on the job throughput, we conducted the following experiment:

(1) Setup an HPC cluster with 1 Head node, 1 broker node and 4 compute nodes.

(2) Setup the job submission simulation in one typical working day as following:

Also the job length is distributed as following:

(3) Compare the power saving efficiency and also the impact to job throughput for following three sceanrios:

a. Disable the power management rules;

b. Enable only the calendar-based power management rule;

c. Enable only the consumption-based power management rule.

Note:

We adjusted a little bit the configurations for both rules in the experiment:

· Calendar-based rule:

o Set StartTime to 22:00, EndTime to 7:00, PowerOnPercentage to 60%.

· Consumption-based rule:

o Set UpperQueueLength to 2.

Here are the experimental results:

· Power saving efficiency.

We use the # of hibernated nodes multiply the period of time to evaluate the power saving efficiency. By applying calendar-based rule, there are 2 nodes hibernated from 22:00 to 7:00, while applying consumption-based rule, 2 nodes get hibernated from 21:00 to 9:00. Both rules have saved some power for the cluster, while consumption-based rule worked better than calendar-based rule.

 

· Utilization on available cores

Consumption-based power management rule has achieved the highest utilization on available cores (49.1%), followed by calendar-based rule (47.1%) and no rule enabled (42.7%).

· Impact to job throughput

Job throughput measures the average number of completed jobs per hour for a day, and the throughput is the same for all three scenarios.

· Impact to job turnaround

Job turnaround measures how much time a job needs to wait compared to how much time it runs. Job turnaround increases a little bit for consumption-based rule (from 0.436 to 0.437), but very minimum; it remains the same for calendar-based rule and no rule enabled scenario.

Conclusion

Based on above evaluation, the power management rules are able to save the power effectively for the cluster without bringing noticeable impact on job throughput and job turnaround. It is able to help you achieve the Green IT goal for your cluster. J