Activation Filter - New Options in Beta 2

Activation Filters are used to provide the administrator of a Windows HPC cluster with a mechanism to verify and modify jobs before they are run. For example, a Windows HPC Server 2008 Activation Filter receives the job xml which might contain license requests. The filter could parse the xml, find the license requests and determine if the licenses were available. Assuming licenses were available, the filter would return a value of 0 to inform the scheduler to the run the job. A return of 1 (or any other value) would inform the scheduler that it should keep the resources allocated for this job and call the Activation Filter on each scheduling pass until the licenses are available. Unfortunately, holding the resources blocks other jobs that could use those other resources and don’t need the licenses. That is the behavior of a First Come First Served scheduling algorithm, but it isn’t necessarily what the administrator wants to do in every circumstance.

With Windows HPC Server 2008 R2 beta 2, we are introducing additional return codes from the Activation Filter and a hold-until property to jobs to provide additional flexibility. The return codes in beta 2 are :

0 – run the job (same as Windows HPC Server 2008)

1 – do not run the job. Keep the resources and don’t run anything else until this job starts (same as Windows HPC Server 2008)

2 – do not run the job. Keep the resources allocated for this job, but other jobs may be started on other resources.

3 – do not run the job and do not reserve any resources. The job is put on “hold” for a period of time and the scheduler will not attempt to be schedule the job again until the hold time has passed.

4 – fail the job

 

Option 2 is relatively straightforward. Like return code 1, the job is not run but in this case other jobs can be run if other resources are available.

Option 3 is a new feature we think will be valuable. It allows the job to be held for some amount of time. No resources are reserved and the scheduler will not attempt to schedule the job until the hold-until time has passed. There is a default cluster wide value for the hold interval. The default is 15 minutes, but that can be changed by the cluster administrator. The scheduler adds the default hold interval to the current time to get the hold-until time which is then set on the job. The hold-until time can also be set to a specific value by the Activation Filter by using the API. The administrator can even set a hold-until time on a job manually by using the Command Line Interface.

Option 4 allows the template to fail the job without running it.

These new options and the hold feature provide more flexibility for cluster administrators and more options for license allocation.