With automation getting an increasing focus in our stack and datacenter /cloud discussions, we figured it could be interesting to take a step back and look at some of the use cases for automation, which apply to any automation engine, but happen to be easier to create and maintain with the Microsoft automation stack Beyond words, this blog post series will feature sample Runbooks to showcase what we are really talking about – and some of these are being made available as part of a download on TechNet Gallery.
Before diving in, I guess it’s also important to set expectations : This post series should mainly benefit to those of you who are starting their automation journey and are wondering about use cases which may benefit their organization. If you’re a seasoned Orchestrator user, you will probably not learn a lot from this series, except maybe a few tips and tricks along the way.
The business case for automation
While not the main focus of this blog post, it might be worth mentioning some of the reasons why you would want to automate tasks. Who knows, your CIO might be asking you what the fuss is all about and what the benefits would be. So, in general, automation achieves the following:
– Integrate silo’ed environments and processes together, leading to more agility, service delivery performance and reliability
– Automate recurring manual tasks : This helps minimize costs, lets operations teams focus on more valuable – and sometimes more interesting! – tasks, and reduces error-prone manual activities
– Standardize and document processes : The combination of technology integration and manual tasks reduction helps standardize processes, and enhances service delivery predictability
Yes, I know, definitely a lot of big words in only three sentences…So let’s go through a few uses cases, to see which ones may resonate better in your specific situation.
And, remember : At the end of the day, the goal will not be to automate everything. If you look at a classic 80%/20% rule where a small recurring set of tasks tend to add the most churn or work, what you will want to do is identify a handful of items worth automating, and start from there. It then becomes a virtuous circle where the time you freed can be used to focus on get other things done…or automate more stuff !
What are the use cases we are going to cover in this series
Through this 5-part series, each post will cover a specific use case. A purely subjective categorization is below, as shown in the Orchestrator console.
Full table of content for this series follows:
- #1 : Alert Remediation, where automation is used to monitor specific situations, and react automatically. This is also the introduction post to the series (this post!)
- #2 : Maintenance tasks, where recurring tasks are being handled in a consistent and automated manner, triggered manually or when a specific condition is met
- #3 : Provisioning and Change Management Automation, where automation handles backend processing or user requests from a service catalog or any provisioning process worth automating in your context
- #4 : Cross-technology integration. Here automation can be used to integrate otherwise silo’d technologies, or help in better together and migration scenarios (integrate a monitoring solution to a manager of manager, or into a ticketing solution)
- #5 : “Miscellaneous” scenarios, like dynamic resource allocation, and new user onboarding as examples
Most Runbooks presented in this series are being made available as part of this download.
Note : These Runbooks should be mostly considered “design samples” as they are here to illustrate the use cases. More specifically:
- Runbooks which have been tested and sanitized specifically for use in another environment : “Alert Remediation” (Folder #1)
- Runbooks which used to run in my current demo environment or in a previous environment, meaning they should be fairly close to what could be implemented, activities properties included : “Service Catalog and Provisioning” (Folder #3), “Dynamic Resource Allocation” (Folder #5), “Line of Business and Others”
including the new user onboarding scenario (Folder #6)
- Runbooks which are mostly there to illustrate the use cases, and may not have all the activities configured yet : “Maintenance Tasks” (Folder #2), “Cross-Technology Integration” (Folder #4)
Use Case #1 : Alert Remediation
Today, this post #1 will be about the “Alert Remediation” use case, where automation is used to monitor specific situations, and react automatically. The logic being : “If someone will go through a predefined set of steps to try to solve the issue, and moreover if this happens a lot and consumes a fair amount of time or manpower, you might as well try to automate it”. Even if only a few steps of a decision tree can be automated before a human being looks at the data and takes an informed decision, automating might be worth it. Two use cases will be covered to illustrate this : A classic free space issue on application servers, and dealing with Active Directory machine authentication failures from a central location.
Scenario #1 : Free space issues on application servers
Note – This Runbook sample can be downloaded here
Let’s take an example, where managing disk space takes a lot of time on a specific set of application servers. When a disk is low on space, resolution steps might be well documented for the operations team (often in a document that is, well, ironically, sometimes called a “runbook”!).
Transposing such a process in an automation solution like Orchestrator is quite easy, and would look like this in the designer, as an Orchestrator Runbook:
Going into the basic of designing Runbooks is not the core of this post, but the different building blocks are called “activities”, and all the ones used here come either out of the box (“standard activities”) or as a download off the Microsoft website (to integrate with other System Center components for example).
In this example, disk free space is being monitored using System Center 2012 Operations Manager. Since Operations Manager is a central monitoring solution, the nice thing here is that Orchestrator is just polling the Operations Manager server and not every agents in the environment. The pattern to look for in Operations Manager alerts can easily be defined in the activity properties, and knowing which alert name to enter can be achieved by looking at an actual alert in Operations Manager.
The “Delete Files” activity would be reaching out to an affected servers, to delete specific files, with an optional “age filter”. In this Runbook, the path on the remote machine is found in a variable, but could be hardcoded or queried in an application configuration item in a CMDB…
Skipping a few activities – since there is a lot to cover in this post! – you can see that, when the Runbook is able to restore free space over the threshold, it can also close the original alert.
This step is actually optional in the case of Operations Manager, since it would auto-resolve it. One benefit of doing it is to add custom data in the alert properties, to provide background information for operations (these field could even be displayed in Operations Manager views – we’ll see more of that in the next example)
Finally, automation is also about an end to end process and bringing consistency (I really meant it during the introduction ). So assuming it cannot restore enough free disk space, the Runbook would notify the right team and open a ticket in the right ticketing system. Depending on the solution you use for ticketing and
The Runbook in action:
Assuming a new alert just came up…
…the Runbook waiting for this type of condition processes the new alert, while a new instance is being spawned to wait for future alerts…
When running, this Runbook goes through this branch…
…and then it resolves the alert
Scenario #2 : Active Directory machine authentication failures
To be fair, this second scenario – Active Directory machine authentication failures – might not be the top candidate I’ve seen for automation, but looking at what it would like as a Runbook brings a few interesting twists.
The overall Runbook could look something like this:
In a nutshell, the idea would be to monitor authentication failure alerts (event 5805 in the System log on a domain controller) and then execute a ”netdom” or “ntlest” command to reset the secure channel (a command your Active Directory administrators are likely already familiar with). The Runbook also checks is the machine is actually online and, if needed, tries to use the iLO integration pack to start it before trying to reset the secure channel. If any of these fail, an incident could be created.
A few notes and tips/tricks along the way:
1. The “Classify Alert” is actually an Operations Manager “Update Alert” activity. It does not play any role in the remediation itself, its goal is just to populate custom fields in the Operations Manager alert.
That way, views in Operations Manager could be used to categorize and delegate access to these alerts (e.g. “listing all open alerts from the last 24 hours, with customfield2 set to “AD” and customfield3 set to “AUTH”). This is a common feature request for Operations Manager, and Orchestrator does a great job in helping to achieve this.
2. PowerShell can be used to parse the output of the previous “ping” command, and pass the right information to the iLO connection. While PowerShell is not mandatory for these activity (you could use a combination of data manipulation activities built into the standard activities or available as community integration packs, PowerShell usually provides a nice way to achieve these items in a single activity with only a couple of script lines. Plus, as you saw, PowerShell is key pillar of our automation story moving forward with Server Management Automation (SMA) moving forward, so I would recommend using PowerShell activities as much as you can (More information on SMA itself can be found in this blog series from my peers Charles and Jim).
3. Just in case you are wondering, yes you can achieve a similar scenario for other hardware than HP. For Dell and IBM hardware, you could use command lines provided by these hardware vendors, and Cisco also provides an integration pack.
4. By the way, regarding the scenario itself, the specific alert to look for was monitored out of the box in previous versions of the Active Directory management pack for Operations Manager, but the rule has been deprecated and changed to a report collection rule. But you could easily add back a custom rule to look at events 5805 if you want to achieve this.
Thanks for reading this post, I hope you found it useful! Next time, post #2 will cover how automation can help with maintenance tasks, triggered manually or when an external condition is met. Specific examples I will cover are “advanced” patching (executing pre-flight and post-patching checks, restarting servers in the right order,…) and SQL Server maintenance tasks.