Evaluating the "Reach" of Our Opalis Infrastructure at Microsoft

Hello readers. This is my first technical post of hopefully many on the topic of all things Opalis and Orchestrator!  This product keeps me up at night (in a good way Smile) so I should have plenty of interesting content as time goes on. 

Challenge and Initial Questions

So considering our vast collection of systems here at Microsoft that we support across the world, it was clear that some analysis needed to be done on what our initial Opalis 6.3.3 environment needed to look like to support our diverse environment. With various sites and bandwidths, some with latent links, we needed a way to determine how far reaching our Opalis Action Server could reach into our infrastructure and perform certain actions.  The goal was to leverage a single Management Server cohosting the SQL DB for Opalis, and a single / separate Action Server to manage our policy execution. 

Answering the Challenge

So what better way to answer the question of Reach than to use a workflow within Opalis to evaluate scenario-based tests.

Main Orchestration Workflow

image

The main orchestration workflow (shown above) breaks out into the following components

  • Scheduler: We setup a schedule for our workflow to fire every 4 hours.  Having a schedule applied to our workflow provided us the ability to run this automatically, on a scheduled basis, collecting historical data that could be correlated later on for more interesting trend analysis.
  • Table Creation: This task creates a status table (if it doesn’t exist) to hold the reach data that we are gathering as part of this analysis workflow
  • Get Computers: This activity is reading in an array of computer systems for processing by pulling in a list of systems from a text file sitting on a share.
  • Get Ping and Service Status: This activity is triggering the sub workflow for gathering our analysis data as well as logging that information in the status table we created in activity 2 above.

Ping and Service State Sub Workflow

image

Now to break out the sub workflow components (shown above).  This is where the heavy lifting happens!

  • Initiate Worker: This activity is a custom start object that holds a computer name from the list of computers gathered in the main workflow above.
  • Get Ping Data: This activity is a PowerShell script that initiates a ping and stores the results as well as the latency measured during that ping for the host that we are evaluating.
    • If the ping fails, it goes directly to Log Data into Status Table and then moves to the next system
    • If the ping is successful, it moves to Get Service Status.  
  • Get Service Status: This activity is using the computer named pulled from the initiate activity above and checks service stated on a predefined service we are interested in.  In our case (SMS Agent Host) Smile.  Status of the service state is logged into the Log Data into Status Table activity.
  • Log Data into Status Table: This activity essentially takes the computername, status of the ping (success/failure), latency data and service state and inserts them into an entry into the status table for this machine.
  • Update Variance info: This activity takes the data for the previous run of this workflow for a particular computer analyzed, and analyzes the variance (+ / – ) from the last time it was run for latency data.  Essentially this tells you where the ping latency was higher or lower from the last run potentially giving you an idea of trends  for your network connectivity.

Results

So what do we get with all of this?  We get a table.  However, that table contains historical data that can be analyzed over time for trends, success / failure of activities, potentially to be leveraged for decisions regarding how far your reach can be within your organization for Opalis Action Servers. 

Example Data

image

For us, it showed we had quite a bit of reach from our Opalis Action Server, even over high latency.  The fine print on this is that “your mileage may vary” and likely will depending upon the health of your network and what you are attempting to do over the links at the end of your network from your core Opalis Action Server.  The above scenario that I walked through can certainly be modified by grabbing the attachment provided and updated according to your own needs.  Take out service check and add file copy, or add a file copy, or event log combing, etc.  The rest is up to you.

Note: A huge thank you goes out to Benjamin Reynolds (our local SQL guru within the MPSD Platform and Service Delivery teams) for helping me with the variance data query provided in this workflow. A final obligatory note – use at your own risk and support and only after testing in your environment – and have fun building automation!

Download Workflow Here ReachFiles.zip

A final note: The workflow that is attached in the above download has the logging turned on for the purposes of showing logging information during execution. If you decided to implement this into production, it is best practice to remove these options to avoid the excessive logging that is possible with the frequency of run and number of servers you may run this against. If you leave these settings as is, the sub workflow (Ping and Service Check) will eventually lock up when viewed in the OIS client due to logs being populated at the bottom of the designer.

image