Monitoring Azure applications - Part 1

Article
08/19/2011

The “Monitoring Azure applications” series:

I've been working a lot monitoring of Azure applications lately and while I've found a number of helpful blog posts on various topics surround the matter I've not yet found one comprehensive set of posts that walk you through the process from start to finish. What I want to do with this series is…

Show the basic components and configurations of Azure that are relevant to monitoring
Discuss how the Azure MP works and how to get it setup (based on the latest release of the Azure MP - 6.1.7224.0)
Walkthrough how to setup discovery for an Azure application, how to validate the discovery worked and (most importantly IMHO) how to troubleshoot things if it doesn't work the first time
Walkthrough how I have gone about looking at the instrumentation available for a given Azure application so that I could figure out what I needed to monitor
Provide a few supplements to the existing guidance around how to create your own monitors/rules that leverage the Azure MP, including using management pack template synthetic transactions
Aggregate together all of the helpful links that I've come across along the way

The series assumes that folks have a novice to intermediate understanding of OpsMgr and (like me) are relative newcomers to Windows Azure. These posts also assume that the following as already been done:

An OpsMgr environment has been setup with the necessary pre-reqs to monitor Azure applications (refer to Walter Myers III’s great primer for a jump start)
An Azure subscription has been created with a base role defined

We’re not in the data center any more…

With traditional on-premise applications, I’m used to getting a list of servers and a list of instrumentation, pushing agents to those servers and then building an MP to collect or alert on the instrumentation. While there are similarities, there are some fundamental differences about Azure applications that make that process, not entirely portable.

The three most significant ones that come to my mind are the following:

Azure focuses on the service/role, not the instances: While health models do provide a structure of sorts for monitoring, I still think of things mostly on a per-server basis on-premise. Within Windows Azure though, applications are structured by default (Subscription -> Hosted Service(s) -> Deployment(s) -> Role(s) -> Role Instance(s)). This allows us to raises our eyes up the stack (much like a management pack does) and focus on services and roles, instead of instances. In fact, Azure managed the instances completely so keeping track of them is contrary to what we are paying for from Azure in the first place.
Instrumentation is all in one place: While it is true that each instance has its own local store of its personal instrumentation, Azure allows you to also forward relevant instrumentation across all of your role instances within a hosted service, to a single location (i.e. Windows Azure Table Storage). This is really handy because now I only have to worry about my OpsMgr infrastructure connecting to a single location for an entire application, as opposed to maintaining agents across all of my servers. Granted, this approach is not as fully features as a full blown OpsMgr
agent (e.g. running scripts, using advanced OpsMgr MP modules), but it is great for the basics.
Instrumentation
is not “on” by default: By default, none of the role instances will forward their instrumentation to the storage account, where it can be retrieved by OpsMgr. And by none, I mean absolutely nothing; no windows event log, no processor counters, nothing. It’s straight forward enough to setup, but it is also a big shift from on-premise.

I’ll cover the impact of the first bullet in another post, but for starters I want to focus on bullets 2 and 3 as they both relate to getting things configured in Azure.

Getting the Azure application ready to be monitored

So in order to turn the instrumentation “on” and to get all of that instrumentation into one place, where it can be consumed by OpsMgr, we need to start working with the Windows Azure Diagnostics (WAD). Here are the steps involved:

Create a new storage account for diagnostics data: The first thing you need to do is create the storage account, which all of the diagnostic data will be written to. As a best practice, this account should be used solely for storing diagnostic data. It is a best practice because it allows you to
1) manage access to each data set separately and 2) eliminate the risk of issues with diagnostic data impacting the actual application’s data. Following the instructions provided here to setup the new account.
Enable diagnostics for the role(s): Next, you need to give your roles a couple of instructions to enable diagnostics and to allow those diagnostics to be written to Azure storage. These instructions are entered into the .cscdef and .cscfg files respectively so that means you’re making some changes to your app. Following the instructions provided here to initialize the diagnostic monitor. After that, following the instructions provided here (or in the bottom of the previous link) to tell your roles to write their diagnostic data to the storage account you just created.

Note: The configuration setting that you use to store your connection string is actually something that the Azure MP will need to access as well. By default the string will be named Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString, with the newer Visual Studio templates, but the Azure MP assumes it will be “DiagnosticsConnectionString” (Refer to the important box on this page). I personally prefer to leave the connection string as the new default and to override the discovery in the MP, but if you want to you can change the setting name at this time to be “DiagnosticsConnectionString”.
Configure the roles to submit their data to the storage account: So now your instances are able to submit instrumentation data to a central store, and they know what store to send that data to, but they do not know what instrumentation to send. There are a few different ways
to do this (described here), but within Microsoft IT we prefer the use of diagnostics.wadcfg. The reasons we prefer this method are 1) it is not hard coded into the source and can be more easily changed as a result and 2) it is supported on all role types so we don’t have to different our approach based on whether it is a web, worker or VM role.

Note: Although WAD supports 8 different types of instrumentation currently, the Azure MP currently only has the ability to work with Windows Event Logs (WADWindowsEventLogsTable), Performance Counters (WADPerformanceCountersTable) and Windows Azure Logs (WadLogsTable)

With all of that in place, your Azure application is now capable of sending it’s instrumentation to Windows Azure Storage. You can verify this with one of the many tools available (a few are listed here). I personally use the Server Explorer in Visual Studio.

In the next post we’ll cover getting management certificates setup in both Azure and OpsMgr.

Monitoring Azure applications - Part 1

Additional resources