Status and State and Backlog, Oh My

Article
08/21/2007

We've had an interesting conundrum lately when our “localizers” were translating System Center Configuration Manager 2007 (aka “ConfigMgr”). It turns out that "state" translates to exactly the same word as "status" in most languages. For example, when running the Configuration Manager console in Chinese, when you had a drop down box with the ability to choose between "state" and "status", you saw the exact same term. Fortunately, someone caught it and filed a bug. To resolve the bug, we have to figure out, how should these two terms be translated? To science people, like many of the PMS, it was clear and obvious. To the word people, the words have the same root so the distinction is not clear.

I used to be a trainer, and still think like a trainer, so I created an analogy. It helped the localization team; maybe it will help you. At the suggestion from the rest of the writer team, I am also calling out some late breaking performance information you should know about state messages.

What’s the difference between a state message and a status message?

Let’s say I want to know what my kid is doing when she gets home from school. (Disclaimer: I do not really let my daughter come home from school by herself. Please do not call the child protective authorities on me. J) I could have my girl send me status messages. “Mom, I arrived home from school.” “Mom, you have a permission slip to sign.” “Mom, I ate a snack.” “Mom, I fed the gerbils.”

These messages are helpful, but they don’t really give me a sense of what she is up to at any given time. And I have to follow a lot of messages and their time stamps to piece together what her afternoon is like. These are status messages. You could also think of them as “transaction messages” or “event messages”.

But what I really care about is, did she eat her snack, did she clean her room, and is she still parked in front of the TV? (We could call that “compliance status”.)

So I have my programmer husband develop a new system. We have her punch a button when she starts her snack and a button when she finishes her snack. The button generates an email, but the email is automatically processed by a state system so I never see the actual mails. Here at my computer, I just see a dashboard that says “Snack: not started”. If I don’t check for a while, when I do check back it says “Snack: complete” but I missed it when she was in “Snack: acquiring snack” state. It doesn’t matter to me, because all I care about is that she ate something and won’t be cranky when I get home. She pushes a button when she sits down to watch TV and she should push a button when she turns the TV off. But when I check, I see that TV is still on, so I call her to remind her to turn it off and do her homework.

While she was getting her snack, she could send different transaction (status) messages telling me for example 1) we ran out of popcorn, 2) she spilled the milk, 3) she doesn’t like the brand of string cheese on hand, and 4) the dishwasher is full so don't blame her for not putting her plate away. None of those events change that she was in “Snack: acquiring snack” state. When she is done eating (and leaving her plate on the counter because the dishwasher is full), she goes to “Snack: complete” state, and that is what I see in my dashboard.

In ConfigMgr, we don’t just piece together status messages to determine state, we use an entirely different mechanism. Status messages can come in and be filtered and viewed individually. For state, we just store the last state and you get to see it in reports. A client can send many status messages that don't necessarily change the state the client is in.

State Message Backlogs

In our glossary, we defined state messages as a "quick and efficient lightweight mechanism", and it is true, However, we have seen that in very large environments, these quick little messages can achieve critical mass. It's a bit like paper - negligible as a single piece, manageable as a single book, but impossible if you have to process an entire library in a few days before more books come in.

When the release notes come out, if you will be working in large environments and using lots of features, you really must read the relnote about state messages (title: “State message backlogs can create reporting delays”). If you aren’t careful, you could end up with a backlog of state messages. In some cases, we’ve seen sites get so behind in their state processing that they simply cannot catch up.

Why is it bad to get backlog of state messages? Going back to our analogy for a minute, let’s say we open up the state system to all of the kids at her school, but we don’t increase our processing capacity. The messages backlog, so when I go to check my dashboard it says “State: getting ready for school” even though it’s 3:30 in the afternoon. And the system doesn’t get around to processing the “Snack” acquiring snack” state message until 11:45 at night. Obviously, that won’t be very helpful to me.

I won’t try to twist the analogy to cover how you get into this backlog state. Going back to the real product, there are a few things that tend to produce large amounts of state messages. When you install a client, it sends a bunch of state messages. And then the first time the client scans for software updates, it sends another bunch of state messages. Putting those together, you can see that installing clients will cause a super huge bunch because they will probably scan pretty soon after they first install. Then you factor in the reality that some client deployment methods don’t give you a way to determine how many clients to deploy at any one time. If you do client push using the wizard, you can pick just a few smaller collections at a time. If you do client push by setting the client push settings, discovering everything, and then letting it push to the discovered computers, you could be pumping a lot of state messages all at the same time. Patch Tuesday can also generate large piles of state messages, so don’t deploy clients on or around Patch Tuesday. Even if you start client deployment the day before Patch Tuesday, you could still be processing all of the deployment state, so when the clients start scanning, the state system is already backlogged. Two other big generators of state messages: Asset Intelligence Client Access License (CAL) data collection and configuration baselines in desired configuration management. ConfigMgr generates a state message per configuration baseline per client. CAL data collection can generate piles of really helpful state messages, as long as they can get processed without overloading the system.

Unlike status messages, state messages are more of a black box. You can’t view state message in any sort of message viewer like you can with status messages. You can’t see the individual status messages winding their way through the system and up the site hierarchy. If you have a child site generating a bunch of state messages, they will eventually hit the parent site, too. If you have a large backlog in Statsys.box\incoming (for state message in the site) or Replmgr.box\incoming (for state messages moving through the hierarchy), it might be tempting to go and start deleting the files, hoping that it will give ConfigMgr a chance to catch up. However, another generator of state messages is resynchronization of state. If we can’t tell what state the client is in, we tell them to give us all of their state again. It can be iffy to determine when you’d be better off living with the resynchs than trying to process a load you will never get through, so we didn’t even doc that as an option in the release notes.

You are better off never getting into the state backlog state in the first place. Here are the recommendations we came up with from the relnotes:

Do not deploy clients using the Configuration Manager 2007 software updates feature at the same time you are deploying large numbers of security updates, for example on or around the second Tuesday of the month.
Limit the number of clients deployed at any one time. For example, use the Client Push Installation Wizard instead of enabling site-wide client push.
Do not configure the Asset Intelligence CALCollectionFrequencyDays more frequently than the default (one time per week), and set the CALCollectionType to only the minimum data required. For more information, see "About enabling Asset Intelligence Data Collection" in the Configuration Manager Documentation Library.
If using the desired configuration management feature, limit the number of configuration baselines per client and the number of configuration items per configuration baseline.

Now, just because I put these recommendations in here, it does NOT excuse you from reading the relnotes. J Do make sure you read them thoroughly before deploying. There are two hotfixes you need that are not listed in the prereq checker. And, the relnotes are the only place you can read out about the cool new prerequisite downloader. (How’s that for a cliffhanger?)

Cathy Moya, with help from Carol Bailey and Marc Umeno

This posting is provided “AS IS” with no warranties and confers no rights.

Status and State and Backlog, Oh My

Additional resources