Network Bandwidth Utilization for the various OpsMgr 2007 Roles
I recently started owning performance and scale for OpsMgr and it is definitely one of the most interesting and challenging areas I have ever worked on. I know the first question that is popping up in most of your minds is why is console performance so darn slow in OpsMgr 2007? There are various reasons for this which I will divulge at another time but the one thing I will assure you is that the console performance with Service Pack 1 is a lot faster (Geo Metro to BMW M3 faster, if that is a valid comparison). But I wanted to dedicate today’s blog to talk about the network bandwidth utilization as it seems to be a question a lot of customers have been asking. There are essentially three sections to discuss a) Agent to Root Management Servers\ Management Server\ Gateway Server b) Root Management Server\MS to Database c) Audit Forwarders to Audit Collectors.
a) Agent to Root Management Servers\ Management Server (MS)\ Gateway Servers
The amount of data sent to MS is based on the kind of Management Packs (MPs) you have in your environment and how you as the end user have tuned these MPs. Some MPs by default send more data to MSs than others and to get an idea of what these MPs are you can view the table attached in the doc. Data sent between agents and MSs is always compressed, in our test environment with the Active Directory, Base OS, DNS and OpsMgr MPs we noticed that the estimated bandwidth utilization was about 500 bytes per sec on a single agent and about 75 kilobytes per sec on server for 150 agents. We also did a test with all the MPs that are there out of the box and with an additional stress test MPs (simulating real world load) and we noticed that about 200Kbytes per sec received by a Management Server for 2000 agents. So as you can see from the numbers that the data sent between the agent and MS is compressed. The one common question that we get is the minimum bandwidth requirement for agent to MS in the supported configurations document is 64Kbps but my customer has a few servers that only have 56Kbps. Can we support this?. If your agent data packet size is small enough for the bandwidth you have then it should not be a problem. Our Microsoft product support folks (PSS) have always been very accommodating so while you maybe outside the support requirements they will not stop from helping you troubleshoot your issues. This is definitely one of the big perks of using Microsoft products and having Microsoft support. Gateway server are basically proxy agents that tunnel data from multiple agents to a Management Server. We have seen bandwidth utilization of about 22Kbytes per sec received by Gateway Servers for 400 agents with all the out of the box Management Packs and some stress Management Packs.
b) Root Management Server (RMS) \ Management Server(MS) to Database(DB) and Data Warehouse (DW)
In OpsMgr 2007 the RMS and MS both write directly to the DB and DW. There is no DTS jobs like we had in MOM 2005. Since, the RMS and MS write directly to the DB and DW the data is not compressed and the size of data is larger as well. What we recommend many customers to do is to have their RMS\ MS close to the DB and DW and have fast links between them. It is much better to have the agents in remote geographic locations report to an MS than to have management server in remote geographic locations write to a DB and DW.
c) Audit Forwarders to Audit Collectors
While I am no expert in Audit Collection as a feature I will share with you the data we have collected for ACS so far. My colleague Joseph is the definitive ACS guru and should be the ultimate source of information on this topic.
ACS forward events in near real time, rather than batching them together. This is different from how MOM2005 sends events. Therefore when we say ACS bandwidth utilization is 100 bytes per events and a system generates ~27 events/sec, you can literally translate that to 2.7KB per sec
If there’s a loss of network connectivity, the forwarder will resend all security events that are not confirmed to be written to the DB by the collector. The forwarder sends a heartbeat (in the form of an event) to the collector every 45 seconds. If the collector does not receive the default of 3 heartbeats from the forwarder, the collector will drop the connection and the forwarder will automatically re-initiate the connection (if it is alive)
The size of a typical security event when it is being transmitted from the managed system to the ACS collector is usually less than 100 bytes. The size of a typical security event when it has been recorded in the ACS SQL database is less than 0.5KB. Typical CPU and memory utilization be for an agent assuming that ACS functionality is enabled also on the managed system CPU is typically is less than 1% and memory is about 4-6 MB.
Joseph in one of his mail threads to a OpsMgr discussion alias had written a quick and dirty script which is attached to the blog(rename it back to .vbs) that helps you count the number of events generated per sec on the local computer.
(It can run against remote computer, just supply the computer name as an argument, but it seems to be slow…) Run it like “CScript SecurityEventPerSecond.vbs >> NumOfEvtsGenPerSec.csv” and just load the csv in Excel to calculate the average. This can be useful in situation where it is not possible to install a pilot ACS collector in order to measure incoming event rate (by looking at the perf counter ACS Collector\Incoming Event per Sec)