My experience upgrading to OpsMgr R2 RTM


I upgraded my test lab from SP1 to R2-RTM this weekend.

 

My current test lab consists of the following servers:

OMRMS – Server 2003 - RMS role

OMMS3 – Server 2008 - MS role, Web Console

OMMS – Server 2003 - MS role, ACS collector

OMDB – Server 2003/SQL 2005 - OperationsManager Database

OMDW – Server 2003/SQL 2005 - OperationsManagerDW database, Reporting, SRS, ACSDB roles

 

There are 18 agents reporting to this management group.

 

So – I start – with a little light reading.

I begin with the release notes.  These are available from the R2 CD, and on the web at Operations Manager 2007 R2 Release Notes  I dont see anything in there that is terribly applicable to me…. but these are good to commit to short term memory – in case we hit a snag during/after the upgrade.

Next – I move on to the Upgrade guide.  This is available on the Technet Library – at Operations Manager 2007 Upgrade Guide  I need to spend a little time on this one, mapping out the pre-upgrade steps, and then planning the order of my upgrade based on how my management group is deployed.

 

So – I start by running down the pre-upgrade checklist at: Preparing to Upgrade Operations Manager 2007

I record my service accounts, make sure my DB’s have plenty of free space, and my t-logs are sized big enough.  I make sure the volume with TempDB has plenty of free disk space in case TempDB needs to auto-grow. 

Next – I map out my plan – and order of operations, for my management group, and share the plan with my team:

  1. Get most recent backup of Database, Encryption key and Export unsealed MP’s for safekeeping.
  2. Go to pending actions – and reject/remove anything in there.
  3. Verify free space on SQL database and validate log size is appropriate.
  4. I need to uninstall the agent from OMTERM – my terminal server which has a console and an agent only.  I decide to go ahead and uninstall the agent, the console, and the SP1 authoring console as well, since I will be replacing it with the R2 auth console.  I will replace the agent and consoles when the upgrade is complete for the management group.
  5. I need to disable all my notification subscriptions, and disabled my product connectors.  I am running a custom internal product connector – which runs as a service and updates alert properties – so I will stop and disable that service for the duration of the upgrade.
  6. I see a section on Improving Upgrade Performance so I will add that step here – right before I upgrade the first component.
  7. I am now ready to establish the upgrade order for my management group – this is available at: Planning your Operations Manager 2007 Upgrade
  8. RMS (OMRMS)
  9. Reporting Server (OMDW)
  10. Stand Alone Consoles (None – I uninstalled this already in my case)
  11. Management Servers (OMMS3, OMMS)
  12. Gateway Servers (None)
  13. Agents
  14. Web Console (on OMMS3 and OMMS)
  15. Post-Upgrade validation steps

Ok – that's my plan.  Time to get rolling.

The SP1 to R2 steps are outlined here:  Upgrading from Operations Manager 2007 SP1 to R2

I know from experience with customers – the success of your upgrade HINGES on how well you read AND follow the upgrade steps – VERBATIM.  The majority of issues we see (especially on clustered RMS) are when a customer does not follow the steps exactly as written, in the correct order.

 

I complete steps 1-7 in the plan above, and then start the RMS upgrade at step 8.  I run “SetupOM.exe” and kick off the pre-req checker before starting the install, where I hit my first snag.  I need to install WS-Management v1.1, because I do plan on monitoring Unix/Linux machines in the future with this management group.  (This was documented in the release notes, and in the upgrade guide – so I was expecting this… I should have added this to my plan)  So I install WS-man from the link provided in the pre-req, which just takes a few minutes.  Now – it looks much better in the pre-req checker:

image

 

The install instructions provided on TechNet are very straightforward.  The install took about 20 minutes for my small environment.  It waited the longest on “Loading Management Packs” on the screen in my environment.  It finally ended with an error:

 

image

 

The guide has a note on this – about the fact you might get a warning that a service failed to start – and to hit OK.  However – this is a different error – this is a service failing to stop…   I click OK, and then a few minutes later – setup completes.  I uncheck the box to start the console and to backup the encryption key.

 

I then ran the RMS upgrade validation steps – checking the registry and the services.  Registry setup version shows me all is good. 

***Note:  We have changed the service display names for R2.  See below:

image

 

I moved on to Reporting.  My SRS, Reporting, and DataWarehouse are all shared on a single server – OMDW.

 

As I read the guide at Upgrading from Operations Manager 2007 SP1 to R2 I notice this little tidbit – which needs to be given STRONG attention before I kick off the upgrade:

Prior to running the upgrade on the Reporting server, you must remove the Operations Manager 2007 agent; the upgrade will fail if this is not done.

So – I kick off the uninstall of the agent on the Reporting/SRS server (OMDW in my case) from Add/Remove programs – before I start the upgrade.  Missing little steps like this will drive you nuts if you aren't methodical.

After the agent uninstall – I pick back up on the guide – and kick off “SetupOM.exe”.  Since I am a freak – I go ahead and run a pre-req check just to make sure all is good:

image

 

Moving on…. I start the install according to the guide.  The install goes without a hitch, and took about 10 minutes to complete.

 

Next up – Management servers.  I start with OMMS3.  I hit the pre-req check – and I notice I already have WS-Man installed – so away I go.  The installer immediately failed with a pre-req failure.  I realized – I have the web console installed on this management server, and I forgot to add that when running the pre-req check manually.  When I do – I see: 

image

 

So – I need to grab the ASP.NET Ajax extensions…. this is to support the new cool health explorer in the Web Console.  I click “More” on the pre-req check – which gives me a link to the download.

After this little hurdle – the management servers upgraded very quickly.  Once again – I got an expected error about a failure to stop a service.

 

image

 

Click ok and setup completes.  I repeat this upgrade on the other management server (OMMS) and these are done.  A quick check of the registry – and the setup version is indeed 6.1.7221.0

 

I don't have any gateways in this lab – so next up is agents.

 

Lucky me – all 18 agents show up in pending actions for an update.  I will approve them all – and let the management server push the update down and upgrade them. 

***Note – do not upgrade more than 299 agents in this manner at a time.  This is documented in the Upgrade Guide.

All my agents upgraded successfully except for two.  BOTH that failed happened to be the two servers that I manually removed the SP1 agent from – OMTERM and OMDW.  (I forgot to delete their “agent managed” object from the management group)  Both have a different error.  OMTERM is failing to install with a push failure for MOMAgentInstaller.  I have had trouble with this agent before – possibly because of the TS role - so I just do a manual agent install here.  OMDW is different – the console push said it was a success – however – the System Center Management Service (HealthService) will not start – it gives an error:

Event Type:    Error
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7024
Date:        5/23/2009
Time:        1:09:37 AM
User:        N/A
Computer:    OMDW
Description:
The System Center Management service terminated with service-specific error 2147500037 (0x80004005).

I ran a repair action from the console – but got the same error here.  So – I manually uninstalled the broken agent – and deleted the agent from the Agent Managed section of the console – and re-pushed the agent.  I had a little trouble getting these two to come into the management group… but eventually after a couple delete/reinstalls they finally appear to be working ok.  I’d recommend uninstalling them from the console next time…. so this will remove both the agent and the computer object from the console.

 

Next on the list:  Web Console

From the upgrade guide I see this note….

If your Web console server is on the same computer as a management server, the Web console server is upgraded when the management server is upgraded, rendering this upgrade procedure unnecessary. You can still run the verification procedure to ensure that the Web console server upgrade was successful.

Good – my web console is not a stand-alone – it was running on a management server (OMMS3) so that is already taken care of.

Aha – I found something we forgot on the plan…. the ACS Collector.  This role is missing from the table at Planning your Operations Manager 2007 Upgrade so I completely missed this as a planning step.  However the process is documented at Upgrading from Operations Manager 2007 SP1 to R2.  So – we need to do this – I will assume last since it is last on the upgrade detailed steps.  Following the guide…. I walked through the steps – no issues.

 

Looks like we are done!  I will now start the post-upgrade validation steps to make sure my management group is actually working as it should without any major issues.

There is a list of post-upgrade checks at Completing the Post-Upgrade Tasks

 

I am going to walk through those here:

1.  I open up discovered inventory – and change target to “Health Service Watcher” and compare this to the list I had before the upgrade.  These are agents that have a problem from the management server perspective – which causes them to appear “grey” in all other views.  My list is the same as before I started – I have 6 in this list as critical – 5 of them are agents that are VM’s that are currently down – so this is good.  1 of them is an old management server… for some reason we don't groom these out of the view/database – and these seem to stick around forever in this view.

2.  I review the event logs on the RMS and all MS roles.  I am seeing some errors like below:

Event Type:    Warning
Event Source:    HealthService
Event Category:    Health Service
Event ID:    2120
Date:        5/23/2009
Time:        10:02:15 AM
User:        N/A
Computer:    OMRMS
Description:
The Health Service has deleted one or more items for management group "OPS" which could not be sent in 1440 minutes.

This is normal – it happens when you have agents that are down in your environment.

Event Type:    Error
Event Source:    Health Service Modules
Event Category:    Data Warehouse
Event ID:    31552
Date:        5/23/2009
Time:        10:03:38 AM
User:        N/A
Computer:    OMRMS
Description:
Failed to store data in the Data Warehouse.
Exception 'SqlException': Sql execution failed. Error 777971002, Level 16, State 1, Procedure StandardDatasetGroom, Line 303, Message: Sql execution failed. Error 2812, Level 16, State 62, Procedure StandardDatasetGroom, Line 145, Message: Could not find stored procedure 'KMS_EventGroom'.

One or more workflows were affected by this. 

Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance
Instance name: KMS Activation Event Data Set
Instance ID: {800D8126-6F72-CA84-A76B-A94F7E3C93CF}
Management group: OPS

This is not normal – this looks like an issue with the KMS MP – and R2’s advanced logging is picking up on an error that's been there all along, I just didn't know it.

That is all from the RMS – pretty clean.  On the Management servers…. I found a bit more – but they were all due to the problems I was having with a handful of agents.  Once I removed and fixed those agents – the MS logs are clean.

3.  No cluster in this lab – so nothing to test there.

4.  Review alerts in the console.  I sort by Repeat Count and LastModified (I add these to all my alert views) and look for anything that stands out as repeating a LOT, or something new that looks like a problem.  I dont see anything here – so that is good!

5.  DB server in perfmon looks good.  I examine % Processor Time, and Logical Disk Avg disk sec/read and Avg disk sec/write.  Those are both avg under 15ms (.015) on the DB and log volumes - so that looks good.   CPU is avg under 25%.

6.  Check all the console views.  Much snappier than in SP1.  Nice.

7.  I opened up reporting – and ran the “Microsoft ODR Report Library > Most Common Alerts” report – to test out reporting.  It ran with no issues.  I test a few of my saved custom and favorite reports – no errors – all good.

8.  Authoring pane looks good – I can see my groups, monitors, rules – and wow – they open a LOT faster than before.  Very nice.

9.  I check out my MP versions.  The install upgraded all my core MP’s to 6.1.7221.0.   I was already pretty current on my MP’s – so not much to do here now that needs my immediate attention.

10.  Re-enable notification subscriptions and product connectors.  I turn my subscriptions back on – and fire off a test event that I use to generate an alert and email me a notification.  Works great.  Next – I got to my custom product connector – and enable the service and start it back up again.  I run some test alerts – to make sure my product connector is taking all the necessary actions on the alerts – and forwarding them as appropriately.  All good.

11.  Review My Workspace.  Yep – all my old custom views are there.

12.  Re-deploy agents.  I already did this.  Perhaps I should have waited on this step…. because I spent so much time troubleshooting those last few pesky agents that seem to have trouble.

13.  Oh – the BIG ONE.  This step is a bit odd – we tell you to go run this SQL query.  LET ME WARN YOU – this is not a “quick job”.  This is the script that is documented and discussed at my blog post:  Does your OpsDB keep growing- Is your localizedtext table using all the space-  Dont take this step lightly – running this script could take several hours – so plan accordingly.  Read the link above for the details – and consider skipping this step for now…. until you are sure you are ready to execute it.  Take some calculations based on the blog post above – how long it will take – how severely you are impacted (row count of your localizedtext table) and make sure you have a LOT of free space for the tempDB and tempDBlog to grow if needed.  My LT table was already really small – so no issues for me running this – it completed in less than a minute.

Done!  (with the “official” steps)

 

Now – I just have a couple cleanup steps I need to do – like go back and install the Ops Console and the Auth Console back on my terminal server.  Did that without issue.  All looks good.

 

And then I realized – we are missing another step in our plan – under the post-upgrade tasks – make sure the web console is working!  I saw lots of items in the release notes about how this might break…. and I imagine someone will complain rather quickly if it isnt working – so we better go check that out.

Sweet! I hit up the web console and it is all good.  I check out several of the new views – and run health explorer from the web console.  I have tasks, maintenance mode, and health explorer.  Very cool.  I event execute some of my favorite reports under “My Workspace” just to make sure those are good – ouch – not working.  I will have to look into that one.

 

Ok – that’s enough for today.  All in all – a successful upgrade.  A good plan written out at the beginning, based on the upgrade guide - makes all the difference.

Comments (21)

  1. Anonymous says:

    Here’s a great, great walk through of the R2 Upgrade process. A must read if the upgrade is on your horizon

  2. Kevin Holman says:

    You can upgrade agents whenever you want.  As you apply R2 to each Management server – we will place those agents into pending actions – that report to that management server as primary.  This is just flipping a bit in the database essentially.  You can approve or reject those anytime you want.

    So – my recommendation would be to apply R2 to the entire management core infrastructure first (RMS, all MS/Gateway, reporting, etc…) and THEN start your agent upgrades.

    When an agent is shown in pending actions – it is ABSOLUTELY still monitored as an SP1 agent.  We do not stop or interrupt monitoring for any reason like this.  

    If you are not planning on upgrading those agents soon – then I would reject the pending action for the agents, and later – when you are ready – simply execute a "repair" routine from the console – which is essentially the same thing as approving an agent for a pending upgrade.

    We just put them into pending to make this process a bit easier.

  3. Ted T Hacker says:

    Where can I get the current version of an mp file updated during the upgrade to 2012 R2 with UR2? One of the management packs updated but I cannot find the mp file. I need the mp file in order to seal a management pack containing an override to an object
    in that old management pack. The sealing is looking for microsoft.windows.server.2000.mp, "Windows Server 2000 Operating System", with a version of 6.0.7011.0. The latest version I have is 6.0.6989.0.

    It was not in "C:Program Files (x86)System Center Management Packs" or in the source of the R2 upgrade in "ManagementPacks" folder.

  4. Anonymous says:

    Kevin –

    I referenced your blog entry in a posting I made about upgrading to OpsMgr R2.

    Thanks again for sharing, it made my upgrade pretty easy…

    🙂

    Cheers,

    John

  5. The error about the service stop or start is just a timeout. i have hit that occasionally on multiple SP1 and R2 beta setups… but reall, everything works anyway after that.

    The KMS error is known (at least to the community – I hope there is at least a bug filed for it): that stored procedure does not exist, and we never groom the KMS stuff from the DW. It was fond by Daniele Gandini (a partner) and it is described here: http://nocentdocent.wordpress.com/2009/02/14/kms-management-pack-bug/

  6. Anonymous says:

    Kevin,

    Like you we are seeing this error from event 31552 repeatedly.  I thought this was due to an issue with the KMS MP I read about so we removed the MP, but the error has persisted.  Did you ever resolve this in your environment?

    Vratix

  7. Anonymous says:

    Hi Kevin,

    We have upgraded our infrastructure to SCOM 2007 R2, and it went fine….recently i observed that their are 3 different version of agents reporting to my Managements servers( V6.0.6278.0, V6.0.6278.32,V6.1.7221.0) and all are in healthy state ….. my question is will the monitoring work for all the agents.

  8. Kevin Holman says:

    I would follow the steps of the upgrade guide….

    1.  RMS

    2.  Reporting server

    3.  Stand alone consoles

    4.  Management servers

    It doesnt really care where your reporting server is installed…

  9. Kevin Holman says:

    No – there is no requirement to remove consoles.  Per the upgrade guide – you just need to remove the agent – in order to upgrade a console from SP1 to R2.  I just chose to remove the console in my lab… because I had a lot of SP1 stuff loaded.  Normally – you would follow the upgrade guide – remove agents from stand-alone console machines – then upgrade the stand alone consoles…. then later install the agent again.  If you dont have an agent running on a machine with a stand alone console – you simply upgrade it.

  10. Kevin Holman says:

    This is caused by a few issues with the current KMS MP.  

    When you remove the MP – it does not remove the database objects, so the error persists.  If you open a case with MS – they can help you remove this or keep the KMS MP and fix it – but it requires editing your database table…. which they dont like to do.  🙂  I had planned a blog post on this issue but haven’t heard from a lot of people affected by it.  You can ping me via the email contact form if you want the explicit instructions.

  11. Kevin Holman says:

    Yes – SP1 agents are supported reporting to a R2 server.

    However – we recommend you get these SP1 agents upgraded ASAP.

  12. John Wieda says:

    Very clear and well documented upgrade experience, thanks!

    Should individuals that have just the Ops Mgr 2007 SP1 console installed, uninstall that console and then install Ops Mgr 2007 R2 console?

    Or is there an upgrade path?

    Also, if someone, such as myself, is running Windows Server 2008 on their workstation and the Ops Mgr 2007 agent is installed – should both the console and the agent be uninstalled prior to the upgrade to Ops Mgr R2?

    Thanks,

    John in Chicago (Fiserv)

  13. LayneR says:

    Kevin, excellent work as usual.  Nice meeting you at MMS as well.  Silly question about agent upgrades, as it is not clear to me from the various upgrade docs.  How soon after you upgrade server roles are you supposed to upgrade your agents?

    If after upgrading the server roles the agents appear in the Agent Requires Update section under Pending Management, they will not be monitored until you approve or reinstall manually, correct?

  14. Brian Hansen says:

    I need to upgrade agents using SCCM rather than the console. I ran into problems upgrading to SP1 and had to actually uninstall and reinstall agents rather than upgrading.

    Has anyone tried command line upgrades to R2? Any recomdations for command line (SCCM) upgrades?

  15. Nicole Welch says:

    For Brian-

    We upgraded agents (~5K) to SP1 via SMS without issues.  Below is the cmd I used in my VBS wrapper.

    If ADInt = FALSE then

        set objExecObject = objShell.Exec("msiexec.exe /i " &MSI& " SP1UPGRADE=1 SET_ACTIONS_ACCOUNT=0 REINSTALLMODE=vomus REINSTALL=All /quiet /norestart /l*v c:ntutilsWASUP-SCOMAgentMSIUpgrade.log")

      End IF

      If ADInt = TRUE then

        set objExecObject = objShell.Exec("msiexec.exe /i " &MSI& " SP1UPGRADE=1 SET_ACTIONS_ACCOUNT=0 REINSTALLMODE=vomus REINSTALL=All USE_MANUALLY_SPECIFIED_SETTINGS=0 USE_SETTINGS_FROM_AD=1 /quiet /norestart /l*v c:ntutilsWASUP-SCOMAgentMSIUpgrade.log")

      End IF

    I’m working to confirm what the SP1Upgrade=1 switch is now for the R2 upgrade.  Changing it to "R2Upgrade=1" appears to work in my lab — still testing though.

  16. Jim Kiniry says:

    We’re in the planning stage of performing the upgrade from SP1 to R2. We’re also wanting to upgrade the OS from WS2003 SP2 to WS2008 R2. Any recommendations as to when it would be best to perform the OS upgrades? Should we upgrade to WS2008 first or can we go direct to WS2008 R2. We have split the OpsMgr roles to individual servers: RMS, DB, DW, RS, MS, MS2, and ND (Network Devices MS).

    Thanks,

    Jim

  17. nick says:

    we have a clean build of Ops Mngr R2 (ie we did not upgrade from earlier version) with no additional mgt packs installed other than the defaults and are getting 31552 events logged. just to add that to the mix – was it ever resolved – can you please post resolution if so.

    thanks

    Nick

  18. babu says:

    How to Migrate from Single RMS to Clustered RMS in OpsMgr R2.

    I cant see any documentation on web for this.

  19. Chris Walker says:

    Excellent documentation.  I’ve used your blog, the upgrade guide, and a few other resources.

    One point that I would add for others upgrading:  check the event log after each server that is upgraded.  Do NOT proceed until you receieve a 1210 event within the Operations Manager event log (except for a clustered RMS, which you can proceed with each subsequent node until the cluster upgraded, but do not start any other servers until the final clustered node that will hold the resources presents a 1210 message).

    We found that we were getting random Health Service State corruption if this was not finished first.

  20. samson says:

    Hi Kevin,

    It is a nice document. We are planning to upgrade our SCOM SP1 infrastructure to R2. The current SCOM structure consists of the following servers:

    Server 2003 – with RMS role, Web Console

    Server 2003 – with MS role, Reporting, SRS

    Server 2003/SQL 2005 – with OperationsManagerDW DB, OperationsManager DB

    Because our MS is also our Reporting server with SQL 2005 RS what is the best step to upgrade after RMS upgrade? First upgrade MS and then Reporting server or vice versa?

Skip to main content