Deployment–Troubleshooting PDT

The PowerShell Deployment Toolkit – PDT – performs distributed installations of System Center 2012 SP1, including SQL and all prerequisites.  If you are doing a full production highly available scale-out deployment, this could potentially be across a significant number of servers.  Keeping track of the status of all that is one of the interesting challenges that PDT addresses, but what do you do when something goes wrong?  The good news is that PDT gracefully handles failures mid-flow across that distributed installation, and also allows for restarts of partially failed deployments.

Here’s what happens in the inner workings of PDT.  For each server in a deployment, PDT dynamically determines the set of items that need to be installed and configured and the order in which that needs to happen.  Then, for each item it determines whether that item has already been done using one of a number of validation types.  If it has, it just skips that step.  If it has not, it performs the necessary actions for that item, then re-runs the validation to make sure it worked.  If the validation fails, PDT does not continue for that server – so any items after the failed item are not completed.  If a server has a dependency on an item on another server in the deployment – for example, a management server needs SQL to be installed on another server before it can be installed – it waits for that server to complete that dependency before it continues.  If that server has failed any item prior to the dependency, the server that is dependent at that point also fails.

The result of all this is that, if something fails, you can wait for everything else in the deployment to complete, then fix the condition that caused the failure, then just run Installer.ps1 again.  Everything that worked the first time through will not be done again because PDT will validate that those items are already in place – so it effectively picks up where the failures in the previous run occurred.

So, how can you tell what went wrong?  There are two sets of log files to help you diagnose a failure – the PDT log files themselves, and the log files for the items being installed.  The PDT log files are in the folder C:Users<username>AppDataLocalInstaller on the system running Installer.ps1 – there is a log file per server being deployed to, as well as a consolidated log file Installer.log.  All files are in the format that can easily be read by the CMTrace.exe utility from Configuration Manager (remember I am an old ConfigMgr guy at heart).  The PDT logs list everything PDT is doing – getting information, setting variables, checking if something needs to be installed, creating a task, waiting for that task to finish, waiting for dependencies etc.  If something fails, it will tell you what failed.  The log files for items being installed are collected by PDT and copied to the installer machine at the end of the deployment – they can be found in C:Temp<guid>, guid being a unique identifier that is assigned to each run of the deployment.  These logs are generated by each individual setup, and so each has their own format.  PDT collects them so that they are easy for you to find, but also because the way PDT runs tasks against remotes machines means that some of the log files get deleted as part of the process, and we need to make sure you have access to them.

So, that’s how to start troubleshooting failures, plus a little insight into how PDT actually works.  More on that in future posts!