Greetings – Hilde here to pass along some wisdom for AD shops everywhere.
Recently, I was part of a conversation with a handful of true Active Directory rock-stars here in Premier Field Engineering who have done a lot of AD Risk Assessment Program (RAP) deliveries.
As a reminder, the "RAP as a Service" delivery includes a very in-depth scan of a technology (AD, GPO, Failover Cluster, Desktop OS, etc) and provides a thorough review of the scan results, as well a conference call with a RAP-accredited PFE to discuss the results. Reports and remediation plans are generated for the environment and you are licensed to use the RAP as a Service "client" (the scanning tool) for a year to re-scan/review the environment as much as you'd like. Another value-add is the portal for reviewing, analyzing and interacting with your data, results and reports. The portal has evolved to become a superb aspect of the RAP as a Service platform and benefits from frequent updates and continuous improvement
Bryan Zink and the YYZ PFE have been delivering/involved in AD Risk Assessments since they helped develop/originate the program almost 15 years ago.
Two other veteran PFEs – Doug Gabbard and David Morillo - have been delivering AD Risk Assessments for years and also have tremendous depth and experience in AD.
Some of those names you may recognize from this blog or perhaps from your own AD Risk Assessment; these guys are all members of the Hall of Justice for AD.
As you review this, take note of the patterns and use the information to improve the AD environments you are responsible for.
After more than 12 years and 500 on-site assessments of customer Active Directory environments, lots of unusual and interesting experiences come to mind. I've had the pleasure of working with customers across all sorts of Industries with AD Forests ranging in size from two Domain Controllers all the way up to more than 3,000. What's probably most noteworthy though are the common scenarios. In no particular order:
Membership counts – As an AD Admin, invest your time and effort to really understand delegation and deploy a manageable least privileged access model. Also, make it your business to keep your groups manageable.
Ignorance is no excuse – Know your subnets, where they're in use and make sure they map to the correct Active Directory sites. LANs and WANs tend to change much more frequently than your AD Topology. If you don't stay on top of this, Users and the helpdesk will be the first to let you know. They will experience everything from slow logons to poor performance to lost data and applications from policies not applying.
Embrace your inner 'Bob Ross' – Performance analysis is more art than science. Size your Domain Controllers with a purpose. Understand how Windows uses CPU, Memory and Disk. Measure and understand what normal really looks like. This will keep late night troubleshooting down to a minimum.
YYZ PFE from AZ
Strict Replication is your friend…
Don't ignore FRS problems and get moving towards DFSR for SYSVOL repl (if you haven't already)…
Backups are cool…
Monitor AD replication…
Subnet definitions are critical…
Change notification should be evaluated …
Preferred Bridgeheads are usually a bad idea…
Limit your time sync/drift thresholds…
Review your application partitions in terms of DNS (DomainDNSZones and ForestDNSZones) – you just might find conflicts, duplicates or other stale/half-moved DNS data …
There aren't many (any?) good reasons for more than two Sites per Site Link…
Educate multiple engineers on how to update/use the RAP as a Service client/scanning tool to collect data. Too often, only one person knows how to use the tool.
Likewise, educate multiple engineers how to use the portal to review collected data, reports, issues, etc.
Budget time or plan time or call it whatever you like to perform remediation. Going back to the beginning days of risk assessments, all the way up to today, data is collected, reports are delivered, but little to no remediation is accomplished. Also, many times, the environment is never scanned again and the data collection tools sit idle.
Trust and verify – this is a twist on the "trust but verify" – just a little more friendly. This is one of the best ways to learn more deeply how AD works.
My top recommendation after doing 5 years of risk assessments is to decide on an interval, create a reminder and rerun the scanning tool on a regular basis. 7-9 out of 10 customers I've visited don't do much (if anything) with the tool after the initial engagement concludes.
Enforcing and blocking inheritance on GPOs should be used sparingly as these are advanced features and can complicate troubleshooting.
Using SYSVOL to house/replicate file types such as .exe, .msi or DLLs is not recommended as doing so could delay the promotion of a DC, increase replication traffic and cause excessive disk utilization, among other things.
Many times, there is a lack of clarity/understanding between the "backup team" and the "AD Team" and proper backups are not configured, not configured properly or don't follow best practices. Backing up 2 domain controllers per domain with Windows Server Backup is a simple thing to configure and a very cost-effective insurance policy in the event a recovery is needed.
Environments without subnets defined in AD Sites & Services is very common. These can have a big impact on client performance and are very easy to remediate in most situations.
Keeping up with patches should be a normal process for most IT shops. However, we often find one or two DCs that are missing one or more critical security patches and/or DCs that are missing non-security updates such as the rollups that have become very important for proactive operational health. One such example is the "Enterprise hotfix rollup" for Windows Server 2008 R2 (it applies to Windows 7, too, so get it on your clients) - https://support.microsoft.com/en-us/kb/2775511/en-us. No good conversation about patches should stop at the OS – we see old drivers, old firmware and another item often overlooked are the VM integration components – they're either out of date or mis-matched (or both) across DCs. Be disciplined.
A few from Hilde
Practice DR – setup Outlook calendar entries with reminders to test your recovery processes (restore a user account, a group, an OU, a GPO, a DC, the whole Forest) at recurring intervals throughout the year. Don't wait until you need your recovery skills, documentation and data to discover that there is a problem with one or more of those aspects.
Accurately document the environmental and system settings (and maintain the docs). AD Sites, Site Links, DC/DNS configurations and many other settings are automatically captured for an entire AD Forest via the RAP as a Service client tools - which is supremely very helpful. You can use that collected data and copy/paste it into Word/Excel files as an excellent starting point for documentation but don't stop there - consider GP Links, DNS zone replication details, member server NIC settings, DHCP Scope settings, etc. We all know there tends to be configuration drift and subtle (or drastic) environmental changes over the years; take a look at your docs (if there are any). I'd bet they are likely no longer accurate.
Protect AD from accidents – broadly enable "accidental deletion" preventions on AD objects including OUs, DNS zones, etc. We all make mistakes unless we are prevented from doing so.
There you have it folks. A handful of great tips for improved AD health and reduced risk.
Notice, there was nothing unexpected here. No magic wands or secret potions. No cryptic registry settings or secret hotfixes.
Just ditches to be dug.
Here's your shovel … get to work!
If this post goes over well, keep your ear to the tracks for a "Volume #2" ...