First, the warnings:
1. Sometimes I am a bit of a salmon, meaning that I have a tendency to swim upstream, metaphorically speaking.
More specifically, I like to take current thoughts around “best practices” and pick them apart to see if they actually make sense as a best practice. One of my favorite words is “specious”. A specious argument is one that seems to make sense on the surface, but when actually evaluated, turns out not to make so much sense, after all.
2. Anything I say in this blog is purely my personal opinion and may not reflect the opinions of my employer, my colleagues, my mother or anybody else. They’re just my thoughts, and there are no warranties associated with them. I also tend to have a sense of humor that may perplex some. If something comes across as obnoxious, it was probably intended to be humorous. Please don’t take offense; I certainly don’t intend any.
Now, the point of this post:
For some time now, I’ve been a proponent of using virtual machines to host offline CAs- with a lot of specific caveats about how to do it. I’ve started discussions on this subject in various forums (electronic or otherwise), including internally at Microsoft.
Some months back, somebody posted to an internal distribution list asking if it is a supported scenario to install an offline CA in a virtualized environment and are there a lot of companies doing such. That query ended up spurring a relatively lengthy debate between myself and a few others on the DL, and I think a summary of the discussion might be useful for those of you out there who are wondering whether or not you can build your offline CAs in virtualized environments.
I’ll give you my thoughts as to when and how virtualized offline CAs may be appropriate, and then I’ll give you the arguments against the idea that I’ve received in the past and my rebuttals to those arguments.
Should I Build My Offline CAs in Virtualized Environments?
Okay, so you’re looking at building or updating your PKI and you’d like to put your offline CAs (root, policy, whatever) in virtual machines (AKA partitions in Windows Server 2008 Hyper-V). I am absolutely in favor of your doing this if you do the following:
1. Build your offline CA in a virtual machine that is hosted on a removable/external drive. If that is not an option, build it on a separate disk that you can reformat at the end of this process. Don’t build this on one of your “regular” servers- build this whole shebang on a machine that you’ve built specifically for the purpose of building your offline CA, and don’t connect the host machine to a network.
2. Use a network-attached HSM with a crossover cable connected to the host machine to generate and secure your CA’s keys (most major HSM vendors support virtualized CAs these days).
3. Back up your keys and store them in a secure location. As in, a safe, not your desk drawer.
4. Shut down your offline CA once you’ve done what you need to do with it.
5. Burn at least two copies of your offline CA’s virtual machine to removable media such as external hard drives.
6. DELETE the virtual machine from the host server. If you built the virtual machine on a removable or external drive, all you’re deleting is the configuration information that your virtual machine host maintains about the virtual machine. If you did not build the virtual machine on a removable/external drive, perform a DoD wipe of the disk on which you built it. Even better is to just wipe the entire machine that you used as your host.
7. Place one copy of the virtual machine in your primary datacenter’s safe along with a backup of its keys. Place the second copy in your disaster recovery datacenter’s safe with a second backup of its keys. Ideally, courier a third copy of the virtual machine and backed-up keys to your company’s lawyers with instructions that they must secure it in a safe and must notify your company at intervals when the CA needs to be brought up for maintenance, CRL publishing, certificate renewal, subordinate CA certificate issuance/renewal, etc.
8. At each CRL publication interval (or any other time at which you’re required to spin up the CA), do the following things:
- Fire up the virtualization software you use, [very, very, very] preferably on a freshly-built machine that has never been connected to a network. For example, grab a spare laptop, wipe it, install your virtualization server of choice (Windows Server 2008 Hyper-V rocks, btw).
- Obtain all copies of your backed up virtual machine and keys.
- Connect everything up with your primary copy of the virtual machine and keys and load the virtual machine into your virtualization host.
- Bring up the offline CA. Since you have it up and running, even though you may have only brought it up to publish the CRL, take advantage of the fact that you have it up and running. Apply any patches that need to be applied to the CA. Apply any updates that need to be applied to your virtualization software. If your virtual disk format needs updating, update it. Publish your CRL. Renew certificates that need renewing. Backup your keys again, if necessary.
- Assuming the above all went well, shut down the CA. Burn fresh copies of the virtual machine to each removable/external drive, fire them up to make sure they work, then shut them down and lock everything back up in the safes. Repeat this process whenever you need to bring up your offline CA(s). If you broke something when you were updating your primary copy of the virtual machine, start over with your secondary copy and don’t do whatever you did that broke the first one. Then do the copy-and-lock-up thing.
The Really Important Warnings
Here are the key concepts in the above:
1. Don’t build the virtual machines on a production server that is connected to your network. Build them on a disconnected machine that was built specifically for the purpose of temporarily hosting these virtual machines.
2. If you want to keep the host machine, put it in the safe with your removable drive containing the virtual machine. Do NOT just build all of this on a laptop and stick it in a safe “as is” and hope that when you fire it up again, everything works. Get the virtual machine off of that host machine’s hard drive.
3. If you have a situation wherein you need to bring up the CA in an emergency, having a ready-to-run machine in the safe with the backed-up virtual machine and keys is very handy. However, every time you bring up that machine you still need to do the things above to remove the virtual machine from the parent, because you do not want to find out five years from now during an emergency CRL publication when you’re scrambling to bring up your root CA that the hardware has failed. If you’re going to store a machine in the safe with the virtual machine, every time you maintain the CA, you should also evaluate that host machine to determine if it should be replaced with a newer host machine.
4. Every time you bring up the CA, make sure that you update and re-burn the virtual machine.
5. Never leave the virtual machines sitting on any host that is accessible in any way other than opening the safe and pulling everything out (or calling your lawyers, telling them that you had a massive disaster and that you need them to courier their copy from their safe).
6. Do not sit alone at your desk building these things. Build them in accordance with your company’s security policies, with more than one person sitting there during the entire process.
7. DOCUMENT everything, including the processes around the above, the instructions regarding when the CAs are brought up, instructions for your lawyers, etc.
In other words, treat that virtual machine as you would any crucial and highly secured piece of your infrastructure, albeit doing certain things virtually rather than physically.
Arguments and Rebuttals
On to the last part- here are some of the rebuttals I’ve received in the past when I’ve espoused virtualized offline CAs.
“A virtual machine isn’t as secure as a physical server- somebody could just put it in his/her pocket and walk out with it.”
No, somebody cannot do that, because you’re going to follow my instructions and never have this thing sitting in an accessible location, nor will anybody ever be left alone with it. See why I insist upon multiple people being there for the build and maintenance processes?
“You can’t just stick a laptop in a safe- it’s going to get old and die when you least expect it, and then what?”
That’s why I tell you not to build the virtual machine on a host and leave it there. The whole point of a virtual machine is that it’s hardware agnostic. That means that you can build out a new host each time you crack open the safe rather than praying that the hardware didn’t die while it was gathering dust. Besides, if you built the whole shebang on a physical server in a locked cage in your datacenter, how is that physical machine any less likely to die than the one in the safe? Take the hardware out of the equation.
“How are you going to guard it? Cage-locked servers are more secure.”
Cage-locked servers in a secure datacenter are no more secure than something that is locked in a safe and completely physically untouchable. In fact, at a company where I worked and built a PKI with virtualized offline CAs in the past, one of the chief arguments in favor of virtualization was this- our VP of Operations could walk into our datacenters and do whatever he wanted to do to a physical server in the guise of “maintenance”, and he wouldn’t be challenged because, well, he was the VP of Operations and was a hands-on kind of guy. However, for him to retrieve a virtual machine from a safe in the secure datacenter not only required many more hoops for him to jump through, but would also raise much more suspicion if he tried to do it covertly. In fact, he simply couldn’t just go to the safe, crack it open and start mucking with what was inside. Nuh-uh. No way. Not gonna happen without security guards tackling him to the floor, sorry.
“Virtualization software changes over time. What if your virtual machine is no longer compatible with your virtualization software?”
That’s why you update the virtualization software and virtual machine format whenever you bring up the virtual machine. Additionally, when you stick the virtual machines in the safes, it’s a good idea to also stick in a DVD containing a copy of the virtualization software that you used at the time you spun up the virtual machine. Worst case scenario, you end up having to load some old virtualization software on a sacrificial machine the next time you spin up the virtual machine. And then, of course, you’re going to update everything, remember? Lather, rinse, repeat. This is a regular, albeit infrequent and stringently controlled, operational task. Treat it like such.
“A virtualized CA requires a netHSM, which is more expensive than a dedicated HSM.”
You should be using a netHSM for your issuing CAs, anyway. Use that same netHSM to generate and back up your offline CAs’ keys. But remember, don’t leave them on the HSM. And whenever you need to use the netHSM (you really should have two for redundancy), connect it to the virtual machine host with a crossover cable- while it is not connected to your network. Alternately,
waste spend your money on a dedicated machine with a dedicated HSM and stick that in the safe, then evaluate it every time you fire up your CA to determine if it needs updating. I don’t recommend this, personally. It’s unnecessary expense and increases your risk of hardware failure. So, let’s see- use a device you’re buying anyway (the netHSM), OR buy an additional device (the dedicated HSM) that you’re going to use perhaps forty times, at most, over the next couple of decades. And pray that it doesn’t die. Hm, better buy two of them, just in case. Or maybe three. Or just use the netHSM that you should already be buying anyway.
“Most virtualization software supports scripting APIs, so somebody could remotely attack the virtual machine via the host. Or it could get infected with a Trojan.”
Uh, nope. Host never connects to network. Host never hosts virtual machines without multiple people sitting there watching each other. Host can’t be infected via the network, Host can’t be infected via human being unless multiple human beings are in collusion and don’t value their jobs, and then what are you doing letting them handle such security-sensitive stuff in the first place?
“You rely on the host/parent security as well as the virtual machine security when you build an offline CA in this configuration, while in a dedicated machine scenario you rely on one layer only; you’re increasing your attack surface.”
This is one of those specious arguments I mentioned earlier. It sounds like it makes sense until you really think about it. Let’s think about it.
Host is never connected to a network.
Host is built from scratch every time it’s used (except in the emergency scenario where you might have that laptop that was sitting in the safe that you had to fire up for this emergency). Sounds to me like a heck of a lot more secure host than a physical server sitting in my datacenter where somebody has years to try to figure out a way to get at it (and where it’s probably connected to my network, anyway).
“The procedures that have been taken to secure the host machine will have to be done every time you want to bring the offline CA online since the host hardware is usually reused.”
I smell speciousness again. First, there’s an assumption of hardware reuse, which I have pretty clearly recommended against. Second, how is securing your freshly-built, never-connected-to-the-network host any harder than securing a machine that you’ve left up and running and sitting in your datacenter connected to your network?
“I just don’t like it. We’ve always said it’s not a best practice to virtualize offline CAs.”
Any argument that begins with “we’ve always done it that way” has instantly lost credibility with me. Times change. Hardware changes. Technologies change. Therefore, best practices change. Call me a Darwinist, but I’m of the opinion that dinosaurs are extinct for a reason.
Thanks for reading this very long post, and for those who are wondering- yes, we do support virtualized offline CAs, although I’d strongly recommend that you create and use them with Windows Server 2008 Hyper-V, because we can’t fully support other vendors’ virtualization products.
At some point, I’ll probably write up another really long post around the things that I think you should always do from a policy and procedure perspective when you’re designing and building a PKI.