Have you looked at the Deduplication features in Windows Server 2012? I thought it was just “enabled by default”, but it’s not. We have a great article that talks about deduplication, but below are the high points of setting it up.
Data Deduplication Overview
Step 1 is to ensure you’ve installed the deduplication role:
Once you install the role, you then need to configure it. Until you configure deduplication, nothing happens, so here’s how you Configure Data Deduplication.
Make sure that you enable data deduplication and then Set the Deduplication Schedule…:
Take note of the Deduplicate files older than (in days): My suggestion for production environments is to set this number higher, say 20 days. In my test environment, I set this to one day so I would recover disk space now, not later.
Here is where you Enable the background optimization and Enable throughput optimization.
I love that I can setup two schedules. I could have one for weekdays and one for the weekends.
After I setup deduplication on my 2TB drive, I recovered 652GB of disk space. Your mileage may vary, but for this volume I recovered a lot of disk space.
I took a 10 GB file and made four additional copies of it in four separate directories of the same ( D: ) drive. Initially these additional four copies consumed the additional 40 GB, but after deduplication ran, it recovered the additional 40 GB and then some! Take note that deduplication deduplicates blocks. If you have two files that are almost identical, deduplication will still be able to deduplicate the sections of the almost identical file that are duplicates.
Of course, you also have the ability to exclude particular file folders from dedplication.
Until next time,