Working with Large Data Warehouses

 

So you are designing an environment with enough agents and a retention policy necessitating a terabyte data warehouse. In reality, it takes some time to populate a terabyte warehouse. Of course the SAN space has been reserved but who is to say you need to start with a terabyte warehouse? What I recently devised for a client was a 12 month incremental approach. I calculated how much space the DW would require at 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360, 390 day mark. Now at 390 days the maximum size was approximately, 1.6 TB and that is how much space was secured on the SAN. 

However, the data warehouse database initial size was set to that of 30 days. At 25 days it was increased to the 60 day size so on and so forth. In addition to simply increasing the database size, of course the data retention policy must also be changed too. I use dwdatarp, available at   https://blogs.technet.com/momteam/archive/2008/05/14/data-warehouse-data-retention-policy-dwdatarp-exe.aspx 

The primary benefit is:

  1. It ensures no undue burden on the DBAs (Holiday season is approaching so you want to be on their good side) If the database has 22GB of data but is 1+ TB in size, someone will be grumpy.
  2. Provides a mechanism to grow the data warehouse in an exacting fashion, not taking up any more space than necessary.
  3. It makes forecasting easier. The sizing model is a reference for a specific static environment. Factoring in institutional knowledge is virtually impossible. Now you can capture their affect and adjust accordingly.
  4. Performance - Small databases are fast databases. Again, why make it it any bigger than it needs to be.

I am sure there are many opinions as to why this may not be a good idea and I am sure they all have merit. I just wanted to share with you in case you can benefit from it.