Working with Large Data Warehouses


So you are designing an environment with enough agents and a retention policy necessitating a terabyte data warehouse. In reality, it takes some time to populate a terabyte warehouse. Of course the SAN space has been reserved but who is to say you need to start with a terabyte warehouse? What I recently devised for a client was a 12 month incremental approach. I calculated how much space the DW would require at 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360, 390 day mark. Now at 390 days the maximum size was approximately, 1.6 TB and that is how much space was secured on the SAN. 

However, the data warehouse database initial size was set to that of 30 days. At 25 days it was increased to the 60 day size so on and so forth. In addition to simply increasing the database size, of course the data retention policy must also be changed too. I use dwdatarp, available at 

The primary benefit is:

  1. It ensures no undue burden on the DBAs (Holiday season is approaching so you want to be on their good side) If the database has 22GB of data but is 1+ TB in size, someone will be grumpy.

  2. Provides a mechanism to grow the data warehouse in an exacting fashion, not taking up any more space than necessary.

  3. It makes forecasting easier. The sizing model is a reference for a specific static environment. Factoring in institutional knowledge is virtually impossible. Now you can capture their affect and adjust accordingly.

  4. Performance – Small databases are fast databases. Again, why make it it any bigger than it needs to be.

I am sure there are many opinions as to why this may not be a good idea and I am sure they all have merit. I just wanted to share with you in case you can benefit from it.

Comments (2)

  1. Anonymous says:

    daviesg – thanks for the feedback. Don’t ever be sorry to share your opinion or thoughts. You bring up some very good points and perhaps I published too quickly before investigating further.

  2. Anonymous says:


    Sorry but I have to disagree. Be very careful with this approach. What you might find is that performance is great at the beginning but that you will get significant disk (OS level) fragmentation as the database warehouse grows (to some extent depending on the growth rate set).

    1) SQL DBAs are always grumpy 😉 Don’t design something to keep someone happy. Design it for performance.

    2) If the space on the SAN is allocated to the Data Warehouse then this is irrelevant.

    3) How does it make forecasting easier? Just monitor the data file size over time. That is what Operations Manager is for.

    4) Wrong. if you have 200 Gb of data there will be the same amount of data in a 1 TB database as in a 300 GB database. The rest is white space.

    I realise you say that there may be reasons why it isn’t a good idea …. and I (personally) think they outweigh any potential benefit that might be gained. But that is just my opinion 😉

    Have fun