One of the best parts of my current job is that the email distribution lists I’m on allow me to be a “fly on the wall” for some very interesting, educational conversations. Terry started this one off by asking a question:
I’ve heard it recommended that the number of partitions for an OLTP table should really not go beyond the low hundreds as a maximum. Is this simply for manageability, or is there anything else behind this? E.g. In the case where the degree of parallelism is high due to little/no partition elimination for a particular query
The always-insightful Lubor Kollar offered a response which I found so brilliant in its brevity that I asked his permission to share it here; I was thrilled when he consented:
If you are accessing the table with partitioning column value known (partitions are eliminated) then the number of partitions does not matter. If you are say partitioning on DATE column and you are using CUSTOMER_KEY to seek the records, we must access all partitions and that costs you so many seeks as you have partitions. There the number of partitions linearly increases the elapsed time.
I think I finally understand partitioning now (I’m reminded of college, when I came to understand calculus when I was helping my roommate’s girlfriend with her homework, two years after I’d kissed the ground after I walked out of the last class). If we can use the partitioning column to triage our disk seeks, we will; if we can’t, we have to seek on each partition.
Lubor is one of those people who makes me smarter just by letting me be in the same room with him. Thanks, Lubor, for allowing me to share your insight here.