Dare has some good points on the debate around the scalability of twitter, relating it to some similar challenges Exchange has faced over the years. Another related issue is that in a single-instance model where you have a pointer to the content, this increases the number of read I/Os necessary to retrieve data – which could end up being net worse overall (running out of IOPs long before you run out of capacity) than if you just had multiple copies of each piece of data.
We’ve focused a lot on improving scale over the years but the number one factor in that scale, for Exchange’s profile, has been disk IO – which is why we focused so heavily on that in Exchange 2007, reducing it by as much as up to 70% when you follow the configuration guidelines. We’re like a geeky band of seven dwarfs, working and humming: “IO, IO, increase the scale we go….” For years I’ve heard people outside the company say “Exchange can’t scale because it’s on JET, Exchange needs to move to SQL” – which is just patently ridiculous. The version of JET Exchange uses has been carefully tuned over the years for Exchange’s usage profile, which includes random IO and comparatively more writes as compared to reads as mail is constantly flowing in and out of the system; compare this to the IO profiles of many apps built on SQL which have a much higher mix of read vs write IOs.
Of course, I’m simplifying – like Dare says on the twitter debate, there are far more issues here at play. Thankfully there are a bunch of brilliant folks who work on this area in Exchange and do the heavy lifting of architecting the system properly so that customers can enjoy the benefits.