Log Buffer #118: a Carnival of the Vanities for the DBA


UPDATED 16 Oct 2008 to fix a broken link. UPDATED 10 Oct 2008 to reflect Giuseppe's comment.

 

Welcome to installation #118 of Log Buffer, the weekly review of database blogs. Today, we’re gonna party like it’s 1999!

For those of you who might be new to this little corner of the Internets, my name is Ward Pond, and I’m a Technology Architect for the Microsoft Services Managed Solutions SQL Server Center of Excellence. My responsibilities include development of best practices and training materials around OLTP design, development, and operations, as well as Business Intelligence. Before taking this position, I was a database application designer/developer and support person for 28 years, with most of that spent at Microsoft and a Fortune 50 petrochemical company.

Remember Vulcan? I do.

With apologies to Prince, why am I partying like it’s 1999 in this post? Because in June of that year, I surrendered my platform neutrality and went to work in Redmond. Since the Log Buffer series is database-neutral, I jumped at Dave Edwards’ invitation to contribute an entry – it offers a great motivation to revisit platforms I haven’t used extensively since.. well.. the last millennium.

So, with the introductions out of the way, let’s party..

Over at The Pythian Group (Log Buffer’s home base), Grégory Guillou wonders how fast your data can be with Oracle’s Exadata. He offers a SQLPA script to estimate its impact on a sample query. It’s very interesting stuff, but it should be noted that Grégory ends his article characterizing the predicted 100% improvement as “a bit optimistic” (a workload metric changing from 80388096 to 0 certainly qualifies as “a bit optimistic” in my book) and remarking that he only needs USD$140,000 for an Exadata box on which to test his theory. Sorry, Grégory, but I’m tapped out at the moment..

Gerwin Hendriksen notes the Exadata isn’t for everybody, and offers a good list of reasons why, including high cost and niche applicability. He’s even got pictures of the machines from Oracle Open World at the Moscone Center which he nearly ran afoul of a security guard to get. Later in the week, he points out some issues with the Oracle data cache.

Giuseppe Maxia, the Data Charmer, has a couple of interesting mySQL-related posts this week. In one, he searches for responsible uses for Federated tables; in the other, he discusses the new Event Scheduler in mySQL 5.1.

At the datacharmer's other lair, this one over at Sun, he laments the current dearth of “daring” proposals for the pending mySQL Conference and Expo. One of the comments to this post notes, “a big part of that is probably due to the delays in 5.1 (and thus 6.0 and 7.0). Some of the really cool things that came out 2 conferences ago are not out yet because they all depend on 5.1.” This scenario is certainly a major challenge to a community development model, where great ideas can languish as proposals for a long time.

Slashdot notes the resignation of MySQL’s co-founder from Sun. Cause and effect? Also, read the comments on this post for a primer on the pitfalls of betting corporate data assets on an open source platform.

Matt Reid has a parallel mysqldump backup script that he’d like your help testing.

Denis Gobo has an interesting discussion regarding alternatives for storing IP addresses. If space and programmability are key for you in this space, there are some great insights here. The code is all T-SQL but the techniques should apply cross-platform. Denis also has an interview with Denny Cherry this week.

Euan Garden is one of those people who makes me smarter whenever I’m lucky enough to share a room with him. This week, he’s got a pointer to the new high performance SSIS loaders for Oracle and Teradata. If you Oracle and Teradata DBAs haven’t checked out this tool, this news provides an ideal motivation to do so. Most people who try SSIS are hooked..

The PSS SQL Server Support Engineers report that Microsoft has updated support policies for SQL Server installations running in hardware virtualization environments. Mark Pohto, my former manager in the SQL CoE (a great guy and a fellow musician), is now leading a huge virtualization effort within Microsoft, and there’s been a steady wave of virtualization-related emails moving through my inbox lately. Virtualization is going to be a key component in managing TCO and overhead; you’d be well served to get out in front of this wave if your organization is at all concerned with controlling costs (and these days, who isn’t?).

Mosha Pasumansky checks in with two posts and a wrap-up from this year’s Microsoft BI conference. Project Gemini is the big news, of course, but there is some very cool work going on in the Excel and SharePoint spaces as well.

By the way, if you're looking for a great SharePoint blog, check out my CoE colleague Mike Watson's entry on blogs.msdn.com. Mike's also a great guy, also a fellow musician, and one of the smartest SharePoint guys on the planet.

Paul Nielsen points out the folly of “denormalizing for performance” in his latest post. This is a topic very close to my heart, as I taught data modeling to four groups of MCA: Database (“SQL Ranger”) candidates. Full normalization was a surprisingly hard (but ultimately successful) sell in each of these groups. Getting these design skills out into the world is a very important prerequisite to unleashing the full power of our technology. For the record, I agree with Paul’s contention that a typical fully normalized schema (properly indexed) will outperform a denormalized schema. However, there is enough poorly designed schema loose in the world that we’re never going to escape the need to deal with it.

David Reed, who I first met when he came through SQL Ranger training, has a post out on the SQL Heroes blog touting the latest finalist in the community coding competition. This week’s contender is a highly granular CLR-based time measurement facility written by Gorm Braarvig of Norway. Good luck, Gorm! Including Gorm’s project, there are currently five finalists up on the site.

Brent Ozar has a very funny post linking to Robert Cringely’s prediction that cloud computing will “kill” databases. Brent struggles with both the prediction and a modicum of self-loathing for exposing himself to it at all (“Note to self: it’s my own fault for reading Cringely’s column in the first place. He’s like the Enquirer of IT.”). The picture of Brent in “Hulk Hands” (one of my younger grandson’s favorite toys) is, for me, just the cherry on top of a fine presentation. Brent is now added to my list of people I need to meet.

Tony Rogerson’s latest rambling on SQL Server (it’s the name of his blog, not a value judgment on my part) involves programmatically escaping double-quote characters out of source files for bulk inserts. A very simple and useful technique if you’ve got need of it, although it should be noted that Tony himself suspects that a better approach may be available.

Arnie Rowland’s latest “Rambling of a Harried Technogeek” ponders the applicability of SQL Server to an EAV scenario. Arnie’s concern that the approach may be detrimentally elegant will be familiar to long-time visitors to this space. One of the lessons our business teaches us is that the ability to do a thing doesn’t necessarily make it a good idea; indeed, Arnie’s plaintive question at the end of his post could inspire a volume of books: “when does something that is elegant from one perspective become burdensome and inelegant from another?” My answer would start with, “when it’s difficult for the average geek to maintain,” but that’s clearly just the tip of a very slippery iceberg.

Craig Freedman’s blog, which I last commended to you for its outstanding 2006 series on JOIN semantics, has a marvelous post on random prefetching. One of the hallmarks of Craig’s writing is that he’s always sensitive to the fact that a good idea in one scenario can be a bad idea in another, which leads to incredibly helpful insights such as, “For systems with many hard drives, random prefetching can dramatically improve performance. However, prefetching can adversely affect concurrency as I explained in this post.”

Microsoft MVP Uri Dimant’s Dimant DataBase Solutions blog has a great reminder about limiting the scope of index rebuilds, and some code to help you do that.

My admiration for Kim Tripp knows no bounds; as I’ve noted previously, she gave me some great advice several years ago which was instrumental in moving my career onto its current path. Her latest blog post is a link to a RunAs Radio interview titled Kim Tripp Indexes Everything. She disavows the title, but not the concept; if you’ve had the privilege of sitting in one of query tuning classes, you know how profoundly a sound indexing strategy can impact SQL Server query performance. In fact, I’ll be presenting on this very topic at TechEd Developers EMEA in November.

Finally, the world lost one of its most distinctive and iconic voices with the recent passing of George Carlin. Check out the intro video at his website (don’t click “skip intro”!), which for some reason recalls in my mind the boot-up sequence for a circa-1999 laptop. Perhaps you’ll also be inspired by his final instructions, which appear when the video completes. I know I was. An excerpt:

I prefer a private gathering at my home… The exact nature of this gathering shall be determined by my surviving family... It should be extremely informal, they should play rhythm and blues music, and they should laugh a lot.

Amen to that.

Thus concludes my initial foray into the Log Buffer. Hopefully, all of these structures have successfully flushed to disk for you, and your transactions are in a consistent state. With that, I’ll pass the torch back to Dave Edwards at Pythian, with my sincere thanks to him for the invitation to write, and to you for your time in reading.

                -wp