I spent last Thursday at Black Marble’s Architect Forum and my slot was on SQL, NoSQL, some SQL. It went down very well but I used OneNote on my tablet not PowerPoint and like all modern art my drawing needs a bit of explaining..
Databases were originally used to store transactions which are highly structured, lots of little fields grouped together in tables which in turn had hard relationships plus an environment where a the transaction was either committed in its entirety or it was rolled back. These simple transactions got more complex and other kinds of data got associated with the transaction, the contract, a picture of the product and so on. Nowadays all sorts of stuff get thrown into databases like SQL Server, for example all the content from a SharePoint site. SQL Server has evolved to make this a lot easier and a lot more useful with technologies like FileStream, and the new FileTable in SQL Server 2012 so that the contents can be used as they are still individual files while remaining part of a SQL Server database (in a special file group) . The point of storing all this data is to be able to retrieve it, and if those unstructured files have text in them then they can be included in a full text index so the contents can be searched as well as the metadata about the file.
The next thing to consider is how are we using the database:
I have already mentioned transactions and content management, and two other key uses are business intelligence and as a backend for web sites. This extended usage doesn’t really cause any problems in itself until we consider the users:
The Developer, don’t like SQL and this phobia started the No SQL movement which has changed to be Not Only SQL so (NOSQL). You don’t have to use SQL to develop with there is entity framework and link to bridge the gap between the relational world I know and the object world of the modern developer. You might need SQL to create objects or modify them and of course keep that under source control, however the SQL Server Developer Tools mean that it’s not necessary to get your hands dirty if you don’t want to. Another problem for corporate developers is the need to hand of the project top the IT department for deployment as developers are rarely allowed access to a production environment. so there’s the Data Tier Application for that which wraps up all the database code for deployment by the DBA.
The DBA. The challenge of SQL Server for the modern DBA is that there can be quite a lot of databases which are only getting larger and many businesses don’t have a dedicated DBA anyway. So there’s extensive tooling for managing lots of database servers using Policies (like there are policies in Windows) and PowerShell support which should be familiar to a part time DBA and allows scripts to act on SQL Server as part of a bigger script to provision virtual machines create logins in Active Directory etc.
The Information Workers (IW) . This is Microsoft speak for the end user and they should not have to learn SQL. They should also be insulated from having to know too much about the detailed data structure of the systems they want to analyse and report from, so in many BI solutions there’s a semantic layer to allow users to drag and drop data without understanding the SQL or the relationships. Having said that the power users often do have some of this knowledge and they do need to understand how to join sets of data together. In PowerPivot 2012 for Excel these users actually create an entity framework which they can then share with their less technical colleagues either via SharePoint or as a BI Semantic Model deployed from analysis services in SQL Server.
There’s a bunch of technical developments which have helped to keep databases in general and specifically SQL Server relevant and fast while data volumes continue to explode:
Column based indexing rather than storing data in blocks of rows gives great read performance, the indexes can be seriously compressed which then means its possible to cache them in memory for further performance increases. solid state storage just make things run faster with no need to change the design of a database but their expensive and possibly less resilient so a good first step is to use them for caching e.g. to put tempdb on them while the actual database still resides on a SAN behind a cluster.
I have covered off FileTable above and spatial is actually a type of structured data which I should have drawn on the left, however it worth noting again that this isn’t just about storing that data it’s about having a rich set of functions to query the data and fast indexing to ensure the queries run quickly .
The current hot topic is Big Data, the ability to store anything and everything without worrying to much about structure. What’s important here as with any data store is the ability to search it, analyse it and make decision from it. In an agile world this needs to be done quickly and by the business not the techies who understand the complexities of that data. So while it’s nice to see that Hadoop is going to run on Azure to store mountains of data what interests me is the tooling that will be put into Excel to make that store of data useful and directly accessible to the user.
Another problem for a database is velocity; the ability to be able to react quickly to incoming data and make decisions from it. Stream Insight is actually not really anything to with SQL Server in that you don’t need a database to use it but it is part of the product, and there are occasions where the end product cold be a feed into a database so I think I can include it here. It is a set of classes to proceeds feeds of data near real time (i.e. sub one second) by aggregating it and raising events off of it to other systems. It is a sort of real time map reduce and uses LINQ so no actual SQL is required to code it either.
Putting all this together there we have Not Only SQL Server (NO SQL Server ?) databases an ecosystem around the actual database engine which is still part of the platform in SQL Server, where you can elect how to store and process data using familiar tools in unfamiliar ways, to create information and insight. Most of the stuff I have mentioned here is doable in the current version (SQL Server 2008 R2) however File Table is new for SQL Server 2012 and PowerPivot has been enhanced to work with the new BI semantic model so you’ll need to look at the current beta if that sounds interesting