SQL Server 2008 Client Design

Here are some notes on "SQL Server 2008 Client Design" I took while attending an advanced class on SQL Server taught by Adam Machanic (https://sqlblog.com/blogs/adam_machanic/default.aspx).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


Talking about code

  • ADO.NET SqlClient, SQLXML and SQLCLR
  • Could you write a small ADO.NET in a few hours
  • Could you write a few queries and return tables
  • Is XML developer technology? Yes, but also admin...
  • Need to understand what a developer feels like
  • Most of the time waiting for the SQL Server...

Interfaces, ADO.NET

  • Interface for an application to get to the data
  • Chunky vs Chatty - Big vs Small interfaces
  • Impacts performance - Each has its place

Chunky Interfaces

  • Extreme case: SELECT with no WHERE clause
  • Looks like: GetAllProducts()
  • Get lots of data in single a calls
  • Large network hit, locks longer, longer load time
  • Less batches, potentially less overall I/O and network traffic

Chatty Interface

  • Extreme case: Always SELECT ... WHERE Column=value
  • Looks like: GetProducts(int[])
  • Get just the exact data you need at that time
  • Less network traffic, less locking, more seeks, more batches
  • Potentially more overall I/O and network traffic
  • Sometimes unecessary calls over time

Finding a balance

  • How much data does your app need to work at once?
  • Is a redesign possible?
  • From GetProductsAll() to GetProducts(int[])
  • From single updates to bulk/batch updates

Demo

  • SQL Profiler running on the SQL side
  • RowGetter demo application on client - Queries on SalesDetails table
  • App options from MinSalesID/MaxSalesID, Chunky/Chatty, etc...
  • Comparing batch query (chunky) vs. One request per row (chatty)
  • SQL can also execute the batch as chatty/chunky/semi-chatty

General Rules

  • Think also about network (not only CPU and IO).
  • The queries are sometimes larger than the results. Think outbound and inbound.
  • Prioritize performance first, then maintainability, then productivity
  • If it performs and is hard to maintain, they'll use it and curse you
  • If it does not perform, they'll not even use it
  • Can you fix the performance problem later on the app? Not cheap...

Service Orientated Architecture (SOA)

  • Boundaries are explicit
  • Services are autonomous
  • Services share schema and contract, not class
  • Compatibility is based upon policy

Mapping Stored Procedures to SOA

  • We've been doing this all along with stored procedures
  • Explicit - Access only via SP
  • Autonomy - If SPs don't change, can be independently deployed
  • Contract - SPs are black block interface to DB
  • Policies - Access control to SPs
  • You don't necessarily need queues and other things to get to SOA
  • Don't over think the problem and waste money

SQLClient Classes

  • SQLConnection - connection to database
  • SQLCommand - command or stored procedure
  • SQLDataReader - connected, tabular "firehose"
  • DataTable - disconnected, tabular result
  • DataSet - collection of DataTables

SQLConnection

  • Exploit connection pool as much as possible
  • Golden rule: open late, close early – or, better worded – acquire later, release early
  • This is important - worth making your code “less maintainable” to achieve it
  • The scope is the app domain
  • 16KB for each connection
  • Connection pooling – same string, same context, connections are reused
  • Connection pooling is not a SQL Server feature, it’s part of ADO.NET

SQLCommand

  • Parameterize your queries
  • Cache SQLCommand objects when possible
  • “Non-result” methods – could be a lot faster
  • ExecuteScalar – gets first column of first row, better than DataReader (use TOP,
  • ExecuteNonQuery – No result sets, but can use output parameters (could be faster - test it!)

Prepared command

  • Use parameters in your commands and turn it on using command.Prepare(), reuse the command
  • ADO.NET will use sp_prepare, sp_execute – first sends text, then sends handle and parameters
  • Fewer bytes over the wire if you’re calling this a number of times
  • This does not apply to SP, since they’re already handled in a similar fashion

SQLDataReader

  • Faster method to get data, forward only, connected while reading the data. Very fast, but inflexible
  • Bad pattern – DataReader read a row, queries a web service, then moves to next one.
  • Buffering of data for DataReader could be on the server or on the client, you don’t know.
  • Release your connections explicitly and early. Or else you end up with orphaned connections.
  • Use batches and .NextResult to get to the other result sets. This is forward only.

Closing the SQLDataReader

  • Garbage collector will release readers once you get out of scope, but it could take a while.
  • Use .ExecuteReader(CommandBehavior.CloseConnection) to close connection when reader is closed.
  • Use .ExecuteReader(CommandBehavior.SequentialAccess) to read only column at a time (good for LOBs).
  • If you call .Close it will get all the other rows in the set (for instance, report interrupted by user).
  • Use .Cancel/.Dispose instead of .Close if you don’t care about the rest of the row and output parameters.
  • You can monitor attention events in SQL Profiler to see if app is using .Cancel/.Dispose

DataTable

  • Cached, in memory access to all of the rows you requested.
  • Disconnected access, can be cached, passed around, read backwards, etc.
  • Sacrifice performance for ultimate flexibility.
  • DataTable is first filled in memory using a DataAdaptor, which uses the DataReader.

DataReader is better than DataTable

  • Forward-only
  • Populating lists, hydrating objects
  • Too much data to fit in memory

DataTable is better than DataReader

  • Need caching
  • If you need to do Sorting, Filtering, DataViews
  • If you need to bind to a Grid or Chart control
  • If you are related sets of data (DataSet – also see MARS)
  • Simplification of disconnected updates (though you could code it more efficiently)
  • Complex processing per row (processing will scale better)

MARS

GetOrdinal

  • Column name indexer calls GetOrdinal under the covers
  • You can optimize by using <int columnx = reader.GetOrdinal(“columnx”)> outside the loop
  • This can make a huge difference in client performance in long loops

ADO.NET is slower SSMS

  • Queries will run perform more slowly in ADO.NET than when running with SSMS
  • Main culprit is usually ARITHABORT setting (affects things like indexed views, indexes on computed columns)
  • Look at sys.dm_exec_plan_attributes WHERE (attribute=’set options’) and (value & 64 = 0)
  • See https://msdn.microsoft.com/en-us/library/ms189472.aspx for a sample query
  • Fix setting user options in the client with “sp_configure ‘user options’”

Parameterization vs. Concatenation

  • Security – prevents SQL injection
  • Performance – Fewer plans in cache, less compilation
  • Consider using “force parameterization” option in database
  • In the app, replace cmd.Text = “SELECT… WHERE ID=”+value with cmd.Parameter.Add(“@ID”,sqlDBType.int)=value;
  • It is optional to specify data size or else it will vary based on the actual size.
  • To overcome, make sure to use cmd.Parameter.Add(“@ID”,sqlDBType.VarChar,4000)=string
  • Do not use Parameters.AddWithValue, since it does not specify size
  • Careful with “parameter sniffing” – SQL can optimize the plan for the first value used

Connection Pooling

  • New connections are expensive. Pooling caches open connections. Acquire late, release early.
  • Eliminates much of cost, make sure there is always a connection available in the pool.
  • Pooling is controlled by unique connection string and user identity.
  • Application roles will create security context that cannot be reset.
  • If your security model uses per-user authentication, it will likely disable pooling.
  • Avoid “initial catalog” option. Use “database” option with three-part naming.
  • See https://msdn.microsoft.com/en-us/library/8xx3tyca.aspx

SQLConnectionStringBuilder

  • SQLConnectionStringBuilder knows how to build connection strings.
  • It’s great for setting just the properties your need.
  • It can also read connection string in, change just a few properties.

Closing connection

  • Understand .iDisposable. “Using” or “try/finally” around the block that uses the connection.
  • Acquire, process and release in the same method. Benefit for DataAdapter.
  • Remember SQLDataReader(CommandBehavior.CloseConnection).
  • Connection leaks. Don’t leave it to the garbage collection…

Pool Size

  • Default Maximum is 100. Consider increasing the maximum for busy apps.
  • Default Minimum is 0. Consider increasing the minimum for a faster warm-up.
  • Both settings are controlled by the connection string: “Max Pool Size” and “Min Pool Size”
  • This is a client-side setting. Monitor using Performance Monitor in the client.
  • Different for web farm, hosted environment, desktop client, etc.
  • See https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx

Monitoring

  • .NET provider
  • - Hard connects/disconnects per second
  • - Soft connects/disconnects per second
  • - Reclaimed connections – should be 0 (garbage collection happened)
  • - Active Pools – should not climb
  • SQL Trace - New behavior for Audit Login and Audit Logout
  • - Hard (nonpooled) logins/logouts – EventSubClass is 1
  • - Soft (pooled) logins/logouts – EventSubClass is 2

Data Caching

  • Proper use of caching can be key to application scalability
  • Consider granularity (user/server), location (client or server), update frequency, expiration, form (data/object)

Query Notification

  • Application tells SQL to watch the query, send notification when it sends
  • Case: Run “SELECT … FROM Orders WHERE Customer=10” with query notification.
  • Case: If new orders for customer 10 show up, SQL will send notification.
  • Uses indexed views, broker. We don’t have to worry so much about when to update the cache.
  • See https://msdn.microsoft.com/en-us/library/ms175110.aspx

Query Notification Issues

  • Indexed views are not free, there’s an overhead.
  • Cache updates can be expensive. Strike a balance.
  • Consider splitting large cache into several smaller caches, with more restrictive queries.
  • Mark cache as “shareable” if possible. Avoid per-user notification.
  • Case: Application global data this is only occasionally updated.
  • Consider pairing with Change Tracking to reduce potential issues.

SQLBulkCopy

  • .NET class that exposes Bulk Copy API
  • Sends rows to SQL, minimally-logged inserts
  • Input via DataTable or iDataReader
  • More efficient than multiple insert statements, especially if minimally-logged
  • Could even be used with FULL recovery mode for better performance
  • Could be used for Bulk updates as well
  • Consider other options like bcp and SSIS, but this is definitely a good one
  • Demo: 10,000 rows using Row-by-row inserts and Bulk Insert
  • Row-by-row Inserts takes 16.7s while Bulk Insert takes less than 1s

SQLBulkCopy - How

SQLBulkCopy - Details

  • Column mapping overrides supported ColumnMappings property
  • Event notification to update UI (every 1000 rows, for instance)
  • Don’t notify too frequently or it will slow things down
  • Tune batch size. SQLCAT found that 10,000 is a sweet spot for ETL
  • See https://msdn.microsoft.com/en-us/library/dd425070.aspx

Table-valued parameters

  • Create a named type that is a table. SQL Server 2008, available from ADO.NET 3.5
  • Syntax: CREATE TYPE name AS TABLE…
  • It’s a table entirely in memory, can be use to pass tables to stored procedure
  • Table types support CHECK, DEFAULT, PRIMARY KEY and UNIQUE constraints
  • Must specify READONLY option in stored procedure parameter (cannot be OUTPUT)

TVP – How

  • Create stored procedures with table type parameter
  • Set up parameters collection as usual
  • Set SqlDBType = SQLDBType.Structure
  • Argument typed as DataTable, dbDataReader or iList<SqlDataRecord>
  • Recommended in books online for up to 1,000 rows (SQLBulkCopy for higher)
  • It uses the Bulk API, but SQL Profiler shows as individual insers (do not replay)
  • See https://msdn.microsoft.com/en-us/library/bb675163.aspx

The Object-Relational Issue

  • Database normalization and object-oriented normalization are different
  • A whole breed of tools called Object-Relation Mappers (ORM) – Create a layer for the Developer
  • Eliminates type issues, better tools, compile-time verification of queries
  • .NET 3.5 shipped with two ORM-style frameworks: Entity Framework (EF) and LINQ to SQL (L2S)

LINQ

  • LINQ = Language Integrated Query – FROM… WHERE… SELECT…
  • It’s actually a good order, makes intellisense work better.
  • This is not a database technology

EF

  • XML-based ORM technology – new layer between app code and data access code
  • Two ways
  • – Entity SQL (eSQL) – SQL dialect design to work with entities
  • – LINQ to Entities (L2E) – Very similar to LINQ to SQL
  • EF queries are always parameterized
  • Developer productivity, reduce amount of bad SQL, reduce cursors, temp table usage

EF details