SQL Server 2008 Client Design

Article
03/23/2009

Here are some notes on "SQL Server 2008 Client Design" I took while attending an advanced class on SQL Server taught by Adam Machanic (https://sqlblog.com/blogs/adam_machanic/default.aspx).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Talking about code

ADO.NET SqlClient, SQLXML and SQLCLR
Could you write a small ADO.NET in a few hours
Could you write a few queries and return tables
Is XML developer technology? Yes, but also admin...
Need to understand what a developer feels like
Most of the time waiting for the SQL Server...

Interfaces, ADO.NET

Interface for an application to get to the data
Chunky vs Chatty - Big vs Small interfaces
Impacts performance - Each has its place

Chunky Interfaces

Extreme case: SELECT with no WHERE clause
Looks like: GetAllProducts()
Get lots of data in single a calls
Large network hit, locks longer, longer load time
Less batches, potentially less overall I/O and network traffic

Chatty Interface

Extreme case: Always SELECT ... WHERE Column=value
Looks like: GetProducts(int[])
Get just the exact data you need at that time
Less network traffic, less locking, more seeks, more batches
Potentially more overall I/O and network traffic
Sometimes unecessary calls over time

Finding a balance

How much data does your app need to work at once?
Is a redesign possible?
From GetProductsAll() to GetProducts(int[])
From single updates to bulk/batch updates

Demo

SQL Profiler running on the SQL side
RowGetter demo application on client - Queries on SalesDetails table
App options from MinSalesID/MaxSalesID, Chunky/Chatty, etc...
Comparing batch query (chunky) vs. One request per row (chatty)
SQL can also execute the batch as chatty/chunky/semi-chatty

General Rules

Think also about network (not only CPU and IO).
The queries are sometimes larger than the results. Think outbound and inbound.
Prioritize performance first, then maintainability, then productivity
If it performs and is hard to maintain, they'll use it and curse you
If it does not perform, they'll not even use it
Can you fix the performance problem later on the app? Not cheap...

Service Orientated Architecture (SOA)

Boundaries are explicit
Services are autonomous
Services share schema and contract, not class
Compatibility is based upon policy

Mapping Stored Procedures to SOA

We've been doing this all along with stored procedures
Explicit - Access only via SP
Autonomy - If SPs don't change, can be independently deployed
Contract - SPs are black block interface to DB
Policies - Access control to SPs
You don't necessarily need queues and other things to get to SOA
Don't over think the problem and waste money

SQLClient Classes

SQLConnection - connection to database
SQLCommand - command or stored procedure
SQLDataReader - connected, tabular "firehose"
DataTable - disconnected, tabular result
DataSet - collection of DataTables

SQLConnection

Exploit connection pool as much as possible
Golden rule: open late, close early – or, better worded – acquire later, release early
This is important - worth making your code “less maintainable” to achieve it
The scope is the app domain
16KB for each connection
Connection pooling – same string, same context, connections are reused
Connection pooling is not a SQL Server feature, it’s part of ADO.NET

SQLCommand

Parameterize your queries
Cache SQLCommand objects when possible
“Non-result” methods – could be a lot faster
ExecuteScalar – gets first column of first row, better than DataReader (use TOP,
ExecuteNonQuery – No result sets, but can use output parameters (could be faster - test it!)

Prepared command

Use parameters in your commands and turn it on using command.Prepare(), reuse the command
ADO.NET will use sp_prepare, sp_execute – first sends text, then sends handle and parameters
Fewer bytes over the wire if you’re calling this a number of times
This does not apply to SP, since they’re already handled in a similar fashion

SQLDataReader

Faster method to get data, forward only, connected while reading the data. Very fast, but inflexible
Bad pattern – DataReader read a row, queries a web service, then moves to next one.
Buffering of data for DataReader could be on the server or on the client, you don’t know.
Release your connections explicitly and early. Or else you end up with orphaned connections.
Use batches and .NextResult to get to the other result sets. This is forward only.

Closing the SQLDataReader

Garbage collector will release readers once you get out of scope, but it could take a while.
Use .ExecuteReader(CommandBehavior.CloseConnection) to close connection when reader is closed.
Use .ExecuteReader(CommandBehavior.SequentialAccess) to read only column at a time (good for LOBs).
If you call .Close it will get all the other rows in the set (for instance, report interrupted by user).
Use .Cancel/.Dispose instead of .Close if you don’t care about the rest of the row and output parameters.
You can monitor attention events in SQL Profiler to see if app is using .Cancel/.Dispose

DataTable

Cached, in memory access to all of the rows you requested.
Disconnected access, can be cached, passed around, read backwards, etc.
Sacrifice performance for ultimate flexibility.
DataTable is first filled in memory using a DataAdaptor, which uses the DataReader.

DataReader is better than DataTable

Forward-only
Populating lists, hydrating objects
Too much data to fit in memory

DataTable is better than DataReader

Need caching
If you need to do Sorting, Filtering, DataViews
If you need to bind to a Grid or Chart control
If you are related sets of data (DataSet – also see MARS)
Simplification of disconnected updates (though you could code it more efficiently)
Complex processing per row (processing will scale better)

MARS

Multiple Active Result Sets
Single batch can return multiple results, without a DataTable
Multiple Readers interleaved on related keys
Fast loading of related data, similar to a “merge join”
See https://msdn.microsoft.com/en-us/library/ms345109.aspx

GetOrdinal

Column name indexer calls GetOrdinal under the covers
You can optimize by using <int columnx = reader.GetOrdinal(“columnx”)> outside the loop
This can make a huge difference in client performance in long loops

ADO.NET is slower SSMS

Queries will run perform more slowly in ADO.NET than when running with SSMS
Main culprit is usually ARITHABORT setting (affects things like indexed views, indexes on computed columns)
Look at sys.dm_exec_plan_attributes WHERE (attribute=’set options’) and (value & 64 = 0)
See https://msdn.microsoft.com/en-us/library/ms189472.aspx for a sample query
Fix setting user options in the client with “sp_configure ‘user options’”

Parameterization vs. Concatenation

Security – prevents SQL injection
Performance – Fewer plans in cache, less compilation
Consider using “force parameterization” option in database
In the app, replace cmd.Text = “SELECT… WHERE ID=”+value with cmd.Parameter.Add(“@ID”,sqlDBType.int)=value;
It is optional to specify data size or else it will vary based on the actual size.
To overcome, make sure to use cmd.Parameter.Add(“@ID”,sqlDBType.VarChar,4000)=string
Do not use Parameters.AddWithValue, since it does not specify size
Careful with “parameter sniffing” – SQL can optimize the plan for the first value used

Connection Pooling

New connections are expensive. Pooling caches open connections. Acquire late, release early.
Eliminates much of cost, make sure there is always a connection available in the pool.
Pooling is controlled by unique connection string and user identity.
Application roles will create security context that cannot be reset.
If your security model uses per-user authentication, it will likely disable pooling.
Avoid “initial catalog” option. Use “database” option with three-part naming.
See https://msdn.microsoft.com/en-us/library/8xx3tyca.aspx

SQLConnectionStringBuilder

SQLConnectionStringBuilder knows how to build connection strings.
It’s great for setting just the properties your need.
It can also read connection string in, change just a few properties.

Closing connection

Understand .iDisposable. “Using” or “try/finally” around the block that uses the connection.
Acquire, process and release in the same method. Benefit for DataAdapter.
Remember SQLDataReader(CommandBehavior.CloseConnection).
Connection leaks. Don’t leave it to the garbage collection…

Pool Size

Default Maximum is 100. Consider increasing the maximum for busy apps.
Default Minimum is 0. Consider increasing the minimum for a faster warm-up.
Both settings are controlled by the connection string: “Max Pool Size” and “Min Pool Size”
This is a client-side setting. Monitor using Performance Monitor in the client.
Different for web farm, hosted environment, desktop client, etc.
See https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx

Monitoring

.NET provider
- Hard connects/disconnects per second
- Soft connects/disconnects per second
- Reclaimed connections – should be 0 (garbage collection happened)
- Active Pools – should not climb
SQL Trace - New behavior for Audit Login and Audit Logout
- Hard (nonpooled) logins/logouts – EventSubClass is 1
- Soft (pooled) logins/logouts – EventSubClass is 2

Data Caching

Proper use of caching can be key to application scalability
Consider granularity (user/server), location (client or server), update frequency, expiration, form (data/object)

Query Notification

Application tells SQL to watch the query, send notification when it sends
Case: Run “SELECT … FROM Orders WHERE Customer=10” with query notification.
Case: If new orders for customer 10 show up, SQL will send notification.
Uses indexed views, broker. We don’t have to worry so much about when to update the cache.
See https://msdn.microsoft.com/en-us/library/ms175110.aspx

Query Notification Issues

Indexed views are not free, there’s an overhead.
Cache updates can be expensive. Strike a balance.
Consider splitting large cache into several smaller caches, with more restrictive queries.
Mark cache as “shareable” if possible. Avoid per-user notification.
Case: Application global data this is only occasionally updated.
Consider pairing with Change Tracking to reduce potential issues.

SQLBulkCopy

.NET class that exposes Bulk Copy API
Sends rows to SQL, minimally-logged inserts
Input via DataTable or iDataReader
More efficient than multiple insert statements, especially if minimally-logged
Could even be used with FULL recovery mode for better performance
Could be used for Bulk updates as well
Consider other options like bcp and SSIS, but this is definitely a good one
Demo: 10,000 rows using Row-by-row inserts and Bulk Insert
Row-by-row Inserts takes 16.7s while Bulk Insert takes less than 1s

SQLBulkCopy - How

Extremely easy to use:
- Instantiate SqlBulkCopy class
- Specify destination table
- Call WriteToServer passing DataTable or iDataReader
For ML insert, use SqlBulkCopyOptions and specify TableLock option.
See https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

SQLBulkCopy - Details

Column mapping overrides supported ColumnMappings property
Event notification to update UI (every 1000 rows, for instance)
Don’t notify too frequently or it will slow things down
Tune batch size. SQLCAT found that 10,000 is a sweet spot for ETL
See https://msdn.microsoft.com/en-us/library/dd425070.aspx

Table-valued parameters

Create a named type that is a table. SQL Server 2008, available from ADO.NET 3.5
Syntax: CREATE TYPE name AS TABLE…
It’s a table entirely in memory, can be use to pass tables to stored procedure
Table types support CHECK, DEFAULT, PRIMARY KEY and UNIQUE constraints
Must specify READONLY option in stored procedure parameter (cannot be OUTPUT)

TVP – How

Create stored procedures with table type parameter
Set up parameters collection as usual
Set SqlDBType = SQLDBType.Structure
Argument typed as DataTable, dbDataReader or iList<SqlDataRecord>
Recommended in books online for up to 1,000 rows (SQLBulkCopy for higher)
It uses the Bulk API, but SQL Profiler shows as individual insers (do not replay)
See https://msdn.microsoft.com/en-us/library/bb675163.aspx

The Object-Relational Issue

Database normalization and object-oriented normalization are different
A whole breed of tools called Object-Relation Mappers (ORM) – Create a layer for the Developer
Eliminates type issues, better tools, compile-time verification of queries
.NET 3.5 shipped with two ORM-style frameworks: Entity Framework (EF) and LINQ to SQL (L2S)

LINQ

LINQ = Language Integrated Query – FROM… WHERE… SELECT…
It’s actually a good order, makes intellisense work better.
This is not a database technology

XML-based ORM technology – new layer between app code and data access code
Two ways
– Entity SQL (eSQL) – SQL dialect design to work with entities
– LINQ to Entities (L2E) – Very similar to LINQ to SQL
EF queries are always parameterized
Developer productivity, reduce amount of bad SQL, reduce cursors, temp table usage

EF details

Careful: Learning curve, Variable-length parameters, automated T-SQL, permissions at database
Best practices: Log and monitor SQL being produced, consider using SP
See https://code.msdn.microsoft.com/esql
See https://www.thedatafarm.com/blog/2009/02/16/LoggingEntityFrameworkQueriesToLookForPerfImprovementOpportunities.aspx
Treat EF layer as Data Access layer

SQL Server 2008 Client Design

Additional resources