Here are some notes on “SQL Server 2008 Client Design” I took while attending an advanced class on SQL Server taught by Adam Machanic (http://sqlblog.com/blogs/adam_machanic/default.aspx).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Talking about code
- ADO.NET SqlClient, SQLXML and SQLCLR
- Could you write a small ADO.NET in a few hours
- Could you write a few queries and return tables
- Is XML developer technology? Yes, but also admin…
- Need to understand what a developer feels like
- Most of the time waiting for the SQL Server…
- Interface for an application to get to the data
- Chunky vs Chatty – Big vs Small interfaces
- Impacts performance – Each has its place
- Extreme case: SELECT with no WHERE clause
- Looks like: GetAllProducts()
- Get lots of data in single a calls
- Large network hit, locks longer, longer load time
- Less batches, potentially less overall I/O and network traffic
- Extreme case: Always SELECT … WHERE Column=value
- Looks like: GetProducts(int)
- Get just the exact data you need at that time
- Less network traffic, less locking, more seeks, more batches
- Potentially more overall I/O and network traffic
- Sometimes unecessary calls over time
Finding a balance
- How much data does your app need to work at once?
- Is a redesign possible?
- From GetProductsAll() to GetProducts(int)
- From single updates to bulk/batch updates
- SQL Profiler running on the SQL side
- RowGetter demo application on client – Queries on SalesDetails table
- App options from MinSalesID/MaxSalesID, Chunky/Chatty, etc…
- Comparing batch query (chunky) vs. One request per row (chatty)
- SQL can also execute the batch as chatty/chunky/semi-chatty
- Think also about network (not only CPU and IO).
- The queries are sometimes larger than the results. Think outbound and inbound.
- Prioritize performance first, then maintainability, then productivity
- If it performs and is hard to maintain, they’ll use it and curse you
- If it does not perform, they’ll not even use it
- Can you fix the performance problem later on the app? Not cheap…
Service Orientated Architecture (SOA)
- Boundaries are explicit
- Services are autonomous
- Services share schema and contract, not class
- Compatibility is based upon policy
Mapping Stored Procedures to SOA
- We’ve been doing this all along with stored procedures
- Explicit – Access only via SP
- Autonomy – If SPs don’t change, can be independently deployed
- Contract – SPs are black block interface to DB
- Policies – Access control to SPs
- You don’t necessarily need queues and other things to get to SOA
- Don’t over think the problem and waste money
- SQLConnection – connection to database
- SQLCommand – command or stored procedure
- SQLDataReader – connected, tabular “firehose”
- DataTable – disconnected, tabular result
- DataSet – collection of DataTables
- Exploit connection pool as much as possible
- Golden rule: open late, close early – or, better worded – acquire later, release early
- This is important – worth making your code “less maintainable” to achieve it
- The scope is the app domain
- 16KB for each connection
- Connection pooling – same string, same context, connections are reused
- Connection pooling is not a SQL Server feature, it’s part of ADO.NET
- Parameterize your queries
- Cache SQLCommand objects when possible
- “Non-result” methods – could be a lot faster
- ExecuteScalar – gets first column of first row, better than DataReader (use TOP,
- ExecuteNonQuery – No result sets, but can use output parameters (could be faster – test it!)
- Use parameters in your commands and turn it on using command.Prepare(), reuse the command
- ADO.NET will use sp_prepare, sp_execute – first sends text, then sends handle and parameters
- Fewer bytes over the wire if you’re calling this a number of times
- This does not apply to SP, since they’re already handled in a similar fashion
- Faster method to get data, forward only, connected while reading the data. Very fast, but inflexible
- Bad pattern – DataReader read a row, queries a web service, then moves to next one.
- Buffering of data for DataReader could be on the server or on the client, you don’t know.
- Release your connections explicitly and early. Or else you end up with orphaned connections.
- Use batches and .NextResult to get to the other result sets. This is forward only.
Closing the SQLDataReader
- Garbage collector will release readers once you get out of scope, but it could take a while.
- Use .ExecuteReader(CommandBehavior.CloseConnection) to close connection when reader is closed.
- Use .ExecuteReader(CommandBehavior.SequentialAccess) to read only column at a time (good for LOBs).
- If you call .Close it will get all the other rows in the set (for instance, report interrupted by user).
- Use .Cancel/.Dispose instead of .Close if you don’t care about the rest of the row and output parameters.
- You can monitor attention events in SQL Profiler to see if app is using .Cancel/.Dispose
- Cached, in memory access to all of the rows you requested.
- Disconnected access, can be cached, passed around, read backwards, etc.
- Sacrifice performance for ultimate flexibility.
- DataTable is first filled in memory using a DataAdaptor, which uses the DataReader.
DataReader is better than DataTable
- Populating lists, hydrating objects
- Too much data to fit in memory
DataTable is better than DataReader
- Need caching
- If you need to do Sorting, Filtering, DataViews
- If you need to bind to a Grid or Chart control
- If you are related sets of data (DataSet – also see MARS)
- Simplification of disconnected updates (though you could code it more efficiently)
- Complex processing per row (processing will scale better)
- Multiple Active Result Sets
- Single batch can return multiple results, without a DataTable
- Multiple Readers interleaved on related keys
- Fast loading of related data, similar to a “merge join”
- See http://msdn.microsoft.com/en-us/library/ms345109.aspx
- Column name indexer calls GetOrdinal under the covers
- You can optimize by using <int columnx = reader.GetOrdinal(“columnx”)> outside the loop
- This can make a huge difference in client performance in long loops
ADO.NET is slower SSMS
- Queries will run perform more slowly in ADO.NET than when running with SSMS
- Main culprit is usually ARITHABORT setting (affects things like indexed views, indexes on computed columns)
- Look at sys.dm_exec_plan_attributes WHERE (attribute=’set options’) and (value & 64 = 0)
- See http://msdn.microsoft.com/en-us/library/ms189472.aspx for a sample query
- Fix setting user options in the client with “sp_configure ‘user options’”
Parameterization vs. Concatenation
- Security – prevents SQL injection
- Performance – Fewer plans in cache, less compilation
- Consider using “force parameterization” option in database
- In the app, replace cmd.Text = “SELECT… WHERE ID=”+value with cmd.Parameter.Add(“@ID”,sqlDBType.int)=value;
- It is optional to specify data size or else it will vary based on the actual size.
- To overcome, make sure to use cmd.Parameter.Add(“@ID”,sqlDBType.VarChar,4000)=string
- Do not use Parameters.AddWithValue, since it does not specify size
- Careful with “parameter sniffing” – SQL can optimize the plan for the first value used
- New connections are expensive. Pooling caches open connections. Acquire late, release early.
- Eliminates much of cost, make sure there is always a connection available in the pool.
- Pooling is controlled by unique connection string and user identity.
- Application roles will create security context that cannot be reset.
- If your security model uses per-user authentication, it will likely disable pooling.
- Avoid “initial catalog” option. Use “database” option with three-part naming.
- See http://msdn.microsoft.com/en-us/library/8xx3tyca.aspx
- SQLConnectionStringBuilder knows how to build connection strings.
- It’s great for setting just the properties your need.
- It can also read connection string in, change just a few properties.
- Understand .iDisposable. “Using” or “try/finally” around the block that uses the connection.
- Acquire, process and release in the same method. Benefit for DataAdapter.
- Remember SQLDataReader(CommandBehavior.CloseConnection).
- Connection leaks. Don’t leave it to the garbage collection…
- Default Maximum is 100. Consider increasing the maximum for busy apps.
- Default Minimum is 0. Consider increasing the minimum for a faster warm-up.
- Both settings are controlled by the connection string: “Max Pool Size” and “Min Pool Size”
- This is a client-side setting. Monitor using Performance Monitor in the client.
- Different for web farm, hosted environment, desktop client, etc.
- See http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx
- .NET provider
- – Hard connects/disconnects per second
- – Soft connects/disconnects per second
- – Reclaimed connections – should be 0 (garbage collection happened)
- – Active Pools – should not climb
- SQL Trace – New behavior for Audit Login and Audit Logout
- – Hard (nonpooled) logins/logouts – EventSubClass is 1
- – Soft (pooled) logins/logouts – EventSubClass is 2
- Proper use of caching can be key to application scalability
- Consider granularity (user/server), location (client or server), update frequency, expiration, form (data/object)
- Application tells SQL to watch the query, send notification when it sends
- Case: Run “SELECT … FROM Orders WHERE Customer=10” with query notification.
- Case: If new orders for customer 10 show up, SQL will send notification.
- Uses indexed views, broker. We don’t have to worry so much about when to update the cache.
- See http://msdn.microsoft.com/en-us/library/ms175110.aspx
Query Notification Issues
- Indexed views are not free, there’s an overhead.
- Cache updates can be expensive. Strike a balance.
- Consider splitting large cache into several smaller caches, with more restrictive queries.
- Mark cache as “shareable” if possible. Avoid per-user notification.
- Case: Application global data this is only occasionally updated.
- Consider pairing with Change Tracking to reduce potential issues.
- .NET class that exposes Bulk Copy API
- Sends rows to SQL, minimally-logged inserts
- Input via DataTable or iDataReader
- More efficient than multiple insert statements, especially if minimally-logged
- Could even be used with FULL recovery mode for better performance
- Could be used for Bulk updates as well
- Consider other options like bcp and SSIS, but this is definitely a good one
- Demo: 10,000 rows using Row-by-row inserts and Bulk Insert
- Row-by-row Inserts takes 16.7s while Bulk Insert takes less than 1s
SQLBulkCopy – How
- Extremely easy to use:
- – Instantiate SqlBulkCopy class
- – Specify destination table
- – Call WriteToServer passing DataTable or iDataReader
- For ML insert, use SqlBulkCopyOptions and specify TableLock option.
- See http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
SQLBulkCopy – Details
- Column mapping overrides supported ColumnMappings property
- Event notification to update UI (every 1000 rows, for instance)
- Don’t notify too frequently or it will slow things down
- Tune batch size. SQLCAT found that 10,000 is a sweet spot for ETL
- See http://msdn.microsoft.com/en-us/library/dd425070.aspx
- Create a named type that is a table. SQL Server 2008, available from ADO.NET 3.5
- Syntax: CREATE TYPE name AS TABLE…
- It’s a table entirely in memory, can be use to pass tables to stored procedure
- Table types support CHECK, DEFAULT, PRIMARY KEY and UNIQUE constraints
- Must specify READONLY option in stored procedure parameter (cannot be OUTPUT)
TVP – How
- Create stored procedures with table type parameter
- Set up parameters collection as usual
- Set SqlDBType = SQLDBType.Structure
- Argument typed as DataTable, dbDataReader or iList<SqlDataRecord>
- Recommended in books online for up to 1,000 rows (SQLBulkCopy for higher)
- It uses the Bulk API, but SQL Profiler shows as individual insers (do not replay)
- See http://msdn.microsoft.com/en-us/library/bb675163.aspx
The Object-Relational Issue
- Database normalization and object-oriented normalization are different
- A whole breed of tools called Object-Relation Mappers (ORM) – Create a layer for the Developer
- Eliminates type issues, better tools, compile-time verification of queries
- .NET 3.5 shipped with two ORM-style frameworks: Entity Framework (EF) and LINQ to SQL (L2S)
- LINQ = Language Integrated Query – FROM… WHERE… SELECT…
- It’s actually a good order, makes intellisense work better.
- This is not a database technology
- XML-based ORM technology – new layer between app code and data access code
- Two ways
- – Entity SQL (eSQL) – SQL dialect design to work with entities
- – LINQ to Entities (L2E) – Very similar to LINQ to SQL
- EF queries are always parameterized
- Developer productivity, reduce amount of bad SQL, reduce cursors, temp table usage
- Careful: Learning curve, Variable-length parameters, automated T-SQL, permissions at database
- Best practices: Log and monitor SQL being produced, consider using SP
- See http://code.msdn.microsoft.com/esql
- See http://www.thedatafarm.com/blog/2009/02/16/LoggingEntityFrameworkQueriesToLookForPerfImprovementOpportunities.aspx
- Treat EF layer as Data Access layer