General Information, My Random Thoughts

So a few years ago I compiled a list of Coding Guidelines, Tips and Tricks for T-SQL performance. I am posting the content of that document here for general information.

Version 1.0

10/14/2013

This document is intended to provide a condensed summary of Best Practices and methods to improve code efficiency. These will be periodically reviewed and updated as needed. If you have any questions about any of these please reach out to a member of the Database Administration team.

Working with Data inside a database is not like procedural programming. Avoid working with data one row at a time design queries that will work with sets of data.
When referring to objects in the database always use the schema name even if it is dbo.
Ordering data will take longer. Using order by clauses should be reserved for a final display or even better let the application that is handling the data return do the sorting. If you need to sort for using the TOP function consider first dropping the entire result set to a temp table and then taking the top off the temp tables.
1. Inside the database engine an Order By will cause the database engine to perform a cursor in order to present the data it correct order, this means that it is processing the data one row at time rather than in sets.
Keep queries as simple as possible. The more complex the logic in a single query the harder it is for the optimizer to come up with a good plan. It may not be a problem when there is a smaller amount of data in the database or table but as that data grows it will make finding an efficient harder.
1. Limit the number of joins and sub queries in single query to 5 or less. And always make sure that inner joins are done first. If you need to break down a large query you can make a temp table to store the data of several inner joins together first and then use that against the outer joins.
  1. If there is several joins and other criteria in the where clause, any criteria that applies to inner joins should be included in the join. This will help the database engine recognize a better query plan faster.
  2. Remember that for Inner Joins the On clause is a filtering clause, for Outer Joins the On statement is a matching clause and all filtering happens in the where clause.

Joins are evaluated Left to Right so the result of the first join is then the input for the next join.

If you are doing a sub query to find certain values, is there a list table that contains a smaller set of data but still has all the values that the list can be pulled from rather than doing a distinct on a table with a large amount of data.
1. Sub Queries are also good candidates to be broken out into temp tables ahead of time, this is especially true if you are using the same sub query in multiple places.
Avoid negative logic this includes <>, !=, not in, and not exists. Negative logic causes the database engine to have to scan the entire column that is being used in the criteria to make sure that the value is not in the data. This will make it harder for the query plan to use indexes.
1. One option for turning negative logic into positive logic is to load the possible values into a temp table and then delete from the temp table the values that are unwanted.
2. If it is not possible to eliminate the negative logic try to do it against as small a dataset or table as possible.
If you are doing an IN statement with literal values and this will account for more than 1/3 of the total table then consider breaking this into a Union statement and in each of the statements of the union only do one or two of the literals. This will help the database engine because each query of the union is a separate plan and it will not have to go through a large amount of data.
Avoid using Select *, this returns more columns than you need and if the underlying structure of the database changes it can break code.
1. Also remember that the order of the columns in a table does not make a difference to the database engine. When columns are added they are added at the end of the table, which is why it is a Best Practice to always specify the precise columns in the order that they are to be returned.
2. On insert statement always specify the columns that are being inserted into. Insert into table (column1, column2) values(1,2)
If there are parameters being passed in that will be used as criteria on a table, make sure they are the same data type as the column they will be compared against.
1. If a new table is created that will have a foreign key in it to another table, those columns need to have the same data types so that when they are joined the join will be more efficient.
  1. If a column needs to be added it should be the smallest data type as possible because this will help speed up queries.
2. If table functions are used to return data to a query for joins, they should only return a small number of rows. The database engine can not estimate how many rows the function might return so if a large number of rows are returned it will cause the performance to slow down.
  1. This same thing also applies to table variables as well.
3. NULLS are special cases, whenever criteria are evaluated the database engine evaluates things to True, False or NULL (Unknown). NULLS are always evaluated to false regardless of what the operator is. There is a setting that allows a change to the behavior of evaluating NULL = NULL but changing that setting (ANSI_NULLS) in the connection string is against ANSI standards. That is a very bad practice, if there are nulls in the columns used by criteria of a query those must be handled as well unless the query only wants data with known values.
  1. One way to reduce the amount of NULL values would be to have Applications, load processes or in some cases the columns themselves have default values if no other is provided.
4. If a criteria uses a LIKE statement it will perform better if the first part of the comparison is static where column1 like ‘aaaa%’ if there is an index on the column the database engine can scan the index and limit the amount of data that it has to work through to complete the comparison.
  1. Avoid using functions on columns in criteria because this can prevent the database engine from using indexes left(columns1,4) = ‘aaaa’ would not perform as well as where column1 like ‘aaaa%’.
5. If data is being loaded and there is the possibility of some rows being inserted and some rows being updated it is more efficient to write a Merge statement so that database engine only has to read through the data once.
  1. If the load would need to delete rows also, TSQL provides for a method of doing that in a merge statement as well (Consult a Sr. DBA if you need more info on this).
  2. If the data being loaded is coming from a file, loading it to a permanent staging table first is the recommended method. This allows for recoverability if issues arise during the load and allows for useful indexes to already be established on the table.
6. Avoid using deprecated features, this makes it more difficult to upgrade in the future. A full list of deprecated features for the current version of SQL and the previous 2 to 3 versions can be found at http://technet.microsoft.com/en-us/library/ms143729.aspx
7. Avoid complex logic in triggers, triggers should be a very quick bit of code that runs
8. Encapsulate more complex tasks in stored procedures rather than views, this will allow use of temp tables to make the complex logic simpler.
  1. Some applications may not allow stored procedures to be used, provided that the database is 2005 or later writing queries that use Common Table Expressions or CTE’s gives similar benefits for breaking out the complex logic into smaller chunks.
9. Look at existing indexes, the development and testing environments should have the same structure unless a change is currently being developed. Looking at the existing indexes can help structure queries more efficiently. Indexes with multiple columns, if they are to be used need to have the criteria in the same order as the index. The first column should be the most selective column meaning that it has lowest amount of repeated values. Columns that are not used as criteria but are returned in the select should be added as included columns.
10. Use of linked servers should be kept to a minimum, but if needed it is best to use a simple query to pull the data back from the linked server and store it in temp table or a stage table rather than doing complex joins across a linked server.
11. Always set nocount on as the first command in stored procedures.
12. When writing stored procedures where DDL statements will be issued, always try and group the DDL statements first in the Stored Procedure and then execute the DML statements.
13. Use Begin/End blocks for Stored Procedures, If logic, While loops so that it is easier to identify code blocks.
14. When calling stored procedures it is always best use the variable names in the call.

Exec dbo.myproc @var1 = ‘test’

Category: General Information

My T-SQL Coding Guidelines

The Case of the Automated DBA