My T-SQL Coding Guidelines

So a few years ago I compiled a list of Coding Guidelines, Tips and Tricks for T-SQL performance.  I am posting the content of that document here for general information.

 

 

Version 1.0

10/14/2013

 

This document is intended to provide a condensed summary of Best Practices and methods to improve code efficiency.  These will be periodically reviewed and updated as needed.  If you have any questions about any of these please reach out to a member of the Database Administration team. 

 

  1. Working with Data inside a database is not like procedural programming. Avoid working with data one row at a time design queries that will work with sets of data. 
  2. When referring to objects in the database always use the schema name even if it is dbo.
  3. Ordering data will take longer. Using order by clauses should be reserved for a final display or even better let the application that is handling the data return do the sorting.  If you need to sort for using the TOP function consider first dropping the entire result set to a temp table and then taking the top off the temp tables.
    1. Inside the database engine an Order By will cause the database engine to perform a cursor in order to present the data it correct order, this means that it is processing the data one row at time rather than in sets.
  4. Keep queries as simple as possible. The more complex the logic in a single query the harder it is for the optimizer to come up with a good plan.  It may not be a problem when there is a smaller amount of data in the database or table but as that data grows it will make finding an efficient harder.
    1. Limit the number of joins and sub queries in single query to 5 or less. And always make sure that inner joins are done first.  If you need to break down a large query you can make a temp table to store the data of several inner joins together first and then use that against the outer joins.
      1. If there is several joins and other criteria in the where clause, any criteria that applies to inner joins should be included in the join. This will help the database engine recognize a better query plan faster.
      2. Remember that for Inner Joins the On clause is a filtering clause, for Outer Joins the On statement is a matching clause and all filtering happens in the where clause.
  • Joins are evaluated Left to Right so the result of the first join is then the input for the next join.
  1. If you are doing a sub query to find certain values, is there a list table that contains a smaller set of data but still has all the values that the list can be pulled from rather than doing a distinct on a table with a large amount of data.
    1. Sub Queries are also good candidates to be broken out into temp tables ahead of time, this is especially true if you are using the same sub query in multiple places.
  2. Avoid negative logic this includes <>, !=, not in, and not exists. Negative logic causes the database engine to have to scan the entire column that is being used in the criteria to make sure that the value is not in the data.  This will make it harder for the query plan to use indexes.
    1. One option for turning negative logic into positive logic is to load the possible values into a temp table and then delete from the temp table the values that are unwanted.
    2. If it is not possible to eliminate the negative logic try to do it against as small a dataset or table as possible.
  3. If you are doing an IN statement with literal values and this will account for more than 1/3 of the total table then consider breaking this into a Union statement and in each of the statements of the union only do one or two of the literals. This will help the database engine because each query of the union is a separate plan and it will not have to go through a large amount of data.
  4. Avoid using Select *, this returns more columns than you need and if the underlying structure of the database changes it can break code.
    1. Also remember that the order of the columns in a table does not make a difference to the database engine. When columns are added they are added at the end of the table, which is why it is a Best Practice to always specify the precise columns in the order that they are to be returned.
    2. On insert statement always specify the columns that are being inserted into. Insert into table (column1, column2) values(1,2)
  5. If there are parameters being passed in that will be used as criteria on a table, make sure they are the same data type as the column they will be compared against.
    1. If a new table is created that will have a foreign key in it to another table, those columns need to have the same data types so that when they are joined the join will be more efficient.
      1. If a column needs to be added it should be the smallest data type as possible because this will help speed up queries.
    2. If table functions are used to return data to a query for joins, they should only return a small number of rows. The database engine can not estimate how many rows the function might return so if a large number of rows are returned it will cause the performance to slow down.
      1. This same thing also applies to table variables as well.
    3. NULLS are special cases, whenever criteria are evaluated the database engine evaluates things to True, False or NULL (Unknown). NULLS are always evaluated to false regardless of what the operator is.  There is a setting that allows a change to the behavior of evaluating NULL = NULL but changing that setting (ANSI_NULLS) in the connection string is against ANSI standards. That is a very bad practice, if there are nulls in the columns used by criteria of a query those must be handled as well unless the query only wants data with known values.
      1. One way to reduce the amount of NULL values would be to have Applications, load processes or in some cases the columns themselves have default values if no other is provided.
    4. If a criteria uses a LIKE statement it will perform better if the first part of the comparison is static where column1 like ‘aaaa%’ if there is an index on the column the database engine can scan the index and limit the amount of data that it has to work through to complete the comparison.
      1. Avoid using functions on columns in criteria because this can prevent the database engine from using indexes left(columns1,4) = ‘aaaa’ would not perform as well as where column1 like ‘aaaa%’.
    5. If data is being loaded and there is the possibility of some rows being inserted and some rows being updated it is more efficient to write a Merge statement so that database engine only has to read through the data once.
      1. If the load would need to delete rows also, TSQL provides for a method of doing that in a merge statement as well (Consult a Sr. DBA if you need more info on this).
      2. If the data being loaded is coming from a file, loading it to a permanent staging table first is the recommended method. This allows for recoverability if issues arise during the load and allows for useful indexes to already be established on the table.
    6. Avoid using deprecated features, this makes it more difficult to upgrade in the future. A full list of deprecated features for the current version of SQL and the previous 2 to 3 versions can be found at http://technet.microsoft.com/en-us/library/ms143729.aspx
    7. Avoid complex logic in triggers, triggers should be a very quick bit of code that runs
    8. Encapsulate more complex tasks in stored procedures rather than views, this will allow use of temp tables to make the complex logic simpler.
      1. Some applications may not allow stored procedures to be used, provided that the database is 2005 or later writing queries that use Common Table Expressions or CTE’s gives similar benefits for breaking out the complex logic into smaller chunks.
    9. Look at existing indexes, the development and testing environments should have the same structure unless a change is currently being developed. Looking at the existing indexes can help structure queries more efficiently.  Indexes with multiple columns, if they are to be used need to have the criteria in the same order as the index.  The first column should be the most selective column meaning that it has lowest amount of repeated values.  Columns that are not used as criteria but are returned in the select should be added as included columns.
    10. Use of linked servers should be kept to a minimum, but if needed it is best to use a simple query to pull the data back from the linked server and store it in temp table or a stage table rather than doing complex joins across a linked server.
    11. Always set nocount on as the first command in stored procedures.
    12. When writing stored procedures where DDL statements will be issued, always try and group the DDL statements first in the Stored Procedure and then execute the DML statements.
    13. Use Begin/End blocks for Stored Procedures, If logic, While loops so that it is easier to identify code blocks.
    14. When calling stored procedures it is always best use the variable names in the call.

Exec dbo.myproc @var1 = ‘test’

 

 

 

 

 

 

The Case of the Automated DBA

If you are thinking the title of this blog post sounds like an Erle Stanley Gardner book, that is where the inspiration comes from. Gardner wrote the books about Perry Mason and his cases that were the basis for the Raymond Burr television series. Sometimes as DBAs we wish we had a Perry Mason type figure to defend us, but we rarely ever get that.

In this story we’ll pretend that we have a defense attorney, because the prosecution is making a to put us in jail for life. Maybe it isn’t quite that extreme but for some people it feels that way. The prosecutor claims that the defendant (DBAs) can’t keep the systems running, and performing. Additionally time and time again when they make changes they are done correctly and cause further impact. For all these charges the prosecution seeks the strictest punishment in the land, life in jail (AKA automate the DBA and fire the person).

I know many people out there have been in this position before, I have. At first I challenged the thought of automation. “How can they automate this?”, “They are going to automate me out of a job!”, and “No computer can do what I do!” were some of the thoughts and statements I have made. The turning point for me came when I started seeing all the things that I couldn’t get done. Things like patching SQL Server, planning and executing migrations to newer versions of SQL Server, and sleeping at night because I would get paged out for something that is a 5 second fix and as I said at 3am “A monkey could do this, why am I getting paged?”

So a show of hands how many people have been in that place before? It is time to be realistic about what automation can do to help us and dispel the fear of it replacing us.

So why should we automate?

  1. Consistency
  2. Remove Human Error
  3. Reduce alerts
  4. Reduce repetitive tasks
  5. Improve reliability
  6. Sleep easier

What should be automated?

  1. SQL Installs
  2. Patching
  3. Anything that triggers an Alert and is fixed with a simple script
  4. Deployments (The great white whale of the DBA)

I plan to dive into these topics in more depth over time. What I want to illustrate is that while the prosecution is going overboard asking to get rid of the DBA in favor of automation, DBAs are going overboard in refusing to consider it. We need to admit that there are things better suited to automation, and agree to a bargain with the prosecution that meets in the middle. In the end the goal should be letting the DBA focus on bigger issues and reducing how many times they get called at night.