Recently a colleague had a rather interesting question, should we use a temporary table instead of a Common Table Expression (CTE) for this problem? Then he continues and says “CTEs are usually slower than temp tables”. For this specific problem, aggregating and filtering 10 records, CTEs seemed to be the right solution; however, this is not always the case, what if instead of 10 records we had to deal with 10,000 records? Would CTEs still be a valid solution? As always, in Computer Science there is not a black and white answer to most problems; therefore, I decided do a post about this subject to show the different characteristics of each object and hopefully increase our understanding as to when to use each object – to make things a bit more interesting I decided to add to the mix table variables.
Let’s start by defining and mentioning the characteristics of each data container:
Temporary tables (temp tables)
These are “almost full-featured” tables that are stored in tempdb and are valid within the scope they were created. There are two types to temporary tables, local (visible in the current session only) and global (visible to all sessions). Overall, think of temporary tables as regular tables (with some minor limitations) that are created on the tempdb database.
On the one hand, since temp tables are “regular” tables, they can benefit from indexing, and statistics. This means that with temp tables we could effectively improve joins performance (seek operations), and benefit from improved plan executions made by the SQL Server Query optimizer (taking advantage of the statistics). On the other hand, they are susceptible to the same issues suffered by ordinary tables such as lock contention (e.g. deadlocks), when using indexes then insert/delete operations become more expensive, we also have index fragmentation as an issue, maintaining statistics, inherit performance issues of the tempdb database, etc.
These are “partially-featured” data types that are defined as a regular variable and behave as a tuple.
According to Microsoft they are minimally logged, and have a well-defined scope, and are not susceptible to rollbacks; that is, they might need less resources and they are “self-contained”. However, they cannot be assigned to another table variable (e.g. it is not possible to do this @tv_a = @tv_b), and they don’t have distribution statistics.
It is important to note that there is a widespread misconception on the web about the use of table variables. Usually it is stated that table variables are “faster” than temporary tables as they are stored in memory, this is completely false as under covers table variables are stored in tempdb.
(Table variables… they are not a data type per se nor a full-featured table. At this point feel free to ask with me, what the heck are you Mr. table variable?)
Having said this, not everything is bad with table variables. Since the cardinality of table variables is always 1, they will cause fewer re-compilations when used in stored procedures (they don’t have a re-compilation threshold); table variables are automatically cleaned up; they can be used as a rollback-immune objects; and they make possible to create table-valued functions.
Can be defined as temporary result sets that are available in the connection where they were created.
CTEs are not persistent objects; therefore, they are not cached and must be recreated every time they are used. CTEs’ result sets cannot be indexed and have not distribution statistics.
Even though CTEs can provide more or less the same functionality as temp tables and table variables (as data container) they should not be compared directly. CTEs are particularly useful to implement recursive logic and make easier to apply multiple transformations on a dataset. Think of CTEs as a nice alternative to derived-table queries.
We can put all this information into the following table:
|Feature/Object||Temp tables||Table variables||CTEs|
|Can be indexed||Yes||No||No|
|Has distribution statistics||Yes||No||No|
|Affected by rollbacks||Yes||No||No|
|Allows recursive logic||No||No||Yes|
|Has reduced scope||No||Yes||Yes|
Comparison of temp tables, table variables, and CTEs.
Do not use CTEs or table variables to manipulate large data sets (being large probably anything above 100 records?) For this situation you are better off using temporary tables as these can be indexed and have available distributions statistics (the former optimises the record-finding process, the latter enables the SQL Optimizer to use better query plans). With temporary tables, the right index, and the right data set your joins will be faster!
Contrarily, if the data set is less than N records (or N bytes) then you might be better off using a table variable (or a CTE if the logic requires multiple result sets). The reason for using a table variable is not because they require less resources or they are faster, but because they have some intrinsic limitations that could remove some extra overhead that otherwise would be added when using a temporary table. Why would you use a fully featured object such as temp table when all you need is to process some few records? Additionally, table variables make possible to create table-valued functions; therefore, expanding our tool-set as programmers.
Regarding CTEs, I personally find them to be quite powerful to replace some sequential logic (implemented through evil cursors) with set logic. Also, they are simpler and easier to maintain when compared to derived-table queries. That is, from my perspective one should use CTEs when there’s a need to re-curse, or improve readability (when using derived tables).
In conclusion, it is important to understand the feature of every object and the particulars of the problem to decide which object is the best option. Hopefully in a subsequent post I will be adding some performance metrics to back up what it has been said here.
I hope this helps!