A common scenario in many client-server applications is allowing the end user to dictate the sort order of results. Some people want to see the lowest priced items first, some want to see the newest items first, and some want to see them alphabetically. This is a complex thing to achieve in Transact-SQL because you can't just say:
CREATE PROCEDURE dbo.SortOnSomeTable @SortColumn NVARCHAR(128) = N'key_col', @SortDirection VARCHAR(4) = 'ASC' AS BEGIN ... ORDER BY @SortColumn; -- or ... ORDER BY @SortColumn @SortDirection; END GO
This is because T-SQL does not allow variables in these locations. If you just use @SortColumn, you receive:
The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.
(And when the error message says, "an expression referencing a column name," you might find it ambiguous, and I agree. But I can assure you that this does not mean a variable is a suitable expression.)
If you try to append @SortDirection, the error message is a little more opaque:
Incorrect syntax near '@SortDirection'.
There are a few ways around this, and your first instinct might be to use dynamic SQL, or to introduce the CASE expression. But as with most things, there are complications that can force you down one path or another. So which one should you use? Let's explore how these solutions might work, and compare the impacts on performance for a few different approaches.
Sample Data
Using a catalog view we all probably understand quite well, sys.all_objects, I created the following table based on a cross join, limiting the table to 100,000 rows (I wanted data that filled many pages but that didn't take significant time to query and test):
CREATE DATABASE OrderBy; GO USE OrderBy; GO SELECT TOP (100000) key_col = ROW_NUMBER() OVER (ORDER BY s1.[object_id]), -- a BIGINT with clustered index s1.[object_id], -- an INT without an index name = s1.name -- an NVARCHAR with a supporting index COLLATE SQL_Latin1_General_CP1_CI_AS, type_desc = s1.type_desc -- an NVARCHAR(60) without an index COLLATE SQL_Latin1_General_CP1_CI_AS, s1.modify_date -- a datetime without an index INTO dbo.sys_objects FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2 ORDER BY s1.[object_id];
(The COLLATE trick is because many catalog views have different columns with different collations, and this ensures that the two columns will match for the purposes of this demo.)
Then I created a typical clustered / non-clustered index pair that might exist on such a table, prior to optimization (I can't use object_id for the key, because the cross join creates duplicates):
CREATE UNIQUE CLUSTERED INDEX key_col ON dbo.sys_objects(key_col); CREATE INDEX name ON dbo.sys_objects(name);
Use Cases
As mentioned above, users may want to see this data ordered in a variety of ways, so let's set out some typical use cases we want to support (and by support, I mean demonstrate):
- Ordered by key_col ascending ** default if user doesn't care
- Ordered by object_id (ascending/descending)
- Ordered by name (ascending/descending)
- Ordered by type_desc (ascending/descending)
- Ordered by modify_date (ascending/descending)
We'll leave the key_col ordering as the default because it should be the most efficient if the user doesn't have a preference; since the key_col is an arbitrary surrogate that should mean nothing to the user (and may not even be exposed to them), there is no reason to allow reverse sorting on that column.
Approaches That Don't Work
The most common approach I see when someone first starts to tackle this problem is introducing control-of-flow logic to the query. They expect to be able to do this:
SELECT key_col, [object_id], name, type_desc, modify_date FROM dbo.sys_objects ORDER BY IF @SortColumn = 'key_col' key_col IF @SortColumn = 'object_id' [object_id] IF @SortColumn = 'name' name ... IF @SortDirection = 'ASC' ASC ELSE DESC;
This obviously doesn't work. Next I see CASE being introduced incorrectly, using similar syntax:
SELECT key_col, [object_id], name, type_desc, modify_date FROM dbo.sys_objects ORDER BY CASE @SortColumn WHEN 'key_col' THEN key_col WHEN 'object_id' THEN [object_id] WHEN 'name' THEN name ... END CASE @SortDirection WHEN 'ASC' THEN ASC ELSE DESC END;
This is closer, but it fails for two reasons. One is that CASE is an expression that returns exactly one value of a specific data type; this merges data types that are incompatible and therefore will break the CASE expression. The other is that there is no way to conditionally apply the sort direction this way without using dynamic SQL.
Approaches That Do Work
The three primary approaches I've seen are as follows:
Group compatible types and directions together
In order to use CASE with ORDER BY, there must be a distinct expression for each combination of compatible types and directions. In this case we would have to use something like this:
CREATE PROCEDURE dbo.Sort_CaseExpanded @SortColumn NVARCHAR(128) = N'key_col', @SortDirection VARCHAR(4) = 'ASC' AS BEGIN SET NOCOUNT ON; SELECT key_col, [object_id], name, type_desc, modify_date FROM dbo.sys_objects ORDER BY CASE WHEN @SortDirection = 'ASC' THEN CASE @SortColumn WHEN 'key_col' THEN key_col WHEN 'object_id' THEN [object_id] END END, CASE WHEN @SortDirection = 'DESC' THEN CASE @SortColumn WHEN 'key_col' THEN key_col WHEN 'object_id' THEN [object_id] END END DESC, CASE WHEN @SortDirection = 'ASC' THEN CASE @SortColumn WHEN 'name' THEN name WHEN 'type_desc' THEN type_desc END END, CASE WHEN @SortDirection = 'DESC' THEN CASE @SortColumn WHEN 'name' THEN name WHEN 'type_desc' THEN type_desc END END DESC, CASE WHEN @SortColumn = 'modify_date' AND @SortDirection = 'ASC' THEN modify_date END, CASE WHEN @SortColumn = 'modify_date' AND @SortDirection = 'DESC' THEN modify_date END DESC; END
You might say, wow, that's an ugly bit of code, and I would agree with you. I think this is why a lot of folks cache their data on the front end and let the presentation tier deal with juggling it around in different orders. :-)
You can collapse this logic a little bit further by converting all the non-string types into strings that will sort correctly, e.g.
CREATE PROCEDURE dbo.Sort_CaseCollapsed @SortColumn NVARCHAR(128) = N'key_col', @SortDirection VARCHAR(4) = 'ASC' AS BEGIN SET NOCOUNT ON; SELECT key_col, [object_id], name, type_desc, modify_date FROM dbo.sys_objects ORDER BY CASE WHEN @SortDirection = 'ASC' THEN CASE @SortColumn WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12) WHEN 'object_id' THEN RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0') + REPLICATE('0', 23) + RTRIM([object_id]), 24) WHEN 'name' THEN name WHEN 'type_desc' THEN type_desc WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120) END END, CASE WHEN @SortDirection = 'DESC' THEN CASE @SortColumn WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12) WHEN 'object_id' THEN RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0') + REPLICATE('0', 23) + RTRIM([object_id]), 24) WHEN 'name' THEN name WHEN 'type_desc' THEN type_desc WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120) END END DESC; END
Still, it's a pretty ugly mess, and you have to repeat the expressions twice to deal with the different sort directions. I would also suspect that using OPTION RECOMPILE on that query would prevent you from being stung by parameter sniffing. Except in the default case, it's not like the majority of the work being done here is going to be compilation.
Apply a rank using window functions
I discovered this neat trick from AndriyM, though it is most useful in cases where all of the potential ordering columns are of compatible types, otherwise the expression used for ROW_NUMBER() is equally complex. The most clever part is that in order to switch between ascending and descending order, we simply multiply the ROW_NUMBER() by 1 or -1. We can apply it in this situation as follows:
CREATE PROCEDURE dbo.Sort_RowNumber @SortColumn NVARCHAR(128) = N'key_col', @SortDirection VARCHAR(4) = 'ASC' AS BEGIN SET NOCOUNT ON; ;WITH x AS ( SELECT key_col, [object_id], name, type_desc, modify_date, rn = ROW_NUMBER() OVER ( ORDER BY CASE @SortColumn WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12) WHEN 'object_id' THEN RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0') + REPLICATE('0', 23) + RTRIM([object_id]), 24) WHEN 'name' THEN name WHEN 'type_desc' THEN type_desc WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120) END ) * CASE @SortDirection WHEN 'ASC' THEN 1 ELSE -1 END FROM dbo.sys_objects ) SELECT key_col, [object_id], name, type_desc, modify_date FROM x ORDER BY rn; END GO
Again, OPTION RECOMPILE can help here. Also, you might notice in some of these cases that ties are handled differently by the various plans – when ordering by name, for example, you will usually see key_col come through in ascending order within each set of duplicate names, but you may also see the values mixed up. To provide more predictable behavior in the event of ties, you can always add an additional ORDER BY clause. Note that if you were to add key_col to the first example, you'll need to make it an expression so that key_col is not listed in the ORDER BY twice (you can do this using key_col + 0, for example).
Dynamic SQL
A lot of people have reservations about dynamic SQL – it's impossible to read, it's a breeding ground for SQL injection, it leads to plan cache bloat, it defeats the purpose of using stored procedures… Some of these are simply untrue, and some of them are easy to mitigate. I've added some validation here that could just as easily be added to any of the above procedures:
CREATE PROCEDURE dbo.Sort_DynamicSQL @SortColumn NVARCHAR(128) = N'key_col', @SortDirection VARCHAR(4) = 'ASC' AS BEGIN SET NOCOUNT ON; -- reject any invalid sort directions: IF UPPER(@SortDirection) NOT IN ('ASC','DESC') BEGIN RAISERROR('Invalid parameter for @SortDirection: %s', 11, 1, @SortDirection); RETURN -1; END -- reject any unexpected column names: IF LOWER(@SortColumn) NOT IN (N'key_col', N'object_id', N'name', N'type_desc', N'modify_date') BEGIN RAISERROR('Invalid parameter for @SortColumn: %s', 11, 1, @SortColumn); RETURN -1; END SET @SortColumn = QUOTENAME(@SortColumn); DECLARE @sql NVARCHAR(MAX); SET @sql = N'SELECT key_col, [object_id], name, type_desc, modify_date FROM dbo.sys_objects ORDER BY ' + @SortColumn + ' ' + @SortDirection + ';'; EXEC sp_executesql @sql; END
Performance Comparisons
I created a wrapper stored procedure for each procedure above, so that I could easily test all scenarios. The four wrapper procedures look like this, with the procedure name varying of course:
CREATE PROCEDURE dbo.Test_Sort_CaseExpanded AS BEGIN SET NOCOUNT ON; EXEC dbo.Sort_CaseExpanded; -- default EXEC dbo.Sort_CaseExpanded N'name', 'ASC'; EXEC dbo.Sort_CaseExpanded N'name', 'DESC'; EXEC dbo.Sort_CaseExpanded N'object_id', 'ASC'; EXEC dbo.Sort_CaseExpanded N'object_id', 'DESC'; EXEC dbo.Sort_CaseExpanded N'type_desc', 'ASC'; EXEC dbo.Sort_CaseExpanded N'type_desc', 'DESC'; EXEC dbo.Sort_CaseExpanded N'modify_date', 'ASC'; EXEC dbo.Sort_CaseExpanded N'modify_date', 'DESC'; END
And then using SQL Sentry Plan Explorer, I generated actual execution plans (and the metrics to go along with it) with the following queries, and repeated the process 10 times to sum up total duration:
DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; EXEC dbo.Test_Sort_CaseExpanded; --EXEC dbo.Test_Sort_CaseCollapsed; --EXEC dbo.Test_Sort_RowNumber; --EXEC dbo.Test_Sort_DynamicSQL; GO 10
I also tested the first three cases with OPTION RECOMPILE (doesn't make much sense for the dynamic SQL case, since we know it will be a new plan each time), and all four cases with MAXDOP 1 to eliminate parallelism interference. Here are the results:
Conclusion
For outright performance, dynamic SQL wins every time (though only by a small margin on this data set). The ROW_NUMBER() approach, while clever, was the loser in each test (sorry AndriyM).
It gets even more fun when you want to introduce a WHERE clause, never mind paging. These three are like the perfect storm for introducing complexity to what starts out as a simple search query. The more permutations your query has, the more likely you'll want to throw readability out the window and use dynamic SQL in combination with the "optimize for ad hoc workloads" setting to minimize the impact of single-use plans in your plan cache.
The post Conditional Order By appeared first on SQLPerformance.com.