When the developer was testing the conversion function, he noticed that the query slowed waaaayyyyy down when he added AT TIME ZONE. Before adding AT TIME ZONE to the query, STATISTICS TIME for the query was: CPU time: 145549 ms, elapsed time: 21693 ms.. It returned 8,996 rows, but if I removed the DISTINCT, it returned over 72M rows. That’s a lot of clams … er, data.
Select Distinct Convert(datetime, KTime, 0), DDate, Category From dbo.DataTable Where RecordType = 1;
After adding AT TIME ZONE, STATISTICS TIME for the query was: CPU time: 3596391 ms, elapsed time: 470998 ms.. STATISTICS IO was the same for both queries, but the time went from less than 22 seconds to almost 8 minutes.
Select Distinct Convert(datetime, KTime at time zone 'EASTERN STANDARD TIME', 0), DDate, Category From dbo.DataTable Where RecordType = 1;
And the plans look almost identical. No scans instead of seeks, no extra operators. Nothing that would lead you to expect the performance to be that different.
So what could make it run so much slower? Maybe there’s a clue on
AT TIME ZONE implementation relies on a Windows mechanism to convert datetime values across time zones.
SQL Server is not performing the time zone conversion, it is shelling out to the OS and having it perform the conversion. And if you know anything about how functions in the SELECT clause are handled, the conversion is done once per row. It is shelling out to the OS 72M+ times to convert the data. YOWZA!
I queried sys.dm_exec_session_wait_stats after running the query expecting to see some PREEMPTIVE_xxxx waits, but I did not. It shows that most of the rows were waiting on CXPACKET waits:
I thought if I could get it to perform the DISTINCT before it performed the conversion, it would be much faster. Testing that theory proved better than I anticipated. I moved the entire CONVERT() and AT TIME ZONE outside of the DISTINCT, and the whole query was actually 2 seconds faster than the original query even with the AT TIME ZONE clause.
Normally, as a best practice, I would say not to use functions in the SELECT list if you can avoid it, but quite often you cannot. And quite often the recordset returned is small enough that it doesn’t matter. If you have to have functions in the SELECT list (especially with a function as bad as AT TIME ZONE), then do what you can to reduce the number of rows on which it has to operate.
That of course led me to this final question for the developer: do you really need to return this much data? He answered that he does not, but he wanted a simple query with which to test the clause.
EDIT: Submitted a Connect item for this after talking to Adam Machanic (blog | @AdamMachanic): https://connect.microsoft.com/SQLServer/feedback/details/3144414.