Users are horrible, I know
You can just never rely on them to pass in reasonable search strings. When you write stored procedures to retrieve information for them, it’s really tempting to define all your string variables as MAX length. As long as the actual data type, N/VARCHAR matches, you’re cool, right? Avoid that pesky implicit conversion and live another day.
Well, not really. Let’s test it out. We’ll be using StackOverflow, and hitting the Users table. We have a column, DisplayName, that’s an NVARCHAR(40), which sets us up well enough to demo.
My Favorite Martian
There’s a user with the handle Eggs McLaren, which cracks me up. It reminds me of this old Red Stripe commercial. I use him for all my demos, even though he doesn’t really exist.
So what happens when we go looking for Eggs?
DECLARE @DisplayName NVARCHAR(MAX) = 'Eggs McLaren' SELECT * FROM dbo.Users AS u WHERE u.DisplayName = @DisplayName GO
We get the correct results, but the execution plan has some bad news for us.
In short, we read everything in the clustered index. This could be mitigated with a smaller index, sure, but you’d still read all 5.2 million rows of it, and pass them into a filter operator. I’m using the clustered index here to highlight why this can be extra bad. We read and passed 22 GB of data into that filter operator, just to get one row out.
Why is this bad, and when is it different?
SQL Server makes many good and successful attempts at something called predicate pushdown, or predicate pushing. This is where certain filter conditions are applied directly to the data access operation. It can sometimes prevent reading all the rows in a table, depending on index structure and if you’re searching on an equality vs. a range, or something else.
What it’s really good for is limiting data movement. When rows are filtered at access time, you avoid needing to pass them all to a separate operator in order to reduce them to the rows you’re actually interested in. Fun for you! Be extra cautious of filter operators happening really late in your execution plans.
Even adjusting the variable type to NVARCHAR(4000) gets us there. If your users need to pass in search strings longer than 4000 characters, you have some serious thinking to do.
DECLARE @DisplayName NVARCHAR(4000) = 'Eggs McLaren' SELECT * FROM dbo.Users AS u WHERE u.DisplayName = @DisplayName GO
Rather than 22 GB, we’re looking at passing 4850 bytes to the next operator. This seems much more optimal to me. Next up is figuring out which columns we actually need, so we’re not running SELECT * every time.
That’s another bummer.
Thanks for reading!