Most of the time, I love Entity Framework, and ORMs in general. These tools make it easier for companies to ship applications. Are the apps perfect? Of course not – but they’re good enough to get to market, bring in revenue to pay salaries, and move a company forwards.
However, just like any tool, if you don’t know how to use it, you’re gonna get hurt.
One classic example popped up again last month with a client who’d used EF Core to design their database for them. The developers just had to say which columns were numbers, dates, or strings, and EF Core handled the rest.
But if you create a string without specifying its length, EF defaults to using NVARCHAR(MAX). That is not a bug. That is by design, and it’s explained in the documentation:
The wording on that really pisses me off because NO, IT IS NOT SQL SERVER DOING THIS MAPPING. There is absolutely nothing in the database engine that’s saying strings are nvarchar(max). This is an Entity Framework problem, and stop pointing the blame at the innocent database engine.
The documentation goes on to explain how you can manually set the right column lengths, and I’ve sat through several development conference sessions that emphasize how important it is for developers to do that. The problem here is that most folks don’t read the documentation, let alone attend conferences to learn how to use their tools. (I don’t blame the folks – I blame the companies who are pressuring developers to ship quickly without training.)
Demoing One of the Problems It Causes
Let’s create a table with two string columns – one NVARCHAR(100) and one NVARCHAR(MAX). Then, let’s load a million rows into it, putting the same contents in both the short and long columns.
DROP TABLE IF EXISTS dbo.Test; CREATE TABLE dbo.Test (Id INT PRIMARY KEY CLUSTERED, ShortString NVARCHAR(100), LongString NVARCHAR(MAX)); GO INSERT INTO dbo.Test(Id, ShortString, LongString) SELECT value, N'Brent Ozar', N'Brent Ozar' FROM generate_series(1,1000000);
Then, we’ll run identical queries against the short & long string version:
SELECT TOP 250 ShortString, Id FROM dbo.Test ORDER BY ShortString; SELECT TOP 250 LongString, Id FROM dbo.Test ORDER BY LongString;
And review their actual execution plans:
The bottom query is the one that hits the NVARCHAR(MAX) column. Your first signs of danger are the yellow bang on the SELECT, and the 99% query cost estimate on the second query, indicating that SQL Server thinks the NVARCHAR(MAX) one is going to be quite a bit more expensive. However, as is often the case with SQL Server, the really big danger isn’t even shown visually.
Hover your mouse over each SELECT operator, and you’ll get a popup tooltip. One of the lines on that tooltip will say Memory Grant. Here’s the one for the NVARCHAR(100) query:
When the datatype is NVARCHAR(100), SQL Server allocates 210MB of memory to run the query because it believes it won’t need too much memory to sort that small of a column. However, check the same metric on the NVARCHAR(MAX) query:
Because SQL Server thinks that the contents of NVARCHAR(MAX) columns are larger, this query gets a 5GB memory grant, 24x larger. Depending on your server size and workloads, this can be a real problem because the more queries you have running simultaneously, the quicker your database server will run out of memory.
There’s a Fast Fix. It Doesn’t Work Here.
If you frequently search or order by a column, all you have to do is index it, right? Let’s try it:
The ShortString index gets created – but SQL Server can’t create an index on LongString because we can’t use NVARCHAR(MAX) as a key column in an index.
That’s a bummer, and you could argue that it’s a SQL Server limitation that could be fixed. For example, you can create a computed column on a shorter version of the column, and index that:
CREATE OR ALTER VIEW dbo.Test_View WITH SCHEMABINDING AS SELECT Id, CAST(LongString AS NVARCHAR(100)) AS LongString_Truncated FROM dbo.Test; GO CREATE UNIQUE CLUSTERED INDEX CLIX ON dbo.Test_View (LongString_Truncated, Id);
There’s nothing to say that SQL Server couldn’t do similar work in order to index abridged versions of NVARCHAR(MAX) columns when it could check to see if there any truly long values in that column. It just doesn’t, though, and I understand that it would require Microsoft to do some work.
It’s much easier for Microsoft to say, “Yo, EF developers, read the docs and you’ll never have problems like this.” To some extent, that’s fair, because I can see how someone would expect people to be well-trained on the tools they use every day to do their jobs. This is only one tiny example of the many, many problems you can hit with Entity Framework if you don’t get trained on how to use it first.
If you’re looking for training resources, start with anything by Julie Lerman in whatever way you enjoy consuming training material – videos, books, etc.