I wanna dance with common problems
One of the most common issues that I’ve seen with Entity Framework isn’t technically an Entity Framework problem at all. The N + 1 problem is an anti-pattern that is a problem with ORMs in general, which most often occurs with lazy loading. There was a lot going on in that paragraph, so let’s break it down.
The N + 1 problem occurs when an application gets data from the database, and then loops through the result of that data. That means we call to the database again and again and again. In total, the application will call the database once for every row returned by the first query (N) plus the original query ( + 1).
All of those calls could have been accomplished in one simple call. Let’s look at an example in code:
using (var context = new StackOverflowContext()) { var posts = context.Posts .Where(t => t.PostTags.Any(pt => pt.Tag == "sqlbulkcopy")) .Select(p => p); foreach (var post in posts) { foreach (var linkPost in post.LinkedPosts) { // Do something important. } } }
Here’s the SQL generated from this code:
SELECT [Extent1].[Id] AS [Id], /* All columns from the Post table are in the SELECT. Extra columns removed for brevity */ [Extent1].[TagsVarchar] AS [TagsVarchar] FROM [dbo].[Posts] AS [Extent1] WHERE EXISTS (SELECT 1 AS [C1] FROM [dbo].[PostTags] AS [Extent2] WHERE ([Extent1].[Id] = [Extent2].[PostId]) AND (N'sqlbulkcopy' = [Extent2].[Tag]) )
In this example, we’re getting data from the Posts table, and the PostTags table where the Tag equals “sqlbulkcopy”. The problem starts to occur in this line:
foreach (var linkPost in post.LinkedPosts)
Do you see it?
The problem is that in our original query we’re not getting data from the LinkedPosts entity, just data from Posts and PostTags. Entity Framework knows that it doesn’t have the data for the LinkPosts entity, so it very kindly gets the data from the database for each row in the query results.
Whoops!
Obviously, making multiple calls to the database instead of one call for the same data is slower. This is a perfect example of RBAR (row by agonizing row) processing.
This is the SQL generated from our code:
exec sp_executesql N'SELECT [Extent1].[Id] AS [Id], [Extent1].[CreationDate] AS [CreationDate], [Extent1].[PostId] AS [PostId], [Extent1].[RelatedPostId] AS [RelatedPostId], [Extent1].[LinkTypeId] AS [LinkTypeId] FROM [dbo].[PostLinks] AS [Extent1] WHERE [Extent1].[PostId] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=23868934
This query is sent to SQL Server 449 times, and the only thing that’s changing it the EntityKeyValue value.
Ugh.
How can we fix it?
There is one fast way. It’s not optimal, but it will be better! Use an Include (also called eager loading) in the LINQ statement. Using an Include will add ALL of the data from the LinkedPosts entity, but it’s a simple fix without much retesting. Who likes testing code? No one. That’s why companies pay through the nose for software that does it automatically.
var posts = context.Posts .Where(t => t.PostTags.Any(pt => pt.Tag == "sqlbulkcopy")) .Include(p => p.LinkedPosts) .Select(p => p);
Now when the LinkedPosts entity is called, the Posts entity will have all of the data for the LinkedPosts entity. It will not make any additional calls to the database. That’s a good thing, right? Databases are cranky. That’s why DBAs are cranky.
Here’s the SQL that’s generated:
SELECT [Project2].[Id] AS [Id], /* All columns from the Post table are in the SELECT. Extra columns removed for brevity */ [Project2].[LinkTypeId] AS [LinkTypeId] FROM ( SELECT [Extent1].[Id] AS [Id], /* All columns from the Post table are in the SELECT. Extra columns removed for brevity */ [Extent1].[TagsVarchar] AS [TagsVarchar], [Extent2].[Id] AS [Id1], [Extent2].[CreationDate] AS [CreationDate1], [Extent2].[PostId] AS [PostId], [Extent2].[RelatedPostId] AS [RelatedPostId], [Extent2].[LinkTypeId] AS [LinkTypeId], CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1] FROM [dbo].[Posts] AS [Extent1] LEFT OUTER JOIN [dbo].[PostLinks] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PostId] WHERE EXISTS (SELECT 1 AS [C1] FROM [dbo].[PostTags] AS [Extent3] WHERE ([Extent1].[Id] = [Extent3].[PostId]) AND (N'sqlbulkcopy' = [Extent3].[Tag]) ) ) AS [Project2] ORDER BY [Project2].[Id] ASC, [Project2].[C1] ASC
See what I mean by it not being optimal? We could rewrite the LINQ statement to have it generate a more optimal query, but that’s not the point of this post. If the performance of the query isn’t satisfactory, you can go down the rewriting the LINQ statement route.
How can we find N + 1 issues?
Not to toot the company horn (but I’m totally going to), one of my favorite ways to find N + 1 problems from the database is by using sp_BlitzCache. After running sp_BlitzCache @SortOrder=’executions’ I get this:
Look at those tasty executions!
Captain, I think we found the problem. Now, it doesn’t tell me what line of code is causing the issue, but it does give the SQL statement. I’m sure if you work with the devs, you can figure out where the problem is and fix it. Having the problem statement makes searching the code base a little easier, and there’s a good chance someone will recognize where it comes from.
Back to School Sale: save on online training classes this week.