Quantcast
Channel: Brent Ozar Unlimited®
Viewing all 3167 articles
Browse latest View live

A Common Query Error

$
0
0

So Many Choices

When you only need stuff from a table where there’s matching (or not!) data in another table, the humble query writer has many choices.

  • Joins
  • In/Not In
  • Exists/Not Exists

No, this isn’t about how NOT IN breaks down in the presence of NULLs, nor is it a performance comparison of the possibilities.

Wrong Turn At Albuquerque

This is a more fundamental problem that I see people running into quite often: dealing with duplicates.

Take this query, and the results…

SELECT   TOP 100
         u.DisplayName
FROM     dbo.Users AS u
JOIN     dbo.Posts AS p
ON       u.Id = p.OwnerUserId
ORDER BY u.Id;

D-D-D

It produces a number of duplicates!

This is what we’d want if we were getting any data from the Posts table, aggregating it, or if we needed it to join off somewhere else.

But we’re not, and now we’re going to make a very common mistake: We’re doing to change the wrong part of our query.

A Million

A touch of distinct…

SELECT   DISTINCT TOP 100
         u.DisplayName
FROM     dbo.Users AS u
JOIN     dbo.Posts AS p
ON       u.Id = p.OwnerUserId
ORDER BY u.Id;

Oh, but..

Msg 145, Level 15, State 1, Line 18
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

Maybe though…

SELECT   TOP 100
         u.DisplayName
FROM     dbo.Users AS u
JOIN     dbo.Posts AS p
ON       u.Id = p.OwnerUserId
GROUP BY u.DisplayName
ORDER BY u.Id;

Msg 8127, Level 16, State 1, Line 31
Column "dbo.Users.Id" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.

Yeah nah.

This is when I start to see all sorts of creative stuff, like ordering by MIN or MAX, wild subqueries, temp tables, dynamic SQL.

Calm down.

Yelling Geronimo

Maybe a join isn’t what you’re after. Maybe you need something else.

We got you covered.

SELECT   TOP 100
         u.DisplayName
FROM     dbo.Users AS u
WHERE    EXISTS ( SELECT 1 / 0 
                  FROM dbo.Posts AS p 
				  WHERE p.OwnerUserId = u.Id )
ORDER BY u.Id;

This results in an already distinct list of Display Names that can be ordered without trial or tribulation.

Figuring.

A subquery would also work here.

Style Guide

I don’t have too many rules for how queries should be written, but I do have this one:

I use EXISTS or NOT EXISTS if I’m referencing a subquery, and IN/NOT IN when I have a list of literal values.

Thanks for reading!


How Trace Flag 2335 Affects Memory Grants

$
0
0

This trace flag is documented in KB #2413549, which says, “Using large amounts of memory can result in an inefficient plan in SQL Server.” The details are a little light, so let’s run a quick experiment with:

  • SQL Server 2017 CU 8 (14.0.3029.16)
  • VM with 4 cores, 32GB RAM, max memory set to 28GB
  • Stack Overflow database (circa March 2016, 90GB)

We’ll use a simple query that wants a memory grant (but doesn’t actually use it):

SELECT TOP 101 *
FROM dbo.Users
ORDER BY DisplayName, Location, WebsiteUrl;

The Users table is less than 1GB, but because the DisplayName, Location, and WebsiteUrl are relatively large datatypes, SQL Server somehow thinks 22GB will come out of the clustered index scan, and go into the sort, as shown in the actual plan:

Hello

This affects the query’s memory grant.

Default memory grant

When I right-click on the select icon and go into properties to look at the memory grant, Desired Memory is 29GB! SQL Server wanted 29GB to run this query.

However, because my server isn’t that large, the query was “only” granted 5GB. It used less than 1GB because of course there’s just not that much data in the table.

If my server was larger, the query could get up to 29GB every time it runs. Run a bunch of those at once, and hello, RESOURCE_SEMAPHORE poison waits.

That’s where it sounds like trace flag 2335 would come in. The KB article says:

One of the factors that impacts the execution plan generated for a query is the amount of memory that is available for SQL Server. In most cases SQL Server generates the most optimal plan based on this value, but occasionally it may generate an inefficient plan for a specific query when you configure a large value for max server memory, thus resulting in a slow-running query.

This is one of the trace flags that can be enabled with QUERYTRACEON, so let’s give it a shot:

SELECT TOP 101 *
FROM dbo.Users
ORDER BY DisplayName, Location, WebsiteUrl
OPTION (QUERYTRACEON 2335);

The new query plan looks and performs the same – but what about the memory grants? Are those more accurate?

Memory grant with 2335 enabled

No: the memory grant is still aiming for 29GB.

That’s because this trace flag isn’t directly about memory grants for the same operation. It’s about indirectly reducing your memory grants by changing the way SQL Server decides to build the plan in the first place.

Keep looking down, and check out the Optimizer Hardware Dependent Properties. These are some of the numbers SQL Server used when building the execution plan – assumptions it made about the hardware it was running on.

Estimated Available Memory Grant:

  • Default = 716MB
  • With 2335 = 26MB

Estimated Pages Cached:

  • Default = 179,200 (which is pretty odd, given that the table only has 80,026 pages)
  • With 2335 = 1,638 pages

If you think 2335 is right for you, look for plans where 2335 changes the entire shape of the plan, getting you a query plan that aims to use less memory overall.

One place to start looking for those plans is:

sp_BlitzCache @SortOrder = 'memory grants'

Scroll across to the right and check out the memory grants columns:

sp_BlitzCache memory grants

You’re not just looking for queries with unused grants – 2335 could (in theory) help your queries with large USED memory grants by changing the shape of the plan. A successful deployment of 2335 would mean a differently shaped plan that still performs fast, but desires (and uses) way less memory.

Of course, this is a last resort – if you can change the query by injecting this trace flag, then you should probably start by tuning the query first instead.

Oh, and you might be wondering – do I actually use this trace flag? Hell no – I just ran across it in the wild, found the documentation to be pretty lacking, and figured I’d document my research here for the next person who finds it turned on somewhere.

Wait Stats When VSS Snaps Are Slow

$
0
0

Deus Redux

A while back I wrote about the Perils of VSS Snaps.

After working with several more clients having similar issues, I decided it was time to look at things again. This time, I wanted blood. I wanted to simulate a slow VSS Snap and see what kind of waits stats I’d have to look out for.

Getting software and rigging stuff up to be slow would have been difficult.

Instead, we’re going to cheat and use some old DBCC commands.

Hold It Now Hit It

Whenever we want to do something important we’re going to freeze IO and then observe what it’s doing:

DBCC FREEZE_IO('Crap');
DBCC THAW_IO('Crap');

Whenever we want find out what we’re waiting on, we’re going to run:

EXEC sp_BlitzWho;

There were a couple things that weren’t blocked, like creating temp tables, and running select queries.

Anything we did that attempted to create or modify something in Crap was blocked when IO was frozen, though.

That means Insert, Update, and Delete queries absolutely blocked read queries.

Some Pictures

Creating a table waits on DISKIO_SUSPEND.

What was really interesting here is that it would wait on it for a couple seconds, then the wait would cycle back to 0.

But at the session level, the wait accumulated quite a bit of total time.

Trying to insert into a table generated long WRITELOG waits.

They’d just keep piling up.

And they’d block other queries.

Trying to create indexes and constraints would generate PAGEIOLATCH_EX waits, which would also block queries.

Trying to create most things, like functions and procedures also generated WRITELOG waits.

You probably don’t need more screencaps of that.

Sort of curiously, if I froze IO during DBCC CHECKDB, I got a bunch of PAGEIOLATCH_UP waits.

CH-CH-CHECK

Trying to take a log backup while IO was frozen was cute. BACKUPIO waits and DISKIO_SUSPEND seemed to cycle a bit, and only added up to total wall clock time in total.

Right To Choose

 

Fudgey Bottoms

This is about where I ran out of stuff I wanted to see blocked. If there’s anything you’re interested in, well, you now have UNLICENSED DBCC COMMANDS to play with.

So, if you’re seeing long pauses between IO being frozen and thawed in your error log, these are waits you may be able to look for to corroborate a problem with VSS Snaps.

Thanks for reading!

A Presenter’s Guide to the Stack Overflow Database

$
0
0

You present on SQL Server topics at user groups and conferences, and you’ve been wondering how to get started with the Stack Overflow public database. Here’s a quick list of things to know.

For stable demos, use StackOverflow2010. This smaller 10GB database has data from the first years of Stack Overflow’s history. It doesn’t change, so you don’t have to worry about updating your screenshots and metrics every time Stack releases a new data dump.

Your attendees can download it from BrentOzar.com/go/querystack as a 1GB zip file with a SQL 2008 database, and then extract it to the full database. No registration is required. You’re also welcome to distribute that database yourself.

Every table has Id as a clustering key. That by itself isn’t a big deal, but here’s where it gets awesome: Stack Overflow’s URLs are all driven by that key. Look at your URL bar in your browser as you’re surfing StackOverflow.com, and you’ll start to recognize what’s going on:

https://stackoverflow.com/users/22656/jon-skeet

Simply drop off the strings at the end of the URLs, and these work as well, making it really fun to show the pages of the data you’re looking at:

https://stackoverflow.com/users/22656

Speaking of 22656, Jon Skeet’s data is unusual. Jon Skeet is a legendary user at Stack Overflow with over a million reputation points. He’s user id #22656, and that’s one number you’ll probably end up memorizing. If you want to do a parameter sniffing demo or show wild swings in data distribution, 22656 is your man.

dbo.Posts contains both questions and answers. Most of Stack’s tables are fairly intuitive, but this one is a bit of a gotcha. To join them, use the ParentId field:

SELECT *
  FROM dbo.Posts q
  LEFT OUTER JOIN dbo.Posts a ON q.Id = a.ParentId;

Note that I’m using a left outer join because not all questions have answers.

Posts by PostType in StackOverflow2010

Posts are more than just questions and answers, too – the dbo.PostTypes table lists other kinds of posts, like Wiki, TagWiki, TagWikiExcerpt. Again, note the lumpy data distribution – this database is absolutely fantastic for lumpy distribution by date, time, scores, lengths of strings, you name it. It’s real data from real humans – just like your day job – and it’s fantastically unreliable and fun.

Data.StackExchange.com has lots of useful queries. When I wanna find real-world queries to show for tuning examples, it’s a great place to start. Just make sure you properly credit the query’s author and link back to the query’s page. Note that when you click on a query link, you’ll see the results instantly – that doesn’t mean the query is fast. Stack caches those query results.

The database schema isn’t exactly Stack’s current live schema. The database reflects the public data dump, not a backup of Stack Overflow’s database. For example, on dbo.Posts, the Tags column stores the tags for a particular question. If you want to find queries for a given tag, you have to do a string search for ‘%<sql-server>%’ – but that isn’t necessarily indicative of how the live site searches for tags today. I love it, though, because it shows how a lot of real-world databases work.

For questions about it, hit Meta. Meta.StackExchange.com is the Q&A site that asks questions about Stack Overflow itself. There’s a good starter post for the database documentation. When you see the term SEDE, it’s referring to Stack Exchange Data Explorer, aka data.stackexchange.com.

[Video] Office Hours 2018/8/15 (With Transcriptions)

$
0
0

This week, Brent, Tara, Erik, and Richie discuss error log issues, issues with moving a 2-node AG to a different VM, adding a rowversion column gotchas, using indexes, Docker & CI/CD, RESOURCE_SEMAPHORE query compile, table locks, Query Store, and best places to eat in NYC.

Here’s the video on YouTube:

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

If you prefer to listen to the audio:

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 8-15-18

 

How should I track down login failures?

Brent Ozar: So we’ll start with Lee. Lee says, “A vendor made a change on an app server and now my error log is full of these errors; login failed for user X, could not find a name matching. Any ideas on what I should tell them to look for? I don’t have access to that app server.

Tara Kizer: I used to have to just ignore that message when I worked at Qualcomm. There was some issue with the SCOM server, something like that, and they couldn’t figure out why it was doing it. But eventually, I just stopped looking at the error log. It was just clogged because – can you turn off failed login attempts? You can turn off successful; I don’t know that you can turn off the failed ones. So at some point, the spamming of the error logs – I guess I can’t use that tool anymore, except to filter it to find what I want to look for. But start with the – you’ve got the IP address, so you know what box is doing it. It’s really hard to say what to do from there, but look for a scheduled task, and application running…

Brent Ozar: And leave it. Who cares?

Tara Kizer: Well, the only thing I care about is the spamming of the error log. The error log is supposed to be used for troubleshooting issues. My client this week has successful logins going into the error log and it’s happening multiple times per second, I think it was, or per minute maybe. I was like, this is unusable now. I can filter for the four things I filter for, but that’s not going to find anything else.

Brent Ozar: Do you know about the minus sign trick with error IDs?

Erik Darling: Well that’s only event viewer…

Brent Ozar: Oh event viewer, not the error log, that’s true.

Tara Kizer: Even event viewer’s becoming a not usable tool.

Brent Ozar: Yeah, the minus sign filter helps in there.

 

We’re thinking about changing a lot on an AG…

Brent Ozar: Glen says, “We have to move a two node AG to a different VM. I’ve been told that the servers have to be powered down in order to migrate…” I already have so many questions, “Any thoughts on the best way to bring down the AG with a cluster to accomplish the move without going to hell in a handbasket? Also, a new IP for the heartbeat network…” Oh come on, man.

Tara Kizer: At that point, I think I would just build new servers on this new VM. It sounds too complicated and I suspect Availability Groups is not going to be happy with the exchanges.

Erik Darling: No, I’d probably just want to build out whatever new environment I’m going to migrate to and set up AGs over there and then reset them up again. There’s just too many changes all at once. That’s a lot of moving pieces, VMs, AGs, IPs.

 

How do you order an autographed copy of Erik’s book?

Brent Ozar: Michael Tilly asks, “How does one go about ordering an autographed copy of the book Great Post, Erik?” So there may be a book signing event at PASS. We’re waiting to see how that goes. But even if not, what you should do is bring your book to the bar whenever Erik’s around there, and he will sign Itzik Ben-Gan.

Erik Darling: Mister, buy me a drink…

 

Any gotchas with adding a row version column?

Brent Ozar: Chris says, “I’m going to be adding a row version column to some of our tables for sending back to a data warehouse. We used to use a changed on UTC column and trigger on that. are there any gotchas I should be looking out for when adding a row version column?”

Erik Darling: I mean, adding any column to a table’s going to have – not like adding the column. If you add a NOT NULL column to a table, you’re not going to do too much damage. But if you have, like, a default for it or if you need to go populate that column later, obviously, you could run into some issues with locking and all that good stuff. So I would be pretty judicious in how I populate that column. I wouldn’t want to just have it all filled in at once.

Brent Ozar: Chris also says he’s going to New York City, “What are Brent’s favorite places to eat there?” Well, Erik lives in Brooklyn, so he should answer this too. Erik, what are your favorite places to eat in New York City? He says he’s staying near Times Square.

Erik Darling: So tip number one, get the hell out of Times Square. Like, just leave, avoid at all costs, don’t go to Ruby Tuesday, don’t go to Bubba Gump Shrimp or whatever the crap it is…

Richie Rump: Don’t take a picture with Spiderman…

Tara Kizer: Visit it for tourist things but leave 15 minutes later. It’s just crowded.

Richie Rump: Go to Midtown Comics and then leave.

Erik Darling: Times Square Elmo will mess you up. He has a dirty gym bag…

Brent Ozar: I would say, my huge resource – all my favorite restaurants have closed. This bums me out so much, but nyeater.com – this has their best restaurants of New York – and they do this for all kinds of cities; San Diego, all kinds of places. And out of there, I’m going to scan down and see if I’ve hit any of them. I haven’t had Ping Seafood…

Erik Darling: I’ve had Ping.

Brent Ozar: Was it any good?

Erik Darling: It’s okay…

Brent Ozar: Katz’s Deli…

Erik Darling: It’s okay. If you want a really big $20 sandwich then go to Katz’s. I don’t know that I’d want it.

Brent Ozar: Momofuku – that’s the real name, yeah. What’s the dessert one? Momofuku has a dessert one – Milk Bar, I think it’s called. I really like that one.

Richie Rump: Is that David Chang’s restaurant?

Brent Ozar: Yeah, so anyway, I would start here. Start there and run from there.

 

Can an index really make that big of a difference?

Brent Ozar: Sheila says, “I added an index last week. This week, my batch process is running far better. The index shouldn’t have made that big of a deal. Would a query plan change occur from the index and make it that much better?”

Erik Darling: You know how indexes work, right?

Brent Ozar: No, probably no.

Erik Darling: If I’m thinking about ways to change a query plan, an index is going to be one of the first things. If I’m looking at clogging a weird process like that, indexes are going to be one of the first things I look at; not just adding them but getting rid of ones that – if I have a write-intensive process, getting rid of a whole bunch of indexes that aren’t living up to their duty, I’m going to get rid of those too. But adding an index is, like, hands down, one of the top things that you can do, even before rewriting a query, using a temp table, doing other stuff, adding an index is, like, what I’m going to go for. That’s like my first stop. What index can I – to make this less horrible…

Tara Kizer: And maybe you didn’t think it was going to help, but maybe it got rid of an expensive key lookup and that was the whole bottleneck of the query.

Erik Darling: Sort, key lookup, improved join, improved some sort of aggregation. Why knows? It could have even helped more than one query. That’s the beauty of indexes. It’s not like you add an index and you’re like, you’re for this query; no other query can use you, you’re special. No, lots of stuff can use them. So if you found that query in your missing index DMVs or something then it’s totally possible that more than one query was able to benefit from it.

Brent Ozar: Or it might have been close enough that other queries, even if it wasn’t their ideal, it was good enough.

Tara Kizer: And to help answer this question for the future on her server, set up logging to a table via sp_WhoIsActive, and if you had that in place, you can go back in time and look at the execution plan from that process and then compare it to what it is now and you’d be able to answer it yourself.

 

How many includes are too many?

Brent Ozar: Speaking of indexes, Steve says, “When it comes to missing index advice from execution plans, what’s the best rule of practice or a good rule of thumb for includes when you think there are too many?”

Tara Kizer: I know Brent’s rule of thumb…

Brent Ozar: What is Brent’s rule of thumb?

Tara Kizer: Well just see, are there more than five indexes per table and no more than five columns per table, and that includes the includes, you know, the key plus includes. My restriction isn’t that low, but when it’s recommending 50, I’m like, okay, that’s enough. That is way too many. Just chop off the includes at that point and then check the execution plans, if you can figure out what query it was targeting and see is there an expensive key lookup. Do you really need to return 50 columns from this query?

Erik Darling: You know, I think my rule is probably a little bit closer to 10 in 10, but that’s because I don’t do a lot of pure – at least historically, I haven’t done a lot of purely OLTP work. My stuff is always, like, a big hunk of reporting on top of stuff. So I’m a little bit more kind to having some extra indexes around. But yeah, Tara’s right about that. I also have a session at GroupBy that’s free that you can go watch about improving select star query performance. So if you go watch that, you can learn a way to change the way a query is written so that you can take advantage of narrower indexes without having to worry about adding all 50 includes because those missing index requests are kind of idiots. They’re just like bad teenage cries for help. They’re going to ask for every single included column. There’s no filter on the kind of columns that get included. You can end up with these long string columns in there, XML columns, like any idiotic data type. Anything that the optimizer is, like, oh but it will be cheaper, it will just, yeah include it in the index, I don’t care. Like, no penalty – everything’s free. It’s just an include. Don’t worry.

Tara Kizer: I mean, some of those are going to fail, you know. Varchar max, that’s just not possible in the index.

 

Easiest way to reinitialize merge replication?

Brent Ozar: Paul asks a question I think we’re all going to arm-wrestle to answer. We’ll be so excited…

Tara Kizer: it’s going to be Richie for sure…

Brent Ozar: Paul says, “When I’m running merge replication, is there a way other than initialization to re-sync all the data from the publisher to the subscriber?”

Tara Kizer: If you don’t mind a full copy of your database over there, just do backup restore. Sync it up that way and you could apply transaction logs to get it more in sync. And then once it’s in sync with the publisher, then set up replication and tell it, I’m already ready to go, I’ve already manually synced it on my own. But, you know, if you’re only going to be replicating some of your tables, that might not be a good solution. But if you’re going to be replicating all of them or most of them then backup and restore is a really good solution for that. And if you need to drop tables, you can do that at that point, but it at least gets you past the point where the initialize is going to take you several hours. And it affects the publisher as it’s happening, so backup and restore wouldn’t affect the publisher.

 

What are your thoughts on Docker and CI/CD?

Brent Ozar: Sri asks – now we’re going to make Richie come back to the microphone…

Richie Rump: Are the bad questions on?

Brent Ozar: Only the first one. Sri asks, “What are your thoughts on the Dockers and CICD?” Richie, what do you think?

Richie Rump: I love continuous integration, I love continuous deployment. And not yesterday, as the team could tell you, as things started breaking and I couldn’t figure out why – here’s a quick hint, it was someone else’s software. It wasn’t ours. It wasn’t Brent’s either.

Brent Ozar: For once.

Richie Rump: Exactly. Continuous integration and continuous development, they’re phenomenal. They’ve been, I think, a boon to us developers as far as being able to get things out quickly and with a high level of confidence of quality in that bugs will not be infecting our code. So now that we’ve got Amazon publishing, what, a new release every 12 seconds or something crazy like that, we couldn’t do that before. So CICD is something every software team – I mean, I’m a software team of one right now and I’m using CICD, so there you go, there’s the value right there. Dockers, I haven’t used Docker; sorry. I haven’t had really that much of an opinion. So again, team of one needs to be sliding things in and out or do anything like that, haven’t really needed it. I’ve got no opinion on it or the Kubernetes or whatever the cool kids are doing these days; don’t know. I am blissfully ignorant.

Brent Ozar: It feels like there’s – if you listen to the Microsoft buzz, it seems like they latch onto any buzzword that’s flying by and they try to stick it to the SQL Server product with Velcro. Artificial intelligence, got it, let’s smear some of that on there. R and Python, yeah, smear that on there. Oh, here comes Docker; grab that…

Erik Darling: Linux, machine learning – because they’ve just missed the boat so terribly on so many things. It’s like – I’m surprised that SQL Server doesn’t have like an internet browser in it.

Richie Rump: The internet, that will never be a thing. What are you talking about?

Erik Darling: Outlook for SQL Server, I’m like, what the…

Richie Rump: I don’t know, for me, as far as Docker on the server, it still doesn’t make a lot of sense for me, right. Maybe because I’ve been playing in the cloud too long and I just let the cloud vendor handle how they want to do implementation, but if I was in-house and somebody said, hey why don’t we just throw a docker out there for SQL Server, I’m like, why? What is it really buying us? And I haven’t really been able to grok my head around that quite yet.

Brent Ozar: Especially compared to platform as a service, where they just manage everything for you.

Richie Rump: Yeah, and it’s like, well why don’t we just throw it out there? I guess then we start talking business reasons of why we would do this, Docker versus the cloud or something like that or VM in-house versus something else. But I just don’t think it’s that big of a deal, when we’re talking about a server, to run the installer.

Brent Ozar: Okay, so hopefully, Sri, there’s your answer.

 

What causes resource_semaphore_query_compile?

Brent Ozar: Rakesh – Rakesh experienced an issue in production with lots of queries waiting on resource semaphore query compile – wants to know if that’s the cause or the effect. What you start seeing resource semaphore query compile, what do you look at next?

Erik Darling: I look at what’s running and I look at how big that query plan is. So when resource semaphore query compile hits, to back up a little bit on that, when queries compile, there are different classes of query depending on how much memory they need. There are queries that don’t need any memory to compile. That’s very tiny low-cost plans and plans that are already in cache. So they can just go and compile immediately. Then, there’s small gateway queries, which require, I think, like 380K of memory to get the query plan compiled for them. And then they kind of step up from there. And as you step up, you can have fewer and fewer queries that go into that gateway.

So, there’s no memory, small memory, medium, and then big. Up to 2014, you could only have one of those big queries going at a time. 2014 had a trace flag. I forget what the trace flag is. But then 2016 and up had this different algorithm where they scale up the number of big queries you could have compiling at once depending on how much memory you have. So with a 768GB server, I’m really concerned, not only because you have queries coming in that need to compile all the time, and enough that you got blocked up on that. Queries that, with 768GB, your plan cache should have the queries you need in there. Maybe forced parameterization is a good idea. Maybe optimized for ad hoc workloads is a good idea to help, kind of, reduce that.

But resource semaphore query compile is when you hit one of those gateways and you just have too many queries trying to go through it at once and they sit around waiting to get additional memory to compile a query. Like, they’re not running, they’re not getting data, they’re not going out and getting locks, they’re not doing anything. They are stuck waiting just to get a query plan, so it’s definitely a big enough problem that you’re going to want to address that and you’re going to want to find a root cause on.

 

Does lock escalation happen with deletes?

Brent Ozar: Mark says, “So table locks happen if you update 5000 rows or more. Does the same locking happen if you have 5000 or more deletes?”

Erik Darling: Yeah.

Brent Ozar: There we go. That might be the very first time we’ve been done with a question in…

 

What are your thoughts on Query Store?

Brent Ozar: Keith says, “What are everyone’s thoughts on Query Store?”

Erik Darling: The new Query Store? What’s the new Query Store?

Brent Ozar: This question says the new Query Store…

Erik Darling: There’s that old Query Store that’s been around since 2016. [crosstalk 0:16:02.9] It’s nifty. I like it, but I can understand people’s reticence in using it because you can’t really choose where that data gets stored. You might be storing some crazy PII in there. It doesn’t quite have the management features, I think, that a lot of people would want in order to start using it. That, and everyone who I talk to – not everyone, but a lot of people who I talk to at conferences are like, you know – because I talk about sp_BlitzQueryStore because I wrote a whole stored procedure. I was that excited about it, I was like look, I’m going to write a stored procedure. It’s going to do the same stuff at BlitzCache but with Query Store, and no one uses it because no one has Query Store turned on.

When I talk to people about it at conferences, they’re like, we turned on Query Store and CPU use went through the roof, bad stuff happened, like the drive filled up. I’m like, man, I wasted how many hours of my life writing a stored procedure for something that makes CPU go through the roof and fills up drives. I can understand why people don’t use it. I like it in theory. I like the prospect of being able to have long-term plan data in there and be able to trend queries over time rather than just depending on that one plan that’s in the plan cache. But you know, like, a lot of pushed out the door features for things in SQL Server, I’m not sure that it was quite 100% ready, like extended events.

Brent Ozar: Oh, that’s mean…

Erik Darling: Extended events is the mobile browsing of user experiences. There’s a reason that everyone has an app instead of making people use a mobile site, because it’s just miserable and painful. No matter what you do, you’re just in for pain and suffering and disappointment.

Brent Ozar: So why is it that the people who evangelize Query Store also evangelize extended events? There’s something in common. I want to believe in Query Store. Like, I think that if I was a database administrator, I like to think that I would turn it on on all my servers and just watch out for all the CUs, because this just came out in the most recent cumulative update; database performance is bad without this cumulative update. So clearly there was a performance problem, but if it wasn’t that big of a problem, I’d sure like to enable it.

Richie Rump: Can we get you a poster of, like, SQL Server Query Store in the background and it just says, I want to believe.

Erik Darling: I mean, it’d be nice because, like, the plan cache is just so temperamental. It clears out, it doesn’t have all the plans in there, it doesn’t keep a lot of historical information, so it would be beautiful to have that kind of stuff. I’m glad that it seems to be working for the robots up in Azure or whatever. It’s helping Microsoft choose between plan A and plan B, you know. For regular people, it’s…

Richie Rump: I just was cranking on a little bit a few weeks ago…

Brent Ozar: What were you cranking on?

Richie Rump: Plan cache.

Brent Ozar: Oh yes…

Erik Darling: Yeah, tell us more…

Richie Rump: We started collecting in the new version of ConstantCare. We haven’t implemented any rules for it yet, but we’re getting there. Getting the data is the first part, and then start applying rules to those plans.

Erik Darling: So you don’t know how we’re going to go through those plans, what we’re going to do with them?

Brent Ozar: One of the users replied in because we sent out an email, hey there’s an update to is, just to, like, early access beta users. Hey, there’s an update. If you want, you can send us your query plans, you don’t have to. One of the users who emailed in copied in all his database admins and he goes, “I don’t care what it takes, you get the Ozar people everything.”

 

Following up on resource_semaphore_query_compile

Brent Ozar: Rakesh follows up with his resource semaphore query compile and says, “We have both optimized for ad hoc and forced parameterization on.” He also mentioned that he started a case with Microsoft. That’s awesome, it’s just that you’re not going to get an answer down to root cause analysis inside a free webcast. There’s just no way we can pull that off. However, I want to leave you with a couple places you can go for help. If you go to – let me go find the page for it – dba.stackexchange.com, you can post a multi-paragraph question. Just be cautious there because it’s going to be super-specific to your company. They’re going to want evidence that you’re going to have to be able to post publically. It’s not free consulting; it’s free help, but you’re getting to a point where Microsoft couldn’t solve the problem. What you may ask form the community might be kind of tough to do. The other thing you can do is, we have actual consulting. It just so happens that this is what we do for a living. If you go to brentozar.com and click on Critical Care up at the top, we do this three-day consulting thing where we get to the root cause of your performance problems. So it may be the point where you need that as well.

Tara Kizer: He also mentioned that the change to the latest cardinality, you know, resolved the issue. So if I wanted root cause analysis, I would switch it back and start troubleshooting what are the queries that are having this issue, what are the queries that have large unused memory grants? And maybe specifically, on problematic queries, switch those guys to the legacy cardinality estimator, but not the whole box.

Brent Ozar: And to some extent, if you flip the CE and it suddenly started working, there’s your root cause. You’ve got queries that don’t work well with that CE. You can either change the queries or you can change the CE.

Erik Darling: That is also valuable feedback for Microsoft. If flipping the CEs has that profound of an effect on a server where you go from being at a complete standstill with resource semaphore query compile waits to not having any and everything being fine and dandy, that’s valuable feedback for them that should be shared, I think. You know, as much as we poke and prod, we do like to see a good competitive product that we have to work with day in and day out…

Brent Ozar: And they want these edge cases too. They want to know when these edge cases hit that are so bad.

Erik Darling: So that’s connor.cunningham@…

Brent Ozar: And his home phone number is… So that’s it for Office Hours this week, everybody. Thanks, everybody, for hanging out and we will see y’all next week. Later.

Building SQL ConstantCare®: Now Free for Consulting Customers

$
0
0

We’re kinda like an emergency room for SQL Server: we specialize in a 3-day SQL Critical Care® where we work side by side with you, talking about your database pains, showing you the root cause, and then teaching you how to get permanent pain relief. That works really well, and we’ve kinda got it down to a science.

When we sign a contract with a client and pick a start date, we tell ’em it’s really important not to restart the SQL Server in the week leading up to the engagement. SQL Server keeps so much good stuff in memory – wait stats, file stats, index usage & recommendations – and all of that gets wiped out on an instance restart.

Nobody wants to restart their SQL Server instance.

to reboot your server

But you know how it is.

Life finds a way.

Some well-meaning sysadmin applies a patch, or doesn’t know about the engagement, or folks just plain old run into an emergency and have to fail over.

So we thought, why not get new clients started with SQL ConstantCare® as soon as they sign a contract? We could just give them free access to start right away, sending in their server’s metrics every day.

This is so useful for so many reasons:

  • Clients start getting advice faster – like warning them about backup issues, easy misconfigurations, and things they can fix without waiting for us
  • We get better historical data – so useful in cases like a dramatically underpowered server that doesn’t have enough memory to keep plan cache contents around for more than a few hours
  • We can spot things that only happen rarely – for example, if a server has a pattern where the wait stats look wildly different on Monday mornings, we can narrow that down and understand why
  • And even after the engagement – we can keep an eye on a client’s server and know if they’re making progress on their homework, and whether they’re seeing permanent pain relief

My favorite example of how it’s been useful was a client that emailed in one Monday morning and said, “It just happened again! We had another performance-wrecking emergency on Sunday. Did SQL ConstantCare give you enough data to tell you what it was?”

Whoomp

Ordinarily, that’d be really hard to do as a consultant – I just don’t have the ability to time travel backwards and see what was going on. But since they’d been in the program already, I was able to just open up my Power BI dashboard, look at their data, and said, “Yep.”

It’s about to get even better: our beta customers with the latest ConstantCare.exe can now send in queries and query plans, too. We’re not giving automated query advice yet – that’s coming next – but at least we can capture the query plans so consulting customers can get better answers about which queries were causing the big slowdown on Sunday.

Foundational Material: Microsoft SQL Server Book and Blogs From The Past

$
0
0

What Did Dinosaurs Watch On TV?

These are some of my favorite books and blogs from Microsoft from the way-back machine.

I can’t say every bit of information is still 100% true and should be followed to the letter, but hey, that’s what happens.

This is stuff I consider foundational material, though. I’ve learned a lot from them, and I think most people who use SQL Server regularly would benefit from reading them, if they haven’t already.

They’re mostly long defunct, so don’t hold your breath on comment replies.

Blogs

Defunct:
  • Craig Freedman: This blog is amazing. I wish Craig still wrote things. Anything, really.
  • Conor Cunningham: Should need no introduction, and has the best blog title of all time.
  • Bart Duncan: Bart was blogging about some pretty crazy problems back before a lot of people even knew these problems existed.
  • Query Optimizer Team: This preceded the current Query Optimizer Team blog, and bonus points for Microsoft’s first attempt at automatic indexing.
  • Ian Jose: Not the longest or most in-depth blogs, but I like me some straight and to the point wisdom too.
Storied History:

Books

Yes, I own all of these. The bottle of wine over there is empty, but it’s one of my favorites.

Do The Worm

Ken Henderson:

Ken’s books are amazingly detailed and still surprisingly relevant.

Practical Troubleshooting was a group effort, and features a chapter from Bob Ward.

Now, I have to point something out, here.

Bob is a badass.

This book was published in 2007. That means Bob has been working with SQL Server for like 25 years.

Bob deserves some kind of award.

Kalen Delaney:

These books are totally worth it for the pictures alone.

Aw lookit the baaaaabiesss

Back To The Future

These aren’t the only SQL Server books I own, and there’s a lot of great, newer stuff out there that you should probably read too.

With SQL Server’s new rapid development cycle, we’re not likely to see this kind of in-depth technical book about a specific release or technology. It would simply become outdated too quickly. Even online documentation becomes difficult. A good blog one day could be mooted by a CU the next.

It’s even more frantic in the cloud, where Azure routinely has features added and removed.

Thanks for reading!

It’s Time to Improve DBCC CHECKDB.

$
0
0

Microsoft has been resting on Paul Randal’s laurels for far too long.

From 1999 to 2007, Paul poured his heart and soul into rewriting SQL Server’s code to check for and repair database corruption. (For more about his illustrious career, read his bio and enjoy the infectious enthusiasm in his bio photo.)

Paul did great work – his baby has lived on for over a decade, and it’s an extremely rare cumulative update that fixes a bug in CHECKDB. I’d like to think it’s not because nobody’s looking, but because he wrote good, solid code that got the job done.

But Microsoft is coasting.

LSI MegaRAID 9285CV-8e

This is a $30 RAID controller.

Meet the LSI MegaRAID SAS 9285CV-8e, one of the most junior RAID controllers you can buy for a server. When he’s bored, he has a couple of homework tasks he likes to perform: Patrol Read and Consistency Check. Between these two, he’s checking all of the drives in the array to make sure they match each other, and that they can successfully read and write data.

This helps catch storage failures earlier with less data loss.

You don’t have to configure this or set up a schedule – he just knows to do it because that’s what he does. It’s his job. You trusted him with your data, so every now and then, he does his homework.

SQL Server needs to do that.

Some of the pieces are there – for example, SQL Server already has the ability to watch for idle CPU times and run Agent jobs when it’s bored. For starters, that’d probably be good enough to save a lot of small businesses from heartache. For the databases over, say, 100GB, it’d be really awesome to have resumable physical_only corruption checking – tracking which pages have been checked (just like how the differential bitmap tracks page changes), with page activity reset when the page is changed (again, just like the differential bitmap.) This wouldn’t count the same as a real CHECKDB, which needs to do things like compare index contents – but holy mackerel, it’d be better than what we have now.

Because I’m just so tired of seeing corruption problems, and we can’t expect admins to know how this stuff works. I know, dear reader, you think admins should know how to set up and run corruption checking because it’s just so doggone important, you say.

But if it’s so important…

Why isn’t SQL Server doing it in the background automatically like $30 RAID cards have been doing for decades?

Want it? Cast your vote here.


Tall Tales From Table Variables

$
0
0

Secret Squirrel

When you modify a table with multiple indexes, SQL Server may choose either a narrow plan, if it doesn’t think all that many rows are going to change, or a wide plan if it thinks many will.

In narrow plans, the work SQL Server has to do to modify many indexes is hidden from you. However, these plan choices are prone to the same issues with estimates that any other plan choices are. During a conversation about when temp tables or table variables are appropriate, it came up that table variables are better for modification queries, because not all the indexes had to be updated at once.

When we looked at the plan together, we all had a good laugh and no one wept into their lumbar supports.

Pantsburner

I created some nonclustered indexes on the Posts table that all had the Score column in them, somewhere. Without them, there wouldn’t be much of a story.

When we use this query to update…

BEGIN TRAN
	
	DECLARE @bad_idea TABLE (id INT NOT NULL)
	INSERT @bad_idea ( id )
	SELECT TOP 1 u.Id
	FROM dbo.Users AS u
	ORDER BY u.Reputation DESC
	
	UPDATE p
	SET p.Score += 1000
	FROM dbo.Posts AS p
	JOIN @bad_idea AS bi
	ON bi.id = p.OwnerUserId
	
	--ROLLBACK

We get this plan…

itsy bitsy teenie weenie LIAR

If you’re playing along at home, the single row estimate that comes out of the Hash Match persists along the plan path right up to the Clustered Index Update.

Since a one row modification likely won’t qualify for a per-index update, all of the updated objects are stashed away behind the Clustered Index Update.

It’s a cover up, Scully. This one goes all the way up the plan tree.

Crackdown

Swapping our table variable out, and running this query…

BEGIN TRAN
	
	CREATE TABLE #better_idea (id INT NOT NULL)
	INSERT #better_idea ( id )
	SELECT TOP 1 u.Id
	FROM dbo.Users AS u
	ORDER BY u.Reputation DESC
	
	UPDATE p
	SET p.Score += 1000
	FROM dbo.Posts AS p
	JOIN #better_idea AS bi
	ON bi.id = p.OwnerUserId
	
	--ROLLBACK

We get a much more honest plan…

Best Policy

The estimates are accurate, so the optimizer chooses the wide plan.

I can see why this would scare some people, and they’d want to use the table variable.

The thing is, they both have to do the same amount of work.

Warnings

If you use our First Responder Kit, you may see warnings from sp_BlitzCache about plans that modify > 5 indexes. In sp_BlitzIndex, we warn about this in a bunch of different ways. Aggressive locking, unused indexes, indexes with a poor read to write ratio, tables with > 7 indexes, etc.

You can validate locking issues by sp_BlitzFirst and looking at your wait stats. If you see lots of LCK_ waits piling up, you’ve got some work to do, and I don’t mean adding NOLOCK to all your queries.

DBAs Need a Jump Box or Jump Server.

$
0
0

Every now and then, you’re going to need to run a query that takes a long time. You’re going to want to make sure that it succeeds and that you can see the full output – even if your workstation disconnects – or maybe you want to check the status from home later.

You’re going to be tempted to remote desktop directly into the SQL Server itself and run the query there.

Don’t do that.

Did I ever tell you about the time a training class student started a long query, then came back to find that it had returned millions of rows, blew up SSMS’s memory, ended up filling the drive, and causing the VM to crash? That was awesome. Thank goodness nothing like that has ever happened in a production environment. <coughs>

What you want instead is a jump box or jump server: a virtual machine that lives where your servers live, so you don’t have to worry about uptime or connectivity. Install your tools there, the client for your monitoring software, SentryOne Plan Explorer, etc.

Not only does this come in handy for long-running queries, but also for emergency troubleshooting. Disasters are carefully timed to strike when you’re at your parents’ house, when somebody else was supposed to be on call, and when you didn’t bring your laptop with you. A jump box means you only have to get onto your company’s VPN, then remote desktop into your jump box, and you’re right at home.

Just be careful with capacity planning. When disaster strikes, if there’s only one jump box VM available, knife fights will break out for who’s able to log in. I’m personally a fan of a jump box for every single admin – that way if somebody hoses up their own jump box with a crappy installation or they want to reboot to fix something, it doesn’t break anyone else’s productivity. When your Recovery Time Objective is measured in minutes, you can’t afford to be waiting for a jump box.

Jump boxes were a big part of what enabled me to switch from Windows to Mac over a decade ago and continue to work happily there today. I get way less nervous about updates to my client machine when I know that the only stuff installed there is productivity applications. Worst case scenario, if my entire desktop or laptop blows chunks, I can still just remote into my jump box and keep right on working.

Building SQL ConstantCare®: Updating ConstantCare.exe

$
0
0

When we started designing SQL ConstantCare® (back before we had a name for it), I listed out the main components at the beginning of the design doc:

The beginning of a giant to-do list

The collector (which later became ConstantCare.exe) was the only piece that would run client-side. I wanted to keep end user support work to a bare minimum – if I could put something in the cloud, I wanted it in the cloud to keep our support costs low. It’s really easy to hop into your own cloud to dig into a problem – it’s much more painful to coordinate support calls with end users all over the world.

Then for each part (collector, ingestion, lab tech), I wrote up a list of things we had to have in the first private alphas, the first public betas, and the first version of the paid-for product. We had to make a lot of tough decisions along the way to the Minimum Viable Product – after all, I could only afford to hire one developer, and I wanted to ship sooner rather than later.

Here were the requirements for the first private build of ConstantCare.exe:

ConstantCare.exe v1 goals

Then for v2, as we started to learn more and scale it:

ConstantCare.exe v2 goals

Note that last line – “Self-updating.” ConstantCare.exe was the only part that would require end user intervention in order to upgrade, and I wanted to avoid that hassle. We had a brand new application, and I figured we’d be shipping rapid changes to ConstantCare.exe in order to fix bugs or collect different kinds of data.

Throughout development, I told Richie that my goals weren’t set in stone. If he tried to implement a particular goal, and it turned out to be painful, we could talk through it and change our minds. Some things turned out to be easy, so he added ’em in earlier builds, whereas some things turned out to suck.

Auto-updating ConstantCare.exe turned out to suck pretty bad.

Richie kinda-sorta got it to work with Squirrel, but for it to work, it needed to run the update as an administrator. I really didn’t want to have to hassle with end users setting up a scheduled task to run under an administrator account – if something went wrong, it could go really wrong, and our goal was lower support workloads, not higher.

We ended up scratching that goal and changed the way we ship ConstantCare.exe updates. We stayed in private beta for a longer period of time, making sure things were working well for the end users. Out of the private beta applications, we purposely picked a wide variety of server versions & types to get as much coverage as we could. Then, when we finally went public, we held off updating ConstantCare.exe as long as we could, focusing on cloud side improvements instead. Looking back, I’m glad we made that decision because we just haven’t needed to update ConstantCare.exe much, and I don’t see that changing – the big ROI is the data analysis in the cloud.

Having said that – we’ve published an update to ConstantCare.exe. To update yours, open a command prompt as administrator and type:

cd %localappdata%\ConstantCare\current
ConstantCare.exe

ConstantCare.exe will then do its regular thing of polling your servers for data, and then at the end, it’ll download the latest ConstantCare.exe and update itself from v0.16.1 to v0.20.17. You can tell which version you’re on by looking at the folder names in %localappdata%\ConstantCare.

This version runs faster because it collects less data about backups and Agent jobs – and then it uses some of that given-back time to collect index metadata. You won’t see an immediate difference in your emails, but we’re starting to build index recommendations. Stay tuned!

Free online training next Friday - register now for GroupBy September.

Do You Have Tables In Your Tables?

$
0
0

This Isn’t A Trick Question

Hopefully it’ll get you thinking about your tables, and how they’re designed. One of the most consistent problems I see with clients is around wide tables.

I don’t mean data types, I mean the number of columns.

Going back to Michael Swart’s 10% Rule, if your tables have > 100 columns in them, you’re likely going to run into trouble.

What Makes Them Bad?

They’re nearly impossible to index efficiently:

  • Queries will hit them in many different ways
  • WHERE clauses will be unpredictable
  • SELECT lists will vary wildly

When indexes pile up to support all these different queries, locking and blocking will start to become larger issues.

While some of them can be solved with optimistic isolation levels, writer on writer conflicts are really tough to avoid.

First Sign Of Problems: Prefixed Columns

Do you have columns with similar prefixes?

Iffy Kid

If you have naming patterns like this, it’s time to look at splitting those columns out.

I took the Users and Posts tables from Stack Overflow and mangled them a bit to look like this.

You may not have tables with this explicit arrangement, but it could be implied all over the place.

One great way to tell is to look at your indexes. If certain groups of columns are always indexed together, or if there are lots of missing index requests for certain groups of columns, it may be time to look at splitting them out into different tables.

Second Sign Of Problems: Numbered Columns

Do you allow people multiple, optional values?

Maybe So.

The problems you’ll run into here will be searching across all of those.

You’ll end up with queries like this

SELECT *
FROM dbo.Yuck
WHERE Tag1 = '...'
OR    Tag2 = '...'
OR    Tag3 = '...'
OR    Tag4 = '...'
OR    Tag5 = '...'

Which can throw the optimizer a hard curve ball, and make indexing awkward.

This should also most likely be broken out into a table of its own that tracks the Post Id and Tag Id, along with a table that tracks the Ids of each Tag.

A wider index across a narrower table is typically less troublesome.

Third Sign Of Problems: Lists In Columns

Poor Tags

This should be obvious, and has a similar solution to the problem up there.

Your queries will end up doing something like this:

SELECT 'stuff'
FROM dbo.Posts AS p
WHERE p.Tags LIKE '%...%'

Which can’t be indexed terribly well, even if you go our of your mind with trigrams.

I See Tables Within Tables

If you have tables with these patterns, it’s time to take a really close look at them.

I was not here

If you’re totally lost on this, Check out Louis Davidson’s book on relational design.

Stuff like this is easy to sketch out, but often difficult to get fixed. It requires application changes, moving lots of data, and probably dropping indexes.

It’s totally worth it when you get it done though, because it makes your tables far easier to index and manage.

You’ll need far fewer insanely wide indexes to compensate for bad design, and you’ll have way less head scratcher missing indexe requests to sort through.

Thanks for reading!

Free online training next Friday - register now for GroupBy September.

What Were Your Game-Changing Discoveries in SQL?

$
0
0

If you like learning random tips & tricks, there’s a great discussion going on in Reddit:

What are your game-changing discoveries in SQL?

I’m only going to give away the first one just to get you started: if you need to repeatedly comment and un-comment groups of code, do this:

-- /* First line. Removing the two dashes activates the block comment
SELECT 
patientname,
Patientid,
Language
FROM whatevertable
Where name = 'Whatever'

-- */ Last line. When the block comment is on, this terminates it

When you want to comment the whole thing out, just remove the top two dashes. You don’t have to put in the ending */ because it’s already there at the end, just silently getting ignored until it’s needed.

GENIUS. Head over to the Reddit thread for more.

Free online training next Friday - register now for GroupBy September.

What’s Different About SQL Server in Cloud VMs?

$
0
0

When you start running SQL Server in cloud VMs – whether it’s Amazon EC2, Google Compute Engine, or Microsoft Azure VMs – there are a few things you need to treat differently than on-premises virtual machines.

Fast shared storage is really expensive – and still slow. If you’re used to fancypants flash storage on-premises, you’re going to be bitterly disappointed by the “fast” storage in the cloud. Take Azure “Premium” Storage:

Premium compared to what, exactly

For about $260 per month, a P40 premium disk gets you 2TB of space with 7,500 IOPs and 250 MB per second – that’s about 1/10th of the IOPs and half the speed of a $300 2TB SSD. It’s not just Azure, either – everybody’s cloud storage is expensive and slow for the price.

So using local SSDs is unheard of in on-premises VMs, but common in the cloud. Ask your on-premises VMware or Hyper-V admin to give you 1TB of local SSD for TempDB in one of your guests, and they’ll look at you like you’re crazy. “If I do that, I can’t vMotion a guest from host to host! That makes my maintenance terrible!” In the cloud, it’s called ephemeral storage, and it’s so insanely fast (compared to the shared storage) that it’s hard to ignore. Not necessarily smart to use for local databases without a whole lot of planning and protection – but a slam-dunk no-brainer for TempDB.

Be ready to fix bad code with hardware. It’s so much easier in the cloud to just say, “Throw another 8 cores in there,” or “Gimme another 64GB RAM,” or “We’re hammering TempDB, and we really need something with faster latency for that volume.” On-premises, these things take planning and coordination between teams. In the cloud, it’s only a budget question: if the manager is willing to pay more, then you can have more in a matter of minutes. But in order to make that change, you really want to stand up a new VM with the power you need, and then fail over to it, which means…

“Alright, who ran Books Online through the Cloud to Butt Plugin?

Start with mirroring or Availability Groups. On-premises, you might be able to skate by with just a single SQL Server. You figure you hardly ever change CPU/memory/storage on an existing VM, so why bother planning for that? Up in the cloud, you’ll be doing it more often – and having your application’s connection strings already set up for the database mirroring failover partner or the Always On Availability Groups listener means you’ll be able to make these changes with less work required from your application teams.

Disaster recovery on demand is cheaper – but not faster. Instead of having a bunch of idle high-powered hardware, you can start with either no VMs, or a small VM in your DR data center. When disaster strikes, you can spin up VMs, restore your backups, and go live. However, you still need a checklist for everything that isn’t included in your backups: think trace flags, specialized settings, logins (since you’re probably not planning on restoring the master database), linked servers, etc. Thing is, people don’t do that. They think they’ll postpone the planning until the disaster strikes – at which point they’re fumbling around building servers from scratch and guessing about their configuration, things you would have already taken care of if you’d have budgeted the hardware (or used something like VMware SRM to sync VMs between data centers.)

You can save money by making long term commitments. I talked a lot about flexibility above, the ability to slide your VM sizing dials around at any time, but with Amazon and Azure, you can save a really good chunk of money by reserving your instance sizes for 1-3 years. I tell clients to use on-demand instances for the first few months to figure out how performance is going to settle out, and then after 3 months, have a discussion about maybe sticking with a set of instance sizes by reserving them for a year. The reservations aren’t tied to specific VMs, either – you can pass sizes around between departments. (This is one area where Google has everybody beat – their sustained use discounts just kick in automatically over time, no commitment required, but if you’d like to make a commitment, they have discounts for that too.)

Free online training next Friday - register now for GroupBy September.

What Kind Of Statistics Updates Invalidate Plans?

$
0
0

Basics

If you update statistics when the underlying objects haven’t been modified, plans won’t be invalidated.

That makes total sense if your statistics update doesn’t also change the statistics.  But what about when they do?

That seemed like a no-brainer to me. What if you used a higher sampling percentage and got a more accurate (or just different) histogram?

It turns out that doesn’t always trigger a recompile. At least not according to Extended Events.

Setup

This is the XE session I’m using. The settings aren’t very good for tracking recompiles generally in production.

You wouldn’t wanna use no event loss and a 1 second dispatch latency.

Just an FYI, copy and paste cowboys and girls.

CREATE EVENT SESSION recompile
    ON SERVER
    ADD EVENT sqlserver.sql_statement_recompile
    ( SET collect_object_name = ( 1 ), collect_statement = ( 1 )
     WHERE ( sqlserver.session_id = ( 59 )))
    ADD TARGET package0.event_file
    ( SET filename = N'c:\temp\recompile' )
    WITH ( MAX_MEMORY = 4096KB,
           EVENT_RETENTION_MODE = NO_EVENT_LOSS,
           MAX_DISPATCH_LATENCY = 1 SECONDS,
           MAX_EVENT_SIZE = 0KB,
           MEMORY_PARTITION_MODE = NONE,
           TRACK_CAUSALITY = OFF,
           STARTUP_STATE = OFF );
GO

Here’s the table I’m using, which is simple enough.

CREATE TABLE dbo.stats_test (id INT, filler_bunny VARCHAR(100));

INSERT dbo.stats_test ( id, filler_bunny )
SELECT x.n, REPLICATE(ASCII(x.n), 50)
FROM (
	SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY @@SPID)
	FROM sys.messages AS m
	CROSS JOIN sys.messages AS m2
) AS x (n);

First, I’m going to create some statistics with a really low sampling rate.

CREATE STATISTICS sst ON dbo.stats_test(id) WITH SAMPLE 1 PERCENT;

It’s going to be the only stats object on the table.

EXEC sp_helpstats @objname = N'dbo.stats_test', @results = 'ALL';

If I run this query, the plan will compile.

SELECT COUNT(*)
FROM dbo.stats_test AS st
WHERE st.id > 10000;

Then update stats with FULLSCAN and re-run the query above…

UPDATE STATISTICS dbo.stats_test WITH FULLSCAN;

And my Extended Event session is empty. Unless I create stats on a column my query isn’t touching.

CREATE STATISTICS sst2 ON dbo.stats_test(filler_bunny) WITH SAMPLE 1 PERCENT;

Because I know you’re going to ask — yes, the histogram is different.

Apparently this doesn’t change SQL Server’s view of things.

Before

After

When Does It Change?

I’m starting to really hate trivial plans (more). If I change my query to this:

SELECT COUNT(*)
FROM dbo.stats_test AS st
WHERE st.id > 10000
AND 1 = (SELECT 1);

Updating the statistics with FULLSCAN, after creating the statistics and running the query, a recompile is triggered.

WHY YOU

Stored Procedures, Too

It’s not just the Trivial Plan, it’s also the Simple Parameterization, which means…

CREATE PROCEDURE dbo.test (@i INT)
AS
BEGIN
SELECT COUNT(*)
FROM dbo.stats_test AS st
WHERE st.id > @i
AND 1 = (SELECT 1);
END

Even with a stats update using FULLSCAN, this won’t recompile.

Unlike the ad hoc query, this won’t recompile if I create an unrelated statistics object.

Using a more complicated example in Stack Overflow results in the same thing.

CREATE PROCEDURE dbo.stack_test (@i INT)
AS
BEGIN

SELECT COUNT(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
AND u.Reputation = @i

END

EXEC dbo.stack_test @i = 1

UPDATE STATISTICS dbo.Users WITH FULLSCAN;
UPDATE STATISTICS dbo.Posts WITH FULLSCAN;

What Does This Mean For You?

When you update statistics and data hasn’t changed, your plans won’t recompile. This is sensible.

When you update statistics and change your histograms, your plans may not recompile if they’re trivial and simple parameterized, or parameterized in a stored procedure.

This is perhaps less sensible, if you were counting on stats updates to trigger a recompilation because you’re trying to fix parameter sniffing, or another plan quality issue.

Thanks for reading!

Free online training next Friday - register now for GroupBy September.


Free SQL Server Training Next Week at GroupBy

$
0
0

It’s time for another day of free training for the community, by the community. Here’s the lineup you voted in for next Friday’s free GroupBy.org conference:

Register now to watch live for free. If you can’t make it, no worries – sessions will be recorded and you can watch past sessions for free.

Free online training next Friday - register now for GroupBy September.

First Responder Kit Release: What Does A Fish Know About Friday?

$
0
0

I know, it seems like just yesterday I was doing one of these releases.

But no, it was three weeks ago.

You’ve just been drunk for a really long time.

You can download the updated FirstResponderKit.zip here.

sp_Blitz Improvements
#1698 – Now warns you about any SQL Modules that have ANSI_NULLS or QUOTED_IDENTIFIER off. Thanks @MisterZeus!
#1719 – @TheUsernameSelectionSucks pointed out a typo. Model, msdb. Who can tell the difference?

sp_BlitzCache Improvements
#1352 – If you ever looked at the “weight” columns and thought they looked weird, we were right there with you. They should be fixed now. If you find anything really far off, let us know!
#1706 – We now warn you about compilation metrics, like compile time, cpu, and memory, if they surpass certain thresholds.
#1724 – After calling index spools the most passive aggressive plan operator for ages, I finally decided to do something about it. I break them down and report them as missing index requests in the clickable missing index column. The improvements aren’t an exact science, but it’s a good start.
#1732 – Found some weird cases where missing index counts were off when multiple plans were in the cache for the same query. Fixed!

sp_BlitzFirst Improvements
#1708 – Switched from a static list of ignorable waits to using the temp table we build, so they only have to be maintained in one place.
#1735 – Ignore poison waits unless they’re > 1 second.

sp_BlitzIndex Improvements
#1705 – In some of the checks around nonclustered indexes, we were counting disabled and hypotheticals towards the total.

sp_BlitzWho Improvements
#1721 – @osumatt fixed things up so we get all the fancy memory grant column in 2017!

sp_AllNightLog and sp_AllNightLog_Setup Improvements
#1727 – @dalehhirt checked in a change that allows you to ignore databases on the restore side.

sp_BlitzQueryStore Improvements
Nothing this time around – WON’T SOMEONE PLEASE USE THE QUERY STORE?

sp_DatabaseRestore Improvements
Nothing this time around

PowerBI
Nothing this time around

sp_BlitzLock
Nothing this time around

sp_BlitzInMemoryOLTP Improvements
Nothing this time around

sp_BlitzBackups Improvements
Nothing this time around

sp_foreachdb Improvements
Nothing this time around

You can download the updated FirstResponderKit.zip here.

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs, heh.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

Free online training next Friday - register now for GroupBy September.

[Video] Office Hours 2018/8/29 (With Transcriptions)

$
0
0

This week, Brent, Erik, and Richie discuss Microsoft cumulative updates, AlwaysOn Encrypted, query tuning, poison waits, the DBA career, CXPACKET waits, THREADPOOL issues, reporting services, and more.

Here’s the video on YouTube:

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

If you prefer to listen to the audio:

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours Webcast – 2018-08-29

 

Brent Ozar: John asks a question that is a stumper. John says, “Microsoft released 2017 Cumulative Update 10 yesterday. I tried to look through the hotfixes included, but I don’t see any reference to the brand new security hotfix they just put out. How can we tell if CU10 includes the security hotfix or not?” None of us know. For those, if you’re only listening to the podcast, we’re all doing various interpretive dance here.

Erik Darling: Yeah, you have to install it to find out what’s in it.

Brent Ozar: Yeah, and then you would even have to know how to trigger whatever the GDR hotfix was, how to trigger whatever thing it’s doing in order to improve your security. That is, what we call, disappointing.

Erik Darling: Yeah, well I mean, Microsoft’s documentation is supposed to be open source, so you could maybe ask them to improve upon that.

Brent Ozar: Or, you could submit two pull requests. You could submit one pull request that says it has the security fix and one that says it doesn’t and see which one they accept.

Erik Darling: Yeah, play both sides of that coin.

Richie Rump: You don’t ever do that in my code base, Brent Ozar. You don’t ever get to do that.

Brent Ozar: Oh god, I check in some pretty crappy pull requests, I will say that. Richie found some of my terrible SQL the other day and he’s like, “Brent, this can’t possibly be right.”

Richie Rump: I’m not saying I spend all afternoon fixing that yesterday. I’m not saying that at all; none whatsoever. But the unit tests passed and that’s the important thing.

Erik Darling: Ladies and gentlemen, we have a man who has eaten a $1000 pizza by telling people to make their queries SARGable using upper in a where clause. That’s where we’re at.

Brent Ozar: You know what, it wasn’t – even worse, I’ll raise you more than that. I’ll raise you a $1000 pizza over that. It wasn’t in the where clause; it was in a join.

Richie Rump: It was in a join. It totally was in a join.

Brent Ozar: I was uppercasing two sides of a join. And, of course, Richie, god bless him, has to keep the poker face when he comes in and asks me, “Hey, Brent, can you tell me a little bit about what’s going on?”

Richie Rump: No, I’m pretty sure it was, “It’s Brent Ozar’s fault. Look at this line…” And then he goes, “We need this.” And I’m like, “Okay.” And I start working some derived table magic and make it look like Frankenstein’s query.

Brent Ozar: That’s not pleasant.

Richie Rump: But it went from like a minute 20 with this one particular set of data, and it went to two seconds and I’m like, okay, we’ll call this one done.”

 

Brent Ozar: Yeah, on a related basis, Nick also asks – he says, “It seems like the quality’s been going down on Cumulative Updates lately. We saw the ones recently where they pushed out a Cumulative Update and then rolled it back for the security fix, pushed out again another change and rolled it back. Do you have any comments on that?” My thought is, yeah it does seem like the monthly cadence for patches is a little bit more than SQL Server can handle right now, or a little bit more than Microsoft’s testing seems to be able to handle right now. I don’t blame them. It’s hard. There’s a huge surface area to cover, you know, but yeah. I have a much lower confidence level in Cumulative Updates than I used to have.

Richie Rump: And we’re a small shop, right. I mean, we just have a code base that we started a year and a half ago and I’ve got, you know, a whole slew of unit tests and it goes through. And it just this last week alone, I go and put a fix in and then something else breaks, you know. Everything passed, everything looks great, but something else that we didn’t consider came up in fix. So imagine something a code base, a freaking SQL Server that’s been around for 30 years and how you would test all the permutations of all the crazy stuff that we see out in the wild. That’s really, really hard, especially when you’re trying to pump these things out as quickly as possible.

Erik Darling: I mean, we do a monthly release of the First Responder Kit, but thankfully the change churn is a lot smaller. I’m not saying, like, everything always goes well 100% of the time. There’s obviously some craziness out there in the world that we can’t account for either, but I’d like to do some due diligence testing and at least make sure everything compiles without too much red text. That’s just always a good sign.

Richie Rump: Are you guys hearing all that noise in the background?

Erik Darling: No…

Richie Rump: Oh good, my audio is – my parents are getting a screened in pool and they’re installing it right now, so they’re drilling all over the place.

Brent Ozar: Wow, nice. Are they using that with your rent money? They’re taking your rent money and putting it towards it?

Richie Rump: No, my rent money is still going to my house to my mortgage I still have to pay even though we’re not living there.

 

Brent Ozar: Tammy asks, “What are some reasons not to use Always Encrypted?”

Richie Rump: I guess the question for me would be, what are the reasons you should use Always Encrypted?

Erik Darling: What do you need to encrypt? What are you going after? Do your queries need to search encrypted things? These are all questions that I would want to ask up front.

Brent Ozar: Do you use linked servers or replication with it too, because that can throw some monkey wrenches in your ability to replicate that data from one place to another and decrypt it on the other side.

Erik Darling: Like, do you have to restore stuff to dev or do refreshes and stuff, because then you have to deal with whatever certificates, moving those things around. Obviously, security makes things more complicated. I know from the fact that last week, I entered an RSA token roughly 3000 times. But yeah, obviously, security makes things far more complicated. You know, if you want reasons not to use it, because it makes your life more complicated. If you want reasons to use it, because it probably may not get you fired having some.

Brent Ozar: And I know a lot of shops that needed encryption and they already rolled I in the app layer before SQL 2016 came out or when it was still Enterprise Edition only. So if you already needed it…

Erik Darling: What’s that plug-in – NetLib – that does the TDE?

Brent Ozar: Yeah, NetLib Encryptionizer, I think it’s called.

Richie Rump: Yeah, and some of the projects I’ve worked on, they just encrypted the data that needed to be encrypted, as opposed to just kind of doing everything. And with TDE, of course, you don’t lose the cert – I mean, there’s a lot of things that you could really screw up on some of that stuff.

 

Brent Ozar: Lee says, “Thank you all for the First Responder Kit and the other information and scripts you share. I’m being given new servers that the vendors are abandoning to use, so all these things help me get a handle on the new-to-me servers and fix the weirdness of vendor installs.” You’re welcome; Erik busts his hump on that.

Erik Darling: Vendors are the worst…

Brent Ozar: Especially us.

 

Brent Ozar: Pablo asks, “What influences SQL Server to choose between a hash match and a nested loops join? Identical dev and prod servers are showing this difference and one query took five minutes on one side and two hours on the other side.”

Erik Darling: Boy, there are so many things that could be different between dev and prod. It’s a terrifying question. I would start looking at the plans themselves. Like, you know, aside from just looking at the join type, look at how many rows we’re expecting from one place to another. Beyond that, look at what indexes are getting used. So cardinality estimation is going to be one, indexes in use are going to be one. Obviously, there are some join strategies that work better with indexed data, where has joins can excel with unindexed data, because it’s going to hash all the values anyway. Memory is another big one; available memory. If SQL Server is, like, no, you know what, we’re just not going to get a good enough chunk of memory to do this has join. I am going to go with this nested loops join. So there’s all sorts of strange considerations. There’s a lot to look at for you, young man.

Brent Ozar: Damn, the memory one’s a good one. I forget about that.

 

Brent Ozar: Gordon says, “This isn’t really a replication question…” Gordon, you’re pushing the limits. Gordon says, “I’m looking at migrating an on-premises environment including merge replication up to Azure SQL DB. Would SQL Data Sync be a replacement for the replication?”

Richie Rump: Do they have that in AWS with Postgres; I’m confused?

Brent Ozar: Merge replication – my question would be, what are you doing merge replication for? If you’re doing it for high availability between two places then Azure SQL DB kind of replaces that. If you’re doing it to let people, like, change stuff on-premises and change stuff up in the cloud and keep it in sync between the two, I would not do that. I don’t have any good points I would – like, so many times, I see people doing merge replication and they’re doing it between two shoe boxes. Like, one box can’t keep up with the load, and I look and it’s a VM with four cores and a VM with three cores and they’re like, we had to scale across separate boxes; maybe scale up a little bit?

 

Brent Ozar: Steve asks – he says, “ConstantCare sent me a poison waits email. Can you talk about poison waits a little?”

Erik Darling: Which one?

Brent Ozar: I should go look at Steve’s data. Hold on, let’s see.

Erik Darling: You should. Let’s pause for a moment.

Richie Rump: Yeah, we have, what, three poison waits warnings in ConstantCare now?

Brent Ozar: We do, so let’s see. He has – the one he’s getting is – I don’t see poison waits in that one. Where on earth – which poison wait is he getting? Steve, tell us about the one you’re actually getting.

Erik Darling: Steve, you’ve confounded us.

Brent Ozar: Yeah, I’m going in to look – oh, resource semaphore. Okay, so we got one on resource semaphore.

Erik Darling: Ooh, you’ve got some memory issues. More specifically, when SQL Server goes to run queries, it has these queues of memory that it sticks them in. And when certain queues fill up, they’re prioritized and they will hold on to – and other queries have to wait for those queries to finish and give up their memory so they can run. When you run into that, you’ve got resource semaphore. So it’s a memory wait and it’s caused by a mix of queries asking for large memory grants and queries asking for any memory grants getting prioritized.

So usually, what you want to start doing is, if you’re on a version of SQL Server that supports it, maybe Brent can tell me from the magical data – if you’re on, like, 2016 or 17 or 2014 with a Service Pack or 2012 SP3 or something, you can start looking at memory grants. You can use sp_BlitzCache and you can sort by memory grant and you can start looking at what queries are using a lot of memory. But that kind of downside to this is, if you’re hitting resource semaphore waits, you are most likely going to be hitting some cleared plan cache issues. So, like, when SQL Server goes to use memory for a query, if you are running out of it, like instance-wise, you don’t have enough memory to cache your data and then you don’t have enough memory to also run these queries, that memory has to come from somewhere.

The stuff that gets cleared out when a query needs memory is the plan cache and the buffer pool. One query can ask for, by default, up to about 25% of max server memory. And queries as a whole can actually take just about 75% of your server memory running all at once. So you could be clearing out a significant amount of plan cache and buffer pool data to run these big queries.

So take a look. If you can, run BlitzCache. Figure out what’s big by memory grant, if you’re on a version that supports it, start looking at those. Otherwise, you’re stuck running some crappy DMV queries that just constantly look at what’s running in the memory grants there.

Brent Ozar: And you also don’t take biblical action based on what comes out of this sp_BitzCache just the first time you run it. Because remember, like you said, maybe your memory’s been cleared out or parts of your buffer plan cache have been cleared out. Run this a few times over the course of several days, just to get a rough idea of which plans are in there recently.

Erik Darling: To kind of get a feel for that, Brent, you don’t mind scrolling down a little bit in the second window, there will be a little bit of information about what’s in your plan cache. So it will tell you how many plans you have and what percentage of them were created over different chunks of time. So if it’s, like, a lot of plans getting created in the last four hours, then obviously you have less plan cache stability than you’d want to start making those big decisions based on.

Brent Ozar: Good point.

 

Brent Ozar: Next up is [Xuan], I believe it is, says, “How can I grant users without sysadmin roles the rights to modify agent jobs that are owned by SA?” So like, they want to play around with agent jobs but not give people SA.

Erik Darling: I don’t know that you can.

Brent Ozar: I want to say that isn’t there a role for – I’m in the wrong place, of course. I think there’s a server level role for agent stuff. I could have sworn there was.

Erik Darling: I don’t know, but if you’re running into the situation where you have so many people who need to mess with agent jobs, I would just want to give them an agent jobs server for themselves that they can just go and do whatever to their heart’s content that doesn’t make them SA or doesn’t promote them beyond the point that you would be comfortable with on the prod server.

Brent Ozar: I adore that so much.

Erik Darling: Can you do that with SQL Server Express, or do you actually need Standard?

Erik Darling: Well, Express doesn’t have agent.

Richie Rump: Oh, that’s right.

Brent Ozar: Diabolical.

Erik Darling: Stupid Express…

 

Brent Ozar: Samantha, who shares the name of one of my favorite Big Brother people this season – Samantha asks, “Have any of y’all ever reached a point in your career where you doubted if it was the right path for you? There are days where I feel like a rock star and there are days when I wonder how the hell I got here. Does anyone else feel that way? I should have been a geologist; rocks don’t change this fast.” When was the last time – yeah, I’m sure it’s happened to all of us. When was the last time it happened to you guys?

Erik Darling: I don’t – I think, once I kind of got into SQL Server, I knew that’s what I wanted to do. Whether it’s what I always want to do in my life is another story, but you know. I’m comfortable with it as a career path now. I like the rate of change, you know. It’s nice having new stuff to look forward to and learn about. To be honest, if SQL Server is going to continue the every two years thing that they were on, or even longer stretches earlier, I think I would get bored because you sit there and you stare at the same product for however long. Like, I’m sick of 2008 R2. I can’t wait for those to go away.

Richie Rump: Well, but to be fair, you don’t look at the whole product. You look at, especially, the engine, right? I mean, you’re not SSIS on top of that.

Erik Darling: I would be really bored if I was doing that.

Richie Rump: Oh no, absolutely, yeah.

Brent Ozar: Yeah, I hit it big time when I was on Richie’s track. When I was a developer in the late 90s, early 2000s, I was like, I am on the complete wrong track. I suck at learning languages. I love learning; I suck at learning languages. It’s like, syntaxes that are different between platforms, and the more that I sat on the developer track, I’m like, I am never going to get better at this. I don’t enjoy it, so I’m going to go somewhere where I still have to learn but the language doesn’t change. Like, SQL has been essentially the same for forever, so I totally felt that way and that’s when I switched over to databases fulltime.

Richie Rump: I think I was a little different because I started off as a developer. I kind of always knew I wanted to be a developer when I got to college. I was going to do electrical engineering, and then psychology, and I kind of fell into the computer thing and I’m like, writing software, that’s my jam. And then I kind of moved up in the company I was in. I was a senior and then they made me a manager and then I was doing project management stuff and doing all this other manager type stuff. I did that for about five years, and then I’m like, I’m not happy. I don’t really like what I’m doing. I don’t really like dealing with some of these people and having to listen to what they think the problem is. That’s not the problem, and all this other stuff.

And then when I went back to development, I was so thrilled. And I consider working with SQL Server and doing database development, development. So that’s still development in my head, but man, I strayed from my path and I was just so miserable. It was so awful.

Brent Ozar: I think it’s also – I was just talking to a client the other day about this – there’s never been a better time if you want leave database administration. There’s never been a better time to go down all kinds of different tracks. Same thing with development. I mean, the whole world is open. There’s all kinds of different data jobs that you can go pursue if you want to. I don’t want people to think that the DBA job is dead, that the developer job is dead, that database development is dead, anything like that.

These careers are hugely strong. But if you find out that it’s not for you, there’s never been a better time to go look at what else is happening in your organization and go, how could I spend more of my free time doing that the next fulltime position I get is only doing that piece?

And when you write your resume, I know a lot of folks who put one page of qualifications of all the things that they do that they’ve done in the past and three quarters of them, they hate doing. It just burns their soul to have to deal with that stuff. Don’t put it on your resume. Don’t put linked server troubleshooting on the front page of your resume if that’s something that you hate.

Richie Rump: So I’m taking VB3 off my resume right now.

Brent Ozar: Bad news, it’s now supported in serverless. We’re going to move exclusively to Visual Basic.

Richie Rump: In Access 2.0…

Brent Ozar: Let’s see, Samantha says, “My manager wants to make me a supervisor now. I haven’t even reached the pinnacle of my career yet.” Well I hope not. None of us have reached the pinnacle of our career yet. And says, “I love IT. I love DBA. I’ve hit a crossroad.” Okay, so that I really can’t speak to because I went down that path. I thought that there was a ceiling limit on database administration, and when my manager said, do you want to lead the team? I was like, sure, management? Why not? I hated management. I suck at management, as all of our employees can tell you. I am hurp a durp completely at management. It is a totally different skill set than technical and I have huge respect for people who are good at management; people who can manage other people, drive them toward a particular goal or whatever. And if you don’t like it, don’t. there are so many places that you can go technically without having to manage people.

Richie Rump: Yeah, I think I was a little different from your experience, Brent, where I actually did it. I was good at it. I delivered the largest project that the company had ever even approached. But what I didn’t like was having to deal with middle-management because there’s such a power struggle. Dealing with the CIO and CEOs, they’re phenomenal. They’re great people. They want things to get done. Dealing with the people that actually get the work done, the lower level workers, they’re great because they just want to get the work done.

And then you get these power struggles within middle-management and I moved into middle-management and it was awful. I mean, people were backstabbing me and they were backstabbing each other and they’re all jockeying to get upper-level stuff. And I’m like, I just want to get this project done. Why are you guys – this is terrible. This is awful. We’re supposed to be working as a team, right? I did a trust-fall thing and you caught me and we were supposed to be a team, and it’s not the way some organizations work.

So I guess, if you think that you’d be good at it and it’s interesting to you, then go for it. Try it out. Maybe you could do it on a trial basis. But if it’s something like maybe, maybe not, I kind of like what I’m doing now, then hang back.

Erik Darling: Until this day, they sabotage Richie’s audio.

Richie Rump: Again?

Brent Ozar: Yes.

 

Brent Ozar: Let’s see. Joseph says, “Can y’all give a sentence or two about how CXPACKET waits in SQL Server 2016 Service Pack 2 can be caused by bad query estimates?” Let me rephrase it, how can bad estimates cause CXPACKET waits?

Erik Darling: I don’t necessarily think that it’s a bad estimate thing. I think that it’s a bad distribution thing. So usually, when I see CXPACKET waits that are like bad, bad, it’s for one of a few reasons. Either the CPUs are just tanked and they can’t talk to each other fast enough, so CXPACKET ramps up. Because if you have threads are just sitting there saying I need to talk to you but I can’t talk to you yet, they’re getting scheduler yielded all over the place, they’re in balanced power mode or something and they’re just not running up at full speed. They’re running at like half or 75% speed, so they’re not talking to each other as quickly as they should be.

And then, finally, there is just when parallelism is skewed all over the place. And I wrote a blog post about it recently where you can see how that will trigger bad CXCONSUMER waits; a query that ran for two days with one thread getting bajillions of rows. So that’s really where I see it going. I don’t think a bad estimate – maybe if like, you know, SQL Server really overestimates the number of rows that are going to come out of something and decides to use a parallel plan instead of a serial plan then maybe. Because that would potentially lead to the kind of skews I’m talking about where, like, one value or one set of values ends up demolishing one thread and all the other threads are sitting around doing absolutely nothing, or close to nothing.

So that’s what I’d check. The only thing is, that stuff isn’t in cached plans or anything that I could tell you anything useful about or like how to go find it. It’s something that you really have to catch in action. You have to, like, narrow it down to which plans are generating that kind of rotten CXPACKET and then go troubleshoot them one by one.

 

Brent Ozar: Joe asks, “Similar to a previous question, why would we not want to use row or page compression; it seems like a no-brainer?”

Erik Darling: It’s like – heavily modified tables, I think, have issues with compression. I can’t remember the ins and outs of it because it’s been a long time since I’ve thought about row and page compression in depth. But I think, if it’s not a heap and you insert the data to it then compression is delayed or something, or compression doesn’t kick in until you rebuild or something weird. I don’t know, there’s something odd about it. [Rinchy Shay] wrote amazing blog posts about it ten years ago that…

Brent Ozar: Yeah, and every time somebody asks us that, we have to still go back to the books. There’s also something to do with if the data set already fits in memory and you just incur extra CPU on crunching it, that was another one.

Erik Darling: You know what I always hated about it is, like, when – I was just like, cool, data compression. We’ll use this stuff, we’ll compress stuff and then maybe we’ll do better with memory usage. But as soon as you read that stuff from disk, it’s uncompressed and you need the same amount of memory to manage the uncompressed data. I was like, I really wanted that to be a thing.

Brent Ozar: Plus these days, a lot of times when people are asking about compression – oh, there’s two other things; that it doesn’t compress off-row data, so if you have big strings. But these days, when people ask about compression, I’m usually like, haven’t you checked out column store? If you really want amazing compression, that seems to be a better way to do it with the kind of data that people expect to compress.

Erik Darling: Especially the kind of indexes that people are making when they want the data compression. Like, these big honking indexes with a billion includes and key columns everywhere and it’s never anything pretty. It’s never for a pretty reason that people are like, maybe we’ll compress a few things. It’s always some big awful ugly table that is full of giant, giant columns and you’re just like, no…

 

Brent Ozar: Next, Jeremy asks, “We’re running Always On Availability Groups and we have a File Share Witness. The policy default is if the resource fails, try to restart a max one time in 15 minutes. What do you recommend?” I don’t think we’ve ever told anyone to change that. I don’t think I’ve even ever looked at it. But the two people I would ask are either Edwin Sarmiento, who teaches Availability Groups classes for us; Edwin Sarmiento or Allan Hirt, who has a book coming out any day now about SQL Server Availability Groups and he may even cover it in there.

 

Brent Ozar: Steven says, “Any suggestions for tracking down the cause of this error; SQL Server failed to connect with error code… to spawn a thread to process a new login or connection?”

Erik Darling: Yeah, baby, you got THREADPOOL. You ran out of worker threads.

Brent Ozar: And where would you go troubleshooting that?

Erik Darling: Well, let’s see. That’s a tough one to catch, because it’s either happening and hitting F5 is really difficult, or it’s not happening and it doesn’t look like anything of the sort is going on. This is where a monitoring tool – actually no, because monitoring tools blank out for that too. Monitoring tools don‘t have threads.

So this is where the remote DAC comes in really handy, because I would pay attention to when you see this in the error log. I would pay attention to your wait stats. So if you see this THREADPOOL wait inching up, then you need to have the remote DAC turned on, unless you’re connecting locally. But hopefully, you’re not already peeing into a server with THREADPOOL issues and running Management Studio and doing other stuff. So I’m going to assume that you’re a good person and you’re running remote SSMS to get into the server.

Turn on the remote DAC, get in there, use sp_BlitzWho or sp_WhoIsActive. Hit F5 when things start to go south or if you notice things getting weird, and you’ll be able to catch all the THREADPOOL goodness in action. THREADPOOL is, like I said, running out of worker threads. Don’t try to adjust the max worker threads, sys.configuration_option, because what that does is basically – picture 30 kids screaming in a classroom. If you turn that up, you just add more screaming kids to the classroom. It might fix the THREADPOOL, but your CPU’s not just overwhelmed with screaming kids. So don’t do that.

It really comes down to, usually, figuring out if the number of CPUs you have is good for your workload, which if you’re hitting THREADPOOL it most likely isn’t. And then, of course, tuning queries and indexes to prevent the kind of awful blocking scenarios that go on, because typically, when we see folks hitting THREADPOOL issues, it’s a lot of parallel queries starting up, running, reserving threads and getting blocked. And those threads don’t get given back, right. Those threads are reserved for those queries until the queries are done with them. So you just have all these queries piling up, reserving threads, blocked, sitting there doing nothing. The server might look bored too because it’s not doing anything. But anyway, that’s where I’d look.

Brent Ozar: Screaming kids, I don’t know that we’re ever going to beat that analogy; a group of screaming kids, just adding more screaming kids to the room. That’s fantastic.

Erik Darling: Yeah, that’s all it is.

 

Brent Ozar: And the last question we’ll take for today, Darshan asks, “For reporting purposes, we created a table and we dump data into it so users can query their PowerBI reports with it. Now the table is huge, so what would be the ideal solution to move it?” The thing I would say is, define what huge is, because for the rest of your life, you’re always going to be working with the largest table you’ve ever worked with.

Generally, when they talk about a very large database, a VLDB, the general standard is a billion rows in one table or 1TB worth of data. So when you say huge, generally, you’re talking over a billion rows in it. He says, “Should I think about moving it to Azure SQL DW?” You could. There are all kinds of things that you can think of that point architecturally.

But when you come to the point where you’re thinking about a different database platform or different backend for reporting, then I would stop to think about bringing somebody in who’s done it before. And that’s not a sales pitch for us because we don’t do data warehouse architecture either. But you’re getting ready to make a decision that will haunt you for the next five years or more.

Bring in a specialist on the platform you’re thinking about working with just because if you only read Books Online, every one of them looks like it can handle huge datasets. You want to find somebody who’s done it and found the gotchas.

Richie Rump: And, you know, the add-on to that, I consulted with the company that brought in Teradata and they spent millions on this server; millions. And then things weren’t performant and it’s supposed to be fast, but because of the way they architected the data, it wasn’t. There you go.

Erik Darling: Yeah, if you’re on a version that supports it, I would want to look at column store for that very big table, just to see if it continues being very big, because it sounds like if you have that kind of table where folks are dumping lots of stuff in, that’s not going to be a long table; that’s going to be a wide table. There will be ass sorts of dumped badly named columns in there. So I would want to look at column store, either clustered or non-clustered, especially because you’re saying Azure SQL DW, those like pretty much your only options once you get up there. So if you want a preview of how things might look there then column store on-prem, if you’re available, would most likely be a good way to test that out.

Richie Rump: And for the record, Brent, ConstantCare now has over a billion rows in one table…

Erik Darling: What kind of compression are you using?

Richie Rump: What’s that? I don’t know, never heard of it.

Brent Ozar: And I can’t pull any of them down with PowerBI. It’s so funny, PowerBI, all the ads are like, just connect in and get all the data. That is not how it works. Alright, well thanks, everybody, for hanging out with us this week and we’ll see y’all next week on Office Hours.

Registering for Free Training Webcasts

Free online training next Friday - register now for GroupBy September.

Building SQL ConstantCare®: You Should Probably Learn Power BI. I Did.

$
0
0

When I was prepping for the PASS Summit last year, I wanted to unveil something awesome and free that would help you analyze your SQL Server’s performance. I ended up building the Power BI Dashboard for DBAs, an easy way for you to visualize your server’s wait stats.

When we started building SQL ConstantCare®, our mentoring service, I did not want to build a user interface. I know, it sounds crazy, but I wanted to just send you regular emails about what actions you needed to take on your SQL Server. I didn’t want to give you numbers – I just wanted to help you take your next steps towards healthier, faster databases.

Uh oh, things are getting worse

I didn’t want to send you numbers.

However, I needed those numbers, so I use Microsoft Power BI.

You might say, “Brent, that’s because you’re a Microsoft shill,” but get a load of this: Power BI Desktop is the only Microsoft component in our entire stack! The data is uploaded to Amazon Web Services, where Richie imports it into AWS Aurora PostgreSQL, and sends you automated emails with AWS Lambda code written in JavaScript.

If you work with data, Power BI Desktop is like a next-level Excel. I bet you know how to use Excel, and I bet you’re not wildly happy about it – you just use it in your development and DBA jobs for 100-level data plumbing tasks.

I firmly believe you will be a better developer and DBA when you learn Power BI Desktop. You won’t be great at it, and you’ll resent it just as much as you resent Excel, but it’ll serve you and your career better. It’s just a better way to render data.

“Okay, I’m in. What the hell is Power BI Desktop?”

Let’s start with Excel. There are two ways to get data into Excel:

  1. Manually enter it. This is bad, but let’s be honest, this is how most of the world uses Excel. They create a spreadsheet, put data in, and then it becomes the single source of the truth.
  2. Create a data source, like point to a SQL Server. Build your queries in Excel, import the data, and then whenever you want to see it updated, refresh it. This kinda uses Excel as a front end, like Access, giving you Excel’s cool graphing capabilities. This is awesome, but…not a lot of folks use Excel this way.

The free Power BI Desktop is specifically designed for that latter use case. It ain’t the home for your data – it’s just a front end for data that lives somewhere else. Like Excel in the latter use case, all your logic lives in a file, except instead of .xlsx, it’s .pbix. You can hand the .pbix file to somebody else and they can see your reports as-is, or hit Refresh and fetch the latest data from the sources you defined.

I’m a huge fan, but before you go hog wild, I wanna caution you about two signs to look for that’ll tell you when it’s time to bring in a Power BI pro:

  1. Watch out for files that go viral. Sooner or later, you’re going to hand that .pbix file to somebody else, and they’ll make their own changes, and your changes and data will get all out of sync. (That’s not a Power BI Desktop problem: that’s a source control and discipline problem.) That’s where Microsoft’s online services for Power BI come in. You can upload/publish your report to PowerBI.Microsoft.com or to your own on-premises report server, define access permissions for who can see it, configure how to connect to the database, and set up regular refresh rates for your data so everybody’s seeing the same stuff at the same time.
  2. Watch out for rapidly changing multi-GB data sets. If you build your report in a way that you have to import the entire data set every time you want to get the latest changes, you can have a bad time as your database grows. You’ll hit a point where you’ll want to switch to incremental refreshes of your data, and right now, that is by no means a trivial change. (That’s not a Power BI Desktop problem either – it’s just the same problem that every data warehouse load process faces as it switches from skunkworks to enterprise-grade.)

As long as you stay mindful of those two warning signs, you can just get by with the totally free Power BI Desktop, building reports for your own personal & occasional team use.

“So Brent, what are you doing with Power BI?”

In the last week, 247 customers uploaded 9,913 collections of DMV data for 1,482 servers. Just to pick a random DMV, we’ve got 117,633,525 rows in sys.dm_os_wait_stats right now. When you wanna be the Senior DBA for 1,482 servers, you gotta think at a larger scale.

I can’t import all of that data into Power BI Desktop, so my general workflow goes like this:

  1. In Postgres, write DMV queries to gather the right raw data (filtering out things that don’t matter, and creating new calculated columns that I can use for analysis)
  2. In Power BI, use those queries as data sources, and then build visualizations to spot problems faster
  3. Every day, open Power BI, look for customer servers that need attention. Sometimes, I can get to the root cause just with Power BI. Other times, I use it as a launching point, then go run Postgres queries directly against the customer’s DMV data. If I find that I needed queries to solve a particular problem several times, then I try to take that query back up to step 2, and bring the data into Power BI so I can move faster next time I see that problem.
  4. Really long term – if I see a problem happening over and over – turn that thought process and query into an automated rule that Richie can build.

In short, I’m using Power BI Desktop and Postico (like SSMS for Postgres) the same way I’d have used Excel and SSMS ten years ago. Power BI Desktop has gradually replaced Excel in my data work. It’s not perfect by any means – no copy/paste, can’t sort on two columnshard to reset filters (that item isn’t completed in Power BI Desktop), hard to hyperlink out of it, no Mac client, etc – but for me, the tradeoffs are worth the end result.

Ready to learn Power BI with me this month?

Dashboard in a DayOn September 18th, the Dashboard in a Day class will teach you how to:

  • Connect to, import, and transform data from a variety of sources
  • Define business rules and KPIs
  • Explore your data with powerful visualization tools
  • Build stunning reports
  • Share your dashboards with your team and/or the world

It’s taught by analytics pro and MVP Steph Locke. I’ll be in there myself – I’m invested in this tool, and I need to sharpen my skills.

Free online training next Friday - register now for GroupBy September.

Single-Column-Key Missing Index Recommendations are Usually Wrong.

$
0
0

When you’re looking at an index recommendation – whether it’s in an execution plan or the missing index DMVs – it helps to understand Clippy’s blind spots.

Let’s start with the small StackOverflow2010 database so you can follow along. (It’s just a 1GB direct download, and it expands to a 10GB database that reflects StackOverflow.com circa 2010.) With no nonclustered indexes present yet, run this query:

SELECT *
  FROM dbo.Users
  WHERE Location = 'India'
  ORDER BY DisplayName;

The execution plan pipes up with an index recommendation, but it’s terrible, because it would double the size of our table, including every single column:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Users] ([Location])
INCLUDE ([AboutMe],[Age],[CreationDate],[DisplayName],
  [DownVotes],[EmailHash],[LastAccessDate],[Reputation],
  [UpVotes],[Views],[WebsiteUrl],[AccountId])
GO

And here’s the kicker: note that the only key in the index is Location. Clippy says, “Oh, you don’t need to sort them by DisplayName – I’ll take care of that for you.”

After we create Clippy’s notoriously bad index, sure, he uses the index – but then he turns right around and sorts all of the results by DisplayName, every single time the query runs:

Clippy’s all out of ideas

Clippy says, “Man, I just have no idea how I might possibly make this query go faster. If you need me, I’ll be over here manually sorting rows. I wanna make sure you get the most of your $7,000 per core licensing for Enterprise Edition, so a-sorting I go!”

When you see an expensive sort operator in a plan, sure, you could just hover your mouse over that Sort, look at the “Order By” at the bottom of the tooltip, and use that knowledge to craft a better index than Clippy came up with.

This stuff is easy to spot in plans,
but less easy to see in the DMVs.

When you’re looking at a query plan with a ridiculous index suggestion like that, it’s easy to say, “Yeah, Clippy’s been drinking on the job again.”

However, when you’re looking at the output of the missing index DMVs – especially with a tool like sp_BlitzIndex – I’ve seen a lot of folks say, “Well, that must be the right index.”

Here’s an easy way to improve Clippy’s index recommendations: when you see a single-key recommended index with just 1-2 included columns, think about moving those includes to the key. In most cases, it doesn’t cost that much more in terms of index space or performance, and it can eliminate those extra pesky sorts even before you spot the query plans involved.

And if you see a single-column key with a ton of includes, it likely means someone’s doing a SELECT *. Clippy’s just throwing everything in the include – but you might actually need some of those columns to be sorted, too.

We’ll be talking about how to do that in this week’s Mastering Index Tuning class.

Free online training next Friday - register now for GroupBy September.

Viewing all 3167 articles
Browse latest View live