Quantcast
Channel: Brent Ozar Unlimited®
Viewing all articles
Browse latest Browse all 3153

How many CPUs is my parallel query using in SQL Server?

$
0
0

Parallelism can be confusing. A single query can have multiple operators that run at the same time. Each of these operators may decide to use multiple threads.

You set SQL Server’s “max degree of parallelism” to control the number of processors that can be used, but it’s not immediately obvious what this mean. Does this setting limit the total number of CPU cores the entire query statement can use? Or does it limit the total number of CPU cores that a single parallel operator within the query can use?

Good news: we can see how it works by running a simple test query and looking at some SQL Server DMVs.

My parallelism test setup

This test is run against a virtualized SQL Server with 4 virtual CPUs. SQL Server’s “max degree of parallelism” setting is at 2. My test instance is on SQL Server 2014.

My simple parallel query

All of my observations are taken while running the following test query against a restored copy of the StackOverflow database. I have ‘actual execution’ plans turned on so that when I query sys.dm_exec_query_profiles, I get some interesting details back.

SELECT p.Id, a.Id
FROM dbo.Posts as p
JOIN dbo.Posts as a on p.AcceptedAnswerId=a.Id;
GO

Here’s what the execution plan looks like:

parallel plan

The plan has seven operators with parallel indicators. Two of those operators are scanning nonclustered indexes on the Posts table.

The question we’re exploring here is whether the scans of those nonclustered indexes will use multiple threads that share the same two CPU cores, or whether they will each get two different CPU cores (and use all four).

Starting simple: tasks and workers

While my parallel ‘Posts’ query is executing on session 53, I can spy on it by querying SQL Server’s DMVs from another session:


select ost.session_id,
    ost.scheduler_id,
    w.worker_address,
    ost.task_state,
    wt.wait_type,
    wt.wait_duration_ms
from sys.dm_os_tasks ost
left join sys.dm_os_workers w on ost.worker_address=w.worker_address
left join sys.dm_os_waiting_tasks wt on w.task_address=wt.waiting_task_address
where ost.session_id=53
order by scheduler_id;

Here are a sample of the results:
schedulers-workers

The scheduler_id column is key. Each scheduler is mapped to one of my virtual CPU cores. My query is using 2 virtual CPU cores. At this moment I have two tasks on scheduler_id 0, and three tasks on scheduler_id 2.

But why leave it at that, when we can overcomplicate things? Let’s poke around a little more.

dm_exec_query_profiles and query plan nodes

There’s one more thing you need to know about our query plan. Each node in the plan has a number. The index scans are node 4 and node 6:
parallel plan -nodes
If I run my query with ‘actual execution plans’ enabled, I can spy on my query using the sys.dm_exec_query_profiles DMV like this:

select ost.session_id,
    ost.scheduler_id,
    w.worker_address,
    qp.node_id,
    qp.physical_operator_name,
    ost.task_state,
    wt.wait_type,
    wt.wait_duration_ms,
    qp.cpu_time_ms
from sys.dm_os_tasks ost
left join sys.dm_os_workers w on ost.worker_address=w.worker_address
left join sys.dm_os_waiting_tasks wt on w.task_address=wt.waiting_task_address
    and wt.session_id=ost.session_id
left join sys.dm_exec_query_profiles qp on w.task_address=qp.task_address
where ost.session_id=53
order by scheduler_id, worker_address, node_id;
cpus-threads-query-nodes

Click to nerd out on this in a larger view

Here’s a sample of the output:

I’ve only got two schedulers being used again – this time it happened to be scheduler_id 1 and scheduler_id 2. Looking at the node_id column, I can see that the index scan on query plan node 4 is using both scheduler_id 1 and scheduler_id 2: the very top line and the bottom line of the output show the current row_count for the runnable tasks. The scan on query plan node 6 isn’t really doing work right at the instance this snapshot was taken.

Recap: Maxdop limits the cpu count for the query

Even if your query has multiple parallel operators, the operators will share the CPUs assigned to the query, which you can limit by the ‘max degree of parallelism’ setting.

Credits: thanks to Paul White confirming that I had the basics of this concept right. If you liked this post, you’d love his great post, Parallelism Execution Plans Suck.

Announcing our Black Friday 2015 Sale.


Viewing all articles
Browse latest Browse all 3153

Trending Articles