SQL Server processor performance metrics – Part 1 – the most important CPU metrics (sqlshack.com)
Processor: % Processor Time
The % Processor Time counter shows the percentage of time that “the processor actually spends working on productive threads and how often it was busy servicing requests.”
As soon as the computer is turned on, the processor is executing threads with instructions. The processor is always active, even when there are no user or system threads, it is not completely idle as it executes the “idle thread” then
By design, there can be only one idle thread per processor. It has the lowest priority among all processor threads. The basic priority classes are idle, normal, high, and real. This means that an idle process is running on a processor only when there are no other threads. The idle process isn’t a real process that “eats” processor resources. It only occupies the processor until a real productive thread appears. A high percentage of system idle processes shows that the processor is unused most of the time
The % of the processor time counter is calculated as the difference between the total processor time and the time the idle thread was running
If the % Processor Time value is constantly high, check the disk and network metrics first. If they are low, the processor might be under stress. To confirm this, check the average and current Processor Queue Length value. If these values are higher than the recommended, it clearly indicates a processor bottleneck. The final solution for this situation is adding more processors, as this will enable more requests to be executed simultaneously.
The % Processor Time counter shows the percentage of time that “the processor actually spends working on productive threads and how often it was busy servicing requests.”
As soon as the computer is turned on, the processor is executing threads with instructions. The processor is always active, even when there are no user or system threads, it is not completely idle as it executes the “idle thread” then
By design, there can be only one idle thread per processor. It has the lowest priority among all processor threads. The basic priority classes are idle, normal, high, and real. This means that an idle process is running on a processor only when there are no other threads. The idle process isn’t a real process that “eats” processor resources. It only occupies the processor until a real productive thread appears. A high percentage of system idle processes shows that the processor is unused most of the time
The % of the processor time counter is calculated as the difference between the total processor time and the time the idle thread was running
On a multi-processor machine, an instance of the % Processor Time counter is shown for every processor on the server. On a four-processor machine, the % Processor Time instances will be enumerated 0 to 3. Also, an instance is shown for each processor thread
On a virtual machine, % Processor Time shows the value for the virtual, not the physical machine
The recommended value for % Processor Time is below 80%.
On a virtual machine, % Processor Time shows the value for the virtual, not the physical machine
The recommended value for % Processor Time is below 80%.
If the % Processor Time value is constantly high, check the disk and network metrics first. If they are low, the processor might be under stress. To confirm this, check the average and current Processor Queue Length value. If these values are higher than the recommended, it clearly indicates a processor bottleneck. The final solution for this situation is adding more processors, as this will enable more requests to be executed simultaneously.
If the Processor Queue Length value is low, consider using more powerful processors
If the disk and network metrics are elevated, start the analysis and troubleshooting with these metrics first
% Processor Time is also shown in Windows Task Manager, but in case of multiple SQL Server instances running on the same machine, this information is not useful for deeper analysis and troubleshooting, as it doesn’t indicate which instance is presented
% Processor time shown in Windows Task Manager
To be able to troubleshoot the processor issues, it’s necessary to know which processor is under stress and what SQL Server instances have issues. You can achieve this by monitoring additional parameters, such as ProcessID and then find the SQL Server instance that had such ProcessID. Another solution is to use a monitoring tool that shows the processor usage per SQL Server instance out-of-the-box
Graph showing values and threshold for % Processor Time
Process: % Processor Time
Windows Performance Monitor offers two counters with similar names Processor: % Processor Time and Process : % Processor Time. It’s important to distinguish between these two metrics and understand the information they show
As described above, % Processor Time shows the percentage of time that the processor works on non-idle threads
The Process: % Processor Time counter splits the processor time percentage per process, so each process is shown as a separate item in the graph. For more useful results, exclude the idle threads and total value
The total value for the processes time can be misleading. If the value is 100%, it can mean that all processes are using an equal share of processor time, or that one is using 90%, while others are struggling. That’s why monitoring the processor time for each process is recommended for troubleshooting
Processor Queue Length
The Processor Queue Length counter shows “a measure of the instantaneous size of the queue for all processors at the moment that the measurement was taken. The resulting value is a measure of how many threads are in the Ready state waiting to be processed.”
Note that the threads currently running in the processor are not included. Even on a multi-processor machine, there is only one queue for all tasks that are waiting to be processed
The typical value for this counter is 0 or 1. The recommended value is under 5 per processor. Some DBAs consider the situation to be alarming even when Processor Queue Length is constantly higher than 2. Along with high % Processor Time, a high Processor Queue Length value is a clear indicator of a busy processor
The Processor Queue Length value can be increased due to activity of the applications other than SQL Server, having more than the optimal number of SQL Server instances on a single machine, high number of compilations and recompilations, etc
If the disk and network metrics are elevated, start the analysis and troubleshooting with these metrics first
% Processor Time is also shown in Windows Task Manager, but in case of multiple SQL Server instances running on the same machine, this information is not useful for deeper analysis and troubleshooting, as it doesn’t indicate which instance is presented
% Processor time shown in Windows Task Manager
To be able to troubleshoot the processor issues, it’s necessary to know which processor is under stress and what SQL Server instances have issues. You can achieve this by monitoring additional parameters, such as ProcessID and then find the SQL Server instance that had such ProcessID. Another solution is to use a monitoring tool that shows the processor usage per SQL Server instance out-of-the-box
Graph showing values and threshold for % Processor Time
Process: % Processor Time
Windows Performance Monitor offers two counters with similar names Processor: % Processor Time and Process : % Processor Time. It’s important to distinguish between these two metrics and understand the information they show
As described above, % Processor Time shows the percentage of time that the processor works on non-idle threads
The Process: % Processor Time counter splits the processor time percentage per process, so each process is shown as a separate item in the graph. For more useful results, exclude the idle threads and total value
The total value for the processes time can be misleading. If the value is 100%, it can mean that all processes are using an equal share of processor time, or that one is using 90%, while others are struggling. That’s why monitoring the processor time for each process is recommended for troubleshooting
Processor Queue Length
The Processor Queue Length counter shows “a measure of the instantaneous size of the queue for all processors at the moment that the measurement was taken. The resulting value is a measure of how many threads are in the Ready state waiting to be processed.”
Note that the threads currently running in the processor are not included. Even on a multi-processor machine, there is only one queue for all tasks that are waiting to be processed
The typical value for this counter is 0 or 1. The recommended value is under 5 per processor. Some DBAs consider the situation to be alarming even when Processor Queue Length is constantly higher than 2. Along with high % Processor Time, a high Processor Queue Length value is a clear indicator of a busy processor
The Processor Queue Length value can be increased due to activity of the applications other than SQL Server, having more than the optimal number of SQL Server instances on a single machine, high number of compilations and recompilations, etc
A high number of processes waiting to be processed and high CPU usage require immediate attention. Start with checking Compilations/sec and Re-Compilations/sec. There is no specific threshold for these metrics – monitor them for a while and set a baseline for typical behavior. A high number of compilations and recompilations usually indicates poor reuse of the query plans. This can be fixed by optimizing your queries and stored procedures
However, there are some specific actions (such as creating a compressed full database backup) that use a lot of processor resources and cause other tasks to be queued.
Graph showing values and threshold for % Processor Time and Processor Queue Length metrics
However, there are some specific actions (such as creating a compressed full database backup) that use a lot of processor resources and cause other tasks to be queued.
Graph showing values and threshold for % Processor Time and Processor Queue Length metrics
Processor: % User Time
There are two modes for all processes executed on a Windows operating system: a user mode and privileged (kernel) mode. The operations that require direct access to memory and machine hardware are executed in the privileged mode (I/O operations, disk access, system services, etc.). All user applications and processes (this is where SQL Server belongs) are executed in the user mode
The Processor: % User Time value “Corresponds to the percentage of time that the processor spends on executing user processes such as SQL Server.”
In other words, this is the percentage of processor non-idle time spent on user processes
If the Processor:% User Time high values are caused by SQL Server, the reason can be non-optimal SQL Server configuration, excessive querying, schema issues, etc. Further troubleshooting requires finding the process or batch that uses so much processor time
The Processor:% User Time values close to 100% are a clear indication of processor bottlenecks. It means that user processes are using the processor all the time. To troubleshoot the issue, the first step is to determine the user application that causes the bottleneck
In an environment with multiple SQL Server instances running, it’s important to monitor each instance performance individually, to be able to easier troubleshoot excessive processor usage
There is no specific value for this counter, you should monitor its value for a while and set a baseline. However, as the recommended Processor: % Processor Time value is 80% and the recommended Processor: % Privilege Time is max 10%, this leads to the conclusion that the recommended Processor: % User Time is below 70%
Processor: % Privilege Time
The Processor: % Privilege Time counter shows how much time the processor spends on executing non-user processes, i.e. privilege (kernel) mode operations
“If the processor is very busy and this mode is high, it is usually an indication of some type of NT service having difficulty, although user mode programs can make calls to the Kernel mode NT components to occasionally cause this type of performance issue.”
This is another important counter that helps determining whether the processor problems originate from internal Windows processes, or are caused by a user application
The sum of Processor: % Privilege Time and Processor: % User Time gives Processor: % Processor Time There’s no need to monitor Processor: % Privilege Time, Processor: % User Time, and Processor: % Processor Time, as the third value can be calculated using other two. Monitoring any two of these will be sufficient
“the proportion of % Privileged Time to % User Time will depend on the system’s workload. A couple of common misconceptions are that there will be a balance between these two areas, and that each system will look the same. That having been said, a system performing the same workload over time should show little change in the ratio of Privileged to User time” [2]
If the Processor: % Privilege Time value is high, kernel mode processes are using a lot of processor time, the machine is busy executing basic operating system tasks and cannot run user processes and other applications, such as SQL Server. Using a more powerful processor can help
The recommended values for Processor: %Privilege Time are 5 to 10% , or maximum 30% of the % Total Processor Time (described next)
Values and threshold for Processor: % Privilege Time, % User Time and % Processor Time shown in a graph Total Processor Time
“This counter groups the activity of all the processors together to report the total performance of the entire system. On a single processor machine, this value will equal the %Processor Time value of the processor object.”
The Total Processor Time counter shows the percentage of time during which the machine processors are busy, i.e. the percentage of time the processors we executing useful operations. The time needed for idle thread processing is excluded
The difference between this counter and Processor: % Processor Time is that Total Processor Time shows a sum of processor time percentages divided by the number of processors, i.e. an average value for all processors, while the latter shows one entry for each processor on the machine
The recommended value for Total Processor Time is up to 80% during normal operation. The remaining 20% are left for complex operations that cause peaks in processor usage, to avoid bottlenecks. If the value is constantly close to 100%, this is an indication of a processor bottleneck. To start troubleshooting, check the Processor: % Processor Time value to determine what processor handles most load
When one of the processors is using much more processor time then the others, the Total Processor Time value might be misleading, so it’s better to monitor both total and per processor values
Values and threshold for Total Processor time shown in a graph % Total User Time
“This is the total user time of all the processors on the system.”
The Processor: % User Time counter shows the user time on a single processor
% Total Privilege Time “This is the total privilege time for all processors on the system collectively”
Similar as above, the Processor : %Privilege Time shows the privilege time on a single processor
Thread: % Processor Time : It shows an entity for each thread on the processor. On a single processor machine that supports two threads, two entities are shown. To distinguish the specific thread among others, use the ThreadID value. The thread processor usage can be modified by adjusting the thread priority
Other metrics that can be monitored per thread are: Thread State, Priority Base, Current Priority, Context Switches/sec, % Privileged Time, and % User Time.
No comments:
Post a Comment