Essential Counters
While the PAL template for SQL Server is very comprehensive, I also maintain a short list of performance counters that I use for spot checking server performance periodically. To simplify covering each of the counters, why it’s important and what to look for, we’ll look at counters for CPU, memory, and the disk separately, starting with CPU.CPU Usage
- Processor
- %Processor Time
- %Privileged Time
- Process (sqlservr.exe)
- %Processor Time
- %Privileged Time
Memory Usage
- Memory
- Available Mbytes
- SQL Server:Buffer Manager
- Lazy writes/sec
- Page life expectancy
- Page reads/sec
- Page writes/sec
- SQL Server:Memory Manager
- Total Server Memory (KB)
- Target Server Memory (KB)
After confirming that Windows has available memory to prevent SQL Server from shrinking memory usage, the next counters I generally look at are Buffer Manager\Page life expectancy and Lazy writes/sec. The Page life expectancy (PLE) should be generally consistent on average, but may fluctuate with changes in the server workload, with lower values at peak periods. In general, the more memory allocated to a SQL Server instance, the higher I’d like to see this counter’s normal value. For example, a server with 230GB RAM allocated to the SQL Server buffer pool and a PLE of 300 would equate roughly to 785MB/sec of I/O activity to maintain the page churn inside of the buffer pool. While it might be possible for the I/O subsystem to keep up with this demand, this represents a significant amount of page churn in the buffer pool and may be a sign of missing indexes, implicit conversions due to mismatched data types, and all kinds of other problems that can be fixed if identified as the root cause.
If you find a low PLE for the server, look at the other performance counters in the Buffer Manager category for correlation of overall memory pressure inside of SQL Server. If Lazy writes/sec is consistently experiencing non-zero values with a low PLE and elevated values for Page reads/sec and Page writes/sec the server is experiencing buffer pool contention and you will need to go about troubleshooting this problem further. If the Memory\Available MBytes counter was fluctuating and under the 64MB low memory threshold, looking at the Memory Manager\Total Server Memory (KB) and Target Server Memory (KB) counters will tell you if that has resulted in SQL Server reducing the size of the buffer pool in response. On a stable system Total Server Memory (KB) will be lower than Target Server Memory (KB) during the initial buffer pool ramp up, but then remain equal under normal operation unless a low memory condition occurs. If the server is a VM running on VMware, look for memory ballooning by the host with the VM Memory\Memory Ballooned (MB) counter for non-zero values to see if the host is causing the low memory condition.
Disk Usage
- Physical Disk
-
Avg. Disk sec/Read
- Avg. Disk Bytes/Read
-
Avg. Disk sec/Write
- Avg. Disk Bytes/Write
-
- Paging File
- %Usage
- SQL Server:Access Methods
- Forwarded Records/sec
- Full Scans/sec
- Index Searches/sec
- < 8ms: excellent
- < 12ms: good
- < 20ms: fair
- > 20ms: poor
Summary
An important consideration when looking at performance counters, or any monitoring data from SQL Server for that matter, is that no single data point will tell you the root cause of a problem, if one occurs. For performance counters, you need to look across multiple counters for correlating information to pinpoint the root of problems.For example, if we look at disk I/O and see high latency numbers with lower I/O sizes on average, we could infer that we have a disk bottleneck and we need to improve our I/O performance. However, if we look at the Buffer Manager\Page life expectancy and find that it is lower than our normal baseline numbers for the server, and then see the Buffer Manager\Page Reads/sec is higher than usual, the data would point to a memory contention in the buffer pool which is going to affect I/O performance to keep up with the page churn occurring in the buffer pool.
Understanding the counters and how they relate makes it much easier to spot when problems are occurring and then pinpoint where the actual root of the problem might be.
Nenhum comentário:
Postar um comentário