quinta-feira, 29 de outubro de 2015

SQL SERVER COUNTER - Essential Counters

Essential Counters

While the PAL template for SQL Server is very comprehensive, I also maintain a short list of performance counters that I use for spot checking server performance periodically.  To simplify covering each of the counters, why it’s important and what to look for, we’ll look at counters for CPU, memory, and the disk separately, starting with CPU.

CPU Usage

  • Processor
    • %Processor Time
    • %Privileged Time
  • Process (sqlservr.exe)
    • %Processor Time
    • %Privileged Time
The Processor\%Processor Time counter is probably the most familiar counter in performance monitor and gives us information about the total CPU usage for the server we are looking at unless it is a virtual machine.  For virtual machines (VMs), the Processor\%Processor Time counter is not a reliable measure of actual CPU usage for the server, and instead shows the percentage of the allocated CPU resources that the VM is actually using at a given point in time.  High values for % Processor  Time in a VM need to be checked against the host performance counters for the VM to determine if the VM is contending for physical CPU allocations with other guest VMs on the same host or not. In addition to the Processor counter we also want to collect the Process (sqlservr.exe)\%Processor Time counter to be able to balance whether a high CPU condition is actually the result of SQL Server utilization or another application that might be running on the server.  For most SQL Server workloads the average processor usage over time should be fairly consistent and unexplained increases from the baseline trending should be investigated further.  Values greater than 80% consistently should also be investigated to determine if the workload is exceeding the current hardware capabilities or for problems that are increasing the overall CPU usage for the instance.

Memory Usage

  • Memory
    • Available Mbytes
  • SQL Server:Buffer Manager
    • Lazy writes/sec
    • Page life expectancy
    • Page reads/sec
    • Page writes/sec
  • SQL Server:Memory Manager
    • Total Server Memory (KB)
    • Target Server Memory (KB)
The first counter that I start with when looking at memory usage on a SQL Server is the Memory\Available MBytes counter.  I start with this counter first to determine if Windows is experiencing memory pressure, which would affect the SQL Server performance counter values potentially.  The Memory\Available MBytes counter should ideally be above 150-300MB for the server, which leaves memory available for other applications to run without pushing Windows into a low memory condition.  When the Available MBytes counter drops below 64MB, on most servers Windows signals a low memory notification that SQL Server monitors for, and the SQLOS ( the ‘operating system’) inside of SQL Server will reduce memory usage as a result of this notification occurring.
After confirming that Windows has available memory to prevent SQL Server from shrinking memory usage, the next counters I generally look at are Buffer Manager\Page life expectancy and Lazy writes/sec.  The Page life expectancy (PLE) should be generally consistent on average, but may fluctuate with changes in the server workload, with lower values at peak periods.  In general, the more memory allocated to a SQL Server instance, the higher I’d like to see this counter’s normal value.  For example, a server with 230GB RAM allocated to the SQL Server buffer pool and a PLE of 300 would equate roughly to 785MB/sec of I/O activity to maintain the page churn inside of the buffer pool.  While it might be possible for the I/O subsystem to keep up with this demand, this represents a significant amount of page churn in the buffer pool and may be a sign of missing indexes, implicit conversions due to mismatched data types, and all kinds of other problems that can be fixed if identified as the root cause.
If you find a low PLE for the server, look at the other performance counters in the Buffer Manager category for correlation of overall memory pressure inside of SQL Server.  If Lazy writes/sec is consistently experiencing non-zero values with a low PLE and elevated values for Page reads/sec and Page writes/sec the server is experiencing buffer pool contention and you will need to go about troubleshooting this problem further.  If the Memory\Available MBytes counter was fluctuating and under the 64MB low memory threshold, looking at the Memory Manager\Total Server Memory (KB) and Target Server Memory (KB) counters will tell you if that has resulted in SQL Server reducing the size of the buffer pool in response.  On a stable system Total Server Memory (KB) will be lower than Target Server Memory (KB) during the initial buffer pool ramp up, but then remain equal under normal operation unless a low memory condition occurs.  If the server is a VM running on VMware, look for memory ballooning by the host with the VM Memory\Memory Ballooned (MB) counter for non-zero values to see if the host is causing the low memory condition.

Disk Usage

  • Physical Disk
    • Avg. Disk sec/Read
    • Avg. Disk Bytes/Read
    • Avg. Disk sec/Write
    • Avg. Disk Bytes/Write
  • Paging File
    • %Usage
  • SQL Server:Access Methods
    • Forwarded Records/sec
    • Full Scans/sec
    • Index Searches/sec
The Avg. Disk sec/Read and /Write counters provide the current average latency for each of the disks on the server.  Latency is one of the most important metrics for SQL Server I/O performance, but the latency should be compared to the size of I/Os that are being performed before determining whether or not a specific value is an indicator of a problem or not.  As the size of the I/O operations increases, so does the latency for the operation, so it would be normal to see higher latency values for a reporting workload doing large table scans vs. a transactional workload with smaller I/O operations occurring.  The Microsoft general recommendations for I/O latency for SQL Server are:
  • < 8ms: excellent
  • < 12ms: good
  • < 20ms: fair
  • > 20ms: poor
Over the last two years consulting, only a few of the servers that I’ve looked at during health checks meet the <20ms latency numbers for data and log files.  Most servers tend to fall into the < 30ms range for I/O latency per disk on average.  SSDs are changing this rapidly, and we’re seeing more and more disk configurations that include SSDs and the result is very low I/O latency.  When looking at our overall I/O for the server, reviewing the Access Method counters for Full Scans/sec, Forwarded Records/sec, and Index Searches/sec can give us a clue to the type of workload that is occurring to generate the I/O.

Summary

An important consideration when looking at performance counters, or any monitoring data from SQL Server for that matter, is that no single data point will tell you the root cause of a problem, if one occurs. For performance counters, you need to look across multiple counters for correlating information to pinpoint the root of problems.
For example, if we look at disk I/O and see high latency numbers with lower I/O sizes on average, we could infer that we have a disk bottleneck and we need to improve our I/O performance. However, if we look at the Buffer Manager\Page life expectancy and find that it is lower than our normal baseline numbers for the server, and then see the Buffer Manager\Page Reads/sec is higher than usual, the data would point to a memory contention in the buffer pool which is going to affect I/O performance to keep up with the page churn occurring in the buffer pool.
Understanding the counters and how they relate makes it much easier to spot when problems are occurring and then pinpoint where the actual root of the problem might be.

Nenhum comentário:

Postar um comentário