BKP_sreenivas: METRICS

CPU Usage Monitor Tool | CPU Load Monitoring - ManageEngine OpManager

Response Time is more means: It might be Network problem or LG issue or System issue or Application Issue.

eg :- Application Issue: Suppose if we submit one page, that page taking more time, because that page having more data.

High Response Time along with Memory Leakage, here first carry out Heap Dump analysis and check frequency of GC(JVM).

If Transaction Response Time More means we need to analyze

1. Web page Diagnostics Graph

2. Page Breakdown Graph.

Note : Users are increased, and Response Time should not increase but Users are increased, and Response Time also increased it's not good result

Response Time Increased and Users are decrease here the issue ie we will get Timeout Error.

Note : If Suddenly increase the Response Time means we need to Merge Response Time graph with Throughput Graph where the point of time Throughput is decreased or not.

Que :- If i run the scenario with 1 user for 10 Iterations. and i want to know the time for each Transaction rather than Avg Transaction Response Time ?

Ans : Check the Raw data from the Avg Transaction Graph.

If the % Processor Time value is constantly high, check the disk and network metrics first

[Processor] "% Processor time": if this counter is constantly high, say above 90%, then you'll need to use other counters (described below) in order to further determine the root cause of the CPU pressure.

[Processor] "% Privileged time": a high percentage of privileged time, anything consistently above 25%, can usually point to a driver or hardware issue and should be investigated.

High physical Disk Queue Length :-

Disk Queue length should be minimal, The Queue length is always should be 1 or 2 not more than this.

If you found High Disk QUEUE Length which is causing High Response Time.

HTTP/Req : Number of HTTP calls made by the client on the server.

End to End Response Time : It means Total Time that a user waiting for a Response after submitting a Request.

End to End Response Time = GUI Response Time + Network Time + Server Response Time.

Network Delay (or) Latency = Response Time - Process Time.

DB Time = CPU Processing Time + Non idle Wait Time.

Elapsed Time = Begin snap Time - End Snap Time

DB Seconds = Elapsed Time / DB Time

Available CPU = Number of CPU * Elapsed Time * 60

Throughput = (Physical Read+ Physical Write)* Block Size.

Step Download Time Out = Request + Process + Response Time.

Number OF Active VUsers = Total Connections at that Time/Connections made by one User.

(Note : Graphs -> Add New Item -> Add New Graph -> Web Resources -> Connections graph).

Note :- A large quantity of Hard Page Faults could signify you need increase the amount of memory (or) reduce the cache size on the server.

Note :- A Hard parse rate >100 sec :- It indicate that Bind variables are not being used effectively.

Note : If AWR reported High DB Time and High DB CPU time which is causes the Delayed Response Time

Physical Disk :

1. PhysicalDisk / % Idle Time – should not be less than 60%. Preferably staying at the top of your chart at all times.

2. PhysicalDisk / Avg. Disk sec/read – should not be higher than 20ms.

3. PhysicalDisk / Avg. Disk sec/write – should not be higher than 20ms.

4. PhysicalDisk / Current Disk Queue Length – should not be higher than 2.

Both (2) and (3) are the ones in Performance Monitor to measure the IO latency

Avg Disk Read/Sec : on an Avg the Disk Read/Sec is having bottleneck when it is taking constantly more than 15 ms

Avg Disk Write/Sec : on an Avg the Disk Write/Sec is having bottleneck when it is taking constantly more than 15 ms

Avg Disk Transfer/Sec : on an Avg the Disk Transfer/Sec is having bottleneck when it is taking constantly more than 15 ms

Avg Disk Queue Length : on an Avg the Disk Queue length is having bottleneck when the Queue length is more than 2

Current Disk Queue Length : The Current Disk Queue Length is having bottleneck when the Queue length is more than 2

TIP If Process/Private Bytes is increasing, but # Bytes in All Heaps remains stable, un managed memory is leaking.

TIP: If an application's logical thread count is increasing unexpectedly, thread stacks are leaking.

TIP: If both counters for 'logical thread count' and 'Private Bytes' are increasing, memory in the managed heaps is building up.

TIP: By default, the stack size on modern desktop and server versions of Windows? is 1MB. So if an application's Process/Private Bytes is periodically jumping in 1MB increments with a corresponding increase in .NET CLR LocksAndThreads/# of current logical Threads, a thread stack leak is very likely the culprit.

TIP: If total memory use is increasing, but counters for 'logical thread count' and 'Private Bytes' (measuring managed heap memory) are not increasing, there is a leak in the unmanage

TIP : If you found High Disk Queue Length which is causing high Response Time(Disk Queue Length should be Minimal should be < 2)

TIP : If the Disk Time & Processor Time values are Low but the Network Values are very high there might be a Capacity problem, to resolve this issue by Optimizing the Network card settings.

Memory :

Memory / Available MBytes – minimum 10% of memory should be free and available. Less than that usually indicating there is insufficient memory which can increase paging activity. You should consider adding more RAM if that happens.
Memory / Pages/sec – should not be higher than 1000. A number higher than that, as a result of excessive paging, usually indicates there may be a memory leak happening.
Memory / Cache Bytes – indicates the amount of memory being used for the file system cache. There may be a disk bottleneck if this value is greater than 300MB.
Page read/sec is high --> here the problem is insufficient RAM
Page Fault/sec : If Pages is not in the memory is called Page Fault.
Page Input/sec : The Total number of Pages read from Disk
Page Output/sec : The total number of pages removed from the Memory and Travelling towards to the Disk.

NETWORK : -

1. Network Interface / Bytes Total/sec – measures the rate at which bytes are sent and received over each network adapter.

healthy – less than 40% of the interface consumed

caution – 41% – 60%

critical – 61% – 100%

2. Network Interface / Output Queue Length – measures the length of the output packet queue in packets.

healthy – 0

caution – 1-2

critical – >2

Paging:

Paging is the process of moving pages between the memory and disk

Page Fault. A page fault occurs when a program attempts to access a block of memory that is not stored in the physical memory, or RAM. The fault notifies the operating system that it must locate the data in virtual memory, then transfer it from the storage device, such as an HDD or SSD, to the system RAM.

The performance of applications will suffer when there is insufficient RAM and excessive hard page faults occur

Hard page faults occur when the page is not located in physical memory or a memory-mapped file created by the process

Soft page fault occurs when the page is resident elsewhere in memory.

Note : Paging File / % Usage – should not be greater than 10%.

Pages/sec : High Pages/sec values can indicate insufficient RAM memory.

Process (_Total) \ Private Bytes : A consistently increasing value may be indicative of a memory leak.

Memory \ Pool Paged Bytes ;- Paged Pool is a larger resource than Nonpaged pool - however, if this value is consistently greater than 70% of the maximum configured pool size, you may be at risk of a Paged Pool depletion

Memory \ Committed Bytes ; if the value is constantly increasing without leveling off, you should investigate.

Memory \ Available Bytes :If this value falls below 5% of installed RAM on a consistent basis, then you should investigate. If the value drops below 1% of installed RAM on a consistent basis, there is a definite problem!

Memory \ %Committed Bytes in Use ;- If this value is consistently over 80% then your page file may be too small

PROCESSOR QUEUE LENGTH : Threads in the processor queue are ready to run but can’t, due to another thread running on the processor. Queues with sustained element counts greater than 2 are indicative of a bottleneck.

Idle Workers :

The number of idle workers can tell you if you’re short of resources or if you have them in abundance. A lack of idle workers means that there are too many requests coming in and your server will have to create new threads and processes to handle them. This could cause each request to take more time to process, which increases the latency of each request. If you’re facing this issue, you should consider increasing the resource allocation to your Apache web server.

On the other hand, if there’s always a large number of idle workers, you have allocated too many resources. You might as well use those resources for other services running on the same machine.

In either case, your decision completely depends on the amount of traffic you are getting on your web server and the amount of resources you have on your machines. Before making any changes to your resources, it’s best to observe the idle worker count over a few weeks. This will eliminate the possibility of having low traffic in an off-season

Physical Disk

BKP_sreenivas

Monday, January 31, 2022

METRICS

No comments:

Post a Comment

Thread