BKP_sreenivas: February 2022

Saturday, February 26, 2022

Processor: % User Time, Processor: % Privilege Time, total times and thread metrics

SQL Server processor performance metrics – Part 1 – the most important CPU metrics (sqlshack.com)

Processor: % Processor Time
The % Processor Time counter shows the percentage of time that “the processor actually spends working on productive threads and how often it was busy servicing requests.”
As soon as the computer is turned on, the processor is executing threads with instructions. The processor is always active, even when there are no user or system threads, it is not completely idle as it executes the “idle thread” then
By design, there can be only one idle thread per processor. It has the lowest priority among all processor threads. The basic priority classes are idle, normal, high, and real. This means that an idle process is running on a processor only when there are no other threads. The idle process isn’t a real process that “eats” processor resources. It only occupies the processor until a real productive thread appears. A high percentage of system idle processes shows that the processor is unused most of the time
The % of the processor time counter is calculated as the difference between the total processor time and the time the idle thread was running

On a multi-processor machine, an instance of the % Processor Time counter is shown for every processor on the server. On a four-processor machine, the % Processor Time instances will be enumerated 0 to 3. Also, an instance is shown for each processor thread
On a virtual machine, % Processor Time shows the value for the virtual, not the physical machine
The recommended value for % Processor Time is below 80%.

If the % Processor Time value is constantly high, check the disk and network metrics first. If they are low, the processor might be under stress. To confirm this, check the average and current Processor Queue Length value. If these values are higher than the recommended, it clearly indicates a processor bottleneck. The final solution for this situation is adding more processors, as this will enable more requests to be executed simultaneously.

If the Processor Queue Length value is low, consider using more powerful processors
If the disk and network metrics are elevated, start the analysis and troubleshooting with these metrics first
% Processor Time is also shown in Windows Task Manager, but in case of multiple SQL Server instances running on the same machine, this information is not useful for deeper analysis and troubleshooting, as it doesn’t indicate which instance is presented
% Processor time shown in Windows Task Manager
To be able to troubleshoot the processor issues, it’s necessary to know which processor is under stress and what SQL Server instances have issues. You can achieve this by monitoring additional parameters, such as ProcessID and then find the SQL Server instance that had such ProcessID. Another solution is to use a monitoring tool that shows the processor usage per SQL Server instance out-of-the-box
Graph showing values and threshold for % Processor Time
Process: % Processor Time
Windows Performance Monitor offers two counters with similar names Processor: % Processor Time and Process : % Processor Time. It’s important to distinguish between these two metrics and understand the information they show
As described above, % Processor Time shows the percentage of time that the processor works on non-idle threads
The Process: % Processor Time counter splits the processor time percentage per process, so each process is shown as a separate item in the graph. For more useful results, exclude the idle threads and total value
The total value for the processes time can be misleading. If the value is 100%, it can mean that all processes are using an equal share of processor time, or that one is using 90%, while others are struggling. That’s why monitoring the processor time for each process is recommended for troubleshooting
Processor Queue Length
The Processor Queue Length counter shows “a measure of the instantaneous size of the queue for all processors at the moment that the measurement was taken. The resulting value is a measure of how many threads are in the Ready state waiting to be processed.”
Note that the threads currently running in the processor are not included. Even on a multi-processor machine, there is only one queue for all tasks that are waiting to be processed
The typical value for this counter is 0 or 1. The recommended value is under 5 per processor. Some DBAs consider the situation to be alarming even when Processor Queue Length is constantly higher than 2. Along with high % Processor Time, a high Processor Queue Length value is a clear indicator of a busy processor
The Processor Queue Length value can be increased due to activity of the applications other than SQL Server, having more than the optimal number of SQL Server instances on a single machine, high number of compilations and recompilations, etc

A high number of processes waiting to be processed and high CPU usage require immediate attention. Start with checking Compilations/sec and Re-Compilations/sec. There is no specific threshold for these metrics – monitor them for a while and set a baseline for typical behavior. A high number of compilations and recompilations usually indicates poor reuse of the query plans. This can be fixed by optimizing your queries and stored procedures
However, there are some specific actions (such as creating a compressed full database backup) that use a lot of processor resources and cause other tasks to be queued.
Graph showing values and threshold for % Processor Time and Processor Queue Length metrics

Processor: % User Time

There are two modes for all processes executed on a Windows operating system: a user mode and privileged (kernel) mode. The operations that require direct access to memory and machine hardware are executed in the privileged mode (I/O operations, disk access, system services, etc.). All user applications and processes (this is where SQL Server belongs) are executed in the user mode

The Processor: % User Time value “Corresponds to the percentage of time that the processor spends on executing user processes such as SQL Server.”

In other words, this is the percentage of processor non-idle time spent on user processes

If the Processor:% User Time high values are caused by SQL Server, the reason can be non-optimal SQL Server configuration, excessive querying, schema issues, etc. Further troubleshooting requires finding the process or batch that uses so much processor time

The Processor:% User Time values close to 100% are a clear indication of processor bottlenecks. It means that user processes are using the processor all the time. To troubleshoot the issue, the first step is to determine the user application that causes the bottleneck

In an environment with multiple SQL Server instances running, it’s important to monitor each instance performance individually, to be able to easier troubleshoot excessive processor usage

There is no specific value for this counter, you should monitor its value for a while and set a baseline. However, as the recommended Processor: % Processor Time value is 80% and the recommended Processor: % Privilege Time is max 10%, this leads to the conclusion that the recommended Processor: % User Time is below 70%

Processor: % Privilege Time

The Processor: % Privilege Time counter shows how much time the processor spends on executing non-user processes, i.e. privilege (kernel) mode operations

“If the processor is very busy and this mode is high, it is usually an indication of some type of NT service having difficulty, although user mode programs can make calls to the Kernel mode NT components to occasionally cause this type of performance issue.”

This is another important counter that helps determining whether the processor problems originate from internal Windows processes, or are caused by a user application

The sum of Processor: % Privilege Time and Processor: % User Time gives Processor: % Processor Time There’s no need to monitor Processor: % Privilege Time, Processor: % User Time, and Processor: % Processor Time, as the third value can be calculated using other two. Monitoring any two of these will be sufficient

“the proportion of % Privileged Time to % User Time will depend on the system’s workload. A couple of common misconceptions are that there will be a balance between these two areas, and that each system will look the same. That having been said, a system performing the same workload over time should show little change in the ratio of Privileged to User time” [2]

If the Processor: % Privilege Time value is high, kernel mode processes are using a lot of processor time, the machine is busy executing basic operating system tasks and cannot run user processes and other applications, such as SQL Server. Using a more powerful processor can help

The recommended values for Processor: %Privilege Time are 5 to 10% , or maximum 30% of the % Total Processor Time (described next)

Values and threshold for Processor: % Privilege Time, % User Time and % Processor Time shown in a graph Total Processor Time

“This counter groups the activity of all the processors together to report the total performance of the entire system. On a single processor machine, this value will equal the %Processor Time value of the processor object.”

The Total Processor Time counter shows the percentage of time during which the machine processors are busy, i.e. the percentage of time the processors we executing useful operations. The time needed for idle thread processing is excluded

The difference between this counter and Processor: % Processor Time is that Total Processor Time shows a sum of processor time percentages divided by the number of processors, i.e. an average value for all processors, while the latter shows one entry for each processor on the machine

The recommended value for Total Processor Time is up to 80% during normal operation. The remaining 20% are left for complex operations that cause peaks in processor usage, to avoid bottlenecks. If the value is constantly close to 100%, this is an indication of a processor bottleneck. To start troubleshooting, check the Processor: % Processor Time value to determine what processor handles most load

When one of the processors is using much more processor time then the others, the Total Processor Time value might be misleading, so it’s better to monitor both total and per processor values

Values and threshold for Total Processor time shown in a graph % Total User Time

“This is the total user time of all the processors on the system.”

The Processor: % User Time counter shows the user time on a single processor

% Total Privilege Time “This is the total privilege time for all processors on the system collectively”

Similar as above, the Processor : %Privilege Time shows the privilege time on a single processor

Thread: % Processor Time : It shows an entity for each thread on the processor. On a single processor machine that supports two threads, two entities are shown. To distinguish the specific thread among others, use the ThreadID value. The thread processor usage can be modified by adjusting the thread priority

Other metrics that can be monitored per thread are: Thread State, Priority Base, Current Priority, Context Switches/sec, % Privileged Time, and % User Time.

Friday, February 18, 2022

Throughput

Latency(Network Delay):

A request is travel time from client to server and server to the client is called Latency. It’s measuring units are millisecond, second, minute or hour. Let’s say:

A request starts at t=0
Reaches to a server in 1 second (at t=1)
The server takes 2 seconds to process (at t=3)
Reaches to the client end in 1.2 seconds (at t=4)
So, the network latency will be 2.2 seconds (= 1 + 1.2).

Bandwidth:
Bandwidth shows the capacity of the pipe (communication channel). It indicates the max water passes through the pipe. In performance testing term the max amount of data that can be transferred per unit of time through a communication channel is called channel’s bandwidth. Let’s say an ISDN having 64Kbps of bandwidth and we can increase it by adding one more 64Kbps channel, so total bandwidth will be 128Kbps, so max 128Kbps data can be transferred through ISDN channel.

Throughput:
The amount of data moved successfully from one place to another in a given time period is called Data Throughput‘. The higher the throughput, the more information a server can process and the better it performs and the more users it can serve. If a website can process 100 hits per second during its first test and 150 hits per second after an update, that means 50 more people can view the website at once without waiting.

It is typically measured in bits per second (bps), as in megabits per second (Mbps) or gigabits per second (Gbps). Let’s say, 20bits data transferred at t=4th second, so throughput at t=4 is 20bps.

Note : Data Throughput can never be more than Network Bandwidth.

Response Time:
Response time is the amount of time from the moment that a user sends a request until the time that the application indicates that the request has completed and reaches back to the user. In the Latency example, Response time will be 4 seconds.

Some important points for Throughput:

Solving bandwidth is easier than solving latency.
If throughput is nearly equal to bandwidth, it means the full capacity of the network is being utilized which may lead network bandwidth issue.
Increase in response time with flat throughput graph shows a network bandwidth issue. This bottleneck can be rectified by adding extra channels i.e. by increasing network bandwidth.
Ideally, consistent throughput indicates an expected capacity of network bandwidth.
Some tools do not express the throughput in units per unit of time but in clock periods. This is incorrect but commonly used because of convenience.
Ideally, response time is directly proportional to throughput during the user ramp-up period. If throughput decreases with an increase in response time then it indicates instability of application/system.
Ideally, response time and throughput should be constant during steady state. A less deviation in both the terms indicates the stability of the application.
The Number of threads is directly proportional to the throughput.
If you have low latency and small bandwidth then it will take a longer time for data to travel from point A to point B compared to a connection that has low latency and high bandwidth.
Latency is affected by connection type, distance and network congestion.

Q:. Can You Tell A Scenario Where Throughput Is Increasing With Response Time Means When They Are Directly Proportional?

Answer: Yes it can be possible when you have lots of CSS (Cascading Style Sheet) in your application which takes a lot of time to display. We can expect this type of situation where throughput will be increasing as well as the response time.

Sunday, February 13, 2022

HEAP Dump

to view HEAP Dump and Thread Dump : Heap dump analysis using Eclipse Memory Analyzer Tool (MAT) (cleantutorials.com)
Troubleshooting CPU spike in a Major Trading Application – Fast thread
Powerful troubleshooting: Marrying Thread dumps and top -H – Fast thread
Brilliant Graphs, metrics and thread dump analysis patterns (fastthread.io)
LESS KNOWN FACTS ABOUT DAEMON AND NON-DAEMON THREADS – Fast thread
Threads with same state (fastthread.io)
Threads in same state (fastthread.io)
GC LOG ANALYSIS COMPLIMENTS APM – tier1app
How to Read a Thread Dump - DZone Java
Shallow Heap and Retained Heap
Gc Log Report :Brilliant GC graphs, metrics and KPIs (gceasy.io)

Heap Memory Leak and OutofMemory
Note : Heap and GC are decide the Performance of JVM.

Available Memory : Total Memory - Used Memory
or Available Memory = (Free Memory + buffer or Cache Memeory)
Heap is the area where the Objects and References are Stored.

References :

Soft
Weak
Strong
Panther

All the Object related data is stored in Heap Area.
Every JVM having only one Heap Area.
Object Information, Object runtime data and all the Instance variables information. These can be accessed by multiple Memory.
Heap dump analysis using Eclipse Memory Analyzer Tool (MAT)

Heap Dump to find the Memory Leaks and it is state of Objects.
By Using Heap dump we can able to find

Memory Leaks
Large Object Allocation
Inefficient Memory usage

A heap dump is a snapshot of all the Java objects that exist in the heap space. The heap dump file is usually stored with .hprof extension.
(hprof :- HEAP Profiling Tool or CPU Profiling Tool)
Note :- When memory usage exceeds beyond XMX, then we will get JVM outofmemory Exception
Why and When should I take the Heap dump?
If your Java application is taking up more memory than you expected or

your Java application crashed with OutOfMemoryError.

Whenever the Available Memory goes Down

Some sudden Failures in the Application. Eg: High CPU Utilization, Suddenly Response Time increase, etc...

Analyzing the heap dump will lead us to the root cause of the anomaly.
Using the heap dump we can find details like the memory usage per class, number of objects per class, etc. We can also go into fine details and find out the amount of memory retained by a single Java object in the application. These details can help us pinpoint the actual code that is causing the memory leak issues.
How do you analyze very large Heap dumps?
Usually analyzing heap dump takes even more memory than the actual heap dump size and this may be problematic if you are trying to analyze heap dump from a large server on your development machine. For instance, a server may have crashed with a heap dump of size 24 GB and your local machine may only have 16 GB of memory. Therefore, tools like MAT, Jhat won’t be able to load the heap dump file. In this case, you should either analyze the heap dump on the same server machine which doesn’t have memory constraint or use live memory sampling tools provided by VisualVM.
How to take a Heap dump of your running Java Application
There are several ways to take a heap dump. We will talk about the 3 easiest ways to do it.

Using jmap command
Using jcmd command on terminal
Using JMX Console
Using the JVisualVM tool
Identifying HeapDumpOnOutOfMemory
Using HotSpotDiagnosticMBean by writing a program
Note :- When memory usage exceeds beyond XMX, then we will get JVM outofmemory Exception
1. jmap Command to generate the Heap Dump
jmap -dump:live,file=<file-name + .hprof> <pid> or
jmap -dump:[live],format=b,file=<file-path> <pid>
live:- This parameter is optional. If set, it prints all those objects that have active references.

format = b , which means the heap dump file is in binary format. It is not necessary to set this parameter.

file =<file-path> indicates where the heap dump file will be generated.

<pid> :- process id of the java process
2. Using jcmd command on terminal
This command sends a request to the JVM to generate a heap dump. One of its parameters is GC.heap_dump. It is as shown below:
jcmd <pid> GC.heap_dump <file-path>
<pid> - Process id of java process
<file-path> - Path where the heap dump is to be generated
3. VisualVM to generate the Heap Dump
Visual VM makes it very easy to take a heap dump running on your local machine. The following steps can be used to generate heap dump using VisualVM

a. Start Visual VM and connect your local Java Application to it.
b. Under the Monitor Tab, click on Heap Dump

c. After clicking on the heap dump you will be redirected to a new tab from which you can find out the location of your heap dump.
4. JConsole to generate the Heap dump
1. Connect your application to JConsole.
2. Switch to MBeans tab and select com.sun.management > HotSpotDiagnostic > Operations > dumpHeap.
3. Before clicking on the dumpHeap function, set the parameters p0, p1 described below.steps to take heap dump using java jconsole

a. The parameter p0 is the location and the name of the heap dump file. Ensure that you add the “.hprof” extension at the end of the file name.
b. The parameter p1, if set to true, performs a GC before dumping the heap so that only live objects are present in the heap dump.
5. Identifying HeapDumpOnOutOfMemory
It is ideal to capture heap dumps when an application experiences java.lang.OutOfMemoryError. Heap dumps help identify live objects sitting in the memory and the percentage of memory it occupies.
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file-path>
While running your java application when this system property is set, JVM takes a snapshot of the heap dump when it encounters OutOfMemoryError.
https://www.geeksforgeeks.org/understanding-outofmemoryerror-exception-java/
Which tools can be used to analyze the heap dump or open the .hprof file?
Once you have the heap dump the next step is to analyze it using a tool. There are multiple paid and equally good open source tools available to analyze the Heap dump. Memory Analyzer (MAT) is one of the best open-source tool that can be used as a plugin with Eclipse or as a standalone application if you don’t have Eclipse IDE installed. Apart from MAT, you can use Jhat, VisualVM. However, in this post, we will discuss the features provided with MAT.
Downloading the Memory Analyzer (MAT)
There are two ways to use the Memory Analyzer tool.
Integrating MAT plugin with Eclipse
1. Open Eclipse IDE and select Help > Eclipse Marketplace.
2. Search for Memory Analyzer and install it. memory analyzer tool (MAT) installation steps using Eclipse IDE

3. Restart Eclipse and the plugin is ready to be used.
Downloading the standalone version of Eclipse MAT
1. Download and install the Java Development Kit.
2. Download the install standalone MAT application from this link.
3. After extracting the package open MemoryAnalyzer application to start using the standalone version of MAT.
Eclipse memory analyzer installation steps

Loading Heap dump file in Eclipse MAT
We will be analyzing the heap dump generated by this Java application. The memory leak in the application
is discussed in depth in this tutorial. And the screenshots posted below are from the MAT plugin used with
Eclipse IDE.

The steps to load the heap dump are as follows.

1. Open Eclipse IDE or the standalone MAT Tool.
2. From the toolbar, Select Files > Open File from the dropdown menu.
3. Open the heap dump file with the extension .hprof and you should see the overview page as shown below.
loading a heap dump using memory analyzer

We will go through some of the important tools like Histogram, Dominator Tree and Leak Suspect report
which can be used to identify memory leaks.
Histogram
Histogram lists all the different classes loaded in your Java Application at the time of heap dump. It also lists
the number of objects per class along with the shallow and retained heap size. Using the histogram,
it is hard to identify which object is taking the most memory. However, we can easily identify which class
type holds the largest amount of memory. For instance, in the screenshot below byte array holds the
largest amount of memory. But, we cannot identify which object actually holds that byte array.
Shallow Heap v/s Retained Heap
Shallow Heap is the size of the object itself. For instance, in the screenshot below byte array itself holds
the largest amount of memory. Retained Heap is the size of the object itself as well as the size of all the
objects retained in it. For instance, in the screenshot below the DogShelter object itself holds a size of
16 bytes. However, it has a retained heap size of more than 305Mb which means it likely holds the byte
array which contributes to the very large retained heap size.
eclipse memory analyzer histogram tab

Finally, from the Histogram, we infer that the problem suspect is byte[] which is retained by the object of
class DogShelter or Dog.
Dominator Tree
The dominator tree of the Java objects allows you to easily identify object holding the largest chunk of
memory. For instance, we can see from the snipped below that the Main Thread object holds the largest
memory. On collapsing the main thread tree we can see that the instance of class DogShelter holds a
hashmap holding over 300Mb of memory.

Dominotart tree is useful when you have a single object that is eating up a large amount of memory.
The Dominator tree wouldn’t make much sense if multiple small objects are leading to a memory leak.
In that case, it would be better to use the Histogram to find out the instances of classes that consume the
most amount of memory.

eclipse memory analyzer dominator tree tab

From the Dominator Tree, we infer that the problem suspect is the DogShelter class.
Duplicate Classes
The duplicate class tab will list down the classes that are loaded multiple times. If you are using ClassLoaders in your code, you can use the Duplicate Classes to ensure that the code is functioning properly and classes
are not loaded multiple times.
Leak Suspect
Finally, the Leak suspect report runs a leak suspect query that analyzes the Heap dump and tries to find the
memory leak. For non-trivial memory leaks, the Leak suspect query may not be able to identify the memory
leak and it’s up to the developer with the knowledge of the program to pinpoint the leak using the tools
discussed above.
Since we had a very trivial memory leak, the inference that we derived manually using Histogram and
Dominator Tree is the same as the inference from the leak suspect report as seen below.

Saturday, February 12, 2022

LR Architecture

Simultaneous users and Concurrent users

Q. What are the simultaneous user? What is the use of simultaneous users?

Ans: When all the users wait at the point and then hit the server at the same point of time without any delay. Such type of users is called simultaneous users.

Simultaneous users apply the full load at the particular functionality to find out the performance of it. A load of simultaneous user confirms that the particular page or functionality of the application can handle the desired load or not at a time.

Example: Tatkal reservation at IRCTC site

Q. What are the concurrent users? What is the use of concurrent users?

Ans: Concurrent users are those users which all are active in the system at a point in time but can do different tasks. You can simply say that they are parallel active users performing different activities on an application.

Concurrent users simulate the real-world scenario in the testing environment. There are very few moments when active users simultaneously hit the same functionality of the application else they remain concurrent.

Example: Behavior of the bank customers on the baking website

There is one practice scenario for simultaneous users and Concurrent Users which states:

The application must successfully handle 10 concurrent travel agents.

The application must be able to process 10 simultaneous flight bookings with response time not exceeding 90 seconds.

Point 1 implies that application must be able to handle 10 travel agent load (over a period of time) irrespective of their activities. Hence they used the term ‘concurrent’.

Point 2 implies that the application must be able to handle 10 simultaneous bookings (at a point in time). That means all the travel agents must click the ‘Book the Flight’ button at the same time. You can simulate this scenario via rendezvous point.

Some important points:

Both the words mean “occurring at the same time”, but “concurrent” represents the events that occur over a period of time whereas “simultaneous” represents the events that occur at a point in time.
The simultaneous user is a subset of the concurrent user.
Simultaneous users can be concurrent users, but concurrent users cannot be simultaneous users.
Generally, the number of concurrent users on an application is more than the number of simultaneous users.
All the simultaneous users must be active and perform the same activities at a point in time.
Concurrent users may be active or inactive and may perform different activities.

Thursday, February 10, 2022

Check Points (Validations)

Sample Script :-

web_reg_find("Text=bing","SaveCount=welcome",LAST);

lr_start_transaction("Google_com");

web_url("Google.com",

"URL=http://www.Google.com",

"TargetFrame=",

"Resource=0",

"Referer=",

"RecContentType=text/html",

"Snapshot=t1.inf",

"Mode=HTML",

LAST);

HttpRetCode = web_get_int_property(HTTP_INFO_RETURN_CODE);

if((atoi(lr_eval_string("{welcome}"))>0)&&(HttpRetCode==200)) {

lr_end_transaction("Google_com", LR_PASS);

} else {

lr_error_message ("Google_com failed with status code,%d", HttpRetCode);

lr_end_transaction("Google_com", LR_FAIL);

}

Q387. The users are login to the application . If user is successfully logged in then only the page is redirect to next page otherwise again user has to perform same login transaction?

Answer:

I will insert the text verification function web_reg_find to know whether the user logged in to the application or not .

After that using If else and strcmp functions and using user defined function I will complete this scenario.

Syntax:

web_reg_find(“text=welcome”,”savecount=textcount”,LAST);

if (strcmp(lr_eval_string(“{textcount}”),==0)){

lr_output_message(“the login is failed”);

} else {

lr_output_message(“the login is successful”);

}

Q. 33 What are the functions of content check?

Ans: LoadRunner content check functions are:

web_reg_find()

web_global_verification()

web_image_check()

Q. 34 Which is a deprecated function of content-check?

Ans: web_find()

Q383. How to know that whether a file is uploaded or not?

Answer: once the file is uploaded immediately the text will appear like the file successfully uploaded . So using the text verification function web_reg_find() we can verify either the file was uploaded or not.

Q343.How to know that how many times a text is present in the response?

Answer:

To very whether a particular text is present in the response or not for that we will use web_reg_find () function . In this function if we use save-count argument that will tell you that how many times that text is present in the response.

Syntax:

web_reg_find (“text”,”savecount=count”,LAST);

Save count let you know how many times a particular text is present in the response.

Q :- Assume that you are working for a school application , once the exam result was released. How to know how many students are passed and how many are failed in a particular subject ?

Answer: I will insert the text verification function web_reg_find();

web_reg_find(“text=pass”,”savecount=passed”,LAST);

web_reg_find(“text=failed”,”savecount=failed”,LAST);

once the script is replied then

Passed_count and failed_count let you know how many students are failed and how many are passed in the results.

Q263. You have an application which shows the exam results of the student. Corresponding to name of each student its mentioned whether he passed or failed the exam with the label of “Pass” and “Fail”. How will you identify the number of passed and failed student in VuGen script?

Answer: For this text check is used for the web page for the text “Pass and “Fail”. Through the function web_reg_find, we can capture the number of texts found on the web page with the help of “SaveCount”. SaveCount stored the number of matches found. For example-

web_reg_find("Text=Pass","SaveCount=Pass_Student", LAST);

web_reg_find("Text=Fail", "SaveCount=Fail_Student",LAST);

Wednesday, February 9, 2022

90 Percentile

what is this 90th percentile exactly?

Let’s try to understand with an example. If you had 10 sheep and each sheep eat some KGs of grass on a daily basis. One day you weighted the grass and noted the figures of each sheep’s intake. Refer to the below table:

Sheep# S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

Grass(kg) 3 3.2 4 4.8 3.6 2.9 3.4 3 3.8 3.9

Now, you need to find out what amount of grass has been consumed by 90% of sheep? So simply you need to sort the number with respect to consumed grass and ignore the last value.

1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th

Sheep# S6 S1 S8 S2 S7 S5 S9 S10 S3 S4

Grass(kg) 2.9 3 3 3.2 3.4 3.6 3.8 3.9 4 4.8

90th percentile value in 10 entries is a 9th value which is 4, so just ignore S4 with 4.8 (keep it hungry for some days, it eats so much).

The conclusion is 90% of total sheep either eat 4 KGs grass or below, so you got an upper limit of grass consumption. In terms of performance testing, you need to sort response time of a particular transaction or request in increasing order and then ignore 10% of the total count having high values. The last highest number in the remaining values will be 90th percentile.

Example:

A performance test script is executed for 25 iterations. The response time of the login transaction of each iteration is:

S. No. Iteration No. Login (Response Time (in sec))

1 1 1.5

2 2 1.6

3 3 1.1

4 4 0.9

5 5 2.1

6 6 1.9

7 7 1.4

8 8 1

9 9 0.8

10 10 1.5

11 11 1.8

12 12 1.1

13 13 1.6

14 14 1.7

15 15 1.3

16 16 0.9

17 17 1

18 18 1.5

19 19 2.3

20 20 1.9

21 21 1.8

22 22 1.2

23 23 1.4

24 24 0.9

25 25 1.5

Now, sort the list in increasing order with respect to response time.

S. No. Iteration No. Login (Response Time)

1 9 0.8

2 4 0.9

3 16 0.9

4 24 0.9

5 8 1

6 17 1

7 3 1.1

8 12 1.1

9 22 1.2

10 15 1.3

11 7 1.4

12 23 1.4

13 1 1.5

14 10 1.5

15 18 1.5

16 25 1.5

17 2 1.6

18 13 1.6

19 14 1.7

20 11 1.8

21 21 1.8

22 6 1.9

23 20 1.9

24 5 2.1

25 19 2.3

Now, 22.5 is the 90% of the number of transactions i.e. 25.

=> 25 x (90/100) = 22.5

Round-off to 23. So the 23rd value will be 90th percentile which is 1.9 seconds. It means 90% of total iterations having response time 1.9 seconds or less than it. Similarly, you can calculate other percentile values like 70th, 80th or 95th percentile.

How 90th percentile calculated in MS Excel?

MS Excel uses below formula to calculate 90th percentile:

90th Percentile = 0.9 * (Number of Values – 1) + 1

Why we need 90th percentile in Performance Testing?

Percentile is often considered as a performance goal. If the given SLA has 90th percentile NFR and it meets during the test then it shows that 90% of the users have an experience that matches your performance goals. It gives additional confidence to the client over his application.

Sometimes average response time appears extremely high and individual datasets seem normal. Even a couple of peaks in response times, skew the average response time numbers and impact the test. In such scenarios, 90th percentile (or other percentile values) eliminate the unusual spike data from the result.

In reality, most of the applications have very few high spikes in the graph; a statistician would say that the curve has a long tail. A long-tail does not imply many slow transactions, but few that are magnitudes slower than the norm. In that case, 90th Percentile helpful because it ignores 10% of the request having the spike (this can be ignored).

If the 50th percentile (median) of response time is 5 seconds that means that 50% of the transactions are either as fast or faster than 5 seconds. If the 90th percentile of the same transaction is at 8 seconds it means that 90% are as fast or faster and only 10% are slower. The average, in this case, could either be lower than 5 seconds or somewhere in between. A percentile gives a much better sense of real-world performance because it shows a slice of response time curve.

If we calculate the difference of the 90th percentile value and the average response time value and divide this difference with the average response time value then it gives an idea of the spread of different data points. If the ratio is extremely small, it means that average and 90th percentile values are very close to each other and will indicate good and constant performance of the application. However, if the ratio is large, it shows high deviation in response time and non-uniform performance of the application. This is one of the methods where 90th percentile is useful, although I would recommend to draw your conclusion using standard deviation only.

Percentiles are a really great and easy way of understanding the real performance characteristics of your application. They also provide a great basis for automatic base-lining, application behavioural learning and optimizing your application with a proper focus. However, averages are ineffective because they are too simplistic and one-dimensional. In short, percentile (90th, 95th, 99th) is great in performance testing world!

Sunday, February 6, 2022

Parameters

Parameter Types:

File

Date & Time

Iteration Number

Random Number

Unique Number

VUser ID

Group Name

LG Name

Table

XML

User Defect

Assignment Methods:

Sequential
Random
Unique

Update Methods:

Each Iteration
Each Occurrence
Once

Unique Each iteration: Continue with last value

Cycling Manner

Block.

Block:

eg : data =10, users = 2 Iteration =2

here

1st user 1 to 5 data and 2 iterations

2nd user 6 to 10 data and 2 iterations

here we can block option to set the data.

Saturday, February 5, 2022

AWR

http://oracleinaction.com/tuning/

Material for..: AWR_Understand each field (meterialfor.blogspot.com)

Before getting AWR report :-

: Collect Multiple AWR Reports

: Stick to Particular Time

: Split Large AWR Report into Smaller Reports.

Elapsed Time = Difference between TWO snapshots.

If DB CPU is high in AWR report, need to check OS statistics and we will check Busy time, Idle Time, num-CPU.

eg : DB CPU = 3150

Num CPU = 4

Snapshot time = 1 hr

so 4*60*60 =14400 sec here 3150 sec CPU time has been used out of 1440 sec.

so % used (3150/14400)*100 = 21% busy. so here 21% CPU is busy.

need to check SQL ID that is taking high CPU resources you can tune it.

eg : CPU time = 40 sec

Wait time =20 sec then DB time = ?

so DB time is here 60 sec (40 + 20)

DB time = DB CPU + Non-idle-Wait-Time

Non-Idle-wait-time = DB time - DB CPU

Note : If number of waits for any class of Waits is > 1% of the total number of Logical reads Then add more Rollback Segments.

DB time / Elapsed time = it gives total number of DB Active sessions during that period.

DB Time < Elapsed Time = there is no Bottleneck in DB.

Throughput = (Physical Read + Physical Write)* Block Size

eg : Physical read =3027.2

Physical Write = 98.4

Block Size =8 kb

so. Throughput = (physical Read + Physical Write) * Block Size

Throughput = (3027.2+ 98.4) * 8 = 25004.8 kb/sec

Note : Below are the causes to delayed Response Time.

1. High Elapsed Time

2. High DB CPU Time

3. High DB Time

Hard Parsing :

1. Loading into Shared pool ( The SQL source code is loaded into RAM for Parsing).

2. Syntax Verification

3. Verify the Table Authorization

4. Transformation of SQL queries from Complex to Simple

5. Preparing Execution Plan

6. Executing the SQL Qry

7. Fetch the Data from Table.

Soft Parsing :

1. Syntax Verification

2. Verify the Table Authorization

3. Transformation of SQL queries from Complex to Simple

4. Preparing Execution Plan

5. Executing the SQL Qry

6. Fetch the Data from Table.

Some high-level important tips regarding AWR:

1. Collect Multiple AWR Reports: It’s beneficial to have two AWR Reports, one for the good time and other when performance is poor or you can create three reports (Before/Meantime/After reports) during the time frame problem was experienced and compare it with the time frame before and after.

2. Stick to Particular Time: You must have a specific time when Database was slow so that you can choose a shorter timeframe to get a more precise report.

3. Split Large AWR Report into Smaller Reports: Instead of having one report for long time like one report for 3 hrs. it is better to have three reports each for one hour. This will help to isolate the problem

4. FOR RAC, take each instance’s individual report: For RAC environment, you need to do it separately of all the instances in the RAC to see if all the instances are balanced the way they should be.

5. Use ASH also : Use AWR to identify the troublesome areas and then use ASH to confirm those areas.

6. Increase the retention period : Some instances where you get more performance issues you should increase the retention time so that you can have historical data to compare.

Reading AWR Reports (Basic Approach)

If you are new to the AWR reports, the first thing you should probably do is run the ADDM report for the specific time period. The ADDM report provides root cause analysis of the parts of the system consuming the most time. The ADDM report to help narrow down your area of focus in the AWR report.

When looking at an AWR report, a good place to start is the "Top 5 Timed Foreground Events" section.

DB CPU

DB file Sequential read

DB file Scattered read.

Direct path read

Direct Path read temp

Direct Path write temp

This gives you an indication of the bottlenecks in the system during this sample period.

Once you've identified the top events, drill down to see what SQL and PL/SQL are consuming the majority of those resources. On the "Main Report" section, click the "SQL Statistics" link.

On the "SQL Statistics" section, click the "SQL ordered by ??" link that most closely relates to the wait event you identified in the "Top 5 Timed Foreground Events" section. In this case, the "DB CPU" was the top event, so it would seem sensible to try the "SQL ordered by CPU Time" link first.

AWR(Automatic Workload Repository) & It's Features

Wait events used to identify performance problems.

Time model statistics indicating the amount of DB time associated with a process from the V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views.

Active Session History (ASH) statistics from the V$ACTIVE_SESSION_HISTORY view.

Some system and session statistics from the V$SYSSTAT and V$SESSTAT views.

Object usage statistics.

Resource intensive SQL statements.

How to read AWR report?

Report Summary: This gives an overall summary of the instance during the snapshot period, and it contains important aggregate summary information.

Cache Sizes (end): This shows the size of each SGA region after AMM has changed them. This information can be compared to the original init.oraparameters at the end of the AWR report.

Load Profile: This important section shows important rates expressed in units of per second and transactions per second.

Instance Efficiency Percentages: With a target of 100%, these are high-level ratios for activity in the SGA.

Shared Pool Statistics: This is a good summary of changes to the shared pool during the snapshot period.

Top 5 Timed Events: This is the most important section in the AWR report. It shows the top wait events and can quickly show the overall database bottleneck.

Wait Events Statistics Section: This section shows a breakdown of the main wait events in the database including foreground and background database wait events as well as time model, operating system, service, and wait classes statistics.

Wait Events: This AWR report section provides more detailed wait event information for foreground user processes which includes Top 5 wait events and many other wait events that occurred during the snapshot interval.

Background Wait Events: This section is relevant to the background process wait events.

Time Model Statistics: Time mode statistics report how database-processing time is spent. This section contains detailed timing information on particular components participating in database processing.

Operating System Statistics: The stress on the Oracle server is important, and this section shows the main external resources including I/O, CPU, memory, and network usage.

Service Statistics: The service statistics section gives information about how particular services configured in the database are operating.

SQL Section: This section displays top SQL, ordered by important SQL execution metrics.

SQL Ordered by Elapsed Time: Includes SQL statements that took significant execution time during processing.

SQL Ordered by CPU Time: Includes SQL statements that consumed significant CPU time during its processing.

SQL Ordered by Gets: These SQLs performed a high number of logical reads while retrieving data.

SQL Ordered by Reads: These SQLs performed a high number of physical disk reads while retrieving data.

SQL Ordered by Parse Calls: These SQLs experienced a high number of reparsing operations.

SQL Ordered by Sharable Memory: Includes SQL statements cursors which consumed a large amount of SGA shared pool memory.

SQL Ordered by Version Count: These SQLs have a large number of versions in shared pool for some reason.

Instance Activity Stats: This section contains statistical information describing how the DB operated during the snapshot period.

Instance Activity Stats (Absolute Values): This section contains statistics that have absolute values not derived from end and start snapshots.

Instance Activity Stats (Thread Activity): This report section reports a log switch activity statistic.

I/O Section: This section shows the all important I/O activity for the instance and shows I/O activity by tablespace, data file, and includes buffer pool statistics.

Tablespace IO Stats

File IO Stats

Buffer Pool Statistics

Advisory Section: This section show details of the advisories for the buffer, shared pool, PGA and Java pool.

Buffer Pool Advisory

PGA Aggr Summary: PGA Aggr Target Stats; PGA Aggr Target Histogram; and PGA Memory Advisory.

Shared Pool Advisory

Java Pool Advisory

PGA (Program Global Area) = It is a Memory Area(RAM)

SGA (System Global Area)= it is an area of Memory (RAM)

Buffer Wait Statistics: This important section shows buffer cache waits statistics.

Note : If Cache Hit ratio <15 % ---> here DB is fine

if Cache Hit ratio >15 % ---> try to increase Shared pool size

Enqueue Activity: This section shows how enqueue operates in the DB. Enqueues are special internal structures which provide concurrent access to various DB resources.

Undo Segment Summary: This section gives a summary about how undo segments are used by the DB.

Undo Segment Stats: This section shows detailed history information about undo segment activity.

Latch Activity: Latches are a lightweight serialization mechanism that is used to single-thread access to internal Oracle structures.

Latch Sleep Breakdown

Latch Miss Sources

Parent Latch Statistics

Child Latch Statistics

Segment Section: This report section provides details about hot segments using the following criteria:

Segments by Logical Reads: Includes top segments which experienced high number of logical reads.

Segments by Physical Reads: Includes top segments which experienced high number of disk physical reads.

Segments by Buffer Busy Waits: These segments have the largest number of buffer waits caused by their data blocks.

Segments by Row Lock Waits: Includes segments that had a large number of row locks on their data.

Segments by ITL Waits: Includes segments that had a large contention for Interested Transaction List (ITL). The contention for ITL can be reduced by increasing INITRANS storage parameter of the table.

Dictionary Cache Stats: This section exposes details about how the data dictionary cache is operating.

Library Cache Activity: Includes library cache statistics describing how shared library objects are managed by Oracle.

SGA Memory Summary: This section provides summary information about various SGA regions.

init.ora Parameters: This section shows the original init.ora parameters for the instance during the snapshot period.

Paging is the process of moving pages from the RAM to Hard Disk.

to know the shortage of memory you can look at the frequency of Paging.

page/sec: high-rate page/sec indicates excessive Paging.

Page faults: - means it is sum of Hard Page faults and Soft Page Faults

Hard Page faults: - When the Requested page is retrieved from Disk.

Sof page Faults: - When the Requested page is found in elsewhere in Physical memory.

Difference between Soft Parse and Hard Parse ? which one is best ?

Soft Parse is good.

Oracle SQL is parsed before execution, and checked for syntax (and parts of the semantic check) before the SQL is loaded into the library cache.

soft parse does not require a shared pool reload (and the associated RAM memory allocation).

In a soft parse, Oracle must still perform a syntax parse and semantic check because it is possible that a DDL change altered one of the target tables or views

Whenever we run a SQL qry, the soft parse is directly taken from Buffer cache or Library Cache and it will executed very fastly.

Hard Page faults = Page Input/sec >= Page reads/sec

Note : If a large Quantity of Hard Page Faults could signify that you need to increase the amount of memory or Reduce the Cache Size on the server.

If the Cache Hit Ratio < 15% here the DB is fine

If the Cache Hit Ratio > 15% here the DB is having issue for this need to increase the Shared Pool size.

A hard parse is when your SQL must be re-loaded into the shared pool.

A hard parse is worse than a soft parse because of the overhead involved in shared pool RAM allocation and memory management.

A Hard Parse is taken more and more time to execute SQl qry.

Note :- A large quantity of Hard Page Faults could signify you need increase the amount of memory (or) reduce the cache size on the server.

Note :- A Hard parse rate >100 sec :- It indicate that Bind variables are not being used effectively.

Bind Variables :- Bind Variables are Variables you create in SQL*Plus and then reference in PL/SQL, If you create variable in SQL*Plus you can use the variable as you would a declared variables in your PL/SQL subprogram and then access the variable from SQL Plus.

Note : If AWR reported High DB Time and High DB CPU time which is causes the Delayed Response Time.

Note : If DB CPU is high in AWR Report need to check OS Statistics and you will see Busy Time, Idle Time and Num of CPU's

Eg :- DB CPU =3150

Num of CPU's = 4

snap shot time = 1hr

so 4*60*60 =14400 sec's

out of 14400 sec's CPU Time only 3150 sec CPU Time has been used.

so , % USed = (3150/14400)*100 =21% busy.

so in this case CPU is not busy at all.

but in case CPU is busy say above 80% you can go "SQL ordered by CPU Time" report and Find the SQL ID that makes CPU DB busy.

when you find SQL ID is taking High CPU resources, you can tune the SQL Query performance.

Oracle Database Memory Structures

High total disk reads mean a SQL statement is reading a lot of data from disks rather than being able to access that data from the db block buffers

READS means physical reads. This means the data is not in the data buffer so Oracle must read the data from disk. Reading disk is very slow compared to searching data buffers (logical reads aka buffer gets).

Buffer Gets (also called Logical Reads) means the data is already in the data buffer cache and Oracle is trying to locate the rows that match the WHERE clause.

A high number of Buffer Gets means two things: (1) Oracle is working very hard to locate the matching rows because of an unrestrictive WHERE clause or bad/missing indexes; and (2) high CPU since Oracle counts the time it’s doing Buffer Gets as CPU time.

HOW BUSY IS YOUR DATABASE SERVER?

The Operating System Statistics section says there are 8 CPU’s.

The ELAPSED time for this AWR is 120.11 minutes. (means 120.11*8 CPU's = 960.88 Minutes)

Therefore, the total number of available CPU minutes is 960.88.

The Top 5 Timed Events section says that CPU TIME is 14,195 seconds which is 236.5 CPU minutes.(14,195/60=236.5 Minutes)

Therefore, your database server is 236.5 / 960.88 or 24.6% busy.

DB_TIME = DB_CPU + WAIT_TIME

Where WAIT_TIME = I/O_WAIT + OTHER_WAIT

Ideal distribution of time is

DB_TIME = DB_CPU + I/O_WAIT + OTHER_WAIT

100% = 70% + 20% + 10%

The goal is to first reduce the wait time and then reduce the CPU time

Note :- Write to Redo Log should be 4 -10ms max

Note : Soft parse + hard parse =100. if not equal to 100 then issue is in DB.

Note : DB time is more than 3 times of Elapsed time then issue is in DB

eg : Elapsed time =120 mins

DB time =360 mins

here issue in DB.

eg : Elapsed time =120 mins

DB time =300 mins

here DB is good.

Elapsed time is 180sec

DB time =320 sec

so 320/180 = 1.8 DB sec.

Note : Always NonParsed CPU should be 100%

Note :

Sizing the Redo Log Buffer :

The size of the redo log buffer is determined by the

LOG_BUFFER parameter
Remaining space in the fixed area granule

Default value can range from 5MB to 32MB
The log Buffer is written at

Commit
One-Third full
DBWR request

Sizing the Redo Log Files :

The size of redo log files can influence performance
Larger redo log files provide better performance
Generally, redo log files should range between 100MB and few gigabytes
Switch redo log files at most once every 20 minutes
Use the Redo Log file Size Advisor to correctly size your redo logs

Increasing the Performance of Archiving :

Share archiving work during a temporary increase in workload

ALTER SYSTEM ARCHIVE LOG ALL TO <log_archive_dest>

Increase the number of archiver processes with LOG_ARCHIVE_MAX_PROCESSES
Multiplex the redo log files, and add more members
Change the number of archive destinations :
LOG_ARCHIVE_DEST_n

Slow DB Qry :-