BKP_sreenivas: AWR

http://oracleinaction.com/tuning/

Material for..: AWR_Understand each field (meterialfor.blogspot.com)

Before getting AWR report :-

: Collect Multiple AWR Reports

: Stick to Particular Time

: Split Large AWR Report into Smaller Reports.

Elapsed Time = Difference between TWO snapshots.

If DB CPU is high in AWR report, need to check OS statistics and we will check Busy time, Idle Time, num-CPU.

eg : DB CPU = 3150

Num CPU = 4

Snapshot time = 1 hr

so 4*60*60 =14400 sec here 3150 sec CPU time has been used out of 1440 sec.

so % used (3150/14400)*100 = 21% busy. so here 21% CPU is busy.

need to check SQL ID that is taking high CPU resources you can tune it.

eg : CPU time = 40 sec

Wait time =20 sec then DB time = ?

so DB time is here 60 sec (40 + 20)

DB time = DB CPU + Non-idle-Wait-Time

Non-Idle-wait-time = DB time - DB CPU

Note : If number of waits for any class of Waits is > 1% of the total number of Logical reads Then add more Rollback Segments.

DB time / Elapsed time = it gives total number of DB Active sessions during that period.

DB Time < Elapsed Time = there is no Bottleneck in DB.

Throughput = (Physical Read + Physical Write)* Block Size

eg : Physical read =3027.2

Physical Write = 98.4

Block Size =8 kb

so. Throughput = (physical Read + Physical Write) * Block Size

Throughput = (3027.2+ 98.4) * 8 = 25004.8 kb/sec

Note : Below are the causes to delayed Response Time.

1. High Elapsed Time

2. High DB CPU Time

3. High DB Time

Hard Parsing :

1. Loading into Shared pool ( The SQL source code is loaded into RAM for Parsing).

2. Syntax Verification

3. Verify the Table Authorization

4. Transformation of SQL queries from Complex to Simple

5. Preparing Execution Plan

6. Executing the SQL Qry

7. Fetch the Data from Table.

Soft Parsing :

1. Syntax Verification

2. Verify the Table Authorization

3. Transformation of SQL queries from Complex to Simple

4. Preparing Execution Plan

5. Executing the SQL Qry

6. Fetch the Data from Table.

Some high-level important tips regarding AWR:

1. Collect Multiple AWR Reports: It’s beneficial to have two AWR Reports, one for the good time and other when performance is poor or you can create three reports (Before/Meantime/After reports) during the time frame problem was experienced and compare it with the time frame before and after.

2. Stick to Particular Time: You must have a specific time when Database was slow so that you can choose a shorter timeframe to get a more precise report.

3. Split Large AWR Report into Smaller Reports: Instead of having one report for long time like one report for 3 hrs. it is better to have three reports each for one hour. This will help to isolate the problem

4. FOR RAC, take each instance’s individual report: For RAC environment, you need to do it separately of all the instances in the RAC to see if all the instances are balanced the way they should be.

5. Use ASH also : Use AWR to identify the troublesome areas and then use ASH to confirm those areas.

6. Increase the retention period : Some instances where you get more performance issues you should increase the retention time so that you can have historical data to compare.

Reading AWR Reports (Basic Approach)

If you are new to the AWR reports, the first thing you should probably do is run the ADDM report for the specific time period. The ADDM report provides root cause analysis of the parts of the system consuming the most time. The ADDM report to help narrow down your area of focus in the AWR report.

When looking at an AWR report, a good place to start is the "Top 5 Timed Foreground Events" section.

DB CPU

DB file Sequential read

DB file Scattered read.

Direct path read

Direct Path read temp

Direct Path write temp

This gives you an indication of the bottlenecks in the system during this sample period.

Once you've identified the top events, drill down to see what SQL and PL/SQL are consuming the majority of those resources. On the "Main Report" section, click the "SQL Statistics" link.

On the "SQL Statistics" section, click the "SQL ordered by ??" link that most closely relates to the wait event you identified in the "Top 5 Timed Foreground Events" section. In this case, the "DB CPU" was the top event, so it would seem sensible to try the "SQL ordered by CPU Time" link first.

AWR(Automatic Workload Repository) & It's Features

Wait events used to identify performance problems.

Time model statistics indicating the amount of DB time associated with a process from the V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views.

Active Session History (ASH) statistics from the V$ACTIVE_SESSION_HISTORY view.

Some system and session statistics from the V$SYSSTAT and V$SESSTAT views.

Object usage statistics.

Resource intensive SQL statements.

How to read AWR report?

Report Summary: This gives an overall summary of the instance during the snapshot period, and it contains important aggregate summary information.

Cache Sizes (end): This shows the size of each SGA region after AMM has changed them. This information can be compared to the original init.oraparameters at the end of the AWR report.

Load Profile: This important section shows important rates expressed in units of per second and transactions per second.

Instance Efficiency Percentages: With a target of 100%, these are high-level ratios for activity in the SGA.

Shared Pool Statistics: This is a good summary of changes to the shared pool during the snapshot period.

Top 5 Timed Events: This is the most important section in the AWR report. It shows the top wait events and can quickly show the overall database bottleneck.

Wait Events Statistics Section: This section shows a breakdown of the main wait events in the database including foreground and background database wait events as well as time model, operating system, service, and wait classes statistics.

Wait Events: This AWR report section provides more detailed wait event information for foreground user processes which includes Top 5 wait events and many other wait events that occurred during the snapshot interval.

Background Wait Events: This section is relevant to the background process wait events.

Time Model Statistics: Time mode statistics report how database-processing time is spent. This section contains detailed timing information on particular components participating in database processing.

Operating System Statistics: The stress on the Oracle server is important, and this section shows the main external resources including I/O, CPU, memory, and network usage.

Service Statistics: The service statistics section gives information about how particular services configured in the database are operating.

SQL Section: This section displays top SQL, ordered by important SQL execution metrics.

SQL Ordered by Elapsed Time: Includes SQL statements that took significant execution time during processing.

SQL Ordered by CPU Time: Includes SQL statements that consumed significant CPU time during its processing.

SQL Ordered by Gets: These SQLs performed a high number of logical reads while retrieving data.

SQL Ordered by Reads: These SQLs performed a high number of physical disk reads while retrieving data.

SQL Ordered by Parse Calls: These SQLs experienced a high number of reparsing operations.

SQL Ordered by Sharable Memory: Includes SQL statements cursors which consumed a large amount of SGA shared pool memory.

SQL Ordered by Version Count: These SQLs have a large number of versions in shared pool for some reason.

Instance Activity Stats: This section contains statistical information describing how the DB operated during the snapshot period.

Instance Activity Stats (Absolute Values): This section contains statistics that have absolute values not derived from end and start snapshots.

Instance Activity Stats (Thread Activity): This report section reports a log switch activity statistic.

I/O Section: This section shows the all important I/O activity for the instance and shows I/O activity by tablespace, data file, and includes buffer pool statistics.

Tablespace IO Stats

File IO Stats

Buffer Pool Statistics

Advisory Section: This section show details of the advisories for the buffer, shared pool, PGA and Java pool.

Buffer Pool Advisory

PGA Aggr Summary: PGA Aggr Target Stats; PGA Aggr Target Histogram; and PGA Memory Advisory.

Shared Pool Advisory

Java Pool Advisory

PGA (Program Global Area) = It is a Memory Area(RAM)

SGA (System Global Area)= it is an area of Memory (RAM)

Buffer Wait Statistics: This important section shows buffer cache waits statistics.

Note : If Cache Hit ratio <15 % ---> here DB is fine

if Cache Hit ratio >15 % ---> try to increase Shared pool size

Enqueue Activity: This section shows how enqueue operates in the DB. Enqueues are special internal structures which provide concurrent access to various DB resources.

Undo Segment Summary: This section gives a summary about how undo segments are used by the DB.

Undo Segment Stats: This section shows detailed history information about undo segment activity.

Latch Activity: Latches are a lightweight serialization mechanism that is used to single-thread access to internal Oracle structures.

Latch Sleep Breakdown

Latch Miss Sources

Parent Latch Statistics

Child Latch Statistics

Segment Section: This report section provides details about hot segments using the following criteria:

Segments by Logical Reads: Includes top segments which experienced high number of logical reads.

Segments by Physical Reads: Includes top segments which experienced high number of disk physical reads.

Segments by Buffer Busy Waits: These segments have the largest number of buffer waits caused by their data blocks.

Segments by Row Lock Waits: Includes segments that had a large number of row locks on their data.

Segments by ITL Waits: Includes segments that had a large contention for Interested Transaction List (ITL). The contention for ITL can be reduced by increasing INITRANS storage parameter of the table.

Dictionary Cache Stats: This section exposes details about how the data dictionary cache is operating.

Library Cache Activity: Includes library cache statistics describing how shared library objects are managed by Oracle.

SGA Memory Summary: This section provides summary information about various SGA regions.

init.ora Parameters: This section shows the original init.ora parameters for the instance during the snapshot period.

Paging is the process of moving pages from the RAM to Hard Disk.

to know the shortage of memory you can look at the frequency of Paging.

page/sec: high-rate page/sec indicates excessive Paging.

Page faults: - means it is sum of Hard Page faults and Soft Page Faults

Hard Page faults: - When the Requested page is retrieved from Disk.

Sof page Faults: - When the Requested page is found in elsewhere in Physical memory.

Difference between Soft Parse and Hard Parse ? which one is best ?

Soft Parse is good.

Oracle SQL is parsed before execution, and checked for syntax (and parts of the semantic check) before the SQL is loaded into the library cache.

soft parse does not require a shared pool reload (and the associated RAM memory allocation).

In a soft parse, Oracle must still perform a syntax parse and semantic check because it is possible that a DDL change altered one of the target tables or views

Whenever we run a SQL qry, the soft parse is directly taken from Buffer cache or Library Cache and it will executed very fastly.

Hard Page faults = Page Input/sec >= Page reads/sec

Note : If a large Quantity of Hard Page Faults could signify that you need to increase the amount of memory or Reduce the Cache Size on the server.

If the Cache Hit Ratio < 15% here the DB is fine

If the Cache Hit Ratio > 15% here the DB is having issue for this need to increase the Shared Pool size.

A hard parse is when your SQL must be re-loaded into the shared pool.

A hard parse is worse than a soft parse because of the overhead involved in shared pool RAM allocation and memory management.

A Hard Parse is taken more and more time to execute SQl qry.

Note :- A large quantity of Hard Page Faults could signify you need increase the amount of memory (or) reduce the cache size on the server.

Note :- A Hard parse rate >100 sec :- It indicate that Bind variables are not being used effectively.

Bind Variables :- Bind Variables are Variables you create in SQL*Plus and then reference in PL/SQL, If you create variable in SQL*Plus you can use the variable as you would a declared variables in your PL/SQL subprogram and then access the variable from SQL Plus.

Note : If AWR reported High DB Time and High DB CPU time which is causes the Delayed Response Time.

Note : If DB CPU is high in AWR Report need to check OS Statistics and you will see Busy Time, Idle Time and Num of CPU's

Eg :- DB CPU =3150

Num of CPU's = 4

snap shot time = 1hr

so 4*60*60 =14400 sec's

out of 14400 sec's CPU Time only 3150 sec CPU Time has been used.

so , % USed = (3150/14400)*100 =21% busy.

so in this case CPU is not busy at all.

but in case CPU is busy say above 80% you can go "SQL ordered by CPU Time" report and Find the SQL ID that makes CPU DB busy.

when you find SQL ID is taking High CPU resources, you can tune the SQL Query performance.

Oracle Database Memory Structures

High total disk reads mean a SQL statement is reading a lot of data from disks rather than being able to access that data from the db block buffers

READS means physical reads. This means the data is not in the data buffer so Oracle must read the data from disk. Reading disk is very slow compared to searching data buffers (logical reads aka buffer gets).

Buffer Gets (also called Logical Reads) means the data is already in the data buffer cache and Oracle is trying to locate the rows that match the WHERE clause.

A high number of Buffer Gets means two things: (1) Oracle is working very hard to locate the matching rows because of an unrestrictive WHERE clause or bad/missing indexes; and (2) high CPU since Oracle counts the time it’s doing Buffer Gets as CPU time.

HOW BUSY IS YOUR DATABASE SERVER?

The Operating System Statistics section says there are 8 CPU’s.

The ELAPSED time for this AWR is 120.11 minutes. (means 120.11*8 CPU's = 960.88 Minutes)

Therefore, the total number of available CPU minutes is 960.88.

The Top 5 Timed Events section says that CPU TIME is 14,195 seconds which is 236.5 CPU minutes.(14,195/60=236.5 Minutes)

Therefore, your database server is 236.5 / 960.88 or 24.6% busy.

DB_TIME = DB_CPU + WAIT_TIME

Where WAIT_TIME = I/O_WAIT + OTHER_WAIT

Ideal distribution of time is

DB_TIME = DB_CPU + I/O_WAIT + OTHER_WAIT

100% = 70% + 20% + 10%

The goal is to first reduce the wait time and then reduce the CPU time

Note :- Write to Redo Log should be 4 -10ms max

Note : Soft parse + hard parse =100. if not equal to 100 then issue is in DB.

Note : DB time is more than 3 times of Elapsed time then issue is in DB

eg : Elapsed time =120 mins

DB time =360 mins

here issue in DB.

eg : Elapsed time =120 mins

DB time =300 mins

here DB is good.

Elapsed time is 180sec

DB time =320 sec

so 320/180 = 1.8 DB sec.

Note : Always NonParsed CPU should be 100%

Note :

Sizing the Redo Log Buffer :

The size of the redo log buffer is determined by the

LOG_BUFFER parameter
Remaining space in the fixed area granule

Default value can range from 5MB to 32MB
The log Buffer is written at

Commit
One-Third full
DBWR request

Sizing the Redo Log Files :

The size of redo log files can influence performance
Larger redo log files provide better performance
Generally, redo log files should range between 100MB and few gigabytes
Switch redo log files at most once every 20 minutes
Use the Redo Log file Size Advisor to correctly size your redo logs

Increasing the Performance of Archiving :

Share archiving work during a temporary increase in workload

ALTER SYSTEM ARCHIVE LOG ALL TO <log_archive_dest>

Increase the number of archiver processes with LOG_ARCHIVE_MAX_PROCESSES
Multiplex the redo log files, and add more members
Change the number of archive destinations :
LOG_ARCHIVE_DEST_n

Slow DB Qry :-

BKP_sreenivas

Saturday, February 5, 2022

AWR

No comments:

Post a Comment

Thread