天天看点

sql trace基础

一、 基础.SQL_TRACE

1. 在全局启用

在参数文件(pfile/spfile)中指定:

sql_trace =true

在全局启用SQL_TRACE会导致所有进程的活动被跟踪,包括后台进程及所有用户进程,这通常会导致比较严重的性能问题,所以在生产环境中要谨慎使用.

提示: 通过在全局启用sql_trace,我们可以跟踪到所有后台进程的活动,很多在文档中的抽象说明,通过跟踪文件的实时变化,我们可以清晰的看到各个进程之间的紧密协调.

2. 在当前session级设置

大多数时候我们使用sql_trace跟踪当前进程.通过跟踪当前进程可以发现当前操作的后台数据库递归活动(这在研究数据库新特性时尤其有效), 研究SQL执行,发现后台错误等.

在session级启用和停止sql_trace方式如下:

SQL> alter session set sql_trace=true;

SQL> SQL statements ......

SQL> alter session set sql_trace=false;

3. 跟踪其他用户进程

在很多时候我们需要跟踪其他用户的进程,而不是当前用户,这可以通过Oracle提供的系统包

DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION来完成

SET_SQL_TRACE_IN_SESSION过程序要提供三个参数:

SQL> desc dbms_system

......

PROCEDURE SET_SQL_TRACE_IN_SESSION

Argument Name                           Type                    In/Out Default?

------------------------------           ---------------------    ------ --------

SID                                      NUMBER                  IN

SERIAL#                                  NUMBER                  IN

SQL_TRACE                                BOOLEAN                 IN

......

通过v$session我们可以获得sid、serial#等信息:

获得进程信息,选择需要跟踪的进程:

SQL> select sid,serial#,username from v$session where username is not null;

SID    SERIAL# USERNAME

---------- ---------- ------------------------------

10       9068 BIGBEAR

SQL> exec dbms_system.set_sql_trace_in_session(10,9068,true)

SQL> exec dbms_system.set_sql_trace_in_session(10,9068,false)

二、 基础.10046事件

10046事件是Oracle提供的内部事件,是对SQL_TRACE的增强.

10046事件可以设置以下四个级别:

1 - 启用标准的SQL_TRACE功能,等价于sql_trace

4 - Level 1 加上绑定值(bind values)

8 - Level 1 + 等待事件跟踪

12 - Level 1 + Level 4 + Level 8

类似sql_trace,10046事件可以在全局设置,也可以在session级设置。

1. 在全局设置

在参数文件中增加:

event="10046 trace name context forever,level 12"

此设置对所有用户的所有进程生效、包括后台进程.

2. 对当前session设置

通过alter session的方式修改,需要alter session的系统权限:

SQL> alter session set events '10046 trace name context forever';

SQL> alter session set events '10046 trace name context forever, level 8';

......

SQL> alter session set events '10046 trace name context off';

3. 对其他用户session设置

通过DBMS_SYSTEM.SET_EV系统包来实现:

SQL> desc dbms_system

...

PROCEDURE SET_EV

Argument Name                  Type                    In/Out Default?

------------------------------ ----------------------- ------ --------

SI                             BINARY_INTEGER          IN

SE                             BINARY_INTEGER          IN

EV                             BINARY_INTEGER          IN

LE                             BINARY_INTEGER          IN

NM                             VARCHAR2                IN

...

其中的参数SI、SE来自v$session视图:

查询获得需要跟踪的session信息:

SQL> select sid,serial#,username from v$session where username is not null;

SID    SERIAL# USERNAME

---------- ---------- ------------------------------

10       9068 BIGBEAR

SQL> exec dbms_system.set_ev(10,9068,10046,8,'bigbear');

......

SQL> exec dbms_system.set_ev(10,9068,10046,0,'bigbear');

三、TKPROF格式化输出

TKPROF bigbear_ora_27292.trc output-file

  EXPLAIN=scott/tiger

  TABLE=scott.temp_plan_table_a

  INSERT=STOREA.SQL

  SYS=NO

  SORT=(EXECPU,FCHCPU)

  PRINT=5

list description
prscnt

number of times parse was called

解析被调用次数

prscpu

cpu time parsing

解析占用CPU时间(秒)

prsela

elapsed time parsing

解析占用总时间(秒)

prsdsk

number of disk reads during parse

解析期间磁盘读取的数据块数

prsqry

number of buffers for consistent read during parse

解析期间一致读所访问到的缓冲块的数量

prscu

number of buffers for current read during parse

解析期间当前读所访问到的缓冲块的数量

prsmis

number of misses in library cache during parse

解析期间从库缓存中丢失的块的数量

execnt number of execute was called
execpu cpu time spent executing
exeela elapsed time executing
exedsk number of disk reads during execute                   
exeqry number of buffers for consistent read during execute  
execu number of buffers for current read during execute     
exerow number of rows processed during execute               
exemis number of library cache misses during execute         
fchcnt number of times fetch was called                      
fchcpu cpu time spent fetching                               
fchela elapsed time fetching                                 
fchdsk number of disk reads during fetch                     
fchqry number of buffers for consistent read during fetch    
fchcu number of buffers for current read during fetch       
fchrow number of rows fetched                                
userid userid of user that parsed the cursor                 

四、TKPROF输出说明

list description
PARSE

Translates the SQL statement into an execution plan, including checks for proper security authorization and checks for the existence of tables, columns, and other referenced objects.

将SQL语句转换为执行计划,包括检查安全认证,还有表/列及其它相关对象的存在性

EXECUTE

Actual execution of the statement by Oracle. For INSERT, UPDATE, and DELETE statements, this modifies the data. For SELECT statements, this identifies the selected rows.

实际提交给ORACLE执行。对于INS/UPD/DEL这个指标反映修改后的数据,对于SELECT反映的是选中的数据。

FETCH

Retrieves rows returned by a query. Fetches are only performed for SELECT statements.

对于SELECT取回数据的行数。

column description
COUNT

Number of times a statement was parsed, executed, or fetched.

次数

CPU

Total CPU time in seconds for all parse, execute, or fetch calls for the statement. This value is zero (0) if TIMED_STATISTICS is not turned on.

占CUP的秒数

ELAPSED

Total elapsed time in seconds for all parse, execute, or fetch calls for the statement. This value is zero (0) if TIMED_STATISTICS is not turned on.

整个流逝的时间

DISK

Total number of data blocks physically read from the data files on disk for all parse, execute, or fetch calls.

读磁盘的块数

QUERY

Total number of buffers retrieved in consistent mode for all parse, execute, or fetch calls. Usually, buffers are retrieved in consistent mode for queries.

一致读所访问的缓冲块数

CURRENT

Total number of buffers retrieved in current mode. Buffers are retrieved in current mode for statements such as INSERT, UPDATE, and DELETE.

SQL所影响到的当前的缓冲块数

ROWS

Total number of rows processed by the SQL statement. This total does not include rows processed by sub-queries of the SQL statement.

SQL总处理的行数,但不包括子查询所访问的行。

Timing statistics have a resolution of one hundredth of a second; therefore, any operation on a cursor that takes a hundredth of a second or less might not be timed accurately. Keep this in mind when interpreting statistics. In particular, be careful when interpreting the results from simple queries that execute very quickly.

基于时间的统计是每百秒一次,因此任何基于指针的操作如果小于100秒将不会被精确统计。特别提醒:对于那些简单的执行速度很快的query的统计要十分地小心

Sometimes, in order to execute a SQL statement issued by a user, Oracle must issue additional statements. Such statements are called recursive calls or recursive SQL statements. For example, if you insert a row into a table that does not have enough space to hold that row, then Oracle makes recursive calls to allocate the space dynamically. Recursive calls are also generated when data dictionary information is not available in the data dictionary cache and must be retrieved from disk.

有时候为了执行用户的一条SQL语句,ORACLE需要调用其它的SQL语句。这样的SQL我们称为递归调用,或者称为递归SQL语句。例如:当为一张表插入一行数据时,发现没有足够的空间,这时ORACLE将产生递归调用为其分配空间。当数据字典不可用时,会从磁盘上读取,这一过程也产生递归调用。

If recursive calls occur while the SQL Trace facility is enabled, then TKPROF produces statistics for the recursive SQL statements and marks them clearly as recursive SQL statements in the output file. You can suppress the listing of Oracle internal recursive calls (for example, space management) in the output file by setting the SYS command-line parameter to NO. The statistics for a recursive SQL statement are included in the listing for that statement, not in the listing for the SQL statement that caused the recursive call. So, when you are calculating the total resources required to process a SQL statement, consider the statistics for that statement as well as those for recursive calls caused by that statement.

在递归调用发生在SQL TRACE中时,TKPROF会统计这些信息,并明确地在输出文件中表明。你可以通过设置参数禁止上述递归调用信息的产生,即:只有产生递归调用的SQL被记录,相关的递归调用过程不被记录。所以,当要计算整个资源消耗时,要考虑是否将上述信息算在内。

Deciding Which Statements to Tune

You need to find which SQL statements use the most CPU or disk resource. If the TIMED_STATISTICS parameter is on, then you can find high CPU activity in the CPU column. If TIMED_STATISTICS is not on, then check the QUERY and CURRENT columns.

判断哪些SQL需要调整

如果设置了TIMED_STATISTICS可以查看CPU占用,如果没有设置TIMED_STATISTICS,查看QUERY和CURRNET。

With the exception of locking problems and inefficient PL/SQL loops, neither the CPU time nor the elapsed time is necessary to find problem statements. The key is the number of block visits, both query (that is, subject to read consistency) and current (that is, not subject to read consistency). Segment headers and blocks that are going to be updated are acquired in current mode, but all query and sub-query processing requests the data in query mode. These are precisely the same measures as the instance statistics CONSISTENT GETS and DB BLOCK GETS. You can find high disk activity in the disk column.

当产生锁异常或者低效率的PL/SQL,不管是CPU time还是elapsed time都不能发现有问题的SQL。这时的关键是访问的数据块的数目,即:QUERY和CURRENT两项中对应的数值。

The following listing shows TKPROF output for one SQL statement as it appears in the output file:

SELECT *

FROM emp, dept

WHERE emp.deptno = dept.deptno;

call   count   cpu       elapsed  disk      query     current rows

----   ------- ------- ------- -------- -------- ------- ------

Parse   11     0.08      0.18     0          0        0         0

Execute 11     0.23      0.66     0          3        6         0

Fetch   35     6.70      6.83     100        12326    2         824

------------------------------------------------------------------

total  57      7.01      7.67     100        12329    8         826

Misses in library cache during parse: 0

If it is acceptable to have 7.01 CPU seconds and to retrieve 824 rows, then you need not look any further at this trace output. In fact, a major use of TKPROF reports in a tuning exercise is to eliminate processes from the detailed tuning phase.

You can also see that 10 unnecessary parse call were made (because there were 11 parse calls for this one statement) and that array fetch operations were performed. You know this because more rows were fetched than there were fetches performed. A large gap between CPU and elapsed timings indicates Physical I/Os (PIOs).

可以发现有10个无用的解析被执行(因为对于同一个SQL解析了11次),并且执行的是“批取回”,因为取回的行数比结果集中实际存在行数多。如果CPU与elapsed相关比较大,则认为有物理I/O,即:(PIOs)。

五、保存 SQL_TRACE 统计结果

When you run TKPROF, use the INSERT parameter to specify the name of the generated SQL script. If you omit this parameter, then TKPROF does not generate a script.

如果使用INSERT参数,可以生成相应的CREATE/INSERT语句。

默认条件下不会生成。

下面表中内容是对于TKPROF_TABLE中部分字段的解释:

column description
SQL_STATEMENT

This is the SQL statement for which the SQL Trace facility collected the row of statistics. Because this column has data-type LONG, you cannot use it in expressions or WHERE clause conditions.

这里保存的是SQL语句,注意保存类型为LONG,所以不能使用WHERE条件对其做为查询条件

DATE_OF_INSERT

This is the date and time when the row was inserted into the table. This value is not exactly the same as the time the statistics were collected by the SQL Trace facility.

这个值是反映插入这个表的时间,并不是执行TKPROF的时间。

DEPTH

This indicates the level of recursion at which the SQL statement was issued. For example, a value of 0 indicates that a user issued the statement. A value of 1 indicates that Oracle generated the statement as a recursive call to process a statement with a value of 0 (a statement issued by a user). A value of n indicates that Oracle generated the statement as a recursive call to process a statement with a value of n-1.

指嵌套查询的嵌套深度。

USER_ID

This identifies the user issuing the statement. This value also appears in the formatted output file.

用户唯一的标识,同时也会出现在格式输出文件的文件名上

CURSOR_NUM

Oracle uses this column value to keep track of the cursor to which each SQL statement was assigned.

ORACLE用来记录游标数,这些游标是为每个SQL分配的

六、TKPROF的缺陷

避免参数(Argument)缺陷

If you are not aware of the values being bound at run time, then it is possible to fall

into the argument trap. EXPLAIN PLAN cannot determine the type of a bind variable from the text of SQL statements, and it always assumes that the type is varchar. If the bind variable is actually a number or a date, then TKPROF can cause implicit data conversions, which can cause inefficient plans to be executed. To avoid this, experiment with different data types in the query.

如果在运行时绑定变量不小心,就有可能遇到这个缺陷。EXPLAIN PLAN从SQL语句中不能判断绑定变量的类型,并且总是将其认为是varchar类型的。如果SQL语句中实际绑定的变量是数字或者日期型的,则TKPROF将会做隐式的数据转换,这样会导致低效的计划被执行。为了避免这个不足,可以绑定不同的数据类型的变量做测试。

避免读一致性(Read Consistency)缺陷

The next example illustrates the read consistency trap. Without knowing that an uncommitted transaction had made a series of updates to the NAME column, it is very difficult to see why so many block visits would be incurred.

如果不知道一个未提交的事务,其产生一系列的更新在NAME列上,就非常难于明白为什么有如此多的数据块被访问。

Cases like this are not normally repeatable: if the process were run again, it is unlikely that another transaction would interact with it in the same way.

像这样的一些例子不经常出现,如果将上述过程再运行一次,未必另一个事务会出现同样的结果。

SELECT name_id

FROM cq_names

WHERE name = ’FLOOR’;

call count cpu elapsed disk query current rows

---- ----- --- ------- ---- ----- ------- ----

Parse   1 0.10 0.18 0 0   0 0

Execute 1 0.00 0.00 0 0   0 0

Fetch   1 0.11 0.21 2 101 0 1

Misses in library cache during parse: 1

Parsing user id: 01 (USER1)

Rows Execution Plan

---- --------- ----

0 SELECT STATEMENT

1 TABLE ACCESS (BY ROWID) OF ’CQ_NAMES’

2 INDEX (RANGE SCAN) OF ’CQ_NAMES_NAME’ (NON_UNIQUE)

避免读计划(schema)缺陷

This example shows an extreme (and thus easily detected) example of the schema trap. At first, it is difficult to see why such an apparently straightforward indexed query needs to look at so many database blocks, or why it should access any blocks at all in current mode.

SELECT name_id

FROM cq_names

WHERE name = ’FLOOR’;

call count cpu elapsed disk query current rows

-------- ------- -------- --------- ------- ------ ------- ----

Parse   1 0.06 0.10 0  0  0 0

Execute 1 0.02 0.02 0  0  0 0

Fetch   1 0.23 0.30 3131 3 1

Misses in library cache during parse: 0

Parsing user id: 02 (USER2)

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT

2340 TABLE ACCESS (BY ROWID) OF ’CQ_NAMES’

0 INDEX (RANGE SCAN) OF ’CQ_NAMES_NAME’ (NON-UNIQUE)

Two statistics suggest that the query might have been executed with a full table scan. These statistics are the current mode block visits, plus the number of rows originating from the Table Access row source in the execution plan. The explanation is that the required index was built after the trace file had been produced, but before TKPROF had been run.

Generating a new trace file gives the following data:

SELECT name_id

FROM cq_names

WHERE name = ’FLOOR’;

call count cpu elapsed disk query current rows

----- ------ ------ -------- ----- ------ ------- -----

Parse   1 0.01 0.02 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch   1 0.00 0.00 0 2 0 1

Misses in library cache during parse: 0

Parsing user id: 02 (USER2)

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT

1 TABLE ACCESS (BY ROWID) OF ’CQ_NAMES’

2 INDEX (RANGE SCAN) OF ’CQ_NAMES_NAME’ (NON-UNIQUE)

One of the marked features of this correct version is that the parse call took 10 milliseconds of CPU time and 20 milliseconds of elapsed time, but the query apparently took no time at all to execute and perform the fetch. These anomalies arise because the clock tick of 10 milliseconds is too long relative to the time taken to execute and fetch the data. In such cases, it is important to get lots of executions of the statements, so that you have statistically valid numbers.

由于在执行TKPORF之前没有建立索引,一但建立索引,执行TKPROF会是另外一种结果。

避免读时间(time)缺陷

Sometimes, as in the following example, you might wonder why a particular query has taken so long.

UPDATE cq_names SET ATTRIBUTES = lower(ATTRIBUTES)

WHERE ATTRIBUTES = :att

call count cpu elapsed disk query current rows

-------- ------- -------- --------- -------- -------- ------- ----------

Parse   1 0.06  0.24 0  0   0  0

Execute 1 0.62 19.62 22 526 12 7

Fetch   0 0.00  0.00 0  0   0  0

Misses in library cache during parse: 1

Parsing user id: 02 (USER2)

Rows Execution Plan

------- ---------------------------------------------------

0 UPDATE STATEMENT

2519 TABLE ACCESS (FULL) OF ’CQ_NAMES’

Again, the answer is interference from another transaction. In this case, another transaction held a shared lock on the table cq_names for several seconds before and after the update was issued. It takes a fair amount of experience to diagnose that interference effects are occurring. On the one hand, comparative data is essential when the interference is contributing only a short delay (or a small increase in block visits in the previous example). On the other hand, if the interference is contributing only a modest overhead, and the statement is essentially

efficient, then its statistics might not need to be analyzed.

还是要从事务角度考虑,当执行这个UPDATE之前几秒,可能有另外一个事务在同一表上分配了共享锁。处理这样的冲突事件需要有权衡的能力。一方面,当这个冲突在瞬间发生时访问到一定量的数据(或者访问有小量的增长)。另一方面,如果这一事件不是很过份,并且这个SQL语句本质上是有效率的,那么这个统计信息可能被忽略。

避免触发器(trigger)缺陷

The resources reported for a statement include those for all of the SQL issued while the statement was being processed. Therefore, they include any resources used within a trigger, along with the resources used by any other recursive SQL (such as that used in space allocation). With the SQL Trace facility enabled, TKPROF reports these resources twice. Avoid trying to tune the DML statement if the resource is actually being consumed at a lower level of recursion.

SQL语句的资源报告包括它们被执行时全部的SQL声明。因此上述资源包括在触发器中调用的任何资源,连同被其它嵌套占用的资源(例如调用空间分配的SQL)。在执行SQL_TRACE后,TKPROF会报告这些资源两次被使用。最好不要调整这些DML语句,如果这些资源确实被以低度嵌套SQL所消耗。

If a DML statement appears to be consuming far more resources than you would expect, then check the tables involved in the statement for triggers and constraints that could be greatly increasing the resource usage.

如果一个DML语句消耗的资料比预期的大,则检查与这些语句相关的触发器和约束,他们也许会极大地增加资源的占用。