天天看點

DSI301: Advanced Server Support Skills

DSI3系列比較老了,分為下面幾個: DSI301: Advanced Server Support Skills DSI302: Data Management DSI303: Database Backup and Recovery DSI304: Query Management DSI305: Database Tuning DSI306: Very Large Databases DSI307: Distribution and Replication DSI308: Parallel Server

以下是自己的學習記錄,有空再整理,

kt --->>>kernel transaction kd--->>>kernel data kc--->>>kernel cache ks--->>>kernel services

setting events: 1> The event initialization parameter 2> The alter session set events or alter system set events commands 3> The sys.dbms_system.set_ev procedure 4> The oradebug utility

oracle内部的錯誤号存儲在下面的檔案中: $ORACLE_HOME/rdbms/mesg/oraus.msg

當設定events時,下面下面三個值被支援: 1> crash --->>> causes an Oracle crash for tesing recovery 2> debugger --->>>invokes a system debugger if any 3> trace --->>> is context specific or named context-independent ones

setting multiple events: 1> 在初始化檔案中分開行寫,必須是連續的兩行 event = "10015 trace name context forever" event = "10046 trace name context forever, level 4"

如果像下面一樣分開寫,那麼隻有第二個event才能生效 event = "10015 trace name context forever" db_name = ORACLE db_files = 20 event = "10046 trace name context forever, level 4" 2> 使用:分隔符寫 event = "10015 trace name context forever: 10046 trace name context forever, level 4"

Setting Events from SQL dbms_system預設情況下隻能被sys使用者可通路.可以授執行權限給其他的使用者 dbms_system包是通過dbmsutil.sql建立的. sys.dbms_system.set_ev(sid, serial#, event, level, 'action'); example: execute sys.dbms_system.set_ev(8, 219, 10046, 12, '');

Setting Events from ORADEBUG SQL> oradebug help HELP           [command]                 Describe one or all commands SETMYPID                                 Debug current process SETOSPID       <ospid>                   Set OS pid of process to debug SETORAPID      <orapid> ['force']        Set Oracle pid of process to debug SETORAPNAME    <orapname>                Set Oracle process name to debug SHORT_STACK                              Get abridged OS stack CURRENT_SQL                              Get current SQL DUMP           <dump_name> <lvl> [addr]  Invoke named dump PDUMP          [interval=<interval>]     Invoke named dump periodically                [ndumps=<count>]  <dump_name> <lvl> [addr] DUMPSGA        [bytes]                   Dump fixed SGA DUMPLIST                                 Print a list of available dumps EVENT          <text>                    Set trace event in process SESSION_EVENT  <text>                    Set trace event in session DUMPVAR        <p|s|uga> <name> [level]  Print/dump a fixed PGA/SGA/UGA variable DUMPTYPE       <address> <type> <count>  Print/dump an address with type info SETVAR         <p|s|uga> <name> <value>  Modify a fixed PGA/SGA/UGA variable PEEK           <addr> <len> [level]      Print/Dump memory POKE           <addr> <len> <value>      Modify memory WAKEUP         <orapid>                  Wake up Oracle process SUSPEND                                  Suspend execution RESUME                                   Resume execution FLUSH                                    Flush pending writes to trace file CLOSE_TRACE                              Close trace file TRACEFILE_NAME                           Get name of trace file SETTRACEFILEID <identifier name>         Set tracefile identifier LKDEBUG                                  Invoke global enqueue service debugger NSDBX                                    Invoke CGS name-service debugger -G             <Inst-List | def | all>   Parallel oradebug command prefix -R             <Inst-List | def | all>   Parallel oradebug prefix (return output SETINST        <instance# .. | all>      Set instance list in double quotes SGATOFILE      <SGA dump dir>         Dump SGA to file; dirname in double quotes DMPCOWSGA      <SGA dump dir> Dump & map SGA as COW; dirname in double quotes MAPCOWSGA      <SGA dump dir>         Map SGA as COW; dirname in double quotes HANGANALYZE    [level] [syslevel]        Analyze system hang FFBEGIN                                  Flash Freeze the Instance FFDEREGISTER                             FF deregister instance from cluster FFTERMINST                               Call exit and terminate instance FFRESUMEINST                             Resume the flash frozen instance FFSTATUS                                 Flash freeze status of instance SKDSTTPCS      <ifname>  <ofname>        Helps translate PCs to names WATCH          <address> <len> <self|exist|all|target>  Watch a region of memory DELETE         <local|global|target> watchpoint <id>    Delete a watchpoint SHOW           <local|global|target> watchpoints        Show  watchpoints DIRECT_ACCESS  <set/enable/disable command | select query> Fixed table access IPC                                      Dump ipc information UNLIMIT                                  Unlimit the size of the trace file CALL           [-t count] <func> [arg1]...[argn]  Invoke function with arguments CORE                                     Dump core without crashing process PROCSTAT                                 Dump process statistics

Four Event Categories 1. Dump diagnostic information on request 2. Dump diagnostic information when an error occurs 3. Change database behavior 4. Produce trace diagnostic as the database runs

Category 1: Immediate Dumps Common immediate dumps are: 1. File header(controlf,redohdr,file_hdrs) 2. System state(systemstate) 3. Process state(processstate)

ordebug dump <dump> <level> dbms_system.set_ev(sid, serial#,65535,<level>,'<dump>')

example: alter session set events 'immediate trace name controlf level 10'; oradebug setmypid oradebug dump controlf 10 exec dbms_system.set_ev(7,10,65535,10,'controlf');

Category 2: On-Error Dumps alter session set events '<error> trace name errorstack level <level>'; oradebug event <error> trace name errorstack level <level>

error stack dumps的級别如下: <level>  Description   0      Error stack only   1      Error stack and function call stack(if implemented)   2      As 1 plus the process state    3      As 2 plus the context area(all cursors and current cursor highlighted)      example:   alter session set events '60 trace name errorstack level 1';   oradebug setospid 4018   oradebug unlimit   oradebug dump errorstack 1      Category 3: Change Behavior   With this event category,you can:   1> Change the ORACLE server's behavior   2> Enable hidden features   3> Work around problems   4> Perform specialized tuning

  Example:   event = "10269 trace name context forever, level 10"   ----setting this event prevents SMON from coalescing free space(7.3 and above)     Category 4: Trace Events  Trace events produce a diagnostic trace as processes are running.  Use trace events to gather additional information so that a problem can be understood and resolved    Examples:  To dump the rollback segment recover information during database startup,you can set event code  10015(undo segment recovery) in your parameter file:  event = "10015 trace name context forever, level 10"    10046等價于sql_trace=true  alter session set events '10046 trace name context forever, level 12';  10046 event levels are:  <level>   Description     1      Enable standard sql_trace functionality(default)     4      As level 1 plus trace bind values     8      As level 1 plus waits(for example,this is especially useful for spotting latch waits,but can also beused  to spot full table scans and index scans)     12     As level 1 plus both trace bind values and waits

Event Groups  There are two groups of events that exist in an instance,process events and session events. Process events are initialized at process startup time with the initialized parameter 'event'. Session events are modified dynamically with an alter session or alter system command. When checking for posted events,the Oracle server first checks the session events,and then the process events. Note:None of the debug facilities are available until after the process has actually initialized its fixed PGA part, which happens at connect time.

In a hang situation you see 0% CPU usage and no activity; in a loop situation you typically see 100% CPU usage and a large amount of activity.

Diagnostic File: The alert.log file Trace files Application log files like SQLPLUS trace files Core dump files Sytem log files

Process State Dumps Component    Definition SO           State object address type         Index entry for object type,printed on next line(process,session,and so on) owner        State object address that owns this state object flag         INIT:object is initialized;FLST:on freelist;CLN:object freed by PMON DID          Resource ID(Oracle8 only)

The Potential for Corruption Memory corruption,known as heap corruption in the RDBMS,occurs when the wrong data is written to memory. It is most commonly caused by a rogue process accidentally updating the wrong memory location. Heap corruption can happen in private or shared memory. Shared momory corruption can and usually does impact more than one process.

Heap Corruption Diagnostics To identify the corrupter, you can three methods to force failures at the point of corruption: The _db_block_cache_protect parameter Event 10235 Event 10049

_db_block_cache_protect: _db_block_cache_protect=true protects the buffer cache from accidental writes: 1> Each buffer is made read-only on startup. 2> Each buffer must be explicitly set to read-write before an update and then returned to read-only afterwrads.

Diagnostics:Event 10235 1> "Check memory manager internal structures" 2> Enables additional checking and processing in the heap manager 3> Enables memory protection on some platforms(higher levels) 4> Detects heap corruption more quickly and so closer to when the corruption occurred

event = "10235 trace name context forever, level <level>" <level>  Description    1     Fast check on heap free(kghfrh)    2     Do 1 and fill memory with junk on alloc/free    3     Do 2 and ensure the chunk belongs to given heap on free    4     Do 3 and make permanent chunks freeable so they can also be checked     Higher levels are bitwise triggers: 8xN    Do 4 and check the given heap on every opertion 16xN   Also check top PGA(recursively) and top SGA(non-recursively) on each operation

Diagnostics: Event 10049 1> Proctect library cache memory heaps 2> You can use this event to enable libray cache object protection

event = "10049 trace name context forever, level <level>"

Memory Dumps 1> Gather more evidence by dumping memory . 2> Examples: ---heapdump:Dump PGA,SGA,UGA ---heapdump_addr:Dump a subheap ---row_cache:Dump the dictionary cache ---buffers:Dump the buffer cache ---library_cache:Dump the library cache

Heap Dumps there are three ways to dump heaps: 1> Using the alter session command 2> Using the oradebug utility 3> Using the event initialization parameter

Memroy dumps A heap dump consists of three parts: 1> Dump of the heap descriptor 2> Dump of the chunks within each extent on the heap's extent list 3> Dump of free lists,lru lists,permanent chunk lists,and marked chunk lists

Oracle8 Restricted ROWID Format BBBBBBBB.RRRR.FFFF Block number.Row number.File number

Oracle8 ROWID Format OOOOOOFFFBBBBBBSSS O--->>>Data object number F--->>>Relative file number B--->>>Block number S--->>>Slot(row) number

Oracle Block Dumps you can use two methods to dump oracle blocks: 1> Raw dumps performed with operating system utilities 2> Formatted dumps performed through the RDBMS

OS Dump for oracle7 and oracle8 On unix platforms,the dd utility is used to dump blocks. Do not place spaces around the equal sign(=) when using dd.

dd bs=db_block_size if=dbfile.dbf skip=(block-1) count=3|od -xv >file.out

unix dd utility bs:Block size  if: Input file skip: Number of blocks to skip count: Number of blocks to dump od:UNIX octal dump utility -x : Option to the od command to convert output to hex -v : Show all input data(verbose).Without the -v option,all groups of output lines that  would be identical to the preceding line are replaced with a line contailing on an asterisk(*).

例子:假如在alert日志中發現了file 1 ,block 100 報告壞塊.并且資料庫為4K的block size dd if=system01.dbf bs=8192 skip=99 count=3 |od -xv >file2.out Note:Remember that dd reads from disk.If the database is up,force a checkpoint to flush dirty buffers to disk.

Oracle 8 Data Block Layout

oracle8 Data Block Layout Typ: Block type(See one of the following pages for accepted values) Fmt: Block format;in Oracle8 this is 0x02,converted on the fly Filler: Not currently used RDBA: Relativedatabase address of the block SCNBase: SCN base SCNWarp:SCN wrap Seq: Sequence number;incremented for every change made to the block at the same SCN Flg: Flag(defined in kcbh.h) #define KCBHFNEW 0x01 #define KCBHFDLC 0x02 #define KCBHFCKV 0x04 #define KCBHFTMP 0x08 ChkVal: Optional check value for the block Tail: Consistency information to verify that the begingging and the end of the block are of the same version( lower order 2 bytes of SCNBase,plus block type,plus SCN Seq number)

Note:If there are 254 changes at the same SCN,then it is necessary to force a new SCN allocation on the  next change of the block.

Oracle 7 Data Block Layout

Oracle7 Data Block Layout The bytes in the header are defined as follows: Typ: Block type(See one of the following pages for accepted values) Ver: Block version;in Oracle7 this is 0x01 Filler: Not used at present DBA: Database address of the block Seq: Sequence number of the block(A change version for the current incarnation) ChkSum: A simpe checksum on the contents of the block(It uses the routine smschk() and is only present in  data blocks if DB_BLOCK_CHECKSUM is set .However,it is always present in file header blocks.)

Type: Block Type(Defined in k.h) ID  Type 1   Undo segment header 2   Undo data block 3   Save undo header 4   Save undo data block 5   Data segment header(temp,index,data,and so on) 6   KTB-managed data block(with itl) 7   Temp table data block(no itl) 8   Sort key 9   Sort run 10  Segment free list block 11  Data file header Note:Block types valid for Oracle7 and Oracle8 blocks

Oracle Formatted Block Dumps 1> You can dump the block from the RDBMS. 2> A trace file containing the dump is generated in the user dump destination. Oracle8: alter system dump datafile 5 block 13; Oracle7: alter session set events 'immediate trace name blockdump level 67110390';

Oracle8 Formatted Block Dump Oracle block dump you should see a block header: Rdba: Relative DBA of the block Scn:  SCN number SEQ -> 0 SEQ -> (UB1MAXVAL-1) SEQ -> (UB1MAXVAL)   Flg: Flag(defined in kcbh.h) Tail: Consistency data used to verify that the beginning and the end of the block are of the same version(Constists of lower order two bytes of SCNBase plus block Type plus SCN Seq number.) Fmt: Block format;in Oracle8 this is 2 Chkval: Optional check value for the block if DB_BLOCK_CHECKSUM=TRUE Type: Block type(defined in kcb.h)

Rdba: Relative DBA of the block Seg/Obj:Seq/Obj ID Csc: SCN at last block clean out Itc: Number of itl slots Flg:0=on the freelist Typ: 1=DATA;2=INDEX Fsl: ITL TX freelist slot Fnx:DBA of NEXT block on freelist ITL:Interested transaction list index(ITLs determined by initrans and maxtrans) Xid:Transaction ID(UndoSeg.Slot.Wrap) Uba:Undo address(UndoDBA.SeqNo.RecordNo) Flg:C=Committed;U=Commit Upper Bound;T=Active at CSC;B=Rollback of this UBA gives before image of the ITL Lck:Number of rows affected by this transaction Scn/Fsc:SCN=SCN of commited TX;Fsc=Free space credit(bytes)

Tsiz:Total data area size Hsiz:Data header size Pbl:Pointer to buffer holding the block Bdba:Block relative data block address(RDBA) Flag:N=pctfree hit(cluster);F=do not put on free list;K=flushable cluster keys Ntab:Number of tables(>1 in clusters) Nrow:Number of ROWS Frre:First free row index entry; -1=you have to add one Fsbo:Free space begin offset Fseo:Free space end offset Avsp:Available space in the block Tosp:Total available space when all TXs commit Nrow:Number of rows for first table

Oracle8 Formatted Block Dump:Data Layer Tab: Table 0, row 8,offset Cc: Number of columns in this ROW piecs Lb:Lock byte:ITL entry that has this row locked Fb:Flag byte.H=head of row piece;K=Cluster key;C=Cluster table member;D=Deleted row; F=First data piece;L=last data piece;P=First column continues from previous piece;N=Last column continues in next piece Tl:Row size(number of bytes plus data) Col:Column data

7.Block Corruption Diagnostics and Recovery What is Block Corruption? Whenever the RDBMS read or writes a block to or from disk,a consistency check is performed: 1>Block version 2>The DBA value in cache is compared to the DBA value in the block buffer 3>Block-checksum,if enabled

Which checks are performed? In Oracle7: 1>The incarnation and sequence number in the header 2>The IncSeq in the footer 3>The block type

In Oracle8: 1>The SCNBase and seq in the header 2>The SCNBase and seq in the footer 3>The block type

Which checks are performed? The block type is stored in the first word of the data block.

Block Corruption Types Meida corrupt: Oracle7:INC=0 Oracle8:SCN=0 Soft corrupt: Oracle7:Seq=0 Oracle8:SCN=XXX Seq=UB1MAXVAL where XXX means that only a change at a higher SCN,which reconstructs the entire block,can be applied to the block.

Media corrupt blocks:The information in the block does not make any sense after the read from disk. Soft corrupt blocks:The RDBMS marked the block corrupt after detecting a corruption.

In Oracle8 ,the seq value in a block has maximum value of UB1MAXBAL-1.Thereforce,the value UB1MAXVAL can be used to mark a block as "soft corrupt".

How to Handle Corruptions 1>Check the alert file and system log file 2>Use available diagnostic tools to find out the type of corruption 3>Perform a block dump 4>Use patch utilities to see what is wrong with the block 5>Determine if the error persists;run checks multiple time. 6>Recover data from the corrupted object if necessary

How to Handle Corruptioins Always try to find out if the error is permanent.Run the analyze command multiple times or,if possible,perform a shutdown and a startup and try again to perform the operation that failed earlier. Find out if there are more corruptions.If you encounted one,there may be other corrupted blocks,as well.Use tools  like DBVERIFY for this. Before you try to salvage the data,perform a block dump as evidence to identify the actual cause of the corruption. 1>Make a hex dump of the bad block,using UNIX dd and od -x. 2>Consider performing a redo log dump to check all the changes that were made to the block so that you can discover when the corruption occurred. Note:Remember that when you have a block corruption,performing media recovery is the recommended process afer the  hardware is verified.

How to Handle Corruptions Resolve any hardware issues: 1>Memory boards 2>Disk controllers 3>Disks

Recover or restore data from the corrupt object if necessary.

There is no point in continuing to work if there are hardware failures.When you encounter hardware are problems, the vendor should be contacted and the machine should be checked and fixed before continuing.A full hardware diagnostics should be run. Many types of hardware failures are possible: 1>Bad I/O hardware or firmware 2>Operating system I/O or caching problem 3>Memory or paging problems 4>Disk repair utilities

DBVERIFY Utility 1>Data file verification utility that comes with oracle7.3 2>Only works on datafile;redo log files cannot be checked. 3>Checks block consistency 4>Opens data files read-only;can be done while the database is up and running 5>The page number is the block number within the data file. 6>DBVERIFY uses the standard kcb routines to check the block integrity. 7>If head and tail do not match,DBVERIFY reads the block again.If they match,and influx block is reported; otherwise a corruption. Influx blocks are split blocks.If DBVERIFY does not report corruption it means that when it read the block the first time,DBW0 was writing a new version and it caught part of the old and part of the new version of this block. Thus,head and tail do not match. Note:DBVERIFY only checks for logical corruption;that is,it uses kcb.h routines to check header/footer information only.Therefore,it is possible for corruption to occur above the high-water mark.

The analyze command 1>Performs a logical block check 2>Does not mark blocks as soft corrupt;only report them 3>Validates index and table entries

analyze table xxx validate structure cascade; analyze index xxx validate structure; analyze cluster xxx validate structure; The cascade option validates an object,including all related objects.

analyze table xxx partition(p1) validate structure into invalid_rows; ---For a partitioned table,the Oracle server also verifies that the rows belongs to the correct parition.If the row does not collate correctly,the rowid is inserted into the invalid rows table.

Note:A simple select statement(select * from <table>) does a full table scan,which means tha it reads all the data blocks up to the high-water of the table.You could use this to perform a quick check for corruptions in your  current table data.

Diagnostic Events 1>Event 10231 "Skip corrupted blocks during a full table scan" 2>Event 10233 "Skip corrupted data or index blocks on index range scans"

Event specification syntax in the parameter file: event = "10231 trace name context forever, level 10" event = "10233 trace name context forever, level 10" Note:An index range scan is used when you query the database based on a range of values,or if your query uses a  nonunique index.

Event 10232: "Dump corrupted blocks in a trace file " Event 10210: "Force logical block checking"

Whenever a table data block is modified, an integrity check is done;bad blocks are marked as soft corrupt. Event 10211:As event 10210 for index blocks Event 10212:As event 10210 for cluster blocks

event = "xxx trace name context forever, level 10" where xxx is 10210,10211,10212 Note:These events are expensive;make sure to turn them off when not needed.Event 10231 and 10233 only skip soft corrupt blocks.

db_block_checking 1>Introduced in Oracle8i 2>Can be set by using the alter session or alter system deferred command 3>Replaces events 10210,10211,10212 4>Default value is false

The db_block_checking initialization parameter invokes the same kdb checks as events 10210,10211,and 10212. The default setting is false and is provided for compatibility with earlier releases where block checking is  disabled as a default.

Because the parameter is dynamic,it provides more flexibility than events 10210,10211,and 10212,which it will  replace in future releases.Note that the setting of db_block_checking overrides any setting of events 10210,10211, and 10212.

Note:Oracle release 7.2 and later includes a special provision,the db_block_checksum initialization parameter,to add checksums to Oracle database blocks.This helps detect media corruptions when a block is next read by the Oracle server.

If db_block_checksum is set to true,DBWn and the direct loader calculate a checksum and store it in the cache header  of every data block when writing it to disk.When the block is subsequently read,the checksum is recomputed and the  stored value is checked with this computed value.

The Export Utility 1> A full export can be used to check the logical consistency of a database. 2> Export performs a full table scan on all the tables to retrieve the data. 3> Export does not detect all corruptions ---1. Does not detect disk corruptions above the high-water mark. ---2. Does not detect corruptions in indexes,or in free or temporary extents.

You do not have to create a dump file when you perform an export,if all you want is to check the consistency of  the data.On UNIX systems,use /dev/null as the file name;for OpenVMS use NULL;and for Windows NT use NUL.When you  do this,the export utility reads all the data but writes it to the null device,which is basically not writing at all. This is a good method for checking the logical consistency of the database and detecting corruption on a regular basis, both for physical corruption(in used blocks) and for logical corruption(data-dictionary issues).

Example: exp system/oracle file=/dev/null full=y

Export only reads: 1> User data below the high-water mark 2> Parts of the data dictionary,while looking up information concerning the objects beging exported.

Note:Export does not read all of the data dictionary,so there may be undetected corruptions in the SYSTEM tablespace.

Media Recovery The database must be in archivelog mode.Perform the following steps: 1>Resolve any hardware issues;for example,replace the disk. 2>Find out which data file contains the corrupted blocks. 3>Restore a backup of the data file. 4>Recover the data file.

Media Recovery Find out which absolute datafile number contains the corrupted block and then query v$datafile or dba_data_files to  retrieve the file name. Make sure that any hardware problems are resolved before you restore the backup of the data file.If the disk is bad  and you have enough room on another disk,it is always possible to put the backup on a good disk,rename the data file  to the new location,and recover.Also,after the recovery you may want to check the recovered data file for corruptions.  If you do not have a hot backup,you must go back to restoreing a full cold backup.   Which Object is Corrupted? 1>Index:Just drop and recreate the index. 2>Rollback segment:See one of the later lessons. 3>Table:The data in the corrupted block is lost. ---1.Drop the table and re-create it,and import data from an export dump. ---2.Set event 10231 to skip corrupt blocks on export,drop the table,and re-create it from the export dump. ---3.Use SQL or PL/SQL to pull data out of the table into a newly created table.

Which Object is corrupted? If you do not plan to restore data files and recover,use the following statement to determine which object has corrupted blocks. The absolute file number(for example,5) and block number(for example,2) can be found in the error message; for example: ORA-01578:ORACLE data block corrupted(file #5, block #2) In Oracle7 ,issue: select segment_name,segment_type from dba_extents where file_id=5 and 2 between block_id and block_id + blocks-1; In Oracle8,issue: select segment_name,segment_type,relative_fno from dba_extents where file_id=5 and 2 between block_id and  block_id + blocks -1;

Using SQL or PL/SQL 1> If the corrupted table has a unique index,you can use the index. 2> Do a range scan with the ROWID hint to select around the corrupted block. 3> You must use PL/SQL to retrieve data from tables containing a LONG column.

Using SQL If you use SQL,you can create a new table using: create table temp as select * from tab_name where 1=2; Create the required ROWIDs with the following dbms_rowid.rowid_create function: function rowid_create (rowid_type IN number,object_name IN number ,relative_fno IN number,block_number IN number ,row_number IN number ) return rowid; pragma RESTRICT_REFERENCES(rowid_create,WNDS,RNDS,WNPS,RNPS)

Then insert the required rows into that table around the corruption: insert into temp select * from tab_name where rowid<DBMS_ROWID.ROWID_CREATE(1,2168,5,3,0); insert into temp select * from tab_name where rowid>DBMS_ROWID.ROWID_CREATE(1,2168,5,1,0);

Because the corrupt block is block 2,the last possible ROWID before the corrupt block contains block 1,and the first ROWID after the corrupt block contains block 3.Use row number 0,because you do not known how many row numbers are in the blocks.

To receive the object number,query dba_objects from non partitioned tables and tabpart$ for partitioned tables: select object_id from dba_objects where object_name='TAB_NAME'; select obj# from tabpart% where file#=5 and block#=2;

Chained Rows If you have chained rows that have row pieces in the corrrupt block,those rows are also lost.To find out which rows you must skip additionally,select ROWIDs from the table with a WHERE clause that skips the rows in the corrupt blocks. If an error occurs,look at the last returned ROWID,and you know that the next rowid has a problem.Then alse exclude  this one and relaunch the altered query.

LONG Colums If the table contains a LONG column,you have to write a small PL/SQL block,which handles the LONG column.This has  restrictioins on the maximus size of the data,for example,32KB(port-specific). If the LONG data is too large,writing a Pro*C or OCI program might be an altermative. SQL*Plus can copy LONG data by specifying the longchunk size parameter and using the SQL*Plus COPY command.However, there is a limitation on the size of the LONG,and this limitation is OS dependent.Chek the user's guide.

The dbms_repair Package 1>Introduced in Oracle8i 2>Works with corruption in the transaction and data layer. 3>Only marks the block "software corrupt" in the initial release

The dbms_repair package can indicate but not repair block corruption,and can help to extract meaningful data even if the block has not yet been marked as corrupt. 1>After the block is marked as corrupt,the entire block must be skipped. 2>For more details on this package,refer to chapter 18 in the Oracle8i Administrator's Guide,the Oracle8i Supplied Packages Reference,and to <Note:68013.1>. The following section covers setting up the dbms_repair tables and checking the objects for corruptions. Example: First create the repair tables and views with the following procedure: execute dbms_repair.admin_tables(table_name => 'REPAIR_TABLE',table_type => 'dbms_repair.repair_table', action => dbms_repair.create_action); Use the check_object procedure to check the specified object and populate the repair table: dbms_repair.check_object(schema_name => 'SYSTEM',object_name => 'T1',repair_table_name => 'REPAIR_TABLE', corrupt_count => :count);

admin_tables:create the repair tables check_object:populate the repair tables with informatioin about corruptions and repair directives. fix_corrupt_blocks:fix the corrupt blocks in specified objects based on informatioin in the repair table(ORA-01578). dump_orphon_keys:report on index entries that point to rows in corrupt data blocks(optional). skip_corrupt_blocks:skip corrupt blocks during index and table scans. Example: print count count: 1

select object_name,block_id,corrupt_type,marked_corrupt,corrupt_description,repair_description from repair_table; Now query the corrupt object and extract as much informatioin as possible.Dump the corrupted block and use tools such as BBED or ORAPATCH(see following sectioin) to mine data from the hex dumps.

Patching with ORAPATCH ORAPATCH acts like a pure disk or file editor. "ORAPATCH help" list all commands. You can view and modify blocks in hexadecimal for all files and block types supported by RDBMS. It is possible to patch everything,but it is very time-consuming to rebuild an entire block. ORAPATCH is an internal support tool and is not provided to customers. Note:ORAPATCH works on 512 bytes blocks.If your alert.log file reported that block 10 was corrupted,then multiply 10 by 8,if the Oracle block size is 4KB(4098 divided by 512=8) and then add 1 for the file header block.As a result,specify  block 81. Example: orapatch open systemTEAMA01.dbf write  patch> set hex patch> display 81

Patching with BBED 1>BBED stands for block browser/editor. 2>It is a tool to browse and edit disk data structures while the database is open. 3>It supports physical(byte-level) and symbolic(structure and field-level) interfaces for displaying and editing.  BBED is shipped with Oracle8 releases and with some Oracle7.3 releases.It includes all the features of ORAPATCH,  and also display separate data structures within blocks.This can be very helpful in determining what exactly is   wrong in the block.Another advantage that BBED has over ORAPATCH is the fact that BBED saves the "before image",  so you can always undo changes.   On UNIX,you need to relink BBED with the following command: make -f ins_rdbms.mk $ORACLE_HOME/rdbms/lib/bbed Note:The link command does not work on all platforms for release 8.0.You must edit the make file to create an  executable. On Windows NT,a BBED.exe is shipped as an executable and is protected with the password "blockedit".

BBED參數: DATAFILE:File to browse or edit BLOCKSIZE:Block size<in bytes> MODE:Browse or edit REVERT:To revert changes made in a previous session(y[es]/n[o]) SILENT:y[es]/n[o] SPOOL:y[es]/n[o] LISTFILE:List of files;the format is <absolute filenumber><name>[sizes in bytes] CMDFILE:Command file name BIFILE:Before-Image file;default is bifile.bbd LOGFILE:User log file;default is log.bbd PARFILE:Parameter file

Patching with BBED or ORAPATCH 1>ORAPATCH and BBED are tools that can potentially corrupt the database. 2>Never use a customer system as a test system 3>Make sure to address customer expectations before you make an attempt. Note:Patching utilities like ORAPATCH and BBED are userful when logical corruption has osscured;that is,when the header and footer do not agree.They are not useful when physical corruption has occurred and data has been corrupted.

8.Rollback Segment Corruption Recovery Rollback segments are central to the RDBMS,beings used for both rollback(transaction) recovery and for building consistent read database snapshots.When they become corrupt,the database availability is affected,and if the  correct preparations are not made beforehand,it may be impossible to restore data consistency.

When customers call support,it is typically because they have failed to recover the database after a corruption. They may be under pressure to get the database working againg and to prove that they are in control.It is critically important that,despite these distractioins,you take the time to thoroughly understand the problem and,using a  knowledge of how RDBMS works and the tools available,propose and justify a strategy that will return the database  to a consistent state.When total constistency cannot be achieved,you should be able to explaint the areas of risk and suggent effective remedial action.

This lesson covers the internals of transaction management,database startup,and rollback segment acquisition,and how rollback segment corruption can interfere with them.This will help you understand the symptoms of rollback segment  corruption,additional diagnostics,and how the RDBMS behavior can be changed by using certain undocumented parameters.

Transaction Internals A transaction begins when a slot is allocated in the transaction table at the head of a rollback segment.This is  the physical mainfestation of a transaction.The transaction identifier(txid) is a pointer to this location. Transaction identifiers have the following structure: txid=usn.slot.wrap (undo_segment_number.transaction slot id.SCN wrap) Before a transaction inserts,updates,or deletes a table row,an ITL is allocated in the block containing the row. The ITL is used to mark the row as locked until the transaction is either committed or rolled back.The ITL contains the transaction identifier. When the change is applied to the block,undo information is also generated and is stored in the rollback segment. The transaction table slot contains a pointer to the undo. Note:When query v$lock,usn.slot show up as id1 and wrap shows up as id2.

Read Consistency Suppose a different session wants to read the same block,and the first transaction has not yet been committed. The session,when reading the block(possibly still in cache),finds the open ITL.It checks the transaction status (by reading the rollback segment header) and finds that it is still active.This means that the session must roll  back the changes made by the active transaction to create a view(snapshot) of the block before the first change  was made.This is called making a consistent read(CR) copy of the block,and is archieved by first cloning the block and then rolling back the latest changes by applying undo records from the rollback segment. Note that CR requires access to the data block,rollback segment header,then undo records.If any of these are corrupted, CR is affected. Note:This is a simplified view of the CR mechanism.See the DSI302 course for a more detailed description.

Locking Continuing the same example,suppose another session wants to change(update or delete) the row that was updated by the first transaction,and the first transaction has not yet been committed. The new session,when reading the block(possibly still in cache) will find the open ITL.It checks the transaction  status(by reading the rollback segment header) and finds that it is still active.This tells the session that  progress cannot be made at this stage,so it waits for the transaction to complete(commit or roll back). Note that it is necessary to read the rollback segment header to establish the status of the ITL.If the rollback  segment header is corrupted,then locking is affected.

Commit and Delayed Block Cleanout Continuing the same example,suppose the first transaction is now committed.The event is immediately recorded by making the transaction table slot as inactive;however,the data block itself may not be updated until later. This means that the ITL in the block may remain open for some time after the commit. When the RDBMS next reads the block,the transaction table is checked,the commit is confirmed,and the ITL is closed. This is known as delayed block cleanout.If the rollback cleanout.If the rollback segment is corrupted,then delayed block cleanout is affected. In release 7.3,a new feature called delayed logging block cleanout(DLBC) has changed the RDBMS behavior.This feature is enabled by setting delayed_logging_block_cleanouts to true,as the default setting.With DLBC,ITL cleanout is  partially completed at commit time for blocks that remaint int the cache.References to the rollback segment header  are no longer required to clean out those ITLs,so an I/O is saved.The ITL cleanout is completed when the ITL is  reused or when a row covered by the ITL is next locked. In release 8.1.3,the delayed_logging_block_cleanouts parameter has been removed and its behavior is incorporated  into the RDBMS. Note:The above desciption is a simplification of the exact DLBC mechanism.

Transaction Recovery Transaction recovery(the process of rolling back transactions) is performed: 1> By the shadow process when a rollback statement is issued 2> By PMON when a session(process) crashes with a transaction in process 3> By SMON or a shadow process on opening a database that crashed with active transactions

Transaction Recovery On Rollback When a rollback statement is issued,the RDBMS scans the undo records for the current transaction in  reverse order(latest first) and applies them to the database.When the statement returns,all the block changes have been undone and the ITL has been cleared.There is no delay on rollback. The rollback opertion requires access to the rollback segment header,undo records,and data blocks. If any of these are corrupted,then rollback will be affected. Note that the process of performing rollback generates new redo,which is written to the redo logs.

Transaction Recovery After a Process Crash If the RDBMS shadow process crashes with an active transaction,PMON detects the failure and immediately rolls back the transaction.You can monitor this by setting the following events in the init.ora file  or on the PMON process: 10012 trace name context forever,level 1(abort transaction) 10246 trace name context forever,level 1(trace PMON actions to trace file +IO slave trace)

Transaction Recovery After an Instance Failure If the RDBMS instance crashes before the transaction is committed,there is no time to roll back the  transaction.The next time the database is opened,crash recovery rolls the database forward,returning it to its precrash state(as above).It is then the responsibility of transaction(rollback) recovery to reomve any incomplete transactions.

Transaction Recovery:Database Open 1> Active transactions in the system rollback segment are immediately rolled back. 2> Active transactions in other rollback segments are marked as "dead". 3> At a later time,SMON scans the segments again and performs a rollback on dead transactions.

Transaction Recovery at Database Open The RDBMS starts up as fast as possible after an untidy shutdown.This is especially useful when downtime must be kept to an absolute minimum.Another benefit is that if a new transaction wants to update a row locked by a dead transaction,the new transaction rolls back the transaction itself and does not have to  wait for SMON.

In release 7.2 and earilier,all active transactions are rolled back before the database is opened; a  corrupt rollback segment typically prevents the database from opening. A side effect of the release 7.3 changes is that database now open,in most cases,even if a rollback segment is corrupt,with errors logged to alert and trace files(SMON).This makes it much easier to  diagnose the failure,because you have access to the database.However,it would be easy for customers to  run for some time without realizing that they have a problem.Note that event in Oracle8,if the server is unable to read the rollback segment header(because the data file is offline or corrupted),then you  cannot open the database.

Transaction Recovery at Database Open In oracle8,you can list dead transactions by issuing the following query: select * from x$ktuxe where ktuxecfl='DEAD'; You can also find dead transactions by dumping the rollback segment header and checking the cflags column of the transactions table dump.To dump the rollback segment header,use the following command: alter system dump undo header R01; where R01 is the rollback segment name.A dead transaction is identified by having cflags='0x10'.

Rollback Segment Acquisition 1> After transaction recovery,rollback segments are acquired by the instance. --Segments specified by the rollback_segments parameter are acquired first. --Public rollback segments may also be acquired when necessary. 2> Acquisition failure typically causes ORA-1545:"Rollback segment %s specified not available."

Rollback Segment Acquisition The number of rollback segments acquired by an instance is governed by the transactions and  transactions_per_rollback_segment initialization parameter. The requirement is: 1(for system)+ transactions/transactions_per_rollback_segment This number is rounded up to the next whole number. A segment with a status of "Need Recovery" cannot be acquired when the database is opened,so if it is specified in rollback_segments,the instance will not open(ORA-1545). Rollback segments containing dead transactions are available to be acquired for new transactions.

Diagnostic Events Event 10013:Monitor transaction recovery during startup.To set in init.ora: 10013 trace name context forever,level 10 Event 10015:Dump rollback segment headers(excluding those not listed in undo$) before and after transaction recovery.To set in init.ora: 10015 trace name context forever,level 10

Undocumented Parameters _offline_rollback_segments _corrupted_rollback_segments

Undocumented and unsupported initialization parameters,used as a last resort. Specify a comma-separated list of rollback segment names;for example: _offline_rollback_segments = (r01,r02,r03) Neither the SYSTEM rollback segment nor segments already listed in rollback_segments can be specified.

Using the _offline_rollback_segments or _corrupted_rollback_segments parameters changes the behavior of the RDBMS when: 1>Opening the database 2>Performing constisent read and delayed block cleanout 3>Dropping a rollback segment

When openging a database,any rollback segments listed in _offline_rollback_segments  or _corrupted_rollback_segments : 1> Are not scanned,and any active transactions are neither marked as dead nor rolled back 2> Appear offline in dba_rollback_segs(undo$) 3> Cannot be acquired by the instance for new transactions

When using these undocumented parameters,the transaction table is not read when the database is opened, so transactions are not marked as dead or rolled back.This is where the situaion becomes dangerous, because if there are active transactions in the segment,they become highly abnormal.They are not dead, and yet they belong to a session that was active before the instance started. Unexpected and undesirable effects will occur;remember:The database is in an unsupported state.

When opening a database,_corrupted additionally causes all distributed transactions for a named rollback segment to be marked as "forced commit". This is done by updating pending_trans$ and dba_2pc_pending(tow phase commits).

In the situation described in the slide,the rollback segment itself is not updated;the Oracle server just performs an update on pending_trans$ and dba_2pc_pending.

CR and Cleanout If an open ITL is found to be associated with an _offline segment,the segment is read to find the  transaction status. 1> If committed,the block is cleaned out. 2> If active and you want to read the block,a CR copy is constructed using undo from the segment. 3> If active and you want to lock the row,undesirable behavior may result.

Note that although the rollback segment is _offline,the Oracle server actually reads the segment to  find transaction statuses and to gather undo records to perform rollback.If a corrupt block is found, the Oracle server will still fail. When you update a block covered by an actived dead transaction,the shadow process loops indefinitely, consuming CPU.Presumably,this is because the old transaction appears active,so the Oracle server  attempts to wait for the TX enqueue.You get this enqueue immediately,because the old transaction is  no longer there.The Oracle server then checks the block again and finds it is still active,so the process is repeated again and again.

If an open ITL is found to be associated with a _corrupted segment,the segment is not read to find the transaction status. It is as if the rollback segment had been dropped;the transaction is assumed to be committed and delayed block cleanout is performed. If the transaction was not committed,logical corruption will occur. Most important,the Oracle server does not read the segment in this case.It is as if the segment has been dropped.This is the most important difference between _offline and _corrupted. Note that if some ITLs are cleared and the rollback segment is then reintroduced to the database(by removing the _corrupted parameter),you may try to roll back a block thay you just committed. Tests show that this typically causes data or block corruption.To prevent this from happening, always drop a rollback segment if you have ever marked it as _corrupted.

Drop Rollback Segment 1> Typically,a rollback segment is prevented from being dropped if it contains active transactions. 2> When dropping an _offline or _corrupted rollback segment,this check is not performed. 3> As a result,the segment may be dropped even if it contains active transactions.

Normally you cannot drop a rollback segment if it contains active transactions.You can circumvent this  by using the parameter discussed in this lesson. If you drop an _offline or _corrupted rollback segment that contains active transactions,you risk logical corruption,possibly in the data dictionary. Always make sure to change your database back into a supported state by solving the problems,removing the  special settings in your parameter file,shutting the instance down,and performing a normal startup.

Although the database may seem to run smoothly,certain corruption problems can come back even after a long time,potentially causing a lot more problems than they did in the first place.

Recovery From Corruption 1> In all cases,media recovery is the preferred recovery approach,because it is supported and guarantees consitstency. 2> When media recovery is impossible,alternative techniques are suggested and the risks explained.

In this lesson,the term object corruption means tha a block has become corrupt in a user object(table, index,cluster) that is part of an active transaction.The block is typically found to be corrupt when the RDBMS attempts to roll back the transaction.

Recovery Guidelines 1> Rollback segment corruption can interfere with CR,row locking,block cleanout,and rollback. 2> If a rollback segment header is corrupt,database startup is prevented. 3> Undocumented parameters may enable recovery;however,they should only be used as a last resort,and only when the many risks are understood and explained to the customer.

9.Data Salvage Using the Data Unloader Utility(DUL) Two configuration files are necessary: 1> init.dul 2> control.dul

init.dul contains all configuration parameters,such as size of caches,details of header layout,Oracle  block size,and output file format. In the control file,control.dul,the data file names and the Oracle file numbers must be specified.

繼續閱讀