天天看點

取消Irp引起藍屏(BugCheck:0x18)

    昨天寫一個簡單的驅動,驅動的write例程會将IRP挂起放進自定義的隊列中,然後在另一個線程中取消這些挂起的IRP:

NTSTATUS SampleCharWriteAsync(PDEVICE_OBJECT devObj, PIRP irp)
{
  KIRQL oldIrql;
  SampleCharDevContext* devCtx = (SampleCharDevContext*)devObj->DeviceExtension;
  
  IoMarkIrpPending(irp);
  KeAcquireSpinLock(&devCtx->devWriteLock, &oldIrql);

  IoSetCancelRoutine(irp, SampleCharCancelIrp);

  if (irp->Cancel == TRUE)
  {
    if (IoSetCancelRoutine(irp, NULL) != NULL)
    {
      irp->IoStatus.Status = STATUS_CANCELLED;
      irp->IoStatus.Information = 0;
      IoCompleteRequest(irp, IO_NO_INCREMENT);

      KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);

      //return STATUS_PENDING;
      return STATUS_CANCELLED; //傳回STATUS_CANCELLED後會藍屏
    }
  }

  InsertTailList(&devCtx->pendWrListHead, &irp->Tail.Overlay.ListEntry);
  KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);

  return STATUS_PENDING;
}      

    開始時,沒寫測試代碼,就想着手動置irp->Cancel為true。改完後進入if分支并執行了IoCompleteRequest,整個過程如此順利,完全出乎意料。但是當執行完return STATUS_CANCELLED後藍屏不期而至:

kd> kb
ChildEBP RetAddr  Args to Child              
8cccac34 82a814bc 851d06e8 868473f8 868473f8 SampleChar!SampleCharWriteAsync+0x3a [c:\studio\samplechar0x44\samplechar\samplechar.c @ 243]
WARNING: Stack unwind information not available. Following frames may be wrong.
8cccac4c 82c82eee 86916038 868473f8 8684748c nt!IofCallDriver+0x64
8cccac6c 82c837a2 851d06e8 86916038 00000001 nt!RtlRandomEx+0x1340
8cccad08 82a8842a 851d06e8 0000008c 00000000 nt!NtWriteFile+0x6ee
kd> ?? irp
struct _IRP * 0x868473f8
   +0x024 Cancel           : 0 ''
kd> eb 0x868473f8+0x024 1
kd> ?? irp
struct _IRP * 0x868473f8
   +0x024 Cancel           : 0x1 ''      
取消Irp引起藍屏(BugCheck:0x18)
kd> !analyze -v
REFERENCE_BY_POINTER (18)
Arguments:
Arg1: 84feb378, Object type of the object whose reference count is being lowered
Arg2: 86c76a08, Object whose reference count is being lowered
...

Debugging Details:
------------------

IRP_ADDRESS: 00129450

LAST_CONTROL_TRANSFER:  from 82b24e71 to 82ab3394

STACK_TEXT:  
...
a49c1bec 82ab0ed0 86c76a08 82aea363 612ca77a nt!ObfDereferenceObjectWithTag+0x4b
a49c1bf4 82aea363 612ca77a 87086f80 87129450 nt!ObfDereferenceObject+0xd
a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d
a49c1c6c 82c867a2 c0000120 87086f80 00000001 nt!IopSynchronousServiceTail+0x240
a49c1d08 82a8b42a 86312b98 0000008c 00000000 nt!NtWriteFile+0x6e8
a49c1d08 773464f4 86312b98 0000008c 00000000 nt!KiFastCallEntry+0x12a
00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa
00f6d660 01138af4 00000080 00f6e778 0000000b kernel32!WriteFileImplementation+0x76
...      

在!analyze -v衆多輸出中,我看到失敗時IRP的位址。抱着嘗試的态度,我試着檢視這個IRP的資訊:

Debugging Details:
------------------
IRP_ADDRESS: 00129450 <----失敗時的IRP位址

kd> !irp 00129450
00129450: Could not read Irp      

很可惜,windbg不能解析IRP資訊。很無奈,我隻能通過堆棧回溯來分析失敗的原因了:

NtWriteFile對寫請求做了一系列檢測處理,然後調用IopSynchronousServiceTail由這個函數調用IoCallDriver向裝置棧發送IRP請求:

status = IoCallDriver( DeviceObject, Irp );

    if (DeferredIoCompletion) {

        if (status != STATUS_PENDING) {

            PKNORMAL_ROUTINE normalRoutine;
            PVOID normalContext;
            KIRQL irql = PASSIVE_LEVEL; // Just to shut up the compiler

            ASSERT( !Irp->PendingReturned );

            if (!SynchronousIo) {
                KeRaiseIrql( APC_LEVEL, &irql );
            }
            IopCompleteRequest( &Irp->Tail.Apc,
                                &normalRoutine,
                                &normalContext,
                                (PVOID *) &FileObject,
                                &normalContext );      

根據堆棧回溯,出錯時核心已經執行過IoCallDriver,并進入了IopCompleteRequest函數開始完成IRP請求。其實到這問題已經呼之欲出了:同一個IRP在我的驅動中已經調用過IopCompleteRequest;當驅動傳回時,IO管理器試圖再次操作IRP,是以出錯了(在調試過程中,Windbg直接提示BugCheck Code 0x44----多次釋放同一個IRP)。而導緻核心調用IopCompleteRequest完成IRP的原因是status!=STATUS_PENDING----我在驅動中傳回了STATUS_CANCEL。啊,果然手賤在驅動的派遣函數中寫錯了傳回值:隻要Irp沒有同步完成,派遣函數的傳回值就必須是STATUS_PENDING。

雖然問題已經定位,但另外還有一些比較重要的資訊,可以用作這次分析的佐證----确認藍屏确實發生在我的驅動中:

根據藍屏前函數調用棧frame 06顯示:

06 a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d      

IoCompleteRequest的第一個參數的值是0x00129490,這和!analyze -v輸出的藍屏時IRP的位址0x129450比較接近,不排除兩者之間有關聯。檢視IRP的結構:

kd> dt ntkrpamp!_IRP
   +0x000 Type             : Int2B
   +0x002 Size             : Uint2B
   +0x004 MdlAddress       : Ptr32 _MDL
...
   +0x02c UserEvent        : Ptr32 _KEVENT
   +0x030 Overlay          : <unnamed-tag>
   +0x038 CancelRoutine    : Ptr32     void 
   +0x03c UserBuffer       : Ptr32 Void
   +0x040 Tail             : <unnamed-tag>      

Irp!Tail成員确實和Irp結構起始位址差0x40B,與前面BugCheck分析得出的錯誤IRP:"IRP_ADDRESS: 00129450"一緻。檢視源碼,IO管理器調用IopCompleteRequest時為之傳入Irp->Tail.Apc為參數。這樣至少可以知道出錯時Windbg提示的IRP在裝置棧中做了些啥~

windbg的幫助文檔提到BugCheck 0x18的第2個參數代表了被錯誤解引用的對象,這裡是0x86c76a08。檢視這個對象的資訊:

kd> !object 86c76a08 9
Object: 86c76a08  Type: (84feb378) Event
    ObjectHeader: 86c769f0 (new version)
    HandleCount: 1  PointerCount: 0
kd> !findhandle 86c76a08
Now checking process 84fcea20...Now checking process 863764c0...
                   [86ecb030 onlyForWrite.e] #<--------句柄86c7a08的屬主程序:0x86ecb030
    8c: Entry a5320118 Granted Access 1f0003      

列出所有程序及其程序ID,86ecb030确實是onlyForWrite.exe程式:

kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
...
PROCESS 88892d40  SessionId: 1  Cid: 0dfc    Peb: 7ffdb000  ParentCid: 01ac
    DirBase: 7eb7a440  ObjectTable: a52b8c30  HandleCount:  52.
    Image: conhost.exe

PROCESS 86ecb030  SessionId: 1  Cid: 0e50    Peb: 7ffdc000  ParentCid: 0df4
    DirBase: 7eb7a3a0  ObjectTable: 98258ec8  HandleCount:  50.
    Image: onlyForWrite.exe      

最後,我想确定這個Event是不是onlyForWrite調用WriteFile時傳入的OVERLAPD!Event句柄:

OVERLAPPED overlapRd = { 0 },overlapWr = {0};
  overlapRd.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
  overlapWr.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);

  char writeBuff[4096] = { "_ThreadProc" }, readBuff[4096] = { 0 };
  DWORD len = 0, writeLen, readLen;

  while (1)
  {
    WriteFile(hDev, writeBuff, strlen(writeBuff), &writeLen, &overlapWr); //調用WriteFile時傳入overlapWr!hEvent事件句柄
    //WaitForSingleObject(overlapWr.hEvent, INFINITE);
    Sleep(1000);
  }      

藍屏時,我并沒有用調試器調試onlyForWrite程式,是以我并沒有記錄overlapWr!hEvent輸入的句柄值。但這個值還是可以檢視調用ZwWriteFile時的參數值。MSDN解釋說參數2即為傳入的OVERLAPPED!hEvent

NTSTATUS ZwWriteFile(
  _In_     HANDLE           FileHandle,
  _In_opt_ HANDLE           Event,
  _In_opt_ PIO_APC_ROUTINE  ApcRoutine,
  _In_opt_ PVOID            ApcContext,
  _Out_    PIO_STATUS_BLOCK IoStatusBlock,
  _In_     PVOID            Buffer,
  _In_     ULONG            Length,
  _In_opt_ PLARGE_INTEGER   ByteOffset,
  _In_opt_ PULONG           Key
);
Parameters
Event [in, optional]

    Optionally, a handle to an event object to set to the signaled state after the write operation completes. Device and intermediate drivers should set this parameter to NULL.      

來看下棧回溯,傳給ZwWriteFile的第二個參數的句柄值為0x8C:

0a 00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
0b 00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
0c 00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa      

有了這個句柄值,我隻要attach到onlyForWrite.exe然後檢視0x8c句柄對應的對象是否為0x86c76a08:

kd> .process 86ecb030  
Implicit process is now 86ecb030
WARNING: .cache forcedecodeuser is not enabled
kd> !handle 0000008c 

PROCESS 86ecb030  SessionId: 1  Cid: 0e50    Peb: 7ffdc000  ParentCid: 0df4
    DirBase: 7eb7a3a0  ObjectTable: 98258ec8  HandleCount:  50.
    Image: onlyForWrite.exe

Handle table at 98258ec8 with 50 entries in use

008c:<-左邊是句柄值,右邊是核心對象-> Object: 86c76a08  GrantedAccess: 001f0003 Entry: a5320118
Object: 86c76a08  Type: (84feb378) Event
    ObjectHeader: 86c769f0 (new version)
        HandleCount: 1  PointerCount: 0