昨天寫一個簡單的驅動,驅動的write例程會将IRP挂起放進自定義的隊列中,然後在另一個線程中取消這些挂起的IRP:
NTSTATUS SampleCharWriteAsync(PDEVICE_OBJECT devObj, PIRP irp)
{
KIRQL oldIrql;
SampleCharDevContext* devCtx = (SampleCharDevContext*)devObj->DeviceExtension;
IoMarkIrpPending(irp);
KeAcquireSpinLock(&devCtx->devWriteLock, &oldIrql);
IoSetCancelRoutine(irp, SampleCharCancelIrp);
if (irp->Cancel == TRUE)
{
if (IoSetCancelRoutine(irp, NULL) != NULL)
{
irp->IoStatus.Status = STATUS_CANCELLED;
irp->IoStatus.Information = 0;
IoCompleteRequest(irp, IO_NO_INCREMENT);
KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);
//return STATUS_PENDING;
return STATUS_CANCELLED; //傳回STATUS_CANCELLED後會藍屏
}
}
InsertTailList(&devCtx->pendWrListHead, &irp->Tail.Overlay.ListEntry);
KeReleaseSpinLock(&devCtx->devWriteLock, oldIrql);
return STATUS_PENDING;
}
開始時,沒寫測試代碼,就想着手動置irp->Cancel為true。改完後進入if分支并執行了IoCompleteRequest,整個過程如此順利,完全出乎意料。但是當執行完return STATUS_CANCELLED後藍屏不期而至:
kd> kb
ChildEBP RetAddr Args to Child
8cccac34 82a814bc 851d06e8 868473f8 868473f8 SampleChar!SampleCharWriteAsync+0x3a [c:\studio\samplechar0x44\samplechar\samplechar.c @ 243]
WARNING: Stack unwind information not available. Following frames may be wrong.
8cccac4c 82c82eee 86916038 868473f8 8684748c nt!IofCallDriver+0x64
8cccac6c 82c837a2 851d06e8 86916038 00000001 nt!RtlRandomEx+0x1340
8cccad08 82a8842a 851d06e8 0000008c 00000000 nt!NtWriteFile+0x6ee
kd> ?? irp
struct _IRP * 0x868473f8
+0x024 Cancel : 0 ''
kd> eb 0x868473f8+0x024 1
kd> ?? irp
struct _IRP * 0x868473f8
+0x024 Cancel : 0x1 ''
kd> !analyze -v
REFERENCE_BY_POINTER (18)
Arguments:
Arg1: 84feb378, Object type of the object whose reference count is being lowered
Arg2: 86c76a08, Object whose reference count is being lowered
...
Debugging Details:
------------------
IRP_ADDRESS: 00129450
LAST_CONTROL_TRANSFER: from 82b24e71 to 82ab3394
STACK_TEXT:
...
a49c1bec 82ab0ed0 86c76a08 82aea363 612ca77a nt!ObfDereferenceObjectWithTag+0x4b
a49c1bf4 82aea363 612ca77a 87086f80 87129450 nt!ObfDereferenceObject+0xd
a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d
a49c1c6c 82c867a2 c0000120 87086f80 00000001 nt!IopSynchronousServiceTail+0x240
a49c1d08 82a8b42a 86312b98 0000008c 00000000 nt!NtWriteFile+0x6e8
a49c1d08 773464f4 86312b98 0000008c 00000000 nt!KiFastCallEntry+0x12a
00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa
00f6d660 01138af4 00000080 00f6e778 0000000b kernel32!WriteFileImplementation+0x76
...
在!analyze -v衆多輸出中,我看到失敗時IRP的位址。抱着嘗試的态度,我試着檢視這個IRP的資訊:
Debugging Details:
------------------
IRP_ADDRESS: 00129450 <----失敗時的IRP位址
kd> !irp 00129450
00129450: Could not read Irp
很可惜,windbg不能解析IRP資訊。很無奈,我隻能通過堆棧回溯來分析失敗的原因了:
NtWriteFile對寫請求做了一系列檢測處理,然後調用IopSynchronousServiceTail由這個函數調用IoCallDriver向裝置棧發送IRP請求:
status = IoCallDriver( DeviceObject, Irp );
if (DeferredIoCompletion) {
if (status != STATUS_PENDING) {
PKNORMAL_ROUTINE normalRoutine;
PVOID normalContext;
KIRQL irql = PASSIVE_LEVEL; // Just to shut up the compiler
ASSERT( !Irp->PendingReturned );
if (!SynchronousIo) {
KeRaiseIrql( APC_LEVEL, &irql );
}
IopCompleteRequest( &Irp->Tail.Apc,
&normalRoutine,
&normalContext,
(PVOID *) &FileObject,
&normalContext );
根據堆棧回溯,出錯時核心已經執行過IoCallDriver,并進入了IopCompleteRequest函數開始完成IRP請求。其實到這問題已經呼之欲出了:同一個IRP在我的驅動中已經調用過IopCompleteRequest;當驅動傳回時,IO管理器試圖再次操作IRP,是以出錯了(在調試過程中,Windbg直接提示BugCheck Code 0x44----多次釋放同一個IRP)。而導緻核心調用IopCompleteRequest完成IRP的原因是status!=STATUS_PENDING----我在驅動中傳回了STATUS_CANCEL。啊,果然手賤在驅動的派遣函數中寫錯了傳回值:隻要Irp沒有同步完成,派遣函數的傳回值就必須是STATUS_PENDING。
雖然問題已經定位,但另外還有一些比較重要的資訊,可以用作這次分析的佐證----确認藍屏确實發生在我的驅動中:
根據藍屏前函數調用棧frame 06顯示:
06 a49c1c38 82c85f36 00129490 a49c1c64 00000000 nt!IopCompleteRequest+0x24d
IoCompleteRequest的第一個參數的值是0x00129490,這和!analyze -v輸出的藍屏時IRP的位址0x129450比較接近,不排除兩者之間有關聯。檢視IRP的結構:
kd> dt ntkrpamp!_IRP
+0x000 Type : Int2B
+0x002 Size : Uint2B
+0x004 MdlAddress : Ptr32 _MDL
...
+0x02c UserEvent : Ptr32 _KEVENT
+0x030 Overlay : <unnamed-tag>
+0x038 CancelRoutine : Ptr32 void
+0x03c UserBuffer : Ptr32 Void
+0x040 Tail : <unnamed-tag>
Irp!Tail成員确實和Irp結構起始位址差0x40B,與前面BugCheck分析得出的錯誤IRP:"IRP_ADDRESS: 00129450"一緻。檢視源碼,IO管理器調用IopCompleteRequest時為之傳入Irp->Tail.Apc為參數。這樣至少可以知道出錯時Windbg提示的IRP在裝置棧中做了些啥~
windbg的幫助文檔提到BugCheck 0x18的第2個參數代表了被錯誤解引用的對象,這裡是0x86c76a08。檢視這個對象的資訊:
kd> !object 86c76a08 9
Object: 86c76a08 Type: (84feb378) Event
ObjectHeader: 86c769f0 (new version)
HandleCount: 1 PointerCount: 0
kd> !findhandle 86c76a08
Now checking process 84fcea20...Now checking process 863764c0...
[86ecb030 onlyForWrite.e] #<--------句柄86c7a08的屬主程序:0x86ecb030
8c: Entry a5320118 Granted Access 1f0003
列出所有程序及其程序ID,86ecb030确實是onlyForWrite.exe程式:
kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
...
PROCESS 88892d40 SessionId: 1 Cid: 0dfc Peb: 7ffdb000 ParentCid: 01ac
DirBase: 7eb7a440 ObjectTable: a52b8c30 HandleCount: 52.
Image: conhost.exe
PROCESS 86ecb030 SessionId: 1 Cid: 0e50 Peb: 7ffdc000 ParentCid: 0df4
DirBase: 7eb7a3a0 ObjectTable: 98258ec8 HandleCount: 50.
Image: onlyForWrite.exe
最後,我想确定這個Event是不是onlyForWrite調用WriteFile時傳入的OVERLAPD!Event句柄:
OVERLAPPED overlapRd = { 0 },overlapWr = {0};
overlapRd.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
overlapWr.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
char writeBuff[4096] = { "_ThreadProc" }, readBuff[4096] = { 0 };
DWORD len = 0, writeLen, readLen;
while (1)
{
WriteFile(hDev, writeBuff, strlen(writeBuff), &writeLen, &overlapWr); //調用WriteFile時傳入overlapWr!hEvent事件句柄
//WaitForSingleObject(overlapWr.hEvent, INFINITE);
Sleep(1000);
}
藍屏時,我并沒有用調試器調試onlyForWrite程式,是以我并沒有記錄overlapWr!hEvent輸入的句柄值。但這個值還是可以檢視調用ZwWriteFile時的參數值。MSDN解釋說參數2即為傳入的OVERLAPPED!hEvent
NTSTATUS ZwWriteFile(
_In_ HANDLE FileHandle,
_In_opt_ HANDLE Event,
_In_opt_ PIO_APC_ROUTINE ApcRoutine,
_In_opt_ PVOID ApcContext,
_Out_ PIO_STATUS_BLOCK IoStatusBlock,
_In_ PVOID Buffer,
_In_ ULONG Length,
_In_opt_ PLARGE_INTEGER ByteOffset,
_In_opt_ PULONG Key
);
Parameters
Event [in, optional]
Optionally, a handle to an event object to set to the signaled state after the write operation completes. Device and intermediate drivers should set this parameter to NULL.
來看下棧回溯,傳給ZwWriteFile的第二個參數的句柄值為0x8C:
0a 00f6d5dc 77345ebc 756b90e3 00000080 0000008c ntdll!KiFastSystemCallRet
0b 00f6d5e0 756b90e3 00000080 0000008c 00000000 ntdll!ZwWriteFile+0xc
0c 00f6d644 7708121a 00000080 00f6e778 0000000b KERNELBASE!WriteFile+0xaa
有了這個句柄值,我隻要attach到onlyForWrite.exe然後檢視0x8c句柄對應的對象是否為0x86c76a08:
kd> .process 86ecb030
Implicit process is now 86ecb030
WARNING: .cache forcedecodeuser is not enabled
kd> !handle 0000008c
PROCESS 86ecb030 SessionId: 1 Cid: 0e50 Peb: 7ffdc000 ParentCid: 0df4
DirBase: 7eb7a3a0 ObjectTable: 98258ec8 HandleCount: 50.
Image: onlyForWrite.exe
Handle table at 98258ec8 with 50 entries in use
008c:<-左邊是句柄值,右邊是核心對象-> Object: 86c76a08 GrantedAccess: 001f0003 Entry: a5320118
Object: 86c76a08 Type: (84feb378) Event
ObjectHeader: 86c769f0 (new version)
HandleCount: 1 PointerCount: 0