M$ddk對調用KeWaitForSingleObject接口有下面約定:
Callers of KeWaitForSingleObject must be running at IRQL <= DISPATCH_LEVEL.
However, if Timeout = NULL or *Timeout != 0, the caller must be running at IRQL <= APC_LEVEL and in a nonarbitrary thread context.
(If Timeout != NULL and *Timeout = 0, the caller must be running at IRQL <= DISPATCH_LEVEL.)
翻譯過來就是以Timeout!=0調用KeWaitForSingleObject時,IRQL<DISPATCH_LEVEL,如果要在IRQL=DISPATCH_LEVEL運作級上調用KeWaitForSingleObject,必須保證Timeout=0.這段話短短幾行,但是有3個重要的資訊點,1.等待的逾時時間在不同irql上該怎麼設定;2.dpc過程中不能執行逾時時間為非零等待;3.當IRQL>=DISPATCH_LEVEL時,也不能執行非零等待。本文将結合自己對Reactos的了解,對這3點進行解釋。
1.IRQL>=DISPATCH_LEVEL時,逾時時間必須==0?
KeWaitForSingleObject是個等待-醒來-再等待的循環過程,每次醒來會判斷條件是否滿足,不滿足就繼續等待。其中有一項參數就是Timeout是否逾時。
NTSTATUS
NTAPI
KeWaitForSingleObject(IN PVOID Object,
IN KWAIT_REASON WaitReason,
IN KPROCESSOR_MODE WaitMode,
IN BOOLEAN Alertable,
IN PLARGE_INTEGER Timeout OPTIONAL)
{
for (;;)
{
if (Timeout)
{
/* Check if the timer expired */
InterruptTime.QuadPart = KeQueryInterruptTime();
if ((ULONGLONG)InterruptTime.QuadPart >=
Timer->DueTime.QuadPart)
{
/* It did, so we don't need to wait */
WaitStatus = STATUS_TIMEOUT;
goto DontWait;
}
/* It didn't, so activate it */
Timer->Header.Inserted = TRUE;
}
...
WaitStatus = KiSwapThread(Thread, KeGetCurrentPrcb());
WaitStart:
Thread->WaitIrql = KeRaiseIrqlToSynchLevel();
KxSingleThreadWait();
KiAcquireDispatcherLockAtDpcLevel();
} //end for(;;)
KiReleaseDispatcherLock(Thread->WaitIrql);
return WaitStatus;
DontWait:
KiReleaseDispatcherLockFromDpcLevel();
KiAdjustQuantumThread(Thread);
return WaitStatus;
}
代碼顯示,如果Timeout!=NULL,且隻有已逾時,就跳出for(;;)循環并從KeWaitForSingleObject函數中傳回;否則,可能進入KiSwapThread(Thread, KeGetCurrentPrcb());進而切換線程,實作睡眠等待。 從上面這段代碼摘要可以知道,調用KeWaitForSingleObject且Timeout!=NULL,會引起線程等待。為了使線程在DISPATCH_LEVEL上不被因睡眠而切換出去,隻能讓逾時值==0,使得KeWaitForSingleObject立刻傳回。這解釋了ddk文檔中關于Timeout的調用約定。
2.dpc過程中不能執行逾時時間為非零等待?
網上一種主流的說法是:線程運作在DISPATCH_LEVEL級别以下,在IRQL==DISPATCH_LEVEL時 線程被挂起,OS開始排程和切換線程,等到IRQL重新下降到DISPATCH_LEVEL以下時,被排程的線程才繼續運作。如果此時線程睡眠,會因為沒法切換回來而導緻BDOS。但是,這個說法有點牽強,首先,下降到DISPATCH_LEVEL級别一下是個很模糊的時機,是在下降沿切換還是下降完畢才切換?其次,難道線程通過執行RaiseIrql就被挂起了?更重要的,這句話容易引起誤解:認為winos跟linux一樣,存在專司線程排程的核心線程,該線程隻有在DISPATCH_LEVEL時才排程和切換線程。然而,winos中不存在固定的排程線程,取而代之的,線程排程遍地都是,隻要調用LowIrql/KiExitDispatcher都會引起線程排程(這是分布式排程的調調嗎?)。 另外,如果仔細看KiSwapThread/SwapContext的實作就可以知道,被切換的線程在SwapContext中就已經恢複執行,而,IRQL下降隻是提供線程切換的機會。是以,分析LowIrql(以及其他會降低cpu目前irql的操作)源碼就顯得很重要。
來看下LowIrql的代碼:
VOID
HalpLowerIrql(KIRQL NewIrql)
{
if (NewIrql >= PROFILE_LEVEL)
{
KeGetPcr()->Irql = NewIrql;
return;
}
...
if (NewIrql >= DISPATCH_LEVEL)
{
KeGetPcr()->Irql = NewIrql;
return;
}
KeGetPcr()->Irql = DISPATCH_LEVEL;
if (((PKIPCR)KeGetPcr())->HalReserved[HAL_DPC_REQUEST])
{
((PKIPCR)KeGetPcr())->HalReserved[HAL_DPC_REQUEST] = FALSE;
KiDispatchInterrupt();
}
KeGetPcr()->Irql = APC_LEVEL;
if (NewIrql == APC_LEVEL)
{
return;
}
if (KeGetCurrentThread() != NULL &&
KeGetCurrentThread()->ApcState.KernelApcPending)
{
KiDeliverApc(KernelMode, NULL, NULL);
}
KeGetPcr()->Irql = PASSIVE_LEVEL;
}
如前所述,當IRQL級别下降時,可能會引起線程切換,它會調用KiDispatchInterrupt()執行dpc過程和軟中斷請求:
.func KiDispatchInterrupt@0
_KiDispatchInterrupt@0:
/* Deliver DPCs */
mov ecx, [ebx+KPCR_PRCB]
call @KiRetireDpcList@4
...
/* Set APC_LEVEL and do the swap */
mov cl, APC_LEVEL
call @KiSwapContextInternal@0
/* Restore registers */
mov ebp, [esp+0]
mov edi, [esp+4]
mov esi, [esp+8]
add esp, 3*4
Return:
/* All done */
ret
...
.endfunc
call KiRetireDpcList周遊Prcb->DpcData隊列,出隊并執行每個dpc過程。而call KiSwapContextInternal則完成線程切換的功能,具體的源碼就不深入進去看了,可以參考毛德操的情景分析。值得一提的是,在KiSwapContextInternal裡,實實在在的存在判斷目前線程切換是不是在發生在dpc過程中:
.globl @KiSwapContextInternal@0
.func @KiSwapContextInternal@0, @KiSwapContextInternal@0
@KiSwapContextInternal@0:
...
/* DPC shouldn't be active */
cmp byte ptr [ebx+KPCR_PRCB_DPC_ROUTINE_ACTIVE], 0
jnz BugCheckDpc
彙編中KPCR_PRCB_DPC_ROUTINE_ACTIVE是prcb中的域,對應 Prcb->DpcRoutineActive。這裡判斷該域是否為0,非零就跳去藍屏。那這個域是什麼時候設定的?
正好在KiRetireDpcList準備調用執行Dpc過程中:
FASTCALL
KiRetireDpcList(IN PKPRCB Prcb)
{
...
DpcData = &Prcb->DpcData[DPC_NORMAL];
ListHead = &DpcData->DpcListHead;
/* Main outer loop */
do
{
/* Set us as active */
Prcb->DpcRoutineActive = TRUE;
...
DeferredRoutine(Dpc,
DeferredContext,
SystemArgument1,
SystemArgument2);
ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
...
Prcb->DpcRoutineActive = FALSE;
Prcb->DpcInterruptRequested = FALSE;
...
}
在出隊Dpc對象并執行Dpc過程前後分别設定Prcb->DpcRoutineActive。這很好的解釋了dpc過程中不能執行逾時時間為非零等待:一旦執行等待,就會引起切換。一旦進入SwapContextInternal就會因為Prcb->DpcRoutineActive的緣故,引起藍屏。
3.上面隻解釋了皮毛,為什麼不能在Dispatcher_Level執行等待還是沒有解釋:
要解釋這個,先得去看下我轉載的文章:從IRQ到IRQL(PIC版),知道硬體上高IRQL怎麼屏蔽低IRQL的執行。然後回過來繼續往下看。不過,還得繼續看KfLowerIrql。
.func @KfLowerIrql@4
_@KfLowerIrql@4:
@KfLowerIrql@4:
/* Save flags since we'll disable interrupts */
pushf
/* Validate IRQL */
movzx ecx, cl
#if DBG
cmp cl, PCR[KPCR_IRQL]
ja InvalidIrql
#endif
/* Disable interrupts and check if IRQL is below DISPATCH_LEVEL */
cmp dword ptr PCR[KPCR_IRQL], DISPATCH_LEVEL
cli
jbe SkipMask
/* Clear interrupt masks since there's a pending hardware interrupt */
mov eax, KiI8259MaskTable[ecx*4]
or eax, PCR[KPCR_IDR]
out 0x21, al
shr eax, 8
out 0xA1, al
SkipMask:
/* Set the new IRQL and check if there's a pending software interrupt */
mov PCR[KPCR_IRQL], ecx
mov eax, PCR[KPCR_IRR]
mov al, SoftIntByteTable[eax]
cmp al, cl
ja DoCall3
/* Restore interrupts and return */
popf
ret
調用LowerIrql時,由于目前IRQL==DISPATCH_LEVEL,是以進入SkipMask。IRR寄存器會儲存在執行軟中斷時是否有新的軟中斷産生。由于目前IRQL為DISPATCH_LEVEL,從SoftIntByteTabele數組中傳回的值都不會大于這個值,是以,Lowerirql沒有做線程切換的工作就傳回了,失去一次線程排程的機會。
SoftIntByteTable:
.byte PASSIVE_LEVEL /* IRR 0 */
.byte PASSIVE_LEVEL /* IRR 1 */
.byte APC_LEVEL /* IRR 2 */
.byte APC_LEVEL /* IRR 3 */
.byte DISPATCH_LEVEL /* IRR 4 */
.byte DISPATCH_LEVEL /* IRR 5 */
.byte DISPATCH_LEVEL /* IRR 6 */
.byte DISPATCH_LEVEL /* IRR 7 */
NTSTATUS
FASTCALL
KiSwapThread(IN PKTHREAD CurrentThread,
IN PKPRCB Prcb)
{
....
WaitIrql = CurrentThread->WaitIrql;
...
ApcState = KiSwapContext(CurrentThread, NextThread);
...
if (ApcState)
{
/* Lower to APC_LEVEL */
KeLowerIrql(APC_LEVEL);
/* Deliver APCs */
KiDeliverApc(KernelMode, NULL, NULL);
ASSERT(WaitIrql == 0);
}
KeLowerIrql(WaitIrql);
}