linux關機suspending,（轉）Linux 休眠原理與實作

1.介紹Linux休眠提供了一種類似于Windows的休眠方式，使使用者能夠通過休眠操作，儲存系統目前的記憶體資料到硬碟，即s w a p分區中。當計算機重新啟動後，系統重新裝載儲存的記憶體資料，包括程序資料，寄存器數值等，并恢複到關機前的狀态。由于不需要重新裝載文檔，應用程式也不用重新打開，是以休眠啟動方式要比正常的啟動過程快得多。

2.Linux休眠原理要實作作業系統的休眠，首先要了解linux的記憶體管理機制。标準L i n u x的分頁是三級頁表結構：頁目錄、中間頁目錄和頁。i 3 8 6采用的是兩級頁表結構：頁目錄和頁，不支援中間頁目錄。4 G的線性位址空間，隻有一個頁目錄，它最多有1024個目錄項，每個目錄項又含有1024個頁面項，每個頁面有4 K位元組。分頁機制通過把線性位址空間中的頁，重新定位到實體位址空間來進行管理，因為每個頁面的整個4K位元組作為一個機關進行映射，并且每個頁面都對齊4K位元組的邊界，是以，線性位址的低12位經過分頁機制直接地作為實體位址的低1 2位使用。下圖所示是x86下線性位址映射為實體位址的過程：休眠過程可以分為兩個階段,一是SUSPEND階段,二是R E S U M E階段, R E S U M E過程是S U S P E N D的逆過程。S U S P E N D階段儲存程序資料到硬碟中,并關機;RESUME階段,從硬碟中讀取儲存的程序資料,并恢複到關機前的原始狀态。休眠需要解決的問題中,最重要的部分是記憶體資料的儲存和如何恢複儲存的記憶體資料。我們可以很容易擷取記憶體頁面資料,SUSPEND的過程中,主要任務就是要儲存這些需要儲存的頁面,但是,作為存儲頁面位址的頁表也需要儲存下來，因為頁表僅僅是一個中間轉換作用的連結清單,是以，可以在S U S P E N D的過程中,臨時建立,然後将記憶體頁面位址記錄在頁表中。RESUME的階段,将儲存的頁面和頁表寫到記憶體頁中，完成後,隻要重新修改頁目錄資料,就完成記憶體資料還原動作了。經過以上分析，可以得到休眠的大體原理圖，如下所示：如圖所示,實作S U S P E N D需要完成三個主要步驟：當機系統中的活動程序,準備儲存記憶體資料,寫記憶體資料到硬碟。當機活動程序：包括三類主要的活動源，即，使用者空間程序和核心線程，裝置驅動和活動的計時器；準備儲存資料：計算需要儲存的記憶體頁數，配置設定記憶體以儲存程序資料，複制程序資料到配置設定的記憶體中；儲存資料到硬碟：寫需要儲存的記憶體頁到硬碟中。RESUME是SUSPEND的逆過程，要完成配置設定記憶體以讀取硬碟中的程序資料，讀取硬碟資料，重新映射頁表位址，更新段描述符表等。

3 Linux軟體休眠實作休眠以子產品方式實作，使用者可以根據自己的需要選擇是否裝載此子產品。但是，因為休眠在R E S U M E的過程中，需要恢複關機前的記憶體資料，以及c p u狀态等，是以，此子產品的裝載應該通過ramdisk的init自動裝載，并且要在mount root檔案系統之前。

3.1 SUSPEND階段3.1.1當機活動程序程序執行時，它會根據具體情況改變狀态。Linux中的程序狀态主要有以下幾種：T A S K _ R U N N I N G可運作T

A S K _ I N T E R R U P T I B L E可中斷的等待狀态T A S K _ U N I N T E

R R U P T I B L E不可中斷的等待狀态T A S K _ Z O M B I E僵死T A S K _ S T O P P E D暫停T A S K _ S W A P P I N G換入/換出作業系統在運作過程中，一般有十幾個，甚至幾十個程序在運作。S U S P E N D程序獲得執行的資源而執行，即目前程序(current)，是不能被當機和中止執行，否則後續的操作會得不到完全執行；另外，程序标志為P F _ N O F R E E E Z

E和P F _ F R O Z E N的；以及程序狀态為T A S K _ Z O M B I E、T A S K _ D E A D、T A S K _ S T O P P E

D，這些程序是不能當機的或者不需要當機的。除此之外，其餘的程序需要當機，也就是改變程序标志為P F _ F R E E Z E。程序标志改為P F _ F R E E Z E後，相應的程序會因為獲不到資源，進而處于靜止狀态。3.1.2準備儲存資料檢測所有記憶體頁，如果頁面辨別不是PG_reserved，則需要儲存的頁面數加1。記憶體檢測完成後，得到需要儲存的頁面數目，即nr_copy_pages。for (pfn = 0; pfn < max_pfn; pfn++){page =

pfn_to_page(pfn);if (!PageReserved(page)){….nr_copy_pages ++….}…由nr_copy_pages數目，得到記憶體中對應數目的空閑頁面作為頁表目錄數，同時配置設定nr_copy_pages個空閑頁，頁位址由頁表目錄記錄管理。除了程序資料外，目前寄存器的資料，包括描述符表，段寄存器，控制寄存器，以及通用寄存器的值，都作為全局變量儲存下來。複制需要儲存的記憶體頁面到新配置設定的空閑頁中。for (pfn = 0; pfn

< max_pfn; pfn++) {….if (pagedir_p) {pagedir_p->orig_address

=ADDRESS(pfn);copy_page((void *) pagedir_p->address,(void *) pagedir_p->orig_address);pagedir_p++;}….}3.1.3儲存資料到swap分區

摘要:休眠操作通過儲存目前系統程序資料和cpu狀态資料到硬碟中，當系統斷電并重新啟動後，又自動讀取儲存的資料并恢複到原始系統狀态，如此大大減少了系統的啟動時間。記憶體管理，程序管理和swap操作等方面是休眠實作的主要涉及範圍，是以對于深入了解linux作業系統有所幫助。

關鍵詞:Linux;核心;休眠; swap__

Freezing of tasks

I. What is the freezing of tasks?

The freezing of tasks is a mechanism by

which user space processes and some

kernel threads are controlled during hibernation or system-wide suspend (on

some

architectures).

II. How does it work?

There are four per-task flags used for

that, PF_NOFREEZE, PF_FROZEN, TIF_FREEZE

and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have

PF_NOFREEZE unset (all user space processes and some kernel threads) are

regarded as 'freezable' and treated in a special way before the system enters a

suspend state as well as before a hibernation image is created (in what follows

we only consider hibernation, but the description also applies to suspend).

Namely, as the first step of the

hibernation procedure the function

freeze_processes() (defined in kernel/power/process.c) is called. It

executes

try_to_freeze_tasks() that sets TIF_FREEZE for all of the freezable tasks and

either wakes them up, if they are kernel threads, or sends fake signals to

them,

if they are user space processes. A task that has TIF_FREEZE set, should

react

to it by calling the function called refrigerator() (defined in

kernel/power/process.c), which sets the task's PF_FROZEN flag, changes its

state

to TASK_UNINTERRUPTIBLE and makes it loop until PF_FROZEN is cleared for it.

Then, we say that the task is 'frozen' and therefore the set of functions

handling this mechanism is referred to as 'the freezer' (these functions are

defined in kernel/power/process.c and include/linux/freezer.h). User

space

processes are generally frozen before kernel threads.

It is not recommended to call

refrigerator() directly. Instead, it is

recommended to use the try_to_freeze() function (defined in

include/linux/freezer.h), that checks the task's TIF_FREEZE flag and makes the

task enter refrigerator() if the flag is set.

For user space processes try_to_freeze()

is called automatically from the

signal-handling code, but the freezable kernel threads need to call it

explicitly in suitable places or use the wait_event_freezable() or

wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)

that combine interruptible sleep with checking if TIF_FREEZE is set and calling

try_to_freeze(). The main loop of a freezable kernel thread may look like

the

following one:

set_freezable();

do {

hub_events();

wait_event_freezable(khubd_wait,

!list_empty(&hub_event_list) ||

kthread_should_stop());

} while (!kthread_should_stop() || !list_empty(&hub_event_list));

(from

drivers/usb/core/hub.c::hub_thread()).

If a freezable kernel thread fails to call

try_to_freeze() after the freezer has

set TIF_FREEZE for it, the freezing of tasks will fail and the entire

hibernation operation will be cancelled. For this reason, freezable

kernel

threads must call try_to_freeze() somewhere or use one of the

wait_event_freezable() and wait_event_freezable_timeout() macros.

After the system memory state has been

restored from a hibernation image and

devices have been reinitialized, the function thaw_processes() is called in

order to clear the PF_FROZEN flag for each frozen task. Then, the tasks

that

have been frozen leave refrigerator() and continue running.

III. Which kernel threads are freezable?

Kernel threads are not freezable by

default. However, a kernel thread may clear

PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE

directly is strongly discouraged). From this point it is regarded as

freezable

and must call try_to_freeze() in a suitable place.

IV. Why do we do that?

Generally speaking, there is a couple of

reasons to use the freezing of tasks:

1. The principal reason is to prevent

filesystems from being damaged after

hibernation. At the moment we have no simple means of checkpointing

filesystems, so if there are any modifications made to filesystem data and/or

metadata on disks, we cannot bring them back to the state from before the

modifications. At the same time each hibernation image contains some

filesystem-related information that must be consistent with the state of the

on-disk data and metadata after the system memory state has been restored from

the image (otherwise the filesystems will be damaged in a nasty way, usually

making them almost impossible to repair). We therefore freeze tasks that

might

cause the on-disk filesystems' data and metadata to be modified after the

hibernation image has been created and before the system is finally powered

off.

The majority of these are user space processes, but if any of the kernel

threads

may cause something like this to happen, they have to be freezable.

2. Next, to create the hibernation image

we need to free a sufficient amount of

memory (approximately 50% of available RAM) and we need to do that before

devices are deactivated, because we generally need them for swapping out.

Then,

after the memory for the image has been freed, we don't want tasks to allocate

additional memory and we prevent them from doing that by freezing them earlier.

[Of course, this also means that device drivers should not allocate substantial

amounts of memory from their .suspend() callbacks before hibernation, but this

is e separate issue.]

3. The third reason is to prevent user

space processes and some kernel threads

from interfering with the suspending and resuming of devices. A user

space

process running on a second CPU while we are suspending devices may, for

example, be troublesome and without the freezing of tasks we would need some

safeguards against race conditions that might occur in such a case.

Although Linus Torvalds doesn't like the

freezing of tasks, he said this in one

of the discussions on LKML ():

"RJW:> Why we freeze tasks at all

or why we freeze kernel threads?

Linus: In many ways, 'at all'.

I _do_ realize the IO request queue

issues, and that we cannot actually do

s2ram with some devices in the middle of a DMA. So we want to be able to

avoid *that*, there's no question about that. And I suspect that stopping

user threads and then waiting for a sync is practically one of the easier

ways to do so.

So in practice, the 'at all' may become a

'why freeze kernel threads?' and

freezing user threads I don't find really objectionable."

Still, there are kernel threads that may

want to be freezable. For example, if

a kernel that belongs to a device driver accesses the device directly, it in

principle needs to know when the device is suspended, so that it doesn't try to

access it at that time. However, if the kernel thread is freezable, it

will be

frozen before the driver's .suspend() callback is executed and it will be

thawed after the driver's .resume() callback has run, so it won't be accessing

the device while it's suspended.

4. Another reason for freezing tasks is to

prevent user space processes from

realizing that hibernation (or suspend) operation takes place. Ideally,

user

space processes should not notice that such a system-wide operation has

occurred

and should continue running without any problems after the restore (or resume

from suspend). Unfortunately, in the most general case this is quite

difficult

to achieve without the freezing of tasks. Consider, for example, a

process

that depends on all CPUs being online while it's running. Since we need

disable nonboot CPUs during the hibernation, if this process is not frozen, it

may notice that the number of CPUs has changed and may start to work

incorrectly

because of that.

V. Are there any problems related to the

freezing of tasks?

Yes, there are.

First of all, the freezing of kernel

threads may be tricky if they depend one

on another. For example, if kernel thread A waits for a completion (in

the

TASK_UNINTERRUPTIBLE state) that needs to be done by freezable kernel thread B

and B is frozen in the meantime, then A will be blocked until B is thawed,

which

may be undesirable. That's why kernel threads are not freezable by

default.

Second, there are the following two

problems related to the freezing of user

space processes:

1. Putting processes into an uninterruptible sleep distorts the load average.

2. Now that we have FUSE, plus the framework for doing device drivers in

userspace, it gets even more complicated because some userspace processes are

now doing the sorts of things that kernel threads do

().

The problem 1. seems to be fixable,

although it hasn't been fixed so far. The

other one is more serious, but it seems that we can work around it by using

hibernation (and suspend) notifiers (in that case, though, we won't be able to

avoid the realization by the user space processes that the hibernation is

taking

place).

There are also problems that the freezing

of tasks tends to expose, although

they are not directly related to it. For example, if request_firmware()

called from a device driver's .resume() routine, it will timeout and eventually

fail, because the user land process that should respond to the request is

frozen

at this point. So, seemingly, the failure is due to the freezing of

tasks.

Suppose, however, that the firmware file is located on a filesystem accessible

only through another device that hasn't been resumed yet. In that case,

request_firmware() will fail regardless of whether or not the freezing of tasks

is used. Consequently, the problem is not really related to the freezing

tasks, since it generally exists anyway.

A driver must have all firmwares it may

need in RAM before suspend() is called.

If keeping them is not practical, for example due to their size, they must be

requested early enough using the suspend notifier API described in notifiers.txt.