Linux核心Crash分析Linux核心Crash分析

在工作中經常會遇到一些核心crash的情況，本文就是根據核心出現crash後的列印資訊，對其進行了分析，使用的核心版本為：linux2.6.32。

每一個程序的生命周期内，其生命周期的範圍為幾毫秒到幾個月。一般都是和核心有互動，例如使用者空間程式使用系統調用進入核心空間。這時使用的不再是使用者空間的棧空間，使用對應的核心棧空間。對每一個程序來說，linux核心都會把兩個不同的資料結構緊湊的存放在一個單獨為程序配置設定的存儲空間中：一個是核心态的程序堆棧，另一個是緊挨程序描述符的資料結構thread_info,叫線程描述符。核心的堆棧大小一般為8kb，也就是8192個位元組，占用兩個頁。在linux-2.6.32核心中thread_info.h檔案中有對核心堆棧的定義：

#define thread_size 8192

在linux核心中使用下面的聯合結構體表示一個程序的線程描述符和核心棧，在核心中檔案include/linux/sched.h。

union thread_union {

struct thread_info thread_info;

unsigned long stack[thread_size/sizeof(long)];

};

該結構是一個聯合體，我們在c語言書上看到過關于union的解釋，在在c programming language 一書中對于聯合體是這麼描述的：

1) 聯合體是一個結構；

2) 它的所有成員相對于基位址的偏移量都為0；

3) 此結構空間要大到足夠容納最"寬"的成員；

4) 其對齊方式要适合其中所有的成員；

通過上面的描述可知，thread_union結構體的大小為8192個位元組。也就是stack數組的大小，類型是unsigned long類型。由于聯合體中的成員變量都是占用同一塊記憶體區域，是以，在平時寫代碼時總有一個概念，對一個聯合體的執行個體隻能使用其中一個成員變量，否則會把原先變量給覆寫掉，這句話如果正确的話，必須要有一個前提假設，成員占用的位元組數相同，當成員所占的位元組數不同時，隻會覆寫相應的位元組。對于thread_union聯合體，我們是可以同時通路這兩個成員，隻要能夠正确擷取到兩個成員變量的位址。

在核心中的某一個程序使用了過多的棧空間時，核心棧就會溢出到thread_info部分，這将導緻嚴重的問題（系統重新開機），例如，遞歸調用的層次太深；在函數内定義的資料結構太大。

圖：程序中thread_info task_struct和核心棧中的關系

下面我們看一下thread_info的結構體：

struct thread_info {

unsigned long flags; /* 底層标志，*/

int preempt_count; /* 0 => 可搶占, <0 => bug */

mm_segment_t addr_limit; /* 程序位址空間 */

struct task_struct *task; /*目前程序的task_struct指針 */

struct exec_domain *exec_domain; /*執行區間 */

__u32 cpu; /* 目前cpu */

__u32 cpu_domain; /* cpu domain */

struct cpu_context_save cpu_context; /* cpu context */

__u32 syscall; /* syscall number */

__u8 used_cp[16]; /* thread used copro */

unsigned long tp_value;

struct crunch_state crunchstate;

union fp_state fpstate __attribute__((aligned(8)));

union vfp_state vfpstate;

#ifdef config_arm_thumbee

unsigned long thumbee_state; /* thumbee handler base register */

#endif

struct restart_block restart_block; /*用于實作信号機制*/

ps：（1）flag 用于儲存各種特定的程序标志，最重要的兩個是：tif_sigpending，如果程序有待處理的信号就置位，tif_need_resched表示程序應該需要排程器選擇另一個程序替換本程序執行。

結合上面的知識，看下當核心列印堆棧資訊時，都列印了上面資訊。下面的列印資訊是工作中遇到的一種情況，列印了核心的堆棧資訊，pc指針在dev_get_by_flags中，不能通路的核心虛位址為45685516，核心中一般可通路的位址都是以0xcxxxxxxx開頭的位址。

unable to handle kernel paging request at virtual address 45685516

pgd = c65a4000

[45685516] *pgd=00000000

internal error: oops: 1 [#1]

last sysfs file: /sys/devices/form/tpm/cfg_l3/l3_rule_add

modules linked in: splic mmp(p)

cpu: 0 tainted: p (2.6.32.11 #42)

pc is at dev_get_by_flags+0xfc/0x140

lr is at dev_get_by_flags+0xe8/0x140

pc : [<c06bee24>] lr : [<c06bee10>] psr: 20000013

sp : c07e9c28 ip : 00000000 fp : c07e9c64

r10: c6bcc560 r9 : c646a220 r8 : c66a0000

r7 : c6a00000 r6 : c0204e56 r5 : 30687461 r4 : 45685516

r3 : 00000000 r2 : 00000010 r1 : c0204e56 r0 : ffffffff

flags: nzcv irqs on fiqs on mode svc_32 isa arm segment kernel

control: 0005397f table: 065a4000 dac: 00000017

process swapper (pid: 0, stack limit = 0xc07e8270)

stack: (0xc07e9c28 to 0xc07ea000)

9c20: c0204e56 c6a00000 45685516 c69ffff0 c69ffff0 c69ffff0

9c40: c6a00000 30687461 c66a0000 c6a00000 00000007 c64b210c c07e9d24 c07e9c68

9c60: c071f764 c06bed38 c66a0000 c66a0000 c6a00000 c6a00000 c66a0000 c6a00000

9c80: c07e9cfc c07e9c90 c03350d4 c0334b2c 00000034 00000006 00000100 c64b2104

9ca0: 0000c4fb c0243ece c66a0000 c0beed04 c033436c c646a220 c07e9cf4 00000000

9cc0: c66a0000 00000003 c0bee8e8 c0beed04 c07e9d24 c07e9ce0 c06e4f5c 00004c68

9ce0: 00000000 faa9fea9 faa9fea9 00000000 00000000 c6bcc560 c0335138 c646a220

9d00: c66a0000 c64b2104 c085ffbc c66a0000 c0bee8e8 00000000 c07e9d54 c07e9d28

9d20: c071f9a0 c071ebc0 00000000 c071ebb0 80000000 00000007 c67fb460 c646a220

9d40: c0bee8c8 00000608 c07e9d94 c07e9d58 c002a100 c071f84c c0029bb8 80000000

9d60: c07e9d84 c0beee0c c0335138 c66a0000 c646a220 00000000 c4959800 c4959800

9d80: c67fb460 00000000 c07e9dc4 c07e9d98 c078f0f4 c0029bc8 00000000 c0029bb8

9da0: 80000000 c07e9dbc c6b8d340 c66a0520 00000000 c646a220 c07e9dec c07e9dc8

9dc0: c078f450 c078effc 00000000 c67fb460 c6b8d340 00000000 c67fb460 c64b20f2

9de0: c07e9e24 c07e9df0 c078fb60 c078f130 00000000 c078f120 80000000 c0029a94

9e00: 00000806 c6b8d340 c0bee818 00000001 00000000 c4959800 c07e9e64 c07e9e28

9e20: c002a030 c078f804 c64b2070 00000000 c64b2078 ffc45000 c64b20c2 c085c2dc

9e40: 00000000 c085c2c0 00000000 c0817398 00086c2e c085c2c4 c07e9e9c c07e9e68

9e60: c06c2684 c0029bc8 00000001 00000040 00000000 c085c2dc c085c2c0 00000001

9e80: 0000012c 00000040 c085c2d0 c0bee818 c07e9ed4 c07e9ea0 c00284e0 c06c2608

9ea0: bf00da5c 00086c30 00000000 00000001 c097e7d4 c07e8000 00000100 c08162d8

9ec0: 00000002 c097e7a0 c07e9f14 c07e9ed8 c00283d0 c0028478 56251311 00023c88

9ee0: c07e9f0c 00000003 c08187ac 00000018 00000000 01000000 c07ebc70 00023cbc

9f00: 56251311 00023c88 c07e9f24 c07e9f18 c03391e8 c0028348 c07e9f3c c07e9f28

9f20: c0028070 c03391b0 ffffffff 0000001f c07e9f94 c07e9f40 c002d4d0 c0028010

9f40: 00000000 00000001 c07e9f88 60000013 c07e8000 c07ebc78 c0868784 c07ebc70

9f60: 00023cbc 56251311 00023c88 c07e9f94 c07e9f98 c07e9f88 c025c3e4 c025c3f4

9f80: 60000013 ffffffff c07e9fb4 c07e9f98 c025c578 c025c3cc 00000000 c0981204

9fa0: c0025ca0 c0d01140 c07e9fc4 c07e9fb8 c0032094 c025c528 c07e9ff4 c07e9fc8

9fc0: c0008918 c0032048 c0008388 00000000 00000000 c0025ca0 00000000 00053975

9fe0: c0868834 c00260a4 00000000 c07e9ff8 00008034 c0008708 00000000 00000000

backtrace:

[<c06bed28>] (dev_get_by_flags+0x0/0x140) from [<c071f764>] (arp_process+0xbb4/0xc74)

r7:c64b210c r6:00000007 r5:c6a00000 r4:c66a0000

（1）首先，看看這段堆棧資訊是在核心中那個檔案中列印出來的，在fault.c檔案中，__do_kernel_fault函數，在上面的列印中unable to handle kernel paging request at virtual address 45685516，該位址是核心空間不可通路的位址。

static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr, struct pt_regs *regs)

{

* are we prepared to handle this kernel fault?

if (fixup_exception(regs))

return;

* no handler, we'll have to terminate things with extreme prejudice.

bust_spinlocks(1);

printk(kern_alert

"unable to handle kernel %s at virtual address %08lx\n",

(addr < page_size) ? "null pointer dereference" :"paging request", addr);

show_pte(mm, addr);

die("oops", regs, fsr);

bust_spinlocks(0);

do_exit(sigkill);

}

（2）對于下面的兩個資訊，在函數show_pte中進行了列印，下面的列印涉及到了頁全局目錄，頁表的知識，暫時先不分析，後續補上。

void show_pte(struct mm_struct *mm, unsigned long addr)

pgd_t *pgd;

if (!mm)

mm = &init_mm;

printk(kern_alert "pgd = %p\n", mm->pgd);

pgd = pgd_offset(mm, addr);

printk(kern_alert "[%08lx] *pgd=%08lx", addr, pgd_val(*pgd));

……………………

(3) die函數中調用在die函數中取得thread_info結構體的位址。

struct thread_info *thread = current_thread_info();

static inline struct thread_info *current_thread_info(void){

return (struct thread_info *)(sp & ~(thread_size - 1));

sp: 0xc07e9c28 通過current_thread_info得到 thread_info的位址

(0xc07e9c28 & 0xffffe000) = 0xc07e8000（thread_info的位址，也就是棧底的位址）

(4)下面的列印資訊在__die函數中列印

last sysfs file: /sys/devices/form/tpm/cfg_l2/l2_rule_add

r7 : c6a00000 r6 : c0204e56 r5 : 30687461 r4 : 30687461

函數的調用關系：die("oops", regs, fsr);---à __die(str, err, thread, regs);

下面是__die函數的定義：

static void __die(const char *str, int err, struct thread_info *thread, struct pt_regs *regs){

struct task_struct *tsk = thread->task;

static int die_counter;

/*internal error: oops: 1 [#1]*/

printk(kern_emerg "internal error: %s: %x [#%d]" s_preempt s_smp "\n",

str, err, ++die_counter);

/*last sysfs file: /sys/devices/form/tpm/cfg_l2/l2_rule_add*/

sysfs_printk_last_file();

/*核心中加載的子產品資訊modules linked in: splic mmp(p) */

print_modules();

/*列印寄存器資訊*/

__show_regs(regs);

/*process swapper (pid: 0, stack limit = 0xc07e8270) tsk->comm task_struct結構體中的comm表示的是除去路徑後的可執行檔案名稱，這裡的swapper為idle程序，程序号為0，建立核心程序init；其中stack limit = 0xc07e8270 指向thread_info的結束位址。*/

printk(kern_emerg "process %.*s (pid: %d, stack limit = 0x%p)\n",

task_comm_len, tsk->comm, task_pid_nr(tsk), thread + 1);

/* dump_mem 函數列印從棧頂到目前sp之間的内容*/

if (!user_mode(regs) || in_interrupt()) {

dump_mem(kern_emerg, "stack: ", regs->arm_sp, thread_size + (unsigned long)task_stack_page(tsk));

dump_backtrace(regs, tsk);

dump_instr(kern_emerg, regs);

在上面的函數中，主要使用了thread_info,task_struct,sp之間的指向關系。task_struct結構體的成員stack是棧底，也是對應thread_info結構體的位址。堆棧資料是從棧底+8k的地方開始向下存的。sp指向的是目前的棧頂。(unsigned long)task_stack_page(tsk)，

#define task_stack_page(task) ((task)->stack) ，該宏根據task_struct得到棧底，也就是thread_info位址。

#define task_thread_info(task) ((struct thread_info *)(task)->stack)，該宏根據task_struct得到thread_info指針。

（5）dump_backtrace函數

該函數用于列印函數的調用關系。fp為幀指針，用于追溯程式的方式，方向跟蹤調用函數。該函數主要是fp進行檢查，看看能否進行backtrace，如果可以就調用彙編的c_backtrace，在arch/arm/lib/backtrace.s函數中。

static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)

unsigned int fp, mode;

int ok = 1;

printk("backtrace: ");

if (!tsk)

tsk = current;

if (regs) {

fp = regs->arm_fp;

mode = processor_mode(regs);

} else if (tsk != current) {

fp = thread_saved_fp(tsk);

mode = 0x10;

} else {

asm("mov %0, fp" : "=r" (fp) : : "cc");

if (!fp) {

printk("no frame pointer");

ok = 0;

} else if (verify_stack(fp)) {

printk("invalid frame pointer 0x%08x", fp);

} else if (fp < (unsigned long)end_of_stack(tsk))

printk("frame pointer underflow");

printk("\n");

if (ok)

c_backtrace(fp, mode);

（6）dump_instr

根據pc指針和指令mode, 列印出目前執行的指令碼

code: 0a000008 e5944000 e2545000 0a000005 (e4153010)

核心中函數的調用關系

原文釋出時間：2014-07-28

本文來自雲栖合作夥伴“linux中國”

Linux核心Crash分析Linux核心Crash分析

繼續閱讀

debian9更新4.9.0核心到4.19.2核心過程

centOS7 配置 vsftpd 虛拟使用者及權限Vsftpd配置虛拟使用者及權限

linux-svn解除安裝與安裝

vsftp虛拟多使用者多權限一鍵部署腳本

Ubuntu14.04 LTS下安裝mongodb

Linux網卡總結線速光子產品檢視網卡資訊檢視PCI資訊RSS（Receive Side Scaling）RPS（Receive Packet Steering）XPS（Transmit Packet Steering）FD（Flow Director）Rx/Tx Ring Buffer網卡多隊列

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三