相信寫c/c++程式的coder, segmentation fault的問題碰到不少，趁最近有時間總結一下分析此類錯誤的方法。

1. 段錯誤是什麼

一句話來說，段錯誤是指通路的記憶體超出了系統給這個程式所設定的記憶體空間，例如通路了不存在的記憶體位址、通路了系統保護的記憶體位址、通路了隻讀的記憶體位址等等情況。這裡貼一個對于“段錯誤”的準确定義（https://en.wikipedia.org/wiki/Segmentation_fault）：

In computing, a segmentation fault (often shortened to segfault) or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory (a memory access violation). On standard x86 computers, this is a form of general protection fault. The OS kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own,[1] but otherwise the OS default signal handler is used, generally causing abnormal termination of the process (a program crash), and sometimes a core dump.

2. 段錯誤産生的原因

通路不存在的記憶體位址
通路系統保護的記憶體位址
通路隻讀的記憶體位址
棧溢出

3. 分析段錯誤的方法

Item 1: log大法

最簡單粗暴的方法，也确實很有效，但有時log也看不出什麼。

為了友善使用這種方法，可以使用條件編譯指令#ifdef DEBUG和#endif把printf函數包起來。這樣在程式編譯時，如果加上-DDEBUG參數就能檢視調試資訊；否則不加該參數就不會顯示調試資訊。

Item 2: 自定義segv handler和添加backtrace()

示例代碼

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <execinfo.h>
#include <signal.h>

int *result = 0;

void add(int a, int b)
{
    *result = a + b;
}

void subtract(int a, int b)
{
    *result = a - b;
}

void handler(int sig) {
  void *array[10];
  size_t size;

  // get void*'s for all entries on the stack
  size = backtrace(array, 10);

  // print out all the frames to stderr
  fprintf(stderr, "Error: signal %d:\n", sig);
  backtrace_symbols_fd(array, size, STDERR_FILENO);
  exit(1);
}

int main()
{
    signal(SIGSEGV, handler);   // install our handler

    int ret;
    int pagesize;

    // 擷取作業系統一個頁的大小, 一般是 4KB == 4096
    pagesize = sysconf(_SC_PAGE_SIZE);
    printf("pagesize is: %d Byte\n", pagesize);
    if (pagesize == -1) {
        perror("sysconf");
        return -1;
    }

    // 按頁對齊來申請一頁記憶體, result會是一個可以被頁(0x1000 == 4096)整除的位址
    ret = posix_memalign((void**)&result, pagesize, pagesize);
    printf("posix_memalign mem %p\n", result);
    if (ret != 0) {
        // posix_memalign 傳回失敗不會設定系統的errno, 不能用perror輸出錯誤
        printf("posix_memalign fail, ret %u\n", ret);
        return -1;
    }

    add(1, 1); // 結果寫入 *result
    printf("the result is %d\n", *result);

    // 保護result指向的記憶體, 權限設為隻讀
    ret = mprotect(result, pagesize, PROT_READ);
    if (ret == -1) {
        perror("mprotect");
        return -1;
    }

    subtract(1, 1); // 結果寫入 *result, 但是 *result 隻讀, 引發segment fault
    printf("the result is %d\n", *result);

    free(result);
    return 0;
}

編譯，需加-g選項

g++ -g  -o mproject_test mproject_test.cc

運作指令

./mproject_test 2>&1 |cut -d '[' -f 2|grep -o '0x[0-9a-z].*' | xargs addr2line -Cfe mproject_test

運作結果

handler(int)
??:0
??
??:0
subtract(int, int)
??:0
main
??:0
??
??:0
_start
??:0
??
??:0

可以看到在函數subtract出崩潰了。

由于捕獲了segv信号，是以不會産生core檔案，也不會有dmesg記錄，其中addr2line可以将出錯的位址轉換為對應的函數和代碼位址（在ubuntu上始終沒看到代碼的行數，在centos上可以）。

Item 3: dmesg + objdump

注釋掉C代碼中的signal(SIGSEGV, handler);，程式中則不會處理SIGSEGV信号，執行時會在dmesg中留下記錄，此時可以用objdump -d解析出彙編代碼，找到發生crash時的位址（注意不要用-O優化，否則編譯器優化了彙編）。

dmesg檢視段錯誤資訊

dmesg | tail
[257215.924911] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[257526.392613] e1000: eth0 NIC Link is Down
[257528.397505] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[257669.623324] mproject_test[29542]: segfault at 180c000 ip 000000000040083c sp 00007fffcb1018e0 error 7 in mproject_test[400000+1000]

出錯的位址是000000000040083c

使用objdump反彙編

objdump -d mproject_test > mproject_test.dump

在反彙編檔案中分析

$ vi mproject_test.dump
0000000000400825 <_Z8subtractii>:
  400825:       55                      push   %rbp
  400826:       48 89 e5                mov    %rsp,%rbp
  400829:       89 7d fc                mov    %edi,-0x4(%rbp)
  40082c:       89 75 f8                mov    %esi,-0x8(%rbp)
  40082f:       48 8b 05 42 08 20 00    mov    0x200842(%rip),%rax        # 601078 <result>
  400836:       8b 55 fc                mov    -0x4(%rbp),%edx
  400839:       2b 55 f8                sub    -0x8(%rbp),%edx
  40083f:       c3                      retq

可以看到出錯的地方在subtract函數

Item4: 使用catchsegv

catchsegv指令專門用來捕獲段錯誤，它通過動态加載器（ld-linux.so）的預加載機制（PRELOAD）把一個事先寫好的庫（/lib/libSegFault.so）加載上，用于捕捉斷錯誤的出錯資訊。

$ catchsegv ./mproject_test

Backtrace:
??:0(_Z8subtractii)[0x40083c]
??:0(main)[0x4009ab]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7ff4721747ed]
??:0(_start)[0x400719]

Item 5: gdb + core

這種方式也很常用，找到發生segv或異常的地方，然後bt，就能發現引起crash的codepath。這種方法需要core檔案足夠大，在這裡就不說了

參考文獻

https://www.cnblogs.com/panfeng412/category/367117.html
https://zhuanlan.zhihu.com/p/37571803

如何解決c++程式中的segmentation fault

1. 段錯誤是什麼

2. 段錯誤産生的原因

3. 分析段錯誤的方法

Item 1: log大法

Item 2: 自定義segv handler和添加backtrace()

Item 3: dmesg + objdump

Item4: 使用catchsegv

Item 5: gdb + core

參考文獻

繼續閱讀

OkHttp深入學習（四）——0kiookio的資料存儲實體okio的輸入輸出流okio的執行個體okio的原理介紹

根據公式計算y的值。其中∑表示求各項的和，∏表示求各項的積。定義一個類F，實作上述功能。具體要求如下：

Kylin中Segments overlap的解決辦法

DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation

Segmentation fault (core dumped）問題

elasticsearch merge 源碼

C++ 類模闆重載運算符實作複數運算1、重載為成員函數2、重載為友元函數

最簡單的C++程式

【語義分割】——分割結果可視化

【醫學+深度論文：F05】2018 automatic optic disk and cup segmentation of fundus images using deep learning05

【醫學+深度論文：F32】2017 SPIE Automated detection of nerve fiber layer defects on retinal fundus32

第16周項目8-有相同數字？

MATLAB環境下使用慣性傳感器進行步态周期驗證和分割算法運作環境為MATLABR2018A，使用慣性傳感器進行步态周期

【論文學習筆記】Learning to Segment Every Thing (2018_CVPR)IntroductionMask^x R-CNNExperimentConclusion

14.1動物這樣叫（二）

[MICCAI2019] Learning shape priors for robust cardiac MR segmentation from multi-view images