Linux作業系統學習筆記（十六）程序間通信之信号

一. 前言

衆所周知，

System V IPC

程序間通信機制體系中有着多種多樣的程序間通信方式，如管道和有名管道，消息隊列，信号，共享記憶體和信号量，套接字。從本文開始我們就逐個剖析程序間通信的機制和底層原理，就從信号開始講起吧。

二. 信号基本知識

信号是程序處理緊急情況所用的一種方式，它沒有特别複雜的資料結構，就是用一個代号一樣的數字。Linux 提供了幾十種信号，分别代表不同的意義。我們可以通過

kill -l

指令檢視信号。

# kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

信号可以在任何時候發送給某一程序，程序需要為這個信号配置信号處理函數。當某個信号發生的時候，就預設執行這個函數就可以了。通過

man 7 signal

可以檢視各個信号的具體含義和對應的處理方法

Signal     Value     Action   Comment
──────────────────────────────────────────────────────────────────────
SIGHUP        1       Term    Hangup detected on controlling terminal
                              or death of controlling process
SIGINT        2       Term    Interrupt from keyboard
SIGQUIT       3       Core    Quit from keyboard
SIGILL        4       Core    Illegal Instruction

SIGABRT       6       Core    Abort signal from abort(3)
SIGFPE        8       Core    Floating point exception
SIGKILL       9       Term    Kill signal
SIGSEGV      11       Core    Invalid memory reference
SIGPIPE      13       Term    Broken pipe: write to pipe with no
                              readers
SIGALRM      14       Term    Timer signal from alarm(2)
SIGTERM      15       Term    Termination signal
SIGUSR1   30,10,16    Term    User-defined signal 1
SIGUSR2   31,12,17    Term    User-defined signal 2
……

由上表可見，信号的處理通常分為三種：

執行預設操作。Linux 對每種信号都規定了預設操作，例如上面清單中的 Term ，就是終止程序的意思。 Core 的意思是 Core Dump ，也即終止程序後通過 Core Dump 将目前程序的運作狀态儲存在檔案裡面，友善程式員事後進行分析問題在哪裡。
捕捉信号。我們可以為信号定義一個信号處理函數。當信号發生時，我們就執行相應的信号處理函數。
忽略信号。當我們不希望處理某些信号的時候，就可以忽略該信号，不做任何處理。有兩個信号是應用程序無法捕捉和忽略的，即 SIGKILL 和 SEGSTOP ，它們用于在任何時候中斷或結束某一程序。

三. 信号和中斷

信号和中斷有着諸多相似之處：

均會注冊處理函數
都是用于對目前的任務進行一些處理，如排程、停止等等

但是二者實際上是有很多不同的，其不同的用途導緻了運作邏輯的不同，最終在代碼實作上展現出了不同的設計特點。其主要差別有：

中斷和信号都可能源于硬體和軟體，但是中斷處理函數注冊于核心之中，由核心中運作，而信号的處理函數注冊于使用者态，核心收到信号後會根據目前任務 task_struct 結構體中的信号相關資料結構找尋對應的處理函數并最終在使用者态處理
中斷作用于核心全局，而信号作用于目前任務（程序）。即信号影響的往往是一個程序，而中斷處理如果出現問題則會導緻整個Linux核心的崩潰

四. 注冊信号處理函數

有些時候我們希望能夠讓信号運作一些特殊功能，是以有了自定義的信号處理函數。注冊

API

主要有

signal()

和

sigaction()

兩個，其中

sigaction()

比較推薦使用。

typedef void (*sighandler_t)(int);
sighandler_t signal(int signum, sighandler_t handler);

int sigaction(int signum, const struct sigaction *act,
                     struct sigaction *oldact);

其主要差別在于

sigaction()

對于信号

signum

會綁定對應的結構體

sigaction

而不僅僅是一個處理函數

sighandler_t

。這樣做的好處是可以更精細的控制信号處理，通過不同參數實作不同的效果。例如

sa_flags

可以設定如

SA_ONESHOT ：信号處理函數僅作用一次，之後啟用預設行為
SA_NOMASK ：該信号處理函數執行過程中允許被其他信号或者相同信号中斷，即不屏蔽
SA_INTERRUPT ：該信号處理函數若執行過程中被中斷，則不會再排程回該函數繼續執行，而是直接傳回 -EINTR ，将執行邏輯交還給調用方
SA_RESTART ：與 SA_INTERRUPT 相反，會自動重新開機該函數

sa_restorer

儲存的是

sa_handler

執行完畢之後，馬上要執行的函數，即下一個函數位址的位置。

struct sigaction {
    __sighandler_t sa_handler;
    unsigned long sa_flags;
    __sigrestore_t sa_restorer;
    sigset_t sa_mask;    /* mask last for extensibility */
};

sigaction()

也是

glibc

封裝的函數，最終系統調用為

rt_sigaction()

。該函數首先将使用者态的

struct sigaction

結構拷貝為核心态的

k_sigaction

，然後調用

do_sigaction()

設定對應的信号處理動作。

SYSCALL_DEFINE4(rt_sigaction, int, sig,
    const struct sigaction __user *, act,
    struct sigaction __user *, oact,
    size_t, sigsetsize)
{
    struct k_sigaction new_sa, old_sa;
    int ret = -EINVAL;
......
    if (act) {
      if (copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)))
        return -EFAULT;
    }

    ret = do_sigaction(sig, act ? &new_sa : NULL, oact ? &old_sa : NULL);

    if (!ret && oact) {
        if (copy_to_user(oact, &old_sa.sa, sizeof(old_sa.sa)))
            return -EFAULT;
    }
out:
    return ret;
}

do_sigaction()

會将使用者層傳來的信号處理函數指派給目前任務

task_struct currrent

對應的

sighand->action[]

數組中

sig

信号對應的位置，以用于之後調用。

int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
{
    struct task_struct *p = current, *t;
    struct k_sigaction *k;
    sigset_t mask;
......
    k = &p->sighand->action[sig-1];

    spin_lock_irq(&p->sighand->siglock);
    if (oact)
        *oact = *k;

    if (act) {
        sigdelsetmask(&act->sa.sa_mask, sigmask(SIGKILL) | sigmask(SIGSTOP));
        *k = *act;
......
  }

  spin_unlock_irq(&p->sighand->siglock);
  return 0;
}

五. 發送信号

信号發送來源廣泛，有可能來自于使用者态，有可能來自于硬體，也有可能來自于核心。

有時候，我們在終端輸入某些組合鍵的時候會給程序發送信号，例如，Ctrl+C 産生 SIGINT 信号，Ctrl+Z 産生 SIGTSTP 信号。再比如， kill -9 pid 可以發送信号給一個程序，殺死它。
有的時候，硬體異常也會産生信号。比如，執行了除以 0 的指令，CPU 就會産生異常，然後把 SIGFPE 信号發送給程序。再如，程序通路了非法記憶體，記憶體管理子產品就會産生異常，然後把信号 SIGSEGV 發送給程序。
有時候，核心在某些情況下，也會給程序發送信号。例如，向讀端已關閉的管道寫資料時産生 SIGPIPE 信号，當子程序退出時，我們要給父程序發送 SIG_CHLD 信号等。

不論通過

kill

或者

sigqueue

系統調用還是通過

tkill

或者

tgkill

發送指定線程的信号，其最終調用的均是

do_send_sig_info()

函數，其調用鍊如下所示

kill()->kill_something_info()->kill_pid_info()->group_send_sig_info()->do_send_sig_info()
    
tkill()->do_tkill()->do_send_specific()->do_send_sig_info()
    
tgkill()->do_tkill()->do_send_specific()->do_send_sig_info()
    
rt_sigqueueinfo()->do_rt_sigqueueinfo()->kill_proc_info()->kill_pid_info()->group_send_sig_info()->do_send_sig_info()

do_send_sig_info()

會調用

send_signal()

，進而調用

__send_signal()

。這裡代碼比較複雜，主要邏輯如下

根據發送信号的類型判斷是共享信号還是線程獨享信号，由此指派 pending 。如果是 kill 發送的，也就是發送給整個程序的，就應該發送給 t->signal->shared_pending ，這裡面是整個程序所有線程共享的信号；如果是 tkill 發送的，也就是發給某個線程的，就應該發給 t->pending ，這裡面是這個線程的 task_struct 獨享的。
調用 legacy_queue() 判斷是否為可靠信号，不可靠則直接退出
調用 __sigqueue_alloc() 配置設定一個 struct sigqueue 對象，然後通過 list_add_tail 挂在 struct sigpending 裡面的連結清單上。
調用 complete_signal() 配置設定線程處理該信号

int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
            enum pid_type type)
{
......
    ret = send_signal(sig, info, p, type);
......
}

static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
            enum pid_type type)
{
......
    return __send_signal(sig, info, t, type, from_ancestor_ns);
}

static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
            enum pid_type type, int from_ancestor_ns)
{
    struct sigpending *pending;
    struct sigqueue *q;
......
    pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
......
    if (legacy_queue(pending, sig))
        goto ret;
......
    /*
     * Real-time signals must be queued if sent by sigqueue, or
     * some other real-time mechanism.  It is implementation
     * defined whether kill() does so.  We attempt to do so, on
     * the principle of least surprise, but since kill is not
     * allowed to fail with EAGAIN when low on memory we just
     * make sure at least one signal gets delivered and don't
     * pass on the info struct.
     */
    if (sig < SIGRTMIN)
        override_rlimit = (is_si_special(info) || info->si_code >= 0);
    else
        override_rlimit = 0;
    q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit);
    if (q) {
        list_add_tail(&q->list, &pending->list);
        switch ((unsigned long) info) {
        case (unsigned long) SEND_SIG_NOINFO:
            clear_siginfo(&q->info);
            q->info.si_signo = sig;
            q->info.si_errno = 0;
            q->info.si_code = SI_USER;
            q->info.si_pid = task_tgid_nr_ns(current,
                            task_active_pid_ns(t));
            q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
            break;
        case (unsigned long) SEND_SIG_PRIV:
            clear_siginfo(&q->info);
            q->info.si_signo = sig;
            q->info.si_errno = 0;
            q->info.si_code = SI_KERNEL;
            q->info.si_pid = 0;
            q->info.si_uid = 0;
            break;
        default:
            copy_siginfo(&q->info, info);
            if (from_ancestor_ns)
                q->info.si_pid = 0;
            break;
        }
        userns_fixup_signal_uid(&q->info, t);
    }
......
out_set:
    signalfd_notify(t, sig);
    sigaddset(&pending->signal, sig);
......
    complete_signal(sig, t, type);
ret:
    trace_signal_generate(sig, info, t, type != PIDTYPE_PID, result);
    return ret;
}

legacy_queue()

中主要是判斷是否為可靠信号，判斷的依據是當信号小于

SIGRTMIN

也即 32 的時候，如果我們發現這個信号已經在集合裡面了，就直接退出。這裡之是以前32位信号稱之為不可靠信号其實是曆史遺留問題，早期UNIX系統隻定義了32種信号，而這些經過檢驗被定義為不可靠信号，主要指的是程序可能對信号做出錯誤的反應以及信号可能丢失：UNIX系統每次信号處理完需要重新安裝信号，是以容易出現各種錯誤。

linux

也支援不可靠信号，但是對不可靠信号機制做出了改進：在調用完信号處理函數後，不必重新調用該信号的安裝函數(信号安裝函數是在可靠機制上是實作的)。是以，

linux

下的不可靠信号問題主要指的是信号可能丢失。

這裡之是以會出現信号丢失，是因為這些信号可能會頻繁快速出現。這樣信号能夠處理多少，和信号處理函數什麼時候被調用，信号多大頻率被發送，都有關系，而信号處理函數的調用時間也是不确定的，是以這種信号稱之為不可靠信号。與之相對的，其他信号稱之為可靠信号，支援排隊執行。

static inline int legacy_queue(struct sigpending *signals, int sig)
{
    return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}

#define SIGRTMIN  32
#define SIGRTMAX  _NSIG
#define _NSIG    64

對于可靠信号我們通過

__sigqueue_alloc()

配置設定

sigqueue

對象，并挂載在

sigpending

中的連結清單上，最終調用

complete_signal()

找一個線程處理。其主要邏輯為：

首先找是否有可喚醒的線程來執行，如果是主線程或者僅有一個線程，則直接從連結清單主隊并配置設定
如果沒找到可喚醒的線程，則檢視目前是否有不需要喚醒的線程可以執行
如果沒找到并且該信号為非常重要的信号如 SIGKILL ，則強行關閉目前線程
調用 signal_wake_up() 喚醒線程

static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
{
    struct signal_struct *signal = p->signal;
    struct task_struct *t;
    /*
     * Now find a thread we can wake up to take the signal off the queue.
     *
     * If the main thread wants the signal, it gets first crack.
     * Probably the least surprising to the average bear.
     */
    if (wants_signal(sig, p))
        t = p;
    else if ((type == PIDTYPE_PID) || thread_group_empty(p))
        /*
         * There is just one thread and it does not need to be woken.
         * It will dequeue unblocked signals before it runs again.
         */
        return;
    else {
        /*
         * Otherwise try to find a suitable thread.
         */
        t = signal->curr_target;
        while (!wants_signal(sig, t)) {
            t = next_thread(t);
            if (t == signal->curr_target)
                /*
                 * No thread needs to be woken.
                 * Any eligible threads will see
                 * the signal in the queue soon.
                 */
                return;
        }
        signal->curr_target = t;
    }
    /*
     * Found a killable thread.  If the signal will be fatal,
     * then start taking the whole group down immediately.
     */
    if (sig_fatal(p, sig) &&
        !(signal->flags & SIGNAL_GROUP_EXIT) &&
        !sigismember(&t->real_blocked, sig) &&
        (sig == SIGKILL || !p->ptrace)) {
        /*
         * This signal will be fatal to the whole group.
         */
        if (!sig_kernel_coredump(sig)) {
            /*
             * Start a group exit and wake everybody up.
             * This way we don't have other threads
             * running and doing things after a slower
             * thread has the fatal signal pending.
             */
            signal->flags = SIGNAL_GROUP_EXIT;
            signal->group_exit_code = sig;
            signal->group_stop_count = 0;
            t = p;
            do {
                task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
                sigaddset(&t->pending.signal, SIGKILL);
                signal_wake_up(t, 1);
            } while_each_thread(p, t);
            return;
        }
    }
    /*
     * The signal is already in the shared-pending queue.
     * Tell the chosen thread to wake up and dequeue it.
     */
    signal_wake_up(t, sig == SIGKILL);
    return;
}

signal_wake_up()

函數主要邏輯為

設定 TIF_SIGPENDING 标記位
嘗試喚醒該線程/程序

信号處理的排程和任務排程類似，均是采用标記位的方式進行。當信号來的時候，核心并不直接處理這個信号，而是設定一個辨別位

TIF_SIGPENDING

來表示已經有信号等待處理。同樣等待系統調用結束，或者中斷處理結束，從核心态傳回使用者态的時候再進行信号的處理。

程序/線程的喚醒和任務排程一樣最終會調用

try_to_wake_up()

，具體邏輯就不重複分析了。如果 wake_up_state 傳回 0，說明程序或者線程已經是 TASK_RUNNING 狀态了，如果它在另外一個 CPU 上運作，則調用 kick_process 發送一個處理器間中斷，強制那個程序或者線程重新排程，重新排程完畢後，會傳回使用者态運作。

static inline void signal_wake_up(struct task_struct *t, bool resume)
{
    signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
}

void signal_wake_up_state(struct task_struct *t, unsigned int state)
{
    set_tsk_thread_flag(t, TIF_SIGPENDING);
    /*
     * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
     * case. We don't check t->state here because there is a race with it
     * executing another processor and just now entering stopped state.
     * By using wake_up_state, we ensure the process will wake up and
     * handle its death signal.
     */
    if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
        kick_process(t);
}

六. 信号的處理

這裡我們以一個從

tap

網卡中讀取資料的例子來分析信号的處理邏輯。這部分涉及到了系統調用、任務排程、中斷等知識，對前面的文章也算是一個回顧。從網卡讀取資料會通過系統調用進入核心，之後通過函數調用表找到對應的函數執行。在讀的過程中，如果沒有資料處理則會調用

schedule()

函數主動讓出CPU進入休眠狀态并等待再次喚醒。

tap_do_read()

主要邏輯為：

把目前程序或者線程的狀态設定為 TASK_INTERRUPTIBLE，這樣才能使這個系統調用可以被中斷。
可以被中斷的系統調用往往是比較慢的調用，并且會因為資料不就緒而通過 schedule() 讓出 CPU 進入等待狀态。在發送信号的時候，我們除了設定這個程序和線程的 _TIF_SIGPENDING 辨別位之外，還試圖喚醒這個程序或者線程，也就是将它從等待狀态中設定為 TASK_RUNNING 。當這個程序或者線程再次運作的時候，會從 schedule() 函數中傳回，然後再次進入 while 循環。由于這個程序或者線程是由信号喚醒的而不是因為資料來了而喚醒的，因而是讀不到資料的，但是在 signal_pending() 函數中，我們檢測到了 _TIF_SIGPENDING 辨別位，這說明系統調用沒有真的做完，于是傳回一個錯誤 ERESTARTSYS ，然後帶着這個錯誤從系統調用傳回。
如果沒有信号，則繼續調用 schedule() 讓出CPU

static ssize_t tap_do_read(struct tap_queue *q,
         struct iov_iter *to,
         int noblock, struct sk_buff *skb)
{
......
    while (1) {
        if (!noblock)
            prepare_to_wait(sk_sleep(&q->sk), &wait, TASK_INTERRUPTIBLE);

        /* Read frames from the queue */
        skb = skb_array_consume(&q->skb_array);
        if (skb)
            break;
        if (noblock) {
            ret = -EAGAIN;
            break;
        }
        if (signal_pending(current)) {
            ret = -ERESTARTSYS;
            break;
        }
        /* Nothing to read, let's sleep */
        schedule();
    }
......
}

schedule()

會在系統調用傳回或者中斷傳回的時刻調用

exit_to_usermode_loop()

，在任務排程中标記位為

_TIF_NEED_RESCHED

，而對于信号來說是

_TIF_SIGPENDING

。

static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
{
    while (true) {
......
        if (cached_flags & _TIF_NEED_RESCHED)
            schedule();
......
        /* deal with pending signal delivery */
        if (cached_flags & _TIF_SIGPENDING)
            do_signal(regs);
......
        if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
            break;
    }
}

do_signal()

函數會調用

handle_signal()

，這裡主要存在一個問題使得邏輯變得較為複雜：信号處理函數定義于使用者态，而排程過程位于核心态。

/*
 * Note that 'init' is a special process: it doesn't get signals it doesn't
 * want to handle. Thus you cannot kill init even with a SIGKILL even by
 * mistake.
 */
void do_signal(struct pt_regs *regs)
{
    struct ksignal ksig;
    if (get_signal(&ksig)) {
        /* Whee! Actually deliver the signal.  */
        handle_signal(&ksig, regs);
        return;
    }
    /* Did we come from a system call? */
    if (syscall_get_nr(current, regs) >= 0) {
        /* Restart the system call - no handlers present */
        switch (syscall_get_error(current, regs)) {
        case -ERESTARTNOHAND:
        case -ERESTARTSYS:
        case -ERESTARTNOINTR:
            regs->ax = regs->orig_ax;
            regs->ip -= 2;
            break;
        case -ERESTART_RESTARTBLOCK:
            regs->ax = get_nr_restart_syscall(regs);
            regs->ip -= 2;
            break;
        }
    }
    /*
     * If there's no signal to deliver, we just put the saved sigmask
     * back.
     */
    restore_saved_sigmask();
}

handle_signal()

會判斷目前是否從系統調用排程而來，當發現錯誤碼為

ERESTARTSYS

的時候就知道這是從一個沒有調用完的系統調用傳回的，設定系統錯誤碼為

EINTR

。由于此處不會直接傳回任務排程前記錄的使用者态狀态，而是進入注冊好的信号處理函數，是以需要調用

setup_rt_frame()

建構新的寄存器結構體

pt_regs

。

static void
handle_signal(struct ksignal *ksig, struct pt_regs *regs)
{
    bool stepping, failed;
......
    /* Are we from a system call? */
    if (syscall_get_nr(current, regs) >= 0) {
        /* If so, check system call restarting.. */
        switch (syscall_get_error(current, regs)) {
        case -ERESTART_RESTARTBLOCK:
        case -ERESTARTNOHAND:
            regs->ax = -EINTR;
            break;
        case -ERESTARTSYS:
            if (!(ksig->ka.sa.sa_flags & SA_RESTART)) {
                regs->ax = -EINTR;
                break;
            }
        /* fallthrough */
        case -ERESTARTNOINTR:
            regs->ax = regs->orig_ax;
            regs->ip -= 2;
            break;
        }
    }
......
    failed = (setup_rt_frame(ksig, regs) < 0);
......
    signal_setup_done(failed, ksig, stepping);
}

setup_rt_frame()

主要調用

__setup_rt_frame()

，主要邏輯為：

調用 get_sigframe() 得到 regs 中的 sp 寄存器值，即原程序使用者态的棧頂指針，将 sp 減去 sizeof(struct rt_sigframe )進而把該建立棧幀壓入棧
調用 put_user_ex() ，将 sa_restorer 按照函數棧的規則放到了 frame->pretcode 裡面。函數棧裡面包含了函數執行完跳回去的位址,當 sa_handler 執行完之後，彈出的函數棧是 frame ，也就應該跳到 sa_restorer 的位址
調用 setup_sigcontext() 裡面，将原來的 pt_regs 儲存在了 frame 中的 uc_mcontext 裡
填充 regs ，将 regs->ip 設定為自定義的信号處理函數 sa_handler ，将棧頂 regs->sp 設定為新棧幀 frame 位址

static int
setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
{
......
        return __setup_rt_frame(ksig->sig, ksig, set, regs);
......
}

static int __setup_rt_frame(int sig, struct ksignal *ksig,
                sigset_t *set, struct pt_regs *regs)
{
    struct rt_sigframe __user *frame;
    void __user *fp = NULL;
    int err = 0;
    frame = get_sigframe(&ksig->ka, regs, sizeof(struct rt_sigframe), &fp);
......
    put_user_try {
......
        /* Set up to return from userspace.  If provided, use a stub
           already in userspace.  */
        /* x86-64 should always use SA_RESTORER. */
        if (ksig->ka.sa.sa_flags & SA_RESTORER) {
            put_user_ex(ksig->ka.sa.sa_restorer, &frame->pretcode);
        } else {
            /* could use a vstub here */
            err |= -EFAULT;
        }
    } put_user_catch(err);
    err |= setup_sigcontext(&frame->uc.uc_mcontext, fp, regs, set->sig[0]);
    err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
    if (err)
        return -EFAULT;
    /* Set up registers for signal handler */
    regs->di = sig;
    /* In case the signal handler was declared without prototypes */
    regs->ax = 0;
    /* This also works for non SA_SIGINFO handlers because they expect the
       next argument after the signal number on the stack. */
    regs->si = (unsigned long)&frame->info;
    regs->dx = (unsigned long)&frame->uc;
    regs->ip = (unsigned long) ksig->ka.sa.sa_handler;
    regs->sp = (unsigned long)frame;
    /*
     * Set up the CS and SS registers to run signal handlers in
     * 64-bit mode, even if the handler happens to be interrupting
     * 32-bit or 16-bit code.
     *
     * SS is subtle.  In 64-bit mode, we don't need any particular
     * SS descriptor, but we do need SS to be valid.  It's possible
     * that the old SS is entirely bogus -- this can happen if the
     * signal we're trying to deliver is #GP or #SS caused by a bad
     * SS value.  We also have a compatbility issue here: DOSEMU
     * relies on the contents of the SS register indicating the
     * SS value at the time of the signal, even though that code in
     * DOSEMU predates sigreturn's ability to restore SS.  (DOSEMU
     * avoids relying on sigreturn to restore SS; instead it uses
     * a trampoline.)  So we do our best: if the old SS was valid,
     * we keep it.  Otherwise we replace it.
     */
    regs->cs = __USER_CS;
    if (unlikely(regs->ss != __USER_DS))
        force_valid_ss(regs);
    return 0;
}

sa_restorer

在

__libc_sigaction()

函數中被指派為

restore_rt

，實際上調用函數調用

__NR_rt_sigreturn()

RESTORE (restore_rt, __NR_rt_sigreturn)

#define RESTORE(name, syscall) RESTORE2 (name, syscall)
# define RESTORE2(name, syscall) \
asm                                     \
  (                                     \
   ".LSTART_" #name ":\n"               \
   "    .type __" #name ",@function\n"  \
   "__" #name ":\n"                     \
   "    movq $" #syscall ", %rax\n"     \
   "    syscall\n"                      \
......

__NR_rt_sigreturn()

對應的核心函數為

sys_rt_sigreturn()

，這裡會調用

restore_sigframe()

将

pt_regs

恢複成原程序的棧幀狀态，進而繼續執行函數調用後續的内容。

asmlinkage int sys_rt_sigreturn(struct pt_regs *regs)
{
    struct rt_sigframe __user *frame;
    /* Always make any pending restarted system calls return -EINTR */
    current->restart_block.fn = do_no_restart_syscall;
    /*
     * Since we stacked the signal on a 64-bit boundary,
     * then 'sp' should be word aligned here.  If it's
     * not, then the user is trying to mess with us.
     */
    if (regs->ARM_sp & 7)
        goto badframe;
    frame = (struct rt_sigframe __user *)regs->ARM_sp;
    if (!access_ok(frame, sizeof (*frame)))
        goto badframe;
    if (restore_sigframe(regs, &frame->sig))
        goto badframe;
    if (restore_altstack(&frame->sig.uc.uc_stack))
        goto badframe;
    return regs->ARM_r0;
badframe:
    force_sig(SIGSEGV, current);
    return 0;
}

總結

信号的發送與處理是一個複雜的過程，這裡來總結一下。

假設我們有一個程序 A會從 tap 網卡中讀取資料， main 函數裡面調用系統調用通過中斷陷入核心。
按照系統調用的原理，将使用者态棧的資訊儲存在 pt_regs 裡面，也即記住原來使用者态是運作到了 line A 的地方。
在核心中執行系統調用讀取資料。
當發現沒有什麼資料可讀取的時候進入睡眠狀态，并且調用 schedule() 讓出 CPU。
将程序狀态設定為可中斷的睡眠狀态 TASK_INTERRUPTIBLE ，也即如果有信号來的話是可以喚醒它的。
其他的程序或者 shell 通過調用 kill()、tkill()、tgkill()、rt_sigqueueinfo() 發送信号。四個發送信号的函數，在核心中最終都是調用 do_send_sig_info() 。
do_send_sig_info() 調用 send_signal() 給程序 A 發送一個信号，其實就是找到程序 A 的 task_struct ，不可靠信号加入信号集合，可靠信号加入信号連結清單。
do_send_sig_info() 調用 signal_wake_up() 喚醒程序 A。
程序 A 重新進入運作狀态 TASK_RUNNING ，接着 schedule() 運作。
程序 A 被喚醒後檢查是否有信号到來，如果沒有，重新循環到一開始，嘗試再次讀取資料，如果還是沒有資料，再次進入 TASK_INTERRUPTIBLE ，即可中斷的睡眠狀态。
當發現有信号到來的時候，就傳回目前正在執行的系統調用，并傳回一個錯誤表示系統調用被中斷了。
系統調用傳回的時候，會調用 exit_to_usermode_loop() ，這是一個處理信号的時機。
調用 do_signal() 開始處理信号。
根據信号得到信号處理函數 sa_handler ，然後修改 pt_regs 中的使用者态棧的資訊讓 pt_regs 指向 sa_handler ，同時修改使用者态的棧，插入一個棧幀 sa_restorer ，裡面儲存了原來的指向 line A 的 pt_regs ，并且設定讓 sa_handler 運作完畢後跳到 sa_restorer 運作。
傳回使用者态，由于 pt_regs 已經設定為 sa_handler ，則傳回使用者态執行 sa_handler 。
sa_handler 執行完畢後，信号處理函數就執行完了，接着會跳到 sa_restorer 運作。
sa_restorer 會調用系統調用 rt_sigreturn 再次進入核心。
在核心中， rt_sigreturn 恢複原來的 pt_regs ，重新指向 line A。
從 rt_sigreturn 傳回使用者态，還是調用 exit_to_usermode_loop() 。
這次因為 pt_regs 已經指向 line A 了，于是就到了程序 A 中接着系統調用之後運作，當然這個系統調用傳回的是它被中斷了沒有執行完的錯誤。

Linux作業系統學習筆記（十六）程式間通信之信号

源碼資料

[1] sigaction

[2] do_send_sig_info()

[3] do_signal()

參考資料

[1] wiki

[2] elixir.bootlin.com/linux

[3] woboq

[4] Linux-insides

[5] 深入了解Linux核心

[6] Linux核心設計的藝術

[7] 極客時間趣談Linux作業系統

Linux作業系統學習筆記（十六）程序間通信之信号

一. 前言

二. 信号基本知識

三. 信号和中斷

四. 注冊信号處理函數

五. 發送信号

六. 信号的處理

總結

源碼資料

參考資料

繼續閱讀

作業系統（python）多程序學習

Ubuntu14.04 LTS下安裝mongodb

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

ACS基本配置-權限等級管理

傳說FreeBSD等比Linux更穩定，更“健壯”

無人機--飛控科普

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三