天天看點

[apue] 使用檔案記錄鎖無法實作父子程序互動執行同步

“同步是不可能同步的,這輩子都不可能同步”——檔案鎖

父子程序間互動執行是指用一種同步原語,實作父程序和子程序在某一時刻隻有一個程序執行,之後由另外一個程序執行,用一段代碼舉例如下:

SYNC_INIT(); 

    int i=0, counter=0; 
    pid_t pid = fork (); 
    if (pid < 0)
        err_sys ("fork error"); 
    else if (pid > 0)
    {
        // parent
        for (i=0; i<NLOOPS; i+=2)
        {
            counter = update ((long *)area); 
            if (counter != i)
                err_quit ("parent: expected %d, got %d", i, counter); 
            else 
                printf ("parent increase to %d based %d\n", i+1, counter); 

            SYNC_TELL(pid, 1); 
            SYNC_WAIT(0); 
        }

        printf ("parent exit\n"); 
    }
    else 
    {
        for (i=1; i<NLOOPS+1; i+=2)
        {
            SYNC_WAIT(1); 
            counter = update ((long *)area); 
            if (counter != i)
                err_quit ("child: expected %d, got %d", i, counter); 
            else 
                printf ("child increase to %d based %d\n", i+1, counter); 

            SYNC_TELL(getppid (), 0);
        }

        printf ("child exit\n"); 
    }      

其中area是指向共享記憶體的一個位址,update用來增加area指向的内容(為long),在fork之後,父子程序交替更新此值。

它們使用了一些抽象的同步原語,例如SYNC_INIT用于初始化同步設施、SYNC_WAIT等待另外程序的信号、SYNC_TELL向另外程序發送信号。

下面是成功同步後的輸出(假設NLOOPS為100):

create shared-memory 3801126 with size 4 ok
attach shared-memory at 0xb7733000
parent increase to 1 based 0
child increase to 2 based 1
parent increase to 3 based 2
child increase to 4 based 3
parent increase to 5 based 4
child increase to 6 based 5
parent increase to 7 based 6
child increase to 8 based 7
parent increase to 9 based 8
child increase to 10 based 9
parent increase to 11 based 10
child increase to 12 based 11
parent increase to 13 based 12
child increase to 14 based 13
parent increase to 15 based 14
child increase to 16 based 15
parent increase to 17 based 16
child increase to 18 based 17
parent increase to 19 based 18
child increase to 20 based 19
parent increase to 21 based 20
child increase to 22 based 21
parent increase to 23 based 22
child increase to 24 based 23
parent increase to 25 based 24
child increase to 26 based 25
parent increase to 27 based 26
child increase to 28 based 27
parent increase to 29 based 28
child increase to 30 based 29
parent increase to 31 based 30
child increase to 32 based 31
parent increase to 33 based 32
child increase to 34 based 33
parent increase to 35 based 34
child increase to 36 based 35
parent increase to 37 based 36
child increase to 38 based 37
parent increase to 39 based 38
child increase to 40 based 39
parent increase to 41 based 40
child increase to 42 based 41
parent increase to 43 based 42
child increase to 44 based 43
parent increase to 45 based 44
child increase to 46 based 45
parent increase to 47 based 46
child increase to 48 based 47
parent increase to 49 based 48
child increase to 50 based 49
parent increase to 51 based 50
child increase to 52 based 51
parent increase to 53 based 52
child increase to 54 based 53
parent increase to 55 based 54
child increase to 56 based 55
parent increase to 57 based 56
child increase to 58 based 57
parent increase to 59 based 58
child increase to 60 based 59
parent increase to 61 based 60
child increase to 62 based 61
parent increase to 63 based 62
child increase to 64 based 63
parent increase to 65 based 64
child increase to 66 based 65
parent increase to 67 based 66
child increase to 68 based 67
parent increase to 69 based 68
child increase to 70 based 69
parent increase to 71 based 70
child increase to 72 based 71
parent increase to 73 based 72
child increase to 74 based 73
parent increase to 75 based 74
child increase to 76 based 75
parent increase to 77 based 76
child increase to 78 based 77
parent increase to 79 based 78
child increase to 80 based 79
parent increase to 81 based 80
child increase to 82 based 81
parent increase to 83 based 82
child increase to 84 based 83
parent increase to 85 based 84
child increase to 86 based 85
parent increase to 87 based 86
child increase to 88 based 87
parent increase to 89 based 88
child increase to 90 based 89
parent increase to 91 based 90
child increase to 92 based 91
parent increase to 93 based 92
child increase to 94 based 93
parent increase to 95 based 94
child increase to 96 based 95
parent increase to 97 based 96
child increase to 98 based 97
parent increase to 99 based 98
child increase to 100 based 99
child exit
parent exit
remove that shared-memory
      

這套同步原語可以有多種實作方案,簡單如管道、xsi信号量,甚至直接使用信号。下面是一些例子:

1. 使用管道

#ifdef USE_PIPE_SYNC

// pp is the pipe that parent notify(write) child wait(read)
// pc is the pipe that child notify(write) parent wait(read)
static int pp[2], pc[2]; 

void SYNC_INIT (void)
{
    if (pipe (pp) < 0 || pipe(pc) < 0)
        err_sys ("pipe error"); 
}

void SYNC_TELL (pid_t pid, int child)
{
    // close unused read end to avoid poll receive events
    // note, we can NOT do it in SYNC_INIT, 
    // as at that moment, we have not fork yet !
    if (child) { 
        close (pp[0]); 
        close (pc[1]); 
        pp[0] = pc[1] = -1; 
    } else { 
        close (pc[0]); 
        close (pp[1]); 
        pc[0] = pp[1] = -1; 
    }

    if (write (child ? pp[1] : pc[1], child ? "p" : "c", 1) != 1)
        err_sys ("write error"); 
}

void SYNC_WAIT (int child /* unused */)
{
    int n = 0, m = 0; 
    struct pollfd fds[2] = {{ 0 }}; 
    // if fd==-1, just be a place taker !
    //if (pp[0] != -1) 
    {
        fds[n].fd = pp[0]; 
        fds[n].events = POLLIN; 
        n++; 
    }

    //if (pc[0] != -1) 
    { 
        fds[n].fd = pc[0]; 
        fds[n].events = POLLIN; 
        n++; 
    }

    int ret = poll (fds, n, -1); 
    if (ret == -1)
        err_sys ("poll error"); 
    else if (ret > 0) { 
        char c = 0; 
        //printf ("poll %d from %d\n", ret, n); 
        for (m=0; m<n; ++m) {
            //printf ("poll fd %d event 0x%08x\n", fds[m].fd, fds[m].revents); 
            if (fds[m].revents & POLLIN) { 
                if (fds[m].fd == pp[0]) { 
                    if (read (pp[0], &c, 1) != 1)
                        err_sys ("read parent pipe error"); 
                    if (c != 'p')
                        err_quit ("wait parent pipe but got incorrect data %c", c); 
                }
                else {
                    if (read (pc[0], &c, 1) != 1) 
                        err_sys ("read child pipe error"); 
                    if (c != 'c') 
                        err_quit ("wait child pipe but got incorrect data %c", c); 
                }
            }
        }
    }
    else 
        printf ("poll return 0\n"); 
}

#endif      

管道的話,TELL時就是向管道寫一個位元組,WAIT的時候就是阻塞在對應的端讀取一個位元組。

注意這裡WAIT沒有直接使用child參數,而是使用poll同時檢測兩個讀端,看哪個有資料就傳回哪個。其實直接讀對應的端更直接一些。

2.使用xsi信号量

#ifdef USE_SEM_SYNC

union semun
{
  int val;                //<= value for SETVAL
  struct semid_ds *buf; //        <= buffer for IPC_STAT & IPC_SET
  unsigned short int *array;//         <= array for GETALL & SETALL
  struct seminfo *__buf;    //        <= buffer for IPC_INFO
};

static int semid = -1;  

void SYNC_INIT ()
{
    int mode = 0666;  // 0; 
    int flag = IPC_CREAT; 
#ifdef USE_EXCL
    flag |= IPC_EXCL; 
#endif

    semid = semget (IPC_PRIVATE, 2, flag | mode); 
    if (semid < 0)
        err_sys ("semget for SYNC failed"); 

    printf ("create semaphore %d for SYNC ok\n", semid); 

    union semun sem; 
    //sem.val = 1;
    //int ret = semctl (semid, 0, SETVAL, sem); 
    //if (ret < 0)
    //    err_sys ("semctl to set val failed"); 
    
    short arr[2] = { 0 }; 
    sem.array = arr; 
    int ret = semctl (semid, 0, SETALL, sem); 
    if (ret < 0)
        err_sys ("semctl to set all val failed"); 

    printf ("reset all semaphores ok\n"); 
}

void SYNC_TELL (pid_t pid, int child)
{
    struct sembuf sb; 
    sb.sem_op = 1; 
    sb.sem_num = child ? 1 : 0; 
    sb.sem_flg = 0;  // IPC_NOWAIT, SEM_UNDO
    int ret = semop (semid, &sb, 1); 
    if (ret < 0)
        printf ("semop to release resource failed, ret %d, errno %d\n", ret, errno); 
    else 
        printf ("release %d resource %d\n", sb.sem_op, ret); 
}

void SYNC_WAIT (int child)
{
    struct sembuf sb; 
    sb.sem_op = -1; 
    sb.sem_num = child ? 1 : 0; 
    sb.sem_flg = 0;  // IPC_NOWAIT, SEM_UNDO
    int ret = semop (semid, &sb, 1); 
    if (ret < 0)
        printf ("semop to require resource failed, ret %d, errno %d\n", ret, errno); 
    else 
        printf ("require %d resource %d\n", sb.sem_op, ret); 
}

#endif      

xsi信号量的話,在TELL時是向對應的信号量執行V操作,釋放一個資源;在WAIT時是向對應的信号量執行P操作,申請一個資源,如果申請不到,就阻塞在那裡。

3.使用信号

#ifdef USE_SIGNAL_SYNC

static volatile sig_atomic_t sigflag; 
static sigset_t newmask, oldmask, zeromask; 

static void sig_usr (int signo)
{
    sigflag = 1; 
    printf ("SIGUSR1/2 called\n"); 
}

void SYNC_INIT ()
{
    if (apue_signal (SIGUSR1, sig_usr) == SIG_ERR)
        err_sys ("signal (SIGUSR1) error"); 
    if (apue_signal (SIGUSR2, sig_usr) == SIG_ERR)
        err_sys ("signal (SIGUSR2) error"); 

    sigemptyset (&zeromask); 
    sigemptyset (&newmask); 
    sigaddset (&newmask, SIGUSR1); 
    sigaddset (&newmask, SIGUSR2); 

    if (sigprocmask (SIG_BLOCK, &newmask, &oldmask) < 0)
        err_sys ("SIG_BLOCK error"); 
}

void SYNC_TELL (pid_t pid, int child)
{
    kill (pid, child ? SIGUSR1 : SIGUSR2); 
}

void SYNC_WAIT (int child /* unused */)
{
    while (sigflag == 0)
        sigsuspend (&zeromask); 

    sigflag = 0; 
    if (sigprocmask (SIG_SETMASK, &oldmask, NULL) < 0)
        err_sys ("SIG_SETMASK error"); 
}

#endif      

直接使用signal的話,這裡分别使用了SIGUSR1和SIGUSR2表示父子程序,TELL操作就是激發一個信号給對方;WAIT操作就是sigsuspend在某個特定信号上,直到有信号發生才傳回。

注意TELL時需要指定發送信号的程序号,是以多了一個pid參數,這個參數在之前據說的兩種方法中并沒有使用。這也是signal不好的一點。

然後,apue 15章最後一道習題中,要求使用檔案記錄鎖來實作上述互動執行時,發現這是不可能完成的任務!

假設我們以加鎖檔案或檔案中一個位元組來實作WAIT,使用解鎖來實作TELL,那麼會發現檔案記錄鎖有以下缺點,導緻它不能勝任這個工作:

1. 檔案記錄鎖是基于檔案+程序的,當fork後産生子程序時,之前加的鎖自動釋放;

2. 檔案記錄鎖對于重複施加鎖于一個檔案或檔案中某個特定位元組時,它的表現就和之前沒有加鎖一樣,直接成功傳回,不會産生阻塞效果;

對于 問題1,直接的影響就是父程序加好鎖之後fork,子程序啟動後卻沒有任何初始鎖,導緻父子程序同步困難。

雖然這個可以通過在子程序中重新初始化來部分的解決,但是這種問題因為有程序競争存在,問題不嚴密進而不完美的;

對于 問題2,就直接導緻其中一個程序在它的任務循環中,TELL另外一個程序後,再WAIT本程序的同步原語時(内部通過加鎖實作),

另一個程序即使沒有解鎖相應的檔案或位元組,WAIT也直接成功傳回(因為本程序已經持有該鎖),進而造成其中一個程序執行多次,另一個程序沒有辦法插進去執行的情況(雖然兩個程序也不能同時執行)。

是以結論是,對于互動執行的同步場景,管道、semaphore、signal都适用,而file lock不适用。

測試程式

各種實作