天天看点

数组分配内存过大导致SIGSEGV信号(段错误)

一、背景

今天codding的时候,发现一个段错误。

-> % ./a.out 
the size is: 
[]     segmentation fault (core dumped)  ./a.out 
           

打印跟了一下程序,段错误发生在定义数组的时候,感觉程序没毛病,就使用gdb跟了一下,效果如下:

(gdb) r 
Starting program: /home/signal/a.out 
the size is: 

Program received signal SIGSEGV, Segmentation fault.
 in main (argc=, argv=) at sigsegv.c:
      bzero(test, sizeof(test));
(gdb) s

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit
           

于是就专门测试了一下这个信号:

SIGSEGV

二、定位问题

1. 测试程序

大概知道了是数组分配的内存太大引起的,就顺手写了个测试程序,如下:

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
    int size;

    if (argc != ) {
        printf("Usage: %s [size]\n", argv[]);
        return -;
    }
    size = atoi(argv[]);
    printf("the size is: 0x%x\n", size);
    char test[size];
    bzero(test, sizeof(test));

    return ;
}
           

执行结果如下:

-> % ./a.out 
the size is: 
[]     segmentation fault (core dumped)  ./a.out 

-> % ./a.out 
the size is: 
           

可见,当分配的内存大于一定值时,就会出现段错误。

2. gdb调试core文件

使用gdb调试时,打印的错误信息如前所述,

设置

ulimit -c

参数,程序运行错误时会生成core文件,使用gdb调试,如下:

-> % gdb -c core ./a.out 
GNU gdb (Ubuntu -ubuntu5~) 
Copyright (C)  Free Software Foundation, Inc.
License GPLv3+: GNU GPL version  or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
[New LWP ]
Core was generated by `./a.out '.
Program terminated with signal SIGSEGV, Segmentation fault.
#   in main (argc=, argv=) at sigsegv.c:
      bzero(test, sizeof(test));
(gdb) s
The program is not being run.
(gdb) bt
#   in main (argc=, argv=) at sigsegv.c:
(gdb) 
           
3. strace调试系统调用

使用strace跟踪系统调用,打印如下:

-> % strace ./a.out 
execve("./a.out", ["./a.out", "9000000"], [/*  vars */]) = 
brk()                                  = 
access("/etc/ld.so.nohwcap", F_OK)      = - ENOENT (No such file or directory)
mmap2(NULL, , PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -, ) = 
access("/etc/ld.so.preload", R_OK)      = - ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 
fstat64(, {st_mode=S_IFREG|, st_size=, ...}) = 
mmap2(NULL, , PROT_READ, MAP_PRIVATE, , ) = 
close()                                = 
access("/etc/ld.so.nohwcap", F_OK)      = - ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 
read(, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\233\1\0004\0\0\0"..., ) = 
fstat64(, {st_mode=S_IFREG|, st_size=, ...}) = 
mmap2(NULL, , PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, , ) = 
mmap2(, , PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, , ) = 
mmap2(, , PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -, ) = 
close()                                = 
mmap2(NULL, , PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -, ) = 
set_thread_area({entry_number:- -> , base_addr:, limit:, seg_32bit:, contents:, read_exec_only:, limit_in_pages:, seg_not_present:, useable:}) = 
mprotect(, , PROT_READ)   = 
mprotect(, , PROT_READ)    = 
mprotect(, , PROT_READ)   = 
munmap(, )               = 
fstat64(, {st_mode=S_IFCHR|, st_rdev=makedev(, ), ...}) = 
mmap2(NULL, , PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -, ) = 
write(, "1the size is: 0x895440\n", 231the size is: 
) = 
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=} ---
+++ killed by SIGSEGV (core dumped) +++
[]     segmentation fault (core dumped)  strace ./a.out 
           

由此可知,大概也就这个问题了。

三、分析与解决

SIGSEGV:指示进程进行了一次无效的内存引用(通常说明程序有错,若访问了一个未经初始化的指针)。名字SEGV代表“段违例”(segmentation violation).

SIGSEGV的默认动作是

终止+core

—— 《UNIX环境高级编程》

对于不正确的内存处理,计算机程序可能抛出SIGSEGV。在函数内分配数组是保存在程序栈上的,而每种操作系统对栈的大小都是有限制的,如果分配的数组空间超过了栈大小,就会发生内存非法使用的错误。操作系统可能使用信号栈向一个处于自然状态的应用程序通告错误,在一个程序接收到SIGSEGV时的默认动作是异常终止。这个动作也许会结束进程,但是可能生成一个核心文件以帮助调试。

SIGSEGV可以被捕获。也就是说,应用程序可以请求它们想要的动作,以替代默认发生的动作。这样的动作可以是忽略它、调用一个函数,或恢复默认的动作。在一些情形下,忽略SIGSEGV导致未定义的行为。

在以后调试过程中,如果再遇到SIGSEGV信号导致的段错误,就要仔细检查程序中内存的使用,避免内存的非法引用。