天天看點

malloc失敗導緻線程死鎖

malloc失敗導緻線程死鎖

環境:Linux3.44 / libc.so.6 2.17

錯誤棧資訊:

Thread 1 (Thread 0x7fcae15e9740 (LWP 17012)):
#0  0x00007fcadededbd8 in pthread_once () from /lib64/libpthread.so.0
#1  0x00007fcadeb2a08c in backtrace () from /lib64/libc.so.6
#2  0x00007fcadea95dd4 in __libc_message () from /lib64/libc.so.6
#3  0x00007fcadea9bbf7 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fcadea9f125 in _int_malloc () from /lib64/libc.so.6
#5  0x00007fcadeaa011c in malloc () from /lib64/libc.so.6
#6  0x00007fcae13ee8a3 in _dl_map_object () from /lib64/ld-linux-x86-64.so.2
#7  0x00007fcae13f98d1 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#8  0x00007fcae13f5314 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9  0x00007fcae13f925b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#10 0x00007fcadeb50912 in do_dlopen () from /lib64/libc.so.6
#11 0x00007fcae13f5314 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x00007fcadeb509d2 in __libc_dlopen_mode () from /lib64/libc.so.6
#13 0x00007fcadeb29f75 in init () from /lib64/libc.so.6
#14 0x00007fcadededbe0 in pthread_once () from /lib64/libpthread.so.0
#15 0x00007fcadeb2a08c in backtrace () from /lib64/libc.so.6
#16 0x00007fcadea95dd4 in __libc_message () from /lib64/libc.so.6
#17 0x00007fcadea9bbf7 in malloc_printerr () from /lib64/libc.so.6
#18 0x00007fcadea9f125 in _int_malloc () from /lib64/libc.so.6
#19 0x00007fcadeaa0b3a in calloc () from /lib64/libc.so.6
#20 0x0000000000510ad0 in pal_mem_calloc (type=MTYPE_LS_PREFIX, size=12) at pal_memory.c:52
#21 0x00000000005d19af in mfh_calloc (type=MTYPE_LS_PREFIX, size=12) at memory.c:168
#22 0x000000000056c6cd in ls_prefix_new (size=8) at ls_prefix.c:23
#23 0x000000000054e9c2 in ls_node_set (table=0x1c640d0, prefix=0x7fffeb616490) at ls_table.c:67
#24 0x000000000054f238 in ls_node_get (table=0x1c640d0, p=0x7fffeb616490) at ls_table.c:362
#25 0x00000000004b01e6 in ospf6_lsdb_add (lsdb=0x1c63a30, lsa=0x1d77f20) at ospf6_lsdb.c:316
#26 0x000000000049bca4 in ospf6_ls_retransmit_add (nbr=0x1c62b40, lsa=0x1d77f20) at ospf6_flood.c:140
#27 0x000000000049c5e8 in ospf6_flood_through_interface (oi=0x1c65d50, inbr=0x0, lsa=0x1d77f20) at ospf6_flood.c:396
#28 0x000000000049ca08 in ospf6_flood_through_as (top=0x1c5e2c0, inbr=0x0, lsa=0x1d77f20) at ospf6_flood.c:501
#29 0x000000000049cac0 in ospf6_flood_through (inbr=0x0, lsa=0x1d77f20) at ospf6_flood.c:519
#30 0x000000000049507a in ospf6_lsa_originate (top=0x1c5e2c0, type=5, param=0x1d78c60) at ospf6_lsa.c:3339
#31 0x00000000004cf778 in ospf6_redist_map_lsa_refresh (top=0x1c5e2c0, map=0x1d78c60) at ospf6_nsm.c:726
#32 0x00000000004cfa89 in ospf6_redist_map_update (table=0x1c5ea90, ri=0x1f40e40, type=5 '\005', parent=0x1c5e2c0) at ospf6_nsm.c:851
#33 0x00000000004d0b09 in ospf6_redistribute_timer (t=0x7fffeb616870) at ospf6_nsm.c:1240
#34 0x0000000000552526 in thread_call (thread=0x7fffeb616870) at thread.c:1283
#35 0x0000000000469290 in ospf6_start (daemon_mode=1, config_file=0x0, vty_port=2606, progname=0x7fffeb6176b4 "ospf6d") at ospf6_main.c:207
#36 0x0000000000468e6f in main (argc=2, argv=0x7fffeb616a18, envp=0x7fffeb616a30) at ../../platform/linux/ospf6.c:170

           

在網上搜尋了一些資訊,記錄一下可能的情況:

信号處理方法的問題

所有開源代碼裡,都少有人在信号處理方法裡寫大量代碼的,這是為什麼呢?

原因在于,信号是可能在任意時刻打斷你線程的正在執行代碼,信号處理方法插入進去執行時,就可能造成有些函數被反複重入。例如上面這個例子中,thead1正在new一個對象,執行malloc配置設定記憶體的過程中,突然被信号打斷,而信号處理方法裡居然又有malloc過程,而malloc是不能反複重入的!于是導緻挂死。

另一個問題的,子程序會繼承父程序的很多資源,其中就包括信号,他的程式處理信号後,才pthread_create許多工作線程,而且,沒有屏蔽信号,是以,所有的線程都在處理那個信号處理方法,所有線程都挂死了。

解決方法有很多種,通常是在信号處理方法裡隻做少量工作,通知其他線程自我回收資源。

對于多線程程式來說,隻弄一個線程使用阻塞式信号處理方法,專職的處理信号,這樣更符合多線程的設計精神。例如,在派生子線程前,用pthread_sigmask來設定信号不會打斷子線程的運作,而在主線程裡,使用阻塞的sigwait方法來同步處理信号,在這裡可以處理一些複雜的操作,不用擔心“重入”問題。

更貼近這個故障的:

nginx: worker process: malloc(): memory corruption

i use valgrind to check memory leak, and have detected some error:

==2243== Invalid write of size 1

==2243== at 0x4A08088: memcpy (mc_replace_strmem.c:628)

==2243== by 0x4448C9: ngx_http_proxy_subs_headers (ngx_http_proxy_subs_filter.c:149)

==2243== by 0x45B2FB: ngx_http_proxy_create_request (ngx_http_proxy_module.c:1235)

==2243== by 0x43EA7E: ngx_http_upstream_init_request (ngx_http_upstream.c:505)

==2243== by 0x43EE92: ngx_http_upstream_init (ngx_http_upstream.c:446)

==2243== by 0x4361C0: ngx_http_read_client_request_body (ngx_http_request_body.c:59)

==2243== by 0x459972: ngx_http_proxy_handler (ngx_http_proxy_module.c:703)

==2243== by 0x42BD23: ngx_http_core_content_phase (ngx_http_core_module.c:1396)

==2243== by 0x4269A2: ngx_http_core_run_phases (ngx_http_core_module.c:877)

==2243== by 0x426A9D: ngx_http_handler (ngx_http_core_module.c:860)

==2243== by 0x430661: ngx_http_process_request (ngx_http_request.c:1874)

==2243== by 0x430D97: ngx_http_process_request_headers (ngx_http_request.c:1318)

==2243== Address 0x5a1f29a is not stack’d, malloc’d or (recently) free’d

==2243==

==2243== Invalid write of size 8

==2243== at 0x4A080B3: memcpy (mc_replace_strmem.c:628)

==2243== by 0x4448C9: ngx_http_proxy_subs_headers (ngx_http_proxy_subs_filter.c:149)

==2243== by 0x45B2FB: ngx_http_proxy_create_request (ngx_http_proxy_module.c:1235)

==2243== by 0x43EA7E: ngx_http_upstream_init_request (ngx_http_upstream.c:505)

==2243== by 0x43EE92: ngx_http_upstream_init (ngx_http_upstream.c:446)

==2243== by 0x4361C0: ngx_http_read_client_request_body (ngx_http_request_body.c:59)

==2243== by 0x459972: ngx_http_proxy_handler (ngx_http_proxy_module.c:703)

==2243== by 0x42BD23: ngx_http_core_content_phase (ngx_http_core_module.c:1396)

==2243== by 0x4269A2: ngx_http_core_run_phases (ngx_http_core_module.c:877)

==2243== by 0x426A9D: ngx_http_handler (ngx_http_core_module.c:860)

==2243== by 0x430661: ngx_http_process_request (ngx_http_request.c:1874)

due to ngx_copy() out of bound, and caused by my code. so, i modify

the corrspongding code, it’s running ok until now.

pthread_once() self deadlock

繼續閱讀