背景
Android模拟器運作在PC端,Android應用運作在模拟器内部,當PC機在BIOS中沒有打開虛拟化技術(vt-x: intel的硬體虛拟化技術; AMD-V: AMD CPU的硬體虛拟化技術)的時候,在模拟器内部運作ARM庫的遊戲,出現崩潰或者運作一段時間之後崩潰的問題. 具體奔潰點在
__get_tls()+6
處. 這裡以
當樂.apk
這個遊戲為例子,删除其中libs下的x86庫,隻保留arm類型庫檔案,安裝運作後整個崩潰日志如下:
- :: E/ZKOPCountUtil( ): find Name = 當樂
- :: D/dalvikvm( ): GC_CONCURRENT freed K, % free K/K, paused ms+ms, total ms
- :: D/dalvikvm( ): WAIT_FOR_CONCURRENT_GC blocked ms
- :: D/dalvikvm( ): WAIT_FOR_CONCURRENT_GC blocked ms
- :: D/dalvikvm( ): WAIT_FOR_CONCURRENT_GC blocked ms
- :: W/View ( ): requestLayout() improperly called by android.support.v7.widget.AppCompatTextView{52831f4c V.ED.... ......I. 20,0-148,91 #7f0d0438 app:id/expand_title} during layout: running second layout pass
- :: D/Volley ( ): [] b.a: HTTP response for request=<[ ] http://res5.d.cn/cp/img/502487/o_1bbl6epie170sbec184qs9i1ggou.png 0x22e400ee LOW 2> [lifetime=4156], [size=67], [rc=200], [retryCount=0]
- :: F/libc ( ): Fatal signal (SIGSEGV) at x24244c8d (code=), thread (Thread-)
- :: I/DEBUG ( ): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
- :: I/DEBUG ( ): Build fingerprint: 'SAMSUNG/hlteatt/hlteuc:4.4.4/tt/eng.jenkins.20170306.140753:userdebug/test-keys'
- :: I/DEBUG ( ): Revision: '0'
- :: I/DEBUG ( ): pid: , tid: , name: Thread- >>> com.diguayouxi <<<
- :: I/DEBUG ( ): signal (SIGSEGV), code (SEGV_MAPERR), fault addr c8d
- :: D/dalvikvm( ): GC_CONCURRENT freed K, % free K/K, paused ms+ms, total ms
- :: I/GAv4-SVC( ): Google Analytics . is starting up.
- :: I/DEBUG ( ): eax c89 ebx b76b7fcc ecx edx
- :: I/DEBUG ( ): esi b76c694c edi
- :: I/DEBUG ( ): xcs xds b xes b xfs b xss b
- :: I/DEBUG ( ): eip b76343c6 ebp esp cc flags
- :: D/dalvikvm( ): GC_CONCURRENT freed K, % free K/K, paused ms+ms, total ms
- :: I/DEBUG ( ):
- :: I/DEBUG ( ): backtrace:
- :: I/DEBUG ( ): #00 pc c6 /system/lib/libc.so (__get_thread+)
- :: I/DEBUG ( ): #01 pc de2d /system/lib/libc.so (pthread_mutex_lock+)
- :: I/DEBUG ( ): #02 pc a745 /system/lib/libc.so (flockfile+)
- :: I/DEBUG ( ): #03 pc f /system/lib/libc.so (fread+)
- :: I/DEBUG ( ): #04 pc f6a /system/lib/libc.so (android_getaddrinfo_proxy+)
- :: I/DEBUG ( ): #05 pc c30 /system/lib/libc.so (android_getaddrinfoforiface+)
- :: I/DEBUG ( ): #06 pc e97 /system/lib/libc.so (getaddrinfo+)
- :: I/DEBUG ( ): #07 pc /system/lib/libjavacore.so (Posix_getaddrinfo(_JNIEnv*, _jobject*, _jstring*, _jobject*)+)
- :: I/DEBUG ( ): #08 pc a4ab /system/lib/libdvm.so (dvmPlatformInvoke+)
- :: I/DEBUG ( ): #09 pc a27 [heap]
- :: I/DEBUG ( ): #10 pc da2 /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+434)
03-27 15:: I/DEBUG ( ): #11 pc b8 /system/lib/libdvm.so
- :: I/DEBUG ( ): #12 pc cf7 <unknown>
- :: I/DEBUG ( ): #13 pc b962 /system/lib/libdvm.so (dvmMterpStd(Thread*)+)
- :: I/DEBUG ( ): #14 pc /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+217)
03-27 15:: I/DEBUG ( ): #15 pc bd027 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, char*)+759)
03-27 15:: I/DEBUG ( ): #16 pc bd437 /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+55)
03-27 15:: I/DEBUG ( ): #17 pc c3 /system/lib/libdvm.so (interpThreadStart(void*)+)
- :: I/DEBUG ( ): #18 pc bc3c /system/lib/libc.so (__thread_entry+)
- :: I/DEBUG ( ): #19 pc e1b5 /system/lib/libc.so (__pthread_clone+)
- :: I/DEBUG ( ): #20 pc fdf /system/lib/libdvm.so (internalThreadStart(void*)+)
- :: I/DEBUG ( ):
- :: I/DEBUG ( ): stack:
- :: I/DEBUG ( ): c b4db080e /system/lib/libdvm.so (dvmMterp_OP_RETURN_VOID_BARRIER+)
- :: I/DEBUG ( ): b8cadbc0 [heap]
- :: I/DEBUG ( ):
- :: I/DEBUG ( ):
- :: I/DEBUG ( ): c b7629f39 /system/lib/libc.so (pthread_mutex_unlock+)
- :: I/DEBUG ( ): a0
- :: I/DEBUG ( ): a4 db6fdee /data/dalvik-cache/[email protected]@[email protected]
- :: I/DEBUG ( ): a8 dce4
- :: I/DEBUG ( ): ac b7629fba /system/lib/libc.so (pthread_mutex_unlock+)
- :: I/DEBUG ( ): b0
- :: I/DEBUG ( ): b4 b8cadbd0 [heap]
- :: I/DEBUG ( ): b8 dd30518 /dev/ashmem/dalvik-LinearAlloc (deleted)
- :: I/DEBUG ( ): bc b7629fba /system/lib/libc.so (pthread_mutex_unlock+)
- :: I/DEBUG ( ): c0
- :: I/DEBUG ( ): c4 b8cae030 [heap]
- :: I/DEBUG ( ): c8 b7629d69 /system/lib/libc.so (pthread_mutex_lock+)
- :: I/DEBUG ( ): #00 cc b7629e2e /system/lib/libc.so (pthread_mutex_lock+)
- :: I/DEBUG ( ): #01 d0 a59e7eec /dev/ashmem/dalvik-heap (deleted)
- :: I/DEBUG ( ): d4 b8ea6808 [heap]
- :: I/DEBUG ( ): d8 b76bc718
- :: I/DEBUG ( ): dc b762ed4f /system/lib/libc.so (dlmalloc+)
- :: I/DEBUG ( ): e0 b76bc800
- :: I/DEBUG ( ): e4 b8cae030 [heap]
- :: I/DEBUG ( ): e8
- :: I/DEBUG ( ): ec
- :: I/DEBUG ( ): f0
- :: I/DEBUG ( ): f4 b8e2bee8 [heap]
- :: I/DEBUG ( ): f8 b7629d69 /system/lib/libc.so (pthread_mutex_lock+)
- :: I/DEBUG ( ): fc b76b7fcc /system/lib/libc.so
- :: I/DEBUG ( ): b8ea6808 [heap]
- :: I/DEBUG ( ):
- :: I/DEBUG ( ): b76c63a0
- :: I/DEBUG ( ): c b7676746 /system/lib/libc.so (flockfile+)
- :: I/DEBUG ( ): #02 b76c694c
- :: I/DEBUG ( ): b8e2bee8 [heap]
- :: I/DEBUG ( ):
- :: I/DEBUG ( ): c b76b7fcc /system/lib/libc.so
- :: I/DEBUG ( ): da [stack:]
- :: I/DEBUG ( ): b7676726 /system/lib/libc.so (flockfile+)
- :: I/DEBUG ( ): b76b7fcc /system/lib/libc.so
- :: I/DEBUG ( ): c b7662520 /system/lib/libc.so (fread+)
- :: I/DEBUG ( ):
- :: I/DEBUG ( ): memory map around fault addr c8d:
- :: I/DEBUG ( ): c142000-c145000 rw-
- :: I/DEBUG ( ): (no map for address)
- :: I/DEBUG ( ): d000-e000 ---
- :: I/PhenotypeConfigurator( ): Scheduling Phenotype for one-off execution seconds from now ()
- :: D/dalvikvm( ): GC_CONCURRENT freed K, % free K/K, paused ms+ms, total ms
問題定位
根據奔潰日志,找到相應的函數
__get_tls()
,在源碼中實作如下:
//android-4.4.4\bionic\libc\arch-x86\bionic\__get_tls.c
/* see the implementation of __set_tls and pthread.c to understand this
* code. Basically, the content of gs:[0] always is a pointer to the base
* address of the tls region
*/
void* __get_tls(void)
{
void* tls;
asm ( " movl %%gs:0, %0" : "=r"(tls) );
return tls;
}
從代碼的注釋可以看出,這個
gs寄存器
儲存的是指向TLS(Thread Local Storage:線程本地存儲)的基位址指針.用IDA能更加直覺的看到奔潰的點.如下是用IDA打開libc.so的
__get_tls()
函數,那麼在
__get_tls()+6
這行崩潰,也就是
mov eax, [eax+4]
間接取址崩潰.
.text:C0
.text:C0 ; =============== S U B R O U T I N E =======================================
.text:C0
.text:C0
.text:C0 public __get_thread
.text:C0 __get_thread proc near ; CODE XREF: __pthread_cleanup_push+Bp
.text:C0 ; __pthread_cleanup_pop+Bp ...
.text:C0 mov eax, large gs:
.text:C6 mov eax, [eax+]
.text:C9 nop
.text:CA nop
.text:CB nop
.text:CC nop
.text:CD retn
.text:CD __get_thread endp
那麼問題來了,eax是從gs寄存器讀取的值,加4後間接尋址失敗.這裡gs寄存器的值肯定有問題,從奔潰日志的來看,eax寄存器的值就是gs:0的值,這裡位址有問題.那麼現在我們需要了解的是這個gs寄存器哪裡設定,作用時啥?
既然代碼注釋說明了gs時存放tls基位址指針的,tls存放在核心GDT表中,那麼這個gs應該是由核心來設定的.這裡以x86的段配置設定為例子,段定義檔案在
asm\Segment.h
中,如下:
// genymotion_kernel_3.10\arch\x86\include\asm\Segment.h
/*
* The layout of the per-CPU GDT under Linux:
*
* 0 - null
* 1 - reserved
* 2 - reserved
* 3 - reserved
*
* 4 - unused <==== new cacheline
* 5 - unused
*
* ------- start of TLS (Thread-Local Storage) segments:
*
* 6 - TLS segment #1 [ glibc's TLS segment ]
* 7 - TLS segment #2 [ Wine's %fs Win32 segment ]
* 8 - TLS segment #3
* 9 - reserved
* 10 - reserved
* 11 - reserved
*
* ------- start of kernel segments:
*
* 12 - kernel code segment <==== new cacheline
* 13 - kernel data segment
* 14 - default user CS
* 15 - default user DS
* 16 - TSS
* 17 - LDT
* 18 - PNPBIOS support (16->32 gate)
* 19 - PNPBIOS support
* 20 - PNPBIOS support
* 21 - PNPBIOS support
* 22 - PNPBIOS support
* 23 - APM BIOS support
* 24 - APM BIOS support
* 25 - APM BIOS support
*
* 26 - ESPFIX small SS
* 27 - per-cpu [ offset to per-cpu data area ]
* 28 - stack_canary-20 [ for stack protector ]
* 29 - unused
* 30 - unused
* 31 - TSS for double fault handler
*/
... ...
//省去部分代碼
/*
* Save a segment register aw
*/
#define savesegment(seg, value) \
asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
/*
* x86_32 user gs accessors.
*/
#ifdef CONFIG_X86_32
#ifdef CONFIG_X86_32_LAZY_GS
#define get_user_gs(regs) (u16)({unsigned long v; savesegment(gs, v); v;})
#define set_user_gs(regs, v) loadsegment(gs, (unsigned long)(v))
#define task_user_gs(tsk) ((tsk)->thread.gs)
#define lazy_save_gs(v) savesegment(gs, (v))
#define lazy_load_gs(v) loadsegment(gs, (v))
#else /* X86_32_LAZY_GS */
#define get_user_gs(regs) (u16)((regs)->gs)
#define set_user_gs(regs, v) do { (regs)->gs = (v); } while (0)
#define task_user_gs(tsk) (task_pt_regs(tsk)->gs)
#define lazy_save_gs(v) do { } while (0)
#define lazy_load_gs(v) do { } while (0)
#endif /* X86_32_LAZY_GS */
#endif /* X86_32 */
問題解決
從上表可以看出整個GDT的分段,其中包括TLS段,關鍵的是在最後有關擷取gs寄存器值的方法.可以看到,在核心配置了
CONFIG_X86_32
的情況下,有兩個擷取gs寄存器值的方法,依賴于核心中宏
CONFIG_X86_32_LAZY_GS
的定義與否.
通過檢視核心中
CONFIG_X86_32_LAZY_GS
的定義,發現處于選中狀态,那麼此時gs的值是從局部變量v中指派給gs的,這個時候局部變量的值由于沒有初始化,是以為一個随機值.如果沒有選
CONFIG_X86_32_LAZY_GS
,那麼直接擷取gs寄存器的值傳回,這是regs的值在哪裡設定gs暫且不表.看到這裡也許還是不明白gs在整個核心中的作用以及流程.沒有關系,後續在深入. 至于解決這個問題,由于發現
CONFIG_X86_32_LAZY_GS
對擷取gs寄存器的影響,配置核心,去除
CONFIG_X86_32_LAZY_GS
選項,重編後驗證,當樂.apk正常運作.說明此配置影響gs寄存器的取值.
解決patch如下,合入x86的deconfig配置檔案即可:
@@ -37,7 +37,6 @@ CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
-CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
@@ -452,7 +451,7 @@ CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
-# CONFIG_CC_STACKPROTECTOR is not set
+CONFIG_CC_STACKPROTECTOR=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
- 上述
和CONFIG_X86_32_LAZY_GS
是依賴關系,去除CONFIG_CC_STACKPROTECTOR
配置需要選擇CONFIG_X86_32_LAZY_GS
CONFIG_CC_STACKPROTECTOR=y
- 如果打開上述核心配置選項出現核心編譯錯誤
,請參考本人的另外一篇文章: Linux編譯x86架構核心出現_stack_chk_guard未定義錯誤error: undefined reference to '__stack_chk_guard'
總結
好了,此問題解決了,但是還有很多疑點沒有搞清楚,這個最要命了,作為開發,不了解整個流程總是心裡沒底,不踏實.但是還是得慢慢來,後續就是對整個GDT以及記憶體進行學習
感謝
2017 …… ,卷起褲管跑,撸起袖子幹!
yanxiangyfg的專欄 : “忠于實踐,記錄點滴”