文章目錄
- 1 概述
- 2 SWT機制
-
- 2.1 原理設計框圖
- 2.2 Watchdog的初始化
- 2.3 Watchdog的運作
- 3 導緻 SWT 重新開機原因
- 4 log分析
-
- 4.1 搜尋關鍵 watchdog
- 4.2 搜尋關鍵字 held by
- 4.3 檢查Binder的Server 端
-
- 4.3.1 重要資訊
- 4.3.2 如何确定binder的對端
- 4.4 Native 方法執行時間過長導緻重新開機
- 4.5 SurfaceFlinger 卡住導緻重新開機
- 4.6 Zygote Fork 程序時卡住
- 4.7 Dump 時間過長
- 5 案例
1 概述
一般手機異常reboot的原因:
Reboot rootcause
1.JE on the system process
search keywords “FATAL EXCEPTION IN SYSTEM PROCESS:”in main_log
2.NE on the system process
search keywords “Fatal signal”in main_log
3.SWT on the system process
Search keywords “WATCHDOG KILLING SYSTEM PROCESS”in main_log
4.KE occurs
Search keywords “Kernel panic”in kernel_log
Android SWT 即 Android Software Watchdog Timeout。
System Server程序是Android的一個核心程序,裡面為APP運作提供了核心的服務。如果System Server的一些核心服務和重要線程卡住,就會導緻相應的功能異常。
如手機發生hang機,輸入無響應,無法啟動APP等一些不正常的情況。而且,如果沒有一種機制,讓這些服務複位的話,那麼将嚴重影響客戶體驗。尤其是目前大多數手機把電池封裝在手機裡面的這種,想撥電池重新開機都很難。
是以有必要在核心服務和核心線程卡住的時候,讓系統有自動複位的機會。于是,google引入了Sytem Server watchdog機制。這個機制來監控核心服務和核心線程是否卡住。
Framework層的Watchdog監控系統的重要服務和程序是否死鎖,例如AMS、WMS、PMS、MS、IMS等。在解析init.rc時初始化和啟動,通過addMonitor添加需要監控對象,然後開啟30s循環,如果檢測到死鎖,則dumpTrace并再檢測一輪,如果還是死鎖,則dumpTrace并殺掉systemServer讓zygote重新開機。
watchdog is a thread of system_server, it will mornitor some core service of system_server, such as:
AcitivityManagerService,WindowManagerService,PowerManagerService
SWT主要用來監控SystemServer 重要線程/Service 的運作情況。如果判斷阻塞 60s ,就會把系統重新開機,來保證系統恢複正常狀态.
2 SWT機制
2.1 原理設計框圖
最開始設計隻是在main looper 裡面執行register 的monitor 對象 的monitor 方法. 後續Google 改進,通過HandlerChecker 來達成,在HandlerChecker 中審查注入的montior 對象是否能快速執行。即在foreground thread 來執行register 的monitor 對象,而對于其他的線程,則是審查規定時間内是否可以達到idle,不是一直卡死在某個message執行上。
需要注意的是, SystemServer Watchdog啟動是在SystemServer init 的後期, 如果SystemServer 在init 的過程中卡死了,那麼就意味着watchdog 不會有任何的作用.
2.2 Watchdog的初始化
Android的Watchdog是一個單例線程,在System Server啟動時就會init &start Watchdog:
private void startOtherServices() {
...
traceBeginAndSlog("InitWatchdog");
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
traceEnd();
...
traceBeginAndSlog("StartWatchdog");
Watchdog.getInstance().start();
traceEnd();
...
}
Watchdog在初始化時,會建構很多HandlerChecker,大緻可以分為兩類:
Monitor Checker,用于檢查是Monitor對象可能發生的死鎖, AMS, PKMS, WMS等核心的系統服務都是Monitor對象。
在代碼裡搜尋關鍵字addMonitor,會檢測如下資訊:
LINUX/android/frameworks/base/services/core/java/com/android/server/input/
InputManagerService.java 349 Watchdog.getInstance().addMonitor(this); in start()
LINUX/android/frameworks/base/services/core/java/com/android/server/wm/
WindowManagerService.java 1161 Watchdog.getInstance().addMonitor(this); in WindowManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/am/
ActivityManagerService.java 2988 Watchdog.getInstance().addMonitor(this); in ActivityManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/media/
MediaRouterService.java 95 Watchdog.getInstance().addMonitor(this); in MediaRouterService()
......
各個server的monitor函數也隻是檢測是否可以獲得要檢測的鎖對象,這些service通過
Watchdog.getInstance().addMonitor(this)将自己(實作了Watchdog.Monitor)添加到
Watchdog.mMonitorChecker.mMonitors清單中,該清單會不斷調用Monitor.Monitor()函數。
WindowManagerService.java
// Called by the heartbeat to ensure locks are not held indefnitely (for deadlock detection).
@Override
public void monitor() {
synchronized (mWindowMap) { }
}
具體看各個service中實作的monitor函數,發現這個函數很簡單,就是去擷取對應鎖,如果線程死鎖或其他原因阻塞,那麼必然無法擷取鎖,monitor()函數執行必然會阻塞。Watchdog就是利用這個原理來判斷是否死鎖。
Looper Checker,用于檢查線程的消息隊列是否長時間處于工作狀态。Watchdog自身的消息隊列,Ui, Io, Display這些全局的消息隊列都是被檢查的對象。此外,一些重要的線程的消息隊列,也會加入到Looper Checker中,譬如AMS, PKMS,這些是在對應的對象初始化時加入的。
代碼裡搜尋關鍵字addThread,會檢測如下資訊:
LINUX/android/frameworks/base/services/core/java/com/android/server/power/
PowerManagerService.java 775 Watchdog.getInstance().addThread(mHandler); in onStart()
LINUX/android/frameworks/base/services/core/java/com/android/server/am/
ActivityManagerService.java 2989 Watchdog.getInstance().addThread(mHandler); in ActivityManagerService()
LINUX/android/frameworks/base/services/core/java/com/android/server/pm/
PackageManagerService.java 2512 Watchdog.getInstance().addThread(mHandler, WATCHDOG_TIMEOUT); in PackageManagerService()
addThread()将PowerManagerService、PackageManagerService、ActivityManagerService
等幾個主線程Handler儲存到Watchdog.mHandlerCheckers清單中;同時還會把上面提到的mMonitorChecker也儲存到Watchdog.mHandlerCheckers中;另外還會将foreground thread、ui thread、i/o thread 、display thread 、main thread的Handler也儲存到Watchdog.mHandlerCheckers中來;
/frameworks/base/services/core/java/com/android/server/Watchdog.java
// The shared foreground thread is the main checker. It is where we
// will also dispatch monitor checks and do other work.
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// Add checker for main thread. We only do a quick check since there
// can be UI running on the thread.
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
"main thread", DEFAULT_TIMEOUT));
// Add checker for shared UI thread.
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// And also check IO thread.
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
"i/o thread", DEFAULT_TIMEOUT));
// And the display thread.
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
"display thread", DEFAULT_TIMEOUT));
Watchdog會不斷判斷這些線程的Looper是否空閑,如果一直非空閑,那麼必然被blocked住了。
2.3 Watchdog的運作
通過前面的初始化,已經将watchdog需要監測的對象全部準備就緒。接下來就要看它具體是如何去監測的了。
watchdog本身就是一個線程,我們想知道它是如何去監測各個對象的?那就直接從它的run方法來看就好:
@Override
public void run() {
boolean waitedHalf = false;//辨別第一個30s逾時
boolean mSFHang = false;//辨別surfaceflinger是否hang?
mProcessStats.init(); // mtk: get cpu info when SWT occur.
while (true) {
final List<HandlerChecker> blockedCheckers;
final String subject;
final String name;
mSFHang = false;
if (exceptionHWT != null && waitedHalf == false ) {
exceptionHWT.WDTMatterJava(300);//hang_detect機制相關,用來重置hang_detect的計數,這裡表示正常階段
具體可以檢視MOL > Quick Start > Hang Detect 問題快速分析 專題!
}
final boolean allowRestart;//發生SWT要不要重新開機
int debuggerWasConnected = 0;
if (DEBUG)
Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis());
synchronized (this) {
if (DEBUG)
Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis());
long timeout = CHECK_INTERVAL;
long SFHangTime;
// Make sure we (re)spin the checkers that have become idle within
// this wait-and-check interval
//1、排程所有的HandlerChecker
for (int i=0; i<mHandlerCheckers.size(); i++) {
//mHandlerCheckers中包含一個mMonitorChecker和幾個重要線程的HandlerChecker
(System_Server程序是最重要的線程):該for循環主要檢測mMonitorChecker中重要的
幾把鎖(檢測死鎖)、幾個重要線程的消息隊列是否空閑(檢測是否blocked),當然判斷是
否死鎖和被blocked的依據便是是否逾時。
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();//
shceduleCheckLocked()所做的事情可以想象成給所有的目标thread發放任務。
//如果一個線程死鎖或被blocked,那麼該HandlerChecker的mCompleted = false、
mStartTime=阻塞的初始時間點,是判斷的基礎
}
if (debuggerWasConnected > 0) {
debuggerWasConnected--;
}
// NOTE: We use uptimeMillis() here because we do not want to increment the time we
// wait while asleep. If the device is asleep then the thing that we are waiting
// to timeout on is asleep as well and won't have a chance to run, causing a false
// positive on when to kill things.
//2、開始定期檢查
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
try {
wait(timeout);
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
}
//3、檢查HandlerChecker的完成狀态
//MTK enhance
SFHangTime = GetSFStatus();
if (DEBUG) Slog.w(TAG, "**Get SF Time **" + SFHangTime);
if (SFHangTime > TIME_SF_WAIT * 2) {
Slog.v(TAG, "**SF hang Time **" + SFHangTime);
mSFHang = true;
blockedCheckers = getBlockedCheckersLocked();
subject = "";
} //@@
else {
boolean fdLimitTriggered = false;
if (mOpenFdMonitor != null) {
fdLimitTriggered = mOpenFdMonitor.monitor();
}
if (!fdLimitTriggered) {
final int waitState = evaluateCheckerCompletionLocked();
//根據mCompleted 、mStartTime值評估等待狀态
if (waitState == COMPLETED) {//檢測完成并正常,繼續檢查
// The monitors have returned; reset
waitedHalf = false;
continue;
} else if (waitState == WAITING) {//30秒之内,繼續檢查
// still waiting but within their configured intervals;
// back off and recheck
continue;
} else if (waitState == WAITED_HALF) {//30~60秒之内,dump一些資訊并繼續檢查
if (!waitedHalf) {
// We've waited half the deadlock-detection interval. Pull a stack
// trace and wait another half.
if (exceptionHWT != null) {
exceptionHWT.WDTMatterJava(360);//hang_detect機制相關,表示進入dump階段
}
ArrayList<Integer> pids = new ArrayList<Integer>();
pids.add(Process.myPid());
ActivityManagerService.dumpStackTraces(true, pids, null, null,
getInterestingNativePids());
mProcessStats.update();
waitedHalf = true;
}
continue;
}
//4、收集逾時的HandlerChecker
// something is overdue!
blockedCheckers = getBlockedCheckersLocked();
//超了60秒,此時便出問題了,收集逾時的HandlerChecker
subject = describeCheckersLocked(blockedCheckers);
} else {
blockedCheckers = Collections.emptyList();
subject = "Open FD high water mark reached";
}
}
allowRestart = mAllowRestart;
}
// If we got here, that means that the system is most likely hung.
// First collect stack traces from all threads of the system process.
// Then kill this process so that the system will restart.
//5、儲存一些重要日志,并根據設定,來判斷是否需要重新開機系統
Slog.e(TAG, "**SWT happen **" + subject);
if (exceptionHWT != null) {
exceptionHWT.switchFtrace(2);
}
name = (mSFHang && subject.isEmpty()) ? "surfaceflinger hang." : "";
EventLog.writeEvent(EventLogTags.WATCHDOG, name.isEmpty() ? subject : name);
//将阻塞線程資訊列印到Event日志中
if (exceptionHWT != null) {
exceptionHWT.WDTMatterJava(420);//hang_detect機制相關,表示發生了SWT
}
mProcessStats.update();
final String cpuInfo = mProcessStats.printCurrentState(SystemClock.uptimeMillis());
Slog.d(TAG, mProcessStats.printCurrentLoad());
ArrayList<Integer> pids = new ArrayList<>();
pids.add(Process.myPid());
if (mPhonePid > 0) pids.add(mPhonePid);
// Pass !waitedHalf so that just in case we somehow wind up here without having
// dumped the halfway stacks, we properly re-initialize the trace file.
final File stack = ActivityManagerService.dumpStackTraces(
!waitedHalf, pids, null, null, getInterestingNativePids());
// Give some extra time to make sure the stack traces get written.
// The system's been hanging for a minute, another second or two won't hurt much.
SystemClock.sleep(2000);
// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
doSysRq('w');
doSysRq('l');
/// M: WDT debug enhancement
/// need to wait the AEE dumps all info, then kill system server @{
// Try to add the error to the dropbox, but assuming that the ActivityManager
// itself may be deadlocked. (which has happened, causing this statement to
// deadlock and the watchdog as a whole to be ineffective)
Slog.v(TAG, "** save all info before killnig system server **");
Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
public void run() {
Slog.v(TAG, "** start addErrorToDropBox **");
mActivity.addErrorToDropBox(
"watchdog", null, "system_server", null, null,
name.isEmpty() ? subject : name, cpuInfo, stack, null);
}
};
dropboxThread.start();
try {
dropboxThread.join(2000); // wait up to 2 seconds for it to return.
} catch (InterruptedException ignored) {}
IActivityController controller;
synchronized (this) {
controller = mController;
}
if ((mSFHang == false) && (controller != null)) {
Slog.i(TAG, "Reporting stuck state to activity controller");
try {
Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
Slog.i(TAG, "Binder.setDumpDisabled");
// 1 = keep waiting, -1 = kill system
int res = controller.systemNotResponding(subject);
if (res >= 0) {
Slog.i(TAG, "Activity controller requested to coninue to wait");
waitedHalf = false;
continue;
}
Slog.i(TAG, "Activity controller requested to reboot");
} catch (RemoteException e) {
}
}
// Only kill the process if the debugger is not attached.
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
if (debuggerWasConnected >= 2) {
Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
} else if (debuggerWasConnected > 0) {
Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
} else if (!allowRestart) {
Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
} else {
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
/// @}
Slog.w(TAG, "*** GOODBYE!");
exceptionHWT.WDTMatterJava(330); // 330 means watchdog exit successfully.
// MTK enhance
if (mSFHang)
{
Slog.w(TAG, "SF hang!");
if (GetSFReboot() > 3)
{
Slog.w(TAG, "SF hang reboot time larger than 3 time, reboot device!");
rebootSystem("Maybe SF driver hang,reboot device.");
}
else
{
SetSFReboot();
}
}
//@
if (mSFHang) {
Slog.v(TAG, "killing surfaceflinger for surfaceflinger hang");
String[] sf = new String[] {"/system/bin/surfaceflinger"};
int[] pid_sf = Process.getPidsForCommands(sf);
if (pid_sf[0] > 0) {
Process.killProcess(pid_sf[0]);
}
Slog.v(TAG, "killing surfaceflinger end");
} else {
Process.killProcess(Process.myPid());
}
System.exit(10);
}
waitedHalf = false;
}
}
以上代碼片段主要的運作邏輯如下:
1、Watchdog運作後,便開始無限循環,依次調用每一個HandlerChecker的scheduleCheckLocked()方法
2、排程完HandlerChecker之後,便開始定期檢查是否逾時,每一次檢查的間隔時間由CHECK_INTERVAL常量設定,為30秒
3、每一次檢查都會調用evaluateCheckerCompletionLocked()方法來評估一下HandlerChecker的完成狀态:
a、COMPLETED表示已經完成
b、WAITING和WAITED_HALF表示還在等待,但未逾時
c、OVERDUE表示已經逾時。預設情況下,timeout是1分鐘,但監測對象可以通過傳參自行設定,譬如PKMS的Handler Checker的逾時是10分鐘
4、如果逾時時間到了,還有HandlerChecker處于未完成的狀态(OVERDUE),則通過getBlockedCheckersLocked()方法,擷取阻塞的HandlerChecker,生成一些描述資訊
5、儲存日志,包括一些運作時的堆棧資訊,這些日志是我們解決Watchdog問題的重要依據。如果判斷需要殺掉system_server程序,則給目前程序(system_server)發送signal 9
3 導緻 SWT 重新開機原因
線程死鎖
Binder的server端卡住
Native方法執行時間過長
SurfaceFlinger卡住
Dump時間過長
Zygote fork程序時卡住
Binder used up
…
4 log分析
4.1 搜尋關鍵 watchdog
搜尋關鍵 watchdog初步分析rootcause
03-29 16:59:14.818 748 4784 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.input.InputManagerService on foreground thread (android.fg)
03-29 16:59:14.818 748 4784 W Watchdog: foreground thread stack trace:
03-29 16:59:14.819 748 4784 W Watchdog: at com.android.server.input.InputManagerService.nativeMonitor(Native Method)
03-29 16:59:14.819 748 4784 W Watchdog: at com.android.server.input.InputManagerService.monitor(InputManagerService.java:1404) 03-29 16:59:14.819 748 4784 W Watchdog: at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:179)
03-29 16:59:14.819 748 4784 W Watchdog: at android.os.Handler.handleCallback(Handler.java:739) 03-29 16:59:14.819 748 4784 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:95)
03-29 16:59:14.819 748 4784 W Watchdog: at android.os.Looper.loop(Looper.java:135)
03-29 16:59:14.819 748 4784 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:61)
03-29 16:59:14.819 748 4784 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:46)
03-29 16:59:14.819 748 4784 W Watchdog: *** GOODBYE!
可見WD檢測到InputManagerService發生Block:
03-29 16:59:14.818 748 4784 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.input.InputManagerService on foreground thread (android.fg)
在 android.fg線程發生block,檢視對應trace:
"android.fg" prio=5 tid=17 Native
| group="main" sCount=1 dsCount=0 obj=0x12e7a120 self=0xb7b20160
| sysTid=771 nice=0 cgrp=default sched=0/0 handle=0xb7b20538
| state=S schedstat=( 1927397505 30260078920 12131 ) utm=138 stm=54 core=0 HZ=100
| stack=0xa51df000-0xa51e1000 stackSize=1036KB
| held mutexes=
kernel: (couldn't read /proc/self/task/771/stack)
native: #00 pc 0000f9b0 /system/lib/libc.so (syscall+28)
native: #01 pc 0001318d
/system/lib/libc.so (__pthread_cond_timedwait_relative(pthread_cond_t*, pthread_mutex_t*, timespec const*)+56)
native: #02 pc 00025315 /system/lib/libinputflinger.so (android::InputReader::monitor()+28)
native: #03 pc 0000f56d /system/lib/libandroid_servers.so (???)
native: #04 pc 0011441d
/system/framework/arm/services.odex (Java_com_android_server_input_InputManagerService_nativeMonitor__J+88)
at com.android.server.input.InputManagerService.nativeMonitor(Native method)
at com.android.server.input.InputManagerService.monitor(InputManagerService.java:1404)
at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:179)
at android.os.Handler.handleCallback(Handler.java:739)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:135)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)
Wd監控IMS的InputReader及InputDispatcher線程,從以上trace可知InputReader可能發生死鎖,是以繼續檢視InputReader線程:
"InputReader" prio=10 tid=34 Native
| group="main" sCount=1 dsCount=0 obj=0x13100220 self=0xb7c0dc80
| sysTid=4310 nice=-8 cgrp=default sched=0/0 handle=0xb7b098a0
| state=S schedstat=( 62300529241 31784888301 33265 ) utm=5130 stm=1100 core=3 HZ=100
| stack=0xa3b7b000-0xa3b7d000 stackSize=1012KB
| held mutexes=
kernel: (couldn't read /proc/self/task/4310/stack)
native: #00 pc 0000f9b0 /system/lib/libc.so (syscall+28)
native: #01 pc 000a8c4b /system/lib/libart.so (art::ConditionVariable::Wait(art::Thread*)+82)
native: #02 pc 001abc1b /system/lib/libart.so (art::JNI::IsInstanceOf(_JNIEnv*, _jobject*, _jclass*)+1022)
native: #03 pc 0001fe2f /system/lib/libjavacore.so (???)
native: #04 pc 0001fe71 /system/lib/libjavacore.so (???)
native: #05 pc 00283a93 /system/framework/arm/boot.oat (Java_libcore_io_Posix_writeBytes__Ljava_io_FileDescriptor_2Ljava_lang_Object_2II+142)
at libcore.io.Posix.writeBytes(Native method)
at libcore.io.Posix.write(Posix.java:258)
at libcore.io.BlockGuardOs.write(BlockGuardOs.java:313)
at libcore.io.IoBridge.write(IoBridge.java:497)
at java.io.FileOutputStream.write(FileOutputStream.java:186)
at java.io.OutputStreamWriter.flushBytes(OutputStreamWriter.java:167)
- locked <@addr=0x14079080> (a java.io.FileOutputStream)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:140)
- locked <@addr=0x14079080> (a java.io.FileOutputStream)
at com.android.internal.policy.impl.PhoneWindowManager.saveCoverStatus(PhoneWindowManager.java:6266)
at com.android.internal.policy.impl.PhoneWindowManager.interceptKeyBeforeQueueing(PhoneWindowManager.java:5823)
at com.android.server.wm.InputMonitor.interceptKeyBeforeQueueing(InputMonitor.java:370)
at com.android.server.input.InputManagerService.interceptKeyBeforeQueueing(InputManagerService.java:1491)
可知在 saveCoverStatus時write發生等待,是以一方面檢測函數調用。
另外從全局來檢視下InputReader是否正常:
按power鍵點亮螢幕:
03-29 16:53:36.248 748 4310 D InputReader: Perf-Track KeyboardInputMapper Up keyCode=26
進行觸屏操作:
03-29 16:53:52.894 748 4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=396.44937, y=1192.06873
03-29 16:53:53.007 748 4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=396.44937, y=1192.06873
03-29 16:53:53.954 748 4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=646.10266, y=853.33331
03-29 16:53:54.112 748 4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=532.26074, y=902.29510
03-29 16:53:54.457 748 4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Down:x=370.48544, y=1029.19592
03-29 16:53:54.595 748 4310 D InputReader: Perf-Track AppLaunch_dispatchPtr:Up:x=518.28015, y=920.28101
傳入了鍵值keyCode=0 (KEYCODE_UNKNOWN = 0):
03-29 16:57:35.743 748 4310 D InputReader: Perf-Track KeyboardInputMapper Down keyCode=0, scanCode=251, DownTime=0
03-29 16:57:35.743 748 4310 D InputReader: Perf-Track KeyboardInputMapper Up keyCode=0
是以另一方面檢查keyCode=0 (KEYCODE_UNKNOWN = 0)産生的原因。一方面檢查函數調用是否有異常,另一方面從鍵值映射方面來檢查。
此外 ,我們可以從中吸取一些經驗就是盡量避免在重要服務進行大量耗時操作。
4.2 搜尋關鍵字 held by
搜尋關鍵字held by判斷線程是否被 Block,是否有死鎖問題
例如:
Cmd line: system_server
"ActivityManager" prio=5 tid=14 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12d46f90 self=0xb732ba18
| sysTid=840 nice=-2 cgrp=apps sched=0/0 handle=0xb732c040
| state=S schedstat=( 3934680041 3104214846 9460 ) utm=166 stm=227 core=2 HZ=100
| stack=0xa6106000-0xa6108000 stackSize=1036KB
| held mutexes=
at com.android.server.net.NetworkStatsService.setUidForeground(NetworkStatsService.java:641)
- waiting to lock <0x2e5f7eff> (a java.lang.Object) held by thread 35
at com.android.server.net.NetworkPolicyManagerService.updateRulesForUidLocked(NetworkPolicyMana
gerService.java:2081)
waiting to lock <0x2e5f7eff> (a java.lang.Object) held by thread 35, 檢視thread 35
"NetworkStats" prio=5 tid=35 TimedWaiting
| group="main" sCount=1 dsCount=0 obj=0x1393f9e0 self=0xb758ce48
| sysTid=2021 nice=0 cgrp=apps sched=0/0 handle=0xb76fab58
| state=S schedstat=( 402053186 255954674 601 ) utm=27 stm=13 core=0 HZ=100
| stack=0xa32db000-0xa32dd000 stackSize=1036KB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x16b70c1b> (a java.lang.Object)
at java.lang.Thread.parkFor(Thread.java:1220)
- locked <0x16b70c1b> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:299)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:197)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueue
dSynchronizer.java:2055)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:388)
at com.android.server.NativeDaemonConnector$ResponseQueue.remove(NativeDaemonConnector.jav
a:614)
at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:406)
at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:360)
at com.android.server.NetworkManagementService.getNetworkStatsTethering(NetworkManagementSer
vice.java:1731)
4.3 檢查Binder的Server 端
檢查Binder的Server 端是否卡住
4.3.1 重要資訊
如果線程的狀态是Native,并且callstack中含有IPCThreadState::waitForResponse–>IPCThreadState::talkWithDriver的資訊就可以初步說明是卡在對端,下一步就是要找到對端
找到事件以及發生的時間點
01-01 03:05:34.411 639 1064 I watchdog: Blocked in handler on main thread (main)
block module: main timeout: 60s
檢視時間段内對應線程的 call stack,初步說明是卡在對端
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x721e0870 self=0xa805d000
| sysTid=941 nice=0 cgrp=default sched=0/0 handle=0xabfe74a4
| state=S schedstat=( 6662448086 9845197785 9495 ) utm=548 stm=118 core=1 HZ=100
| stack=0xbe090000-0xbe092000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/941/stack)
native: #00 pc 00049304 /system/lib/libc.so (__ioctl+8)
native: #01 pc 0001deef /system/lib/libc.so (ioctl+38)
native: #02 pc 0004242f /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
native: #03 pc 00042de9 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
native: #04 pc 0003d2e5 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000bcdad /system/lib/libandroid_runtime.so (???)
native: #06 pc 0074ec65 /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:764)
at android.content.om.IOverlayManager$Stub$Proxy.getOverlayInfo(IOverlayManager.java:254)
at com.android.systemui.statusbar.phone.StatusBar.isUsingDarkTheme(unavailable:-1)
at com.android.systemui.statusbar.phone.StatusBar.updateTheme(unavailable:-1)
at com.android.systemui.statusbar.phone.StatusBar.onColorsChanged(unavailable:-1)
at com.android.internal.colorextraction.ColorExtractor.triggerColorsChanged(ColorExtractor.java:186)
at com.android.systemui.colorextraction.SysuiColorExtractor.setWallpaperVisible(unavailable:-1)
at com.android.systemui.colorextraction.SysuiColorExtractor$1.lambda$-com_android_systemui_colorextraction_SysuiColorExtractor$1_3105(unavailable:-1)
at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.$m$0(unavailable:-1)
at com.android.systemui.colorextraction.-$Lambda$j2m7lOWVNe22BvvVwNuW1ftTq4c.run(unavailable:-1)
at android.os.Handler.handleCallback(Handler.java:790)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:164)
at android.app.ActivityThread.main(ActivityThread.java:6523)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:857)
systemui調用getOverlayInfo 出問題是因為Binder對端
檢視SYS_BINDER_INFO,确定Binder對端tid以及sysTid,這裡無法确認,但是從getOverlayInfo 來看可能是sysTid=4447
"Binder:639_17" prio=5 tid=115 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13e4a4a0 self=0xa0310e00
| sysTid=4447 nice=0 cgrp=default sched=0/0 handle=0x85a75970
| state=S schedstat=( 1407904321 1543930926 3137 ) utm=101 stm=39 core=2 HZ=100
| stack=0x8597b000-0x8597d000 stackSize=1006KB
| held mutexes=
at com.android.server.om.OverlayManagerService$1.getOverlayInfo(OverlayManagerService.java:511)
- waiting to lock <0x0d53be75> (a java.lang.Object) held by thread 1
at android.content.om.IOverlayManager$Stub.onTransact(IOverlayManager.java:79)
at android.os.Binder.execTransact(Binder.java:697)
而0x0d53be75鎖被sysTid=639 持有,持有不釋放是639的Binder調用出了問題
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x721e0870 self=0xa805d000
| sysTid=639 nice=-2 cgrp=default sched=0/0 handle=0xabfe74a4
| state=S schedstat=( 38966905233 5299817531 20464 ) utm=3323 stm=573 core=0 HZ=100
| stack=0xbe090000-0xbe092000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/639/stack)
native: #00 pc 00049304 /system/lib/libc.so (__ioctl+8)
native: #01 pc 0001deef /system/lib/libc.so (ioctl+38)
native: #02 pc 0004242f /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+170)
native: #03 pc 00042de9 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
native: #04 pc 0003d2e5 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000bcdad /system/lib/libandroid_runtime.so (???)
native: #06 pc 0074ec65 /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:764)
at android.os.IInstalld$Stub$Proxy.idmap(IInstalld.java:936)
at com.android.server.pm.Installer.idmap(Installer.java:327)
at com.android.server.om.IdmapManager.createIdmap(IdmapManager.java:62)
at com.android.server.om.OverlayManagerServiceImpl.updateState(OverlayManagerServiceImpl.java:493)
at com.android.server.om.OverlayManagerServiceImpl.updateAllOverlaysForTarget(OverlayManagerServiceImpl.java:237)
at com.android.server.om.OverlayManagerServiceImpl.onTargetPackageChanged(OverlayManagerServiceImpl.java:187)
at com.android.server.om.OverlayManagerService$PackageReceiver.onPackageChanged(OverlayManagerService.java:392)
- locked <0x0d53be75> (a java.lang.Object)
at com.android.server.om.OverlayManagerService$PackageReceiver.onReceive(OverlayManagerService.java:350)
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$-android_app_LoadedApk$ReceiverDispatcher$Args_53034(LoadedApk.java:1323)
binder調用卡住導緻問題發生,這種情況需要檢查binder對端的行為
若binder對端已無空閑binder線程,則需要從以下兩個方面做進一步分析和解決:
1) 為何binder線程執行如此長時間?
2) 是否有太多重複binder請求導緻binder線程資源被占用,這種請求是否合理?
4.3.2 如何确定binder的對端
前提:有binder block的callstack的特征,線程狀态為Native
IPCThreadState::waitForResponse-->IPCThreadState::talkWithDriver
Cmd line: system_server
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
| sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4
确定方法:
第一步:擷取時間點 17:54:29.433
06-22 17:54:29.433 815 1035 I watchdog: Blocked in handler on main thread (main)
block module: main timeout: 60s
第二步:檢視時間段内對應線程的 call stack , 擷取 sysTid 以及卡住的具體接口
----- pid 815 at 2018-06-22 17:53:54 -----
Cmd line: system_server
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
| sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4
| state=S schedstat=( 18775190570 13246793364 27928 ) utm=1310 stm=566 core=0 HZ=100
| stack=0xbe1be000-0xbe1c0000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/815/stack)
native: #00 pc 00048c8c /system/lib/libc.so (__ioctl+8)
native: #01 pc 0001dd65 /system/lib/libc.so (ioctl+32)
native: #02 pc 00042535 /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+168)
native: #03 pc 00042ee9 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
native: #04 pc 0003d309 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000bcf89 /system/lib/libandroid_runtime.so (???)
native: #06 pc 002a90a5 /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:764)
at com.android.internal.telephony.ITelephony$Stub$Proxy.invokeOemRilRequestRaw(ITelephony.java:4634)
at android.telephony.TelephonyManager.invokeOemRilRequestRaw(TelephonyManager.java:5716)
at com.android.server.audio.AudioService.sendATCommand(AudioService.java:1068)
at com.android.server.audio.AudioService.setModemAllPowerdown(AudioService.java:1083)
at com.android.server.audio.AudioService$AsOnAudioPortUpdateListener.onAudioPatchListUpdate(AudioService.java:1045)
at android.media.AudioPortEventHandler$1.handleMessage(AudioPortEventHandler.java:108)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:164)
at com.android.server.SystemServer.run(SystemServer.java:442)
at com.android.server.SystemServer.main(SystemServer.java:283)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:837)
----- pid 815 at 2018-06-22 17:54:29 -----
Cmd line: system_server
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x721b49b0 self=0xa97b2000
| sysTid=815 nice=-2 cgrp=default sched=0/0 handle=0xae13a4a4
| state=S schedstat=( 18775421647 13246793364 27930 ) utm=1310 stm=566 core=0 HZ=100
| stack=0xbe1be000-0xbe1c0000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/815/stack)
native: #00 pc 00048c8c /system/lib/libc.so (__ioctl+8)
native: #01 pc 0001dd65 /system/lib/libc.so (ioctl+32)
native: #02 pc 00042535 /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+168)
native: #03 pc 00042ee9 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+236)
native: #04 pc 0003d309 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+36)
native: #05 pc 000bcf89 /system/lib/libandroid_runtime.so (???)
native: #06 pc 002a90a5 /system/framework/arm/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+132)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:764)
at com.android.internal.telephony.ITelephony$Stub$Proxy.invokeOemRilRequestRaw(ITelephony.java:4634)
at android.telephony.TelephonyManager.invokeOemRilRequestRaw(TelephonyManager.java:5716)
at com.android.server.audio.AudioService.sendATCommand(AudioService.java:1068)
at com.android.server.audio.AudioService.setModemAllPowerdown(AudioService.java:1083)
at com.android.server.audio.AudioService$AsOnAudioPortUpdateListener.onAudioPatchListUpdate(AudioService.java:1045)
at android.media.AudioPortEventHandler$1.handleMessage(AudioPortEventHandler.java:108)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:164)
at com.android.server.SystemServer.run(SystemServer.java:442)
at com.android.server.SystemServer.main(SystemServer.java:283)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:837)
可見是卡在com.android.internal.telephony.ITelephony S t u b Stub StubProxy.invokeOemRilRequestRaw(ITelephony.java:4634),卡住的接口是invokeOemRilRequestRaw, sysTid = 815
第三步:Check "SYS_BINDER_INFO"找到815的通信對端
比如
如果 sysTid = 2938,在Check "SYS_BINDER_INFO"找到如下資訊:
proc 838
thread 2938:
outgoing transaction 139654941: ffffffc0877d87ab from 838:2938 to 9111:9123 code 1 flags 10 priv -6...
然後從"SYS_PROCESSES_AND_THREADE" 找到9111是程序monkey
root 9111 9100 1150100 21780 0 20 0 0 0 fg fffff 2487234d S 64 com.android.monkey
root 9123 9111 1150100 21780 2 14 -6 0 0 fg 00000 5456456c R 64 bonder_1
如果不能從SYS_BINDER_INFO找到資訊,也可以從kernel log去找,或者在目前卡住的callstack中,僅從名字上結合code,确認卡住的對端和具體的接口
如上例在binder info中找不到815的通信對端,在kernel log中發現binder release的資訊:
[11645.672135] (3)[23724:kworker/u8:6]binder: release 815:815 transaction 3729140 out, still active
<6>[11647.223911] (1)[23672:kworker/u8:1]binder: release 1207:1229 transaction 3729140 in, still active
<6>[11647.223954] (1)[23672:kworker/u8:1]binder: send failed reply for transaction 3729140, target dead
在binder_info中查找對端,确認對端是1207:1229
incoming transaction 3729140: cc857480 from 0:0 to 1207:1229 code 79 flags 10 pri 0:118 r1 node 23923 size 120:0 data e6280068
檢視1207:1229的backtrace:
----- pid 1207 at 2018-06-22 17:54:30 -----
Cmd line: com.android.phone
"Binder:1207_2" prio=5 tid=9 Waiting
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13c01a90 self=0x9fc53c00
| sysTid=1229 nice=-2 cgrp=default sched=0/0 handle=0x91ca7970
| state=S schedstat=( 10524016079 5984437221 36564 ) utm=647 stm=404 core=0 HZ=100
| stack=0x91bad000-0x91baf000 stackSize=1006KB
| held mutexes=
at java.lang.Object.wait(Native method)
- waiting on <0x0bbbfc71> (a com.android.phone.PhoneInterfaceManager$MainThreadRequest)
at com.android.phone.PhoneInterfaceManager.PhoneInterfaceManager.java(PhoneInterfaceManager.java:1031)
- locked <0x0bbbfc71> (a com.android.phone.PhoneInterfaceManager$MainThreadRequest)
at com.android.phone.PhoneInterfaceManager.sendRequest(PhoneInterfaceManager.java:1010)
at com.android.phone.PhoneInterfaceManager.invokeOemRilRequestRaw(PhoneInterfaceManager.java:3114)
at com.android.internal.telephony.ITelephony$Stub.onTransact(ITelephony.java:1448)
at android.os.Binder.execTransact(Binder.java:697)
看起來應該是卡在phone這邊的invokeOemRilRequestRaw操作
4.4 Native 方法執行時間過長導緻重新開機
線程狀态Native,檢視是否有PowerManagerService.nativeSetAutoSuspend
根據callstack找到是何Native方法時間太久,是否等待硬體傳回資訊,或者硬體本身有問題
4.5 SurfaceFlinger 卡住導緻重新開機
搜尋關鍵字 I watchdog ,檢視是否有 surfaceflinger hang,預設卡住40s,就會重新開機。
4.6 Zygote Fork 程序時卡住
線程狀态Native,檢視是否有 Process.zygoteSendArgsAndGetResult
4.7 Dump 時間過長
Dump 超過60s 可能會引起手機重新開機。
搜尋關鍵字dumpStackTraces 或 dumpStackTraces process
一般前面會有ANR發生
前面有fatal JE NE K等Exception
自動化腳本有call “dumpsys” 去dump 系統資訊
一般來說上面的情況enduser不會遇到,上面原因一般是系統loading過重或者dump的資訊過多
5 案例
現象:
安裝應用時swt,系統重新開機
1.小機率登入賬戶背景在更新google應用,
2.連接配接電腦用豌豆莢安裝應用時,
點選螢幕解鎖然後就重新開機卡死了
搜尋"watchdog":
07-15 15:13:52.132 984 1174 I watchdog: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg),
Blocked in handler on main thread (main),
Blocked in handler on i/o thread (android.io),
Blocked in handler on display thread (android.display),
Blocked in handler on ActivityManager (ActivityManager),
Blocked in handler on PowerManagerService (PowerManagerService)
block module: android.fg timeout: 60s
檢視對應時間線程的call stack
1、----- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"android.fg" prio=5 tid=16 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12ca59c0 self=0x9de63800
| sysTid=1004 nice=0 cgrp=default sched=0/0 handle=0x92238920
| state=S schedstat=( 817332233 838592314 1765 ) utm=68 stm=13 core=2 HZ=100
| stack=0x92136000-0x92138000 stackSize=1038KB
| held mutexes=
at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:22667)
- waiting to lock <0x04cc3347> (a com.android.server.am.ActivityManagerService) held by thread 11
at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:207)
at android.os.Handler.handleCallback(Handler.java:836)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)
//android.fg等待的鎖,被thread 11拿着;
----- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"ActivityManager" prio=5 tid=11 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c532e0 self=0x9de61f00
| sysTid=999 nice=-2 cgrp=default sched=0/0 handle=0x92751920
| state=S schedstat=( 12393939024 9340072944 12993 ) utm=679 stm=560 core=2 HZ=100
| stack=0x9264f000-0x92651000 stackSize=1038KB
| held mutexes=
at com.android.server.pm.PackageManagerService.prepareUserData(PackageManagerService.java:20141)
- waiting to lock <0x0d65c6e0> (a java.lang.Object) held by thread 23
at com.android.server.pm.UserManagerService.onBeforeUnlockUser(UserManagerService.java:2883)
at com.android.server.am.UserController.finishUserUnlocking(UserController.java:292)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.unlockUserCleared(UserController.java:973)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.maybeUnlockUser(UserController.java:938)
at com.android.server.am.UserController.finishUserBoot(UserController.java:267)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.UserController.finishUserBoot(UserController.java:222)
at com.android.server.am.UserController.finishUserSwitch(UserController.java:177)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.ActivityStackSupervisor.activityIdleInternalLocked(ActivityStackSupervisor.java:1739)
at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.activityIdleInternal(ActivityStackSupervisor.java:3902)
- locked <0x04cc3347> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.handleMessage(ActivityStackSupervisor.java:3941)
at android.os.Handler.dispatchMessage(Handler.java:110)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)
//此處為thread 11 的call stack ,看此部分也在等待鎖,此鎖被thread 23 拉着,
檢視thread 23 的狀态:
---- pid 984 at 2017-07-15 15:13:17 -----
Cmd line: system_server
"PackageManager" prio=5 tid=23 Native
| group="main" sCount=1 dsCount=0 obj=0x12e4b380 self=0x9de65b00
| sysTid=1015 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x9183b920
| state=S schedstat=( 3538952919 2746466081 4616 ) utm=294 stm=59 core=1 HZ=100
| stack=0x91739000-0x9173b000 stackSize=1038KB
| held mutexes=
at android.net.LocalSocketImpl.readba_native(Native method)
at android.net.LocalSocketImpl.-wrap1(LocalSocketImpl.java:-1)
at android.net.LocalSocketImpl$SocketInputStream.read(LocalSocketImpl.java:110)
- locked <0x05fd685e> (a java.lang.Object)
at libcore.io.Streams.readFully(Streams.java:81)
at com.android.internal.os.InstallerConnection.readFully(InstallerConnection.java:222)
at com.android.internal.os.InstallerConnection.readReply(InstallerConnection.java:237)
at com.android.internal.os.InstallerConnection.transact(InstallerConnection.java:91)
- locked <0x0966443f> (a com.android.internal.os.InstallerConnection)
at com.android.internal.os.InstallerConnection.execute(InstallerConnection.java:124)
at com.android.internal.os.InstallerConnection.dexopt(InstallerConnection.java:147)
at com.android.server.pm.Installer.dexopt(Installer.java:153)
at com.android.server.pm.PackageDexOptimizer.performDexOptLI(PackageDexOptimizer.java:257)
at com.android.server.pm.PackageDexOptimizer.performDexOpt(PackageDexOptimizer.java:101)
- locked <0x0d65c6e0> (a java.lang.Object)
at com.android.server.pm.PackageManagerService.installPackageLI(PackageManagerService.java:15729)
at com.android.server.pm.PackageManagerService.installPackageTracedLI(PackageManagerService.java:15394)
at com.android.server.pm.PackageManagerService.-wrap25(PackageManagerService.java:-1)
at com.android.server.pm.PackageManagerService$9.run(PackageManagerService.java:12798)
- locked <0x0d65c6e0> (a java.lang.Object)
at android.os.Handler.handleCallback(Handler.java:836)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:203)
at android.os.HandlerThread.run(HandlerThread.java:61)
at com.android.server.ServiceThread.run(ServiceThread.java:46)
從以上的依賴可以看出:
984(system_server):1004(android.fg)—>984(system_server):999(ActivityManager)—>984(system_server):1015(PackageManager)
是以需要檢視上面PackageManager 的行為:看起來在read 資訊,如下backtrace ,read 為何這部分read 操作那麼久??如下,在read 前還有dexoat 優化,安裝,執行的操作,
是以應該是這部分的操作影響到如下這個call stack 執行力較長時間導緻SWT
at android.net.LocalSocketImpl.readba_native(Native method)
at android.net.LocalSocketImpl.-wrap1(LocalSocketImpl.java:-1)
at android.net.LocalSocketImpl$SocketInputStream.read(LocalSocketImpl.java:110)
- locked <0x05fd685e> (a java.lang.Object)
at libcore.io.Streams.readFully(Streams.java:81)
at com.android.internal.os.InstallerConnection.readFully(InstallerConnection.java:222)
at com.android.internal.os.InstallerConnection.readReply(InstallerConnection.java:237)
at com.android.internal.os.InstallerConnection.transact(InstallerConnection.java:91)
- locked <0x0966443f> (a com.android.internal.os.InstallerConnection)
at com.android.internal.os.InstallerConnection.execute(InstallerConnection.java:124)
at com.android.internal.os.InstallerConnection.dexopt(InstallerConnection.java:147)
at com.android.server.pm.Installer.dexopt(Installer.java:153)
at com.android.server.pm.PackageDexOptimizer.performDexOptLI(Packag
如以上分析,是在等鎖逾時,SWT,導緻逾時的原因應該是安裝一些應用時,dex2oat 操作等時間較長:如下為SWT時間時,dex2oat 的時間
07-15 15:12:36.600 12505 12505 I dex2oat : /system/bin/dex2oat --compiler-filter=interpret-only -j1
07-15 15:12:36.600 12505 12505 I dex2oat : /system/bin/dex2oat --compiler-filter=interpret-only -j1
07-15 15:12:46.360 12505 12505 I dex2oat : Large app, accepted running with swap.
07-15 15:12:50.775 12505 12505 W dex2oat : Before Android 4.1, method void com.tencent.biz.pubaccount.AccountDetail.activity.EqqAccountDetailActivity.l() would have incorrectly overridden the package-private method in com.tencent.biz.pubaccount.AccountDetailActivity
07-15 15:12:51.150 12505 12505 W dex2oat : Before Android 4.1, method void com.tencent.mobileqq.activity.fling.ContentWrapView.ensureTransformationInfo() would have incorrectly overridden the package-private method in android.view.View
07-15 15:13:19.736 12505 12505 W dex2oat : Verification of java.util.List com.tencent.mobileqq.activity.contact.addcontact.AddContactsView.b() took 151.200ms
07-15 15:13:54.697 12505 12505 I dex2oat : dex2oat took 78.103s (threads: 1) arena alloc=2MB (2243640B) java alloc=50MB (53066064B) native alloc=45MB (47911536B) free=4MB (4910480B) swap=32MB (33554432B)
07-15 15:13:54.697 12505 12505 I dex2oat : dex2oat took 78.103s (threads: 1) arena alloc=2MB (2243640B) java alloc=50MB (53066064B) native alloc=45MB (47911536B) free=4MB (4910480B) swap=32MB (33554432B)
可以 針對個别apk由于的dex2oat原因安裝失敗/ 安裝慢 / lunch慢進行處理
關于個别apk由于的dex2oat原因安裝失敗/ 安裝慢 / lunch慢進行處理可參考
看上面的應用:是微信,qq,之前我們還有處理過可能會導緻這樣的問題的apk:包括qq、微信、GMS
Guys,oppo内推啦,歡迎發履歷給我
https://blog.csdn.net/ch853199769/article/details/114842209