
最近組裡用第三方給的 SDK 搞了點開發,最近線上突然開始報錯,并且 發現一個特别奇怪的問題 ,組員和我說, 代碼運作到一半不走了,跳過了<typo id="typo-72" data-origin="一段" ignoretag="true">一段</typo> (這代碼是剛參加<typo id="typo-83" data-origin="東奧會" ignoretag="true">東奧會</typo>參加跳遠麼???)。
代碼如下,邏輯非常簡單:
try {
log.info("initiate client with conf: {}", conf);
SDKClient client = new SDKClient(conf);
client.init();
log.info("client initiated");
} catch (Exception e) {
log.error("initiate client failed", e);
}
log.info("start to manipulate...");
我們發現 client 實際上沒有初始化成功,後面的業務處理一直在報錯。檢視日志,發現:
initiate client with conf: xxxxx
start to manipulate...
這就是組員說的代碼發生了跳躍。因為既沒有列印 client initiated ,也沒有列印 initiate client failed ...就直接 start to manipulate... 了。
[圖檔上傳失敗...(image-5e5fb7-1628659037883)]
老讀者知道,我們的線上是 k8s + Docker,并且每個 鏡像中内置了 Arthas ,并且 Java 版本是 Java 16,并且啟用了 JFR。日志中具有鍊路資訊,通過 ELK Agent 拉取到統一日志伺服器。
這個 SDK 裡面要通路的遠端位址都有 IP 白名單,我們為了安全本地并不能直接使用 SDK 通路對方的線上環境。在本地測試連接配接的是對方的測試環境,是沒有問題的。是以這裡,我們還是得 通過 Arthas 進行定位 。
首先得看看線上運作的源碼是否和本地我們看到的一緻呢?這個可以通過 jad 指令:
jad 要看的類全限定名稱
檢視後發現,反編譯後的代碼,和我們的源碼一緻诶。
然後我們看看代碼的實際執行:
trace 要看的類全限定名稱 方法
之後重新執行這個方法,檢視 trace 發現,初始化的時候确實抛出異常了:
# 省略我們這裡不關心的
+---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #130
+---[min=599.388978ms,max=630.23967ms,total=1229.628648ms,count=2] com.dasha13.sdk.SDKClient:<init>() #131
+---[min=203.617545ms,max=221.785512ms,total=425.403057ms,count=2] com.dasha13.sdk.SDKClient:init() #132 [throws Exception,2]
+---[min=0.034798ms,max=0.084505ms,total=0.119303ms,count=2] org.apache.logging.log4j.Logger:error() #136
+---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #138
但是, 這個異常日志,為何沒有列印出來呢 ?我們繼續檢視下這個異常,使用 watch 方法,并且指定檢視深度為 2,這樣 期望能列印出堆棧以及 Message :
watch com.dasha13.sdk.SDKClient init {throwExp} -x 2
但是,這裡 隻列印了一個看似是 Message 的資訊 :
method=com.dasha13.sdk.SDKClient init location=AtExceptionExit
ts=2021-08-10 02:58:15; [cost=131.20209ms] result=ERROR DATA!!! object class: class java.util.ArrayList, exception class: class com.google.common.util.concurrent.UncheckedExecutionException, exception message: java.lang.IllegalArgumentException
這很奇怪,正常來說,指定深度為 2,如果有異常抛出, 那麼這個輸出資訊,會包含異常的 Message 以及堆棧資訊的 。這是怎麼回事呢?我們來分别擷取堆棧以及資訊試試:
首先擷取堆棧:
watch com.dasha13.sdk.SDKClient init {throwExp.getStackTrace()} -x 2
重新執行出問題的方法,堆棧正常輸出,沒啥問題,不過看堆棧應該問題和 Google 的依賴翻轉 Bean 管理架構(類似于 Spring) Guice 載入某個 Bean 出異常有關:
ts=2021-08-10 03:03:37; [cost=146.644563ms] result=@ArrayList[
@StackTraceElement[][
@StackTraceElement[com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025)],
@StackTraceElement[com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)],
@StackTraceElement[com.dasha13.sdk.SDKClient.init(SDKClient.java:482)],
# 省略之後的
再來看異常資訊:
watch com.dasha13.sdk.SDKClient init {throwExp.getMessage()} -x 2
重新執行出問題的方法,這時候發現 watch 失敗:
watch failed, condition is: null, express is: {throwExp.getMessage()}, com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException, visit /app/arthas/arthas.log for more details.
我們按照提示,檢視 arthas 日志,發現的異常堆棧:
2021-08-10 03:07:11 [XNIO-2 task-3] ERROR c.t.a.c.command.express.OgnlExpress -Error during evaluating the expression:
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66)
at com.google.inject.internal.Errors.formatSource(Errors.java:806)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839)
at com.google.inject.internal.Errors.formatSource(Errors.java:800)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.format(Errors.java:584)
at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60)
cause by: MethodNotFoundException: Method not found: class com.google.common.xxxxxxxxx
我們發現,居然是 ProvisionException 的 getMessage() 發生了異常,也就是 異常的 getMessage() 發生了異常 .檢視 異常的 Cause 我們也定位出來,是 Guava 版本與 guice 版本不相容導緻 ,其根本原因是三方接口逾時,導緻初始化異常,有異常抛出被封裝成 ProvisionException , ProvisionException 異常的 getMessage 依賴 Guava Cache 緩存一些異常資訊,但是我們項目中 Guava 版本與 guice 版本不相容,導緻某些方法不存在,是以 ProvisionException 異常的 getMessage 也會有異常。 之前運作沒問題是因為三方沒有還沒有過初始化的時候接口逾時抛異常 。。。
[圖檔上傳失敗...(image-b1e869-1628659037882)]
我們使用的 log4j2 異步日志配置,并且将異常作為最後一個參數傳入日志方法中,正常情況下,會輸出這個 異常的 Message 以及異常堆棧 .但從上面的分析我們知道,擷取 Message 的時候,抛出了異常。Log4j 的設計是使用了 日志事件的生産消費 這種架構。這裡是消費者擷取異常的 Message 以及異常堆棧,并且在擷取 Message 的時候,發現有異常。對于 Log4j2 異步日志,發現有異常的時候,原有日志事件會被直接抛棄,并将異常輸出到 StatusLogger 中( 底層其實就是标準異常輸出 )中,這裡對應 log4j 的源碼:
AppenderControl
private void tryCallAppender(final LogEvent event) {
try {
//調用 appender 輸出日志
appender.append(event);
} catch (final RuntimeException error) {
//處理 RuntimeException
handleAppenderError(event, error);
} catch (final Exception error) {
//處理其他 Exception
handleAppenderError(event, new AppenderLoggingException(error));
}
}
private void handleAppenderError(final LogEvent event, final RuntimeException ex) {
appender.getHandler().error(createErrorMsg("An exception occurred processing Appender "), event, ex);
if (!appender.ignoreExceptions()) {
throw ex;
}
}
ErrorHandler 一般都是預設實作,即 DefaultErrorHandler;DefaultErrorHandler 是輸出到一個 StatusLogger:
DefaultErrorHandler
private static final Logger LOGGER = StatusLogger.getLogger();
public void error(final String msg, final LogEvent event, final Throwable t) {
final long current = System.nanoTime();
if (current - lastException > EXCEPTION_INTERVAL || exceptionCount++ < MAX_EXCEPTIONS) {
LOGGER.error(msg, t);
}
lastException = current;
if (!appender.ignoreExceptions() && t != null && !(t instanceof AppenderLoggingException)) {
throw new AppenderLoggingException(msg, t);
}
}
StatusLogger 其實就是标準異常輸出 System.err :
this.logger = new SimpleLogger("StatusLogger", Level.ERROR, false, true, showDateTime, false,
dateFormat, messageFactory, PROPS,
//标準異常輸出
System.err);
2021-08-10 03:30:29,810 Log4j2-TF-10-AsyncLoggerConfig-3 ERROR An exception occurred processing Appender file com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66)
at com.google.inject.internal.Errors.formatSource(Errors.java:806)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839)
at com.google.inject.internal.Errors.formatSource(Errors.java:800)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.format(Errors.java:584)
at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:105)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:93)
at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629)
at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63)
at org.springframework.boot.logging.log4j2.ExtendedWhitespaceThrowablePatternConverter.format(ExtendedWhitespaceThrowablePatternConverter.java:50)
at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38)
at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:345)
at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:244)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:229)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:543)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.callAppenders(AsyncLoggerConfig.java:127)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:502)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:485)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log(AsyncLoggerConfig.java:121)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncLoggerConfigsOnCurrentThread(AsyncLoggerConfig.java:169)
at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:111)
at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:97)
at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException
at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
at com.google.inject.internal.util.LineNumbers.<init>(LineNumbers.java:65)
at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:46)
at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:43)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
... 41 more