天天看點

探究retransformClasses緻使JVM Metaspace OOM的問題

前言

本文深入分析Arthas 3.3.0~3.4.1版本trace大方法可能導緻JVM Metaspace OOM的問題。通過分析trace指令生成的增強位元組碼及調試分析JVM retransformClasses的處理過程,找出發生Metaspace OOM的原因,然後給出解決辦法。

問題說明

構造了一個測試的大方法

demo.BigMethod250.test()

,大約包含500個方法調用,250多個字元串常量。

package demo;

public class BigMethod250
{
   public static void test()
   {
      final String someString = "Dustin";

      if (someString == null || someString.isEmpty())
      {
         print("The String is null or empty.");
      }
      else if (someString.equals("a0"))
      {
         print("You found me!");
      }
      else if (someString.equals("a1"))
      {
         print("You found me!");
      }
      else if (someString.equals("a2"))
      {
         print("You found me!");
      }
      ...
      else if (someString.equals("a249"))
      {
         print("You found me!");
      }
      else if (someString.equals("a250"))
      {
         print("You found me!");
      }
      else
      {
         print("No matching string found.");
      }
}           

使用JDK版本為11.0.7,對比測試3.2.0及3.4.1版本,測試的指令為:

trace demo.BigMethod250 test

1) Arthas 3.2.0版本執行指令的Metaspace變化

注入啟動Arthas 3.2.0後metaspace為20.8MB:

% jcmd arthas-demo VM.metaspace
14780:

Total Usage ( 134 loaders):

  Non-Class:  474 chunks,     18.17 MB capacity,    17.91 MB ( 99%) used,   228.27 KB (  1%) free,   488 bytes ( <1%) waste,    29.62 KB ( <1%) overhead, deallocated: 153 blocks with 80.33 KB
      Class:  199 chunks,      2.28 MB capacity,     2.19 MB ( 96%) used,    76.82 KB (  3%) free,    16 bytes ( <1%) waste,    12.44 KB ( <1%) overhead, deallocated: 64 blocks with 30.07 KB
       Both:  673 chunks,     20.44 MB capacity,    20.10 MB ( 98%) used,   305.09 KB (  1%) free,   504 bytes ( <1%) waste,    42.06 KB ( <1%) overhead, deallocated: 217 blocks with 110.40 KB

Virtual space:
  Non-class space:       20.00 MB reserved,      18.42 MB ( 92%) committed
      Class space:     1016.00 MB reserved,       2.38 MB ( <1%) committed
             Both:        1.01 GB reserved,      20.80 MB (  2%) committed
.......           

執行trace指令成功,metaspace為21.5MB,增加了不到1MB:

[arthas@14780]$ trace demo.BigMethod250 test
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 129 ms.           
% jcmd arthas-demo VM.metaspace
14780:

Total Usage ( 134 loaders):

  Non-Class:  486 chunks,     18.92 MB capacity,    18.67 MB ( 99%) used,   225.09 KB (  1%) free,   488 bytes ( <1%) waste,    30.38 KB ( <1%) overhead, deallocated: 338 blocks with 83.47 KB
      Class:  202 chunks,      2.34 MB capacity,     2.24 MB ( 96%) used,    87.20 KB (  4%) free,    16 bytes ( <1%) waste,    12.62 KB ( <1%) overhead, deallocated: 67 blocks with 31.26 KB
       Both:  688 chunks,     21.26 MB capacity,    20.91 MB ( 98%) used,   312.29 KB (  1%) free,   504 bytes ( <1%) waste,    43.00 KB ( <1%) overhead, deallocated: 405 blocks with 114.73 KB

Virtual space:
  Non-class space:       20.00 MB reserved,      19.17 MB ( 96%) committed
      Class space:     1016.00 MB reserved,       2.38 MB ( <1%) committed
             Both:        1.01 GB reserved,      21.55 MB (  2%) committed
......           

2)Arthas 3.4.1版本執行指令的Metaspace變化

注入Arthas後,執行trace指令前metaspace為21MB:

% jcmd arthas-demo VM.metaspace
15090:

Total Usage ( 134 loaders):

  Non-Class:  476 chunks,     18.42 MB capacity,    18.16 MB ( 99%) used,   226.62 KB (  1%) free,   504 bytes ( <1%) waste,    29.75 KB ( <1%) overhead, deallocated: 473 blocks with 136.73 KB
      Class:  201 chunks,      2.34 MB capacity,     2.26 MB ( 97%) used,    69.33 KB (  3%) free,     0 bytes (  0%) waste,    12.56 KB ( <1%) overhead, deallocated: 65 blocks with 28.10 KB
       Both:  677 chunks,     20.75 MB capacity,    20.42 MB ( 98%) used,   295.95 KB (  1%) free,   504 bytes ( <1%) waste,    42.31 KB ( <1%) overhead, deallocated: 538 blocks with 164.84 KB

Virtual space:
  Non-class space:       20.00 MB reserved,      18.67 MB ( 93%) committed
      Class space:      492.00 MB reserved,       2.38 MB ( <1%) committed
             Both:      512.00 MB reserved,      21.05 MB (  4%) committed
......           

執行trace指令失敗,metaspace增加到462MB,增加了421MB:

[arthas@15090]$ trace demo.BigMethod250 test
Affect(class count: 1 , method count: 1) cost in 10144 ms, listenerId: 1
Enhance error! exception: java.lang.InternalError
error happens when enhancing class: null, check arthas log: /Users/xxx/logs/arthas/arthas.log           
% jcmd arthas-demo VM.metaspace
15090:

Total Usage ( 140 loaders):

  Non-Class: 1751 chunks,    449.91 MB capacity,   449.35 MB (>99%) used,   234.66 KB ( <1%) free,   229.11 KB ( <1%) waste,   109.44 KB ( <1%) overhead, deallocated: 2558 blocks with 340.38 MB
      Class:  212 chunks,      2.47 MB capacity,     2.40 MB ( 97%) used,    60.89 KB (  2%) free,     0 bytes (  0%) waste,    13.25 KB ( <1%) overhead, deallocated: 70 blocks with 29.30 KB
       Both: 1963 chunks,    452.38 MB capacity,   451.74 MB (>99%) used,   295.55 KB ( <1%) free,   229.11 KB ( <1%) waste,   122.69 KB ( <1%) overhead, deallocated: 2628 blocks with 340.41 MB

Virtual space:
  Non-class space:      558.00 MB reserved,     459.63 MB ( 82%) committed
      Class space:      492.00 MB reserved,       2.50 MB ( <1%) committed
             Both:        1.03 GB reserved,     462.13 MB ( 44%) committed

......

Waste (percentages refer to total committed size 462.13 MB):
              Committed unused:    198.00 KB ( <1%)
        Waste in chunks in use:    229.11 KB ( <1%)
         Free in chunks in use:    295.55 KB ( <1%)
     Overhead in chunks in use:    122.69 KB ( <1%)
                In free chunks:      9.56 MB (  2%)
Deallocated from chunks in use:    340.41 MB ( 74%) (2628 blocks)
                       -total-:    350.79 MB ( 76%)           

第二次執行trace指令後,超過設定的MaxMetaspaceSize 500MB,出現OOM錯誤:

% jcmd arthas-demo VM.metaspace
15090:

Total Usage ( 99 loaders):

  Non-Class: 1717 chunks,    487.71 MB capacity,   487.17 MB (>99%) used,   198.55 KB ( <1%) free,   251.93 KB ( <1%) waste,   107.31 KB ( <1%) overhead, deallocated: 2872 blocks with 465.54 MB
      Class:  171 chunks,      2.43 MB capacity,     2.38 MB ( 98%) used,    41.80 KB (  2%) free,     0 bytes (  0%) waste,    10.69 KB ( <1%) overhead, deallocated: 71 blocks with 29.81 KB
       Both: 1888 chunks,    490.14 MB capacity,   489.55 MB (>99%) used,   240.34 KB ( <1%) free,   251.93 KB ( <1%) waste,   118.00 KB ( <1%) overhead, deallocated: 2943 blocks with 465.56 MB

Virtual space:
  Non-class space:      596.00 MB reserved,     497.50 MB ( 83%) committed
      Class space:      492.00 MB reserved,       2.50 MB ( <1%) committed
             Both:        1.06 GB reserved,     500.00 MB ( 46%) committed

......

Waste (percentages refer to total committed size 500.00 MB):
              Committed unused:     42.00 KB ( <1%)
        Waste in chunks in use:    251.93 KB ( <1%)
         Free in chunks in use:    240.34 KB ( <1%)
     Overhead in chunks in use:    118.00 KB ( <1%)
                In free chunks:      9.82 MB (  2%)
Deallocated from chunks in use:    465.56 MB ( 93%) (2943 blocks)
                       -total-:    476.02 MB ( 95%)


MaxMetaspaceSize: 500.00 MB
InitialBootClassLoaderMetaspaceSize: 4.00 MB
UseCompressedClassPointers: true
CompressedClassSpaceSize: 492.00 MB           
Exception in thread "as-shutdown-hooker" java.lang.OutOfMemoryError: Metaspace
    at java.base/java.lang.ClassLoader.defineClass1(Native Method)
    at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
    at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
    at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:550)
    at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
    at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)           

位元組碼分析

使用javap -verbose 生成class檔案分析結果,對比結果如下:

class檔案 檔案大小 constant數量 StackMapTable
原始class 10035 (9.8KB) 555 總254個frame (其中

frame_type = 17

248個)
3.2.0 增強的class 69367 (67.7KB) 606 (前555個常量與原始class相同) 總760個frame(其中

frame_type = 251

frame_type = 247

個253個)
3.4.1 增強的class 1109038 (1083KB) 1598 (與原始class常量順序差異很大) 總1522個frame(其中

frame_type = 255

1269個)

StackMapTable 幾種主要的frame:

frame_type description
17 same
247 same_locals_1_stack_item_frame_extended
251 same_frame_extended
255 full_frame

其中3.4.1 增強的class的full_frame 包含的資料明顯比另外兩個class的要多,包含非常多的

top

,如下面這種資料:

frame_type = 255 /* full_frame */
  offset_delta = 34
  locals = [ class java/lang/String, null, class java/lang/Class, class java/lang/String, 
    class "[Ljava/lang/Object;", top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, null, class java/lang/Class, class java/lang/String ]
  stack = [ class java/lang/String ]           

對比位元組碼後,最大的兩個差異是Constant pool 和 StackMapTable。嘗試修改3.4.1版本代碼,将增強類的StackMapTable去掉,測試結果隻是減緩Metaspace的增量,但還是明顯比3.2.0版本高得多。

JVM retransformClasses探究

位元組碼的對比分析很難找到切入點,回歸到JVM本身,決定調試分析JVM retransformClasses的過程,期望可以找到兩個arthas版本執行的差異分支。

3.4.1 版本中執行trace指令申請了大量的Metaspace記憶體空間,故在申請空間的方法(

SpaceManager::get_new_chunk

)下條件斷點(chunk_word_size >= 8000),下面的調用棧觸發了很多次:

metaspace::SpaceManager::get_new_chunk(unsigned long) spaceManager.cpp:387
metaspace::SpaceManager::grow_and_allocate(unsigned long) spaceManager.cpp:197
metaspace::SpaceManager::allocate_work(unsigned long) spaceManager.cpp:450
metaspace::SpaceManager::allocate(unsigned long) spaceManager.cpp:421
ClassLoaderMetaspace::allocate(unsigned long, Metaspace::MetadataType) metaspace.cpp:1480
Metaspace::allocate(ClassLoaderData*, unsigned long, MetaspaceObj::Type, Thread*) metaspace.cpp:1282
MetaspaceObj::operator new(unsigned long, ClassLoaderData*, unsigned long, MetaspaceObj::Type, Thread*) allocation.cpp:83
ConstMethod::allocate(ClassLoaderData*, int, InlineTableSizes*, ConstMethod::MethodType, Thread*) constMethod.cpp:46
Method::allocate(ClassLoaderData*, int, AccessFlags, InlineTableSizes*, ConstMethod::MethodType, Thread*) method.cpp:87
Method::clone_with_new_data(methodHandle const&, unsigned char*, int, unsigned char*, int, Thread*) method.cpp:1523
Relocator::insert_space_at(int, int, unsigned char*, Thread*) relocator.cpp:159
VM_RedefineClasses::rewrite_cp_refs_in_method(methodHandle, methodHandle*, Thread*) jvmtiRedefineClasses.cpp:2133
VM_RedefineClasses::rewrite_cp_refs_in_methods(InstanceKlass*, Thread*) jvmtiRedefineClasses.cpp:2041
VM_RedefineClasses::rewrite_cp_refs(InstanceKlass*, Thread*) jvmtiRedefineClasses.cpp:1876
VM_RedefineClasses::merge_cp_and_rewrite(InstanceKlass*, InstanceKlass*, Thread*) jvmtiRedefineClasses.cpp:1834
VM_RedefineClasses::load_new_class_versions(Thread*) jvmtiRedefineClasses.cpp:1405
VM_RedefineClasses::doit_prologue() jvmtiRedefineClasses.cpp:170
VMThread::execute(VM_Operation*) vmThread.cpp:534
JvmtiEnv::RetransformClasses(int, _jclass* const*) jvmtiEnv.cpp:451
jvmti_RetransformClasses(_jvmtiEnv*, int, _jclass* const*) jvmtiEnter.cpp:3969
retransformClasses JPLISAgent.c:1183
Java_sun_instrument_InstrumentationImpl_retransformClasses0 InstrumentationImplNativeMethods.c:109
......
thread_entry(JavaThread*, Thread*) jvm.cpp:3004
JavaThread::thread_main_inner() thread.cpp:2010
JavaThread::run() thread.cpp:1993
Thread::call_run() thread.cpp:395
thread_native_entry(Thread*) os_bsd.cpp:702
_pthread_start 0x00007fff6c6b6109
thread_start 0x00007fff6c6b1b8b           

擴充ldc指令

檢視這個調用棧的每個方法,找到一個可疑點:

VM_RedefineClasses::rewrite_cp_refs_in_method()

->

Relocator::insert_space_at()

其中rewrite_cp_refs_in_method的代碼片段如下:

// Rewrite constant pool references in the specific method. This code
// was adapted from Rewriter::rewrite_method().
void VM_RedefineClasses::rewrite_cp_refs_in_method(methodHandle method,
       methodHandle *new_method_p, TRAPS) {
    ......
    for (int bci = 0; bci < code_length; bci += bc_length) {
        address bcp = code_base + bci;
        Bytecodes::Code c = (Bytecodes::Code)(*bcp);
        ......
        switch (c) {
          case Bytecodes::_ldc:
          {
            int cp_index = *(bcp + 1);
            int new_index = find_new_index(cp_index);
    
            if (StressLdcRewrite && new_index == 0) {
              // If we are stressing ldc -> ldc_w rewriting, then we
              // always need a new_index value.
              new_index = cp_index;
            }
            if (new_index != 0) {
              // the original index is mapped so we have more work to do
              if (!StressLdcRewrite && new_index <= max_jubyte) {
                // The new value can still use ldc instead of ldc_w
                // unless we are trying to stress ldc -> ldc_w rewriting
                *(bcp + 1) = new_index;
              } else {
                // the new value needs ldc_w instead of ldc
                u_char inst_buffer[4]; // max instruction size is 4 bytes
                bcp = (address)inst_buffer;
                // construct new instruction sequence
                *bcp = Bytecodes::_ldc_w;
                bcp++;
                // Rewriter::rewrite_method() does not rewrite ldc -> ldc_w.
                // See comment below for difference between put_Java_u2()
                // and put_native_u2().
                Bytes::put_Java_u2(bcp, new_index);
            
                Relocator rc(method, NULL /* no RelocatorListener needed */);
                methodHandle m;
                {
                  PauseNoSafepointVerifier pnsv(&nsv);
            
                  // ldc is 2 bytes and ldc_w is 3 bytes
                  m = rc.insert_space_at(bci, 3, inst_buffer, CHECK);
                }
            
                // return the new method so that the caller can update
                // the containing class
                *new_method_p = method = m;
                // switch our bytecode processing loop from the old method
                // to the new method
                code_base = method->code_base();
                code_length = method->code_size();
                bcp = code_base + bci;
                ......           

其中

max_jubyte

為一個常量值

0xFF

const jubyte  max_jubyte  = (jubyte)-1;  // 0xFF       largest jubyte           

其中調用棧執行到的方法是

rc.insert_space_at()

// ldc is 2 bytes and ldc_w is 3 bytes
      m = rc.insert_space_at(bci, 3, inst_buffer, CHECK);           

結合代碼注釋大概了解一下,

rewrite_cp_refs_in_method

的作用是重寫方法的常量池引用(Rewrite constant pool references),它的主要執行邏輯如下(此處省略無關的步驟描述):

1) 循環周遊方法的位元組碼,解析為操作碼(

Bytecodes::Code c

),根據不同的操作碼可以确定其後附帶的資料長度;

2) 如果

ldc

指令的資料

cp_index

存在映射值

new_index

(合并新類的constant pool過程會将相同的常量項映射到舊的constant pool的常量項),且

new_index

大于

max_jubyte

(0xFF),則需要将

ldc

指令擴充為

ldc_w

指令;

3)

ldc

的資料為單位元組,

ldc_w

的資料為2個位元組,長度不一樣,是以改寫此指令時需要插入新的位元組碼(

rc.insert_space_at

);

4)

Relocator::insert_space_at()

方法在修改位元組碼後,每次都會複制目前java方法的位元組碼資料及其他資料(如參數表、異常表及StackMapTable等);

ldc 及ldc_w指令介紹:

ldc

Push a single word constant.

ldc_w

Push a single word constant. (16-bit ref in constant pool)

克隆method資料

繼續看

Relocator::insert_space_at()

的代碼:

// size is the new size of the instruction at bci. Hence, if size is less than the current
// instruction size, we will shrink the code.
methodHandle Relocator::insert_space_at(int bci, int size, u_char inst_buffer[], TRAPS) {
  _changes = new GrowableArray<ChangeItem*> (10);
  _changes->push(new ChangeWiden(bci, size, inst_buffer));

  ......

  if (!handle_code_changes()) return methodHandle();

  // Construct the new method
  methodHandle new_method = Method::clone_with_new_data(method(),
                              code_array(), code_length(),
                              compressed_line_number_table(),
                              compressed_line_number_table_size(),
                              CHECK_(methodHandle()));

  // Deallocate the old Method* from metadata
  ClassLoaderData* loader_data = method()->method_holder()->class_loader_data();
  loader_data->add_to_deallocate_list(method()());

  set_method(new_method);

  ......

  return new_method;
}           

Method::clone_with_new_data

方法的代碼:

methodHandle Method::clone_with_new_data(const methodHandle& m, 
        u_char* new_code, 
        int new_code_length, 
        u_char* new_compressed_linenumber_table, 
        int new_compressed_linenumber_size, 
        TRAPS) {
  // Code below does not work for native methods - they should never get rewritten anyway
  assert(!m->is_native(), "cannot rewrite native methods");
  // Allocate new Method*
  AccessFlags flags = m->access_flags();

  ConstMethod* cm = m->constMethod();
  int checked_exceptions_len = cm->checked_exceptions_length();
  int localvariable_len = cm->localvariable_table_length();
  int exception_table_len = cm->exception_table_length();
  int method_parameters_len = cm->method_parameters_length();
  int method_annotations_len = cm->method_annotations_length();
  int parameter_annotations_len = cm->parameter_annotations_length();
  int type_annotations_len = cm->type_annotations_length();
  int default_annotations_len = cm->default_annotations_length();
  ...
  ClassLoaderData* loader_data = m->method_holder()->class_loader_data();
  Method* newm_oop = Method::allocate(loader_data,
                                      new_code_length,
                                      flags,
                                      &sizes,
                                      m->method_type(),
                                      CHECK_(methodHandle()));
  methodHandle newm (THREAD, newm_oop);

  // Create a shallow copy of Method part, but be careful to preserve the new ConstMethod*
  ConstMethod* newcm = newm->constMethod();
  int new_const_method_size = newm->constMethod()->size();

  // This works because the source and target are both Methods. Some compilers
  // (e.g., clang) complain that the target vtable pointer will be stomped,
  // so cast away newm()'s and m()'s Methodness.
  memcpy((void*)newm(), (void*)m(), sizeof(Method));

  // Create shallow copy of ConstMethod.
  memcpy(newcm, m->constMethod(), sizeof(ConstMethod));

  // Reset correct method/const method, method size, and parameter info
  newm->set_constMethod(newcm);
  newm->constMethod()->set_code_size(new_code_length);
  newm->constMethod()->set_constMethod_size(new_const_method_size);

  // Copy new byte codes
  memcpy(newm->code_base(), new_code, new_code_length);
  // Copy line number table
  if (new_compressed_linenumber_size > 0) {
    memcpy(newm->compressed_linenumber_table(),
           new_compressed_linenumber_table,
           new_compressed_linenumber_size);
  }
  // Copy method_parameters
  if (method_parameters_len > 0) {
    memcpy(newm->method_parameters_start(),
           m->method_parameters_start(),
           method_parameters_len * sizeof(MethodParametersElement));
  }
  // Copy checked_exceptions
  if (checked_exceptions_len > 0) {
    memcpy(newm->checked_exceptions_start(),
           m->checked_exceptions_start(),
           checked_exceptions_len * sizeof(CheckedExceptionElement));
  }
  // Copy exception table
  if (exception_table_len > 0) {
    memcpy(newm->exception_table_start(),
           m->exception_table_start(),
           exception_table_len * sizeof(ExceptionTableElement));
  }
  // Copy local variable number table
  if (localvariable_len > 0) {
    memcpy(newm->localvariable_table_start(),
           m->localvariable_table_start(),
           localvariable_len * sizeof(LocalVariableTableElement));
  }
  // Copy stackmap table
  if (m->has_stackmap_table()) {
    int code_attribute_length = m->stackmap_data()->length();
    Array<u1>* stackmap_data =
      MetadataFactory::new_array<u1>(loader_data, code_attribute_length, 0, CHECK_(methodHandle()));
    memcpy((void*)stackmap_data->adr_at(0),
           (void*)m->stackmap_data()->adr_at(0), code_attribute_length);
    newm->set_stackmap_data(stackmap_data);
  }

  // copy annotations over to new method
  newcm->copy_annotations_from(loader_data, cm, CHECK_(methodHandle()));
  return newm;
}           

其中比較大的對象:

  • ConstMethod

    : 35729 (34.9KB)
  • stackmap_data

    : 1037645 (1013KB)

ConstMethod 資料長度為:35729 (34.9KB)

探究retransformClasses緻使JVM Metaspace OOM的問題

stackmak_data 資料長度為:1037645 (1013KB)

探究retransformClasses緻使JVM Metaspace OOM的問題

關鍵差異分支

調試發現3.2.0與3.4.1的執行差異分支在

merge_cp_and_rewrite

中的

if (_index_map_count == 0)

判斷語句。

  • 3.4.1版本的

    _index_map_count = 1597

    ,執行進入

    rewrite_cp_refs

    方法,要進行

    ldc

    指令擴充的操作。
    探究retransformClasses緻使JVM Metaspace OOM的問題
  • 3.2.0版本的

    _index_map_count = 0

    ,沒有執行到

    rewrite_cp_refs

    ,不需要進行

    ldc

    指令擴充的操作。其實這個不難了解,如果新類的常量表在原始class的常量表後面追加新的常量項,不會産生常量項映射,也不會發生

    ldc

    index資料增大溢出的問題。
    探究retransformClasses緻使JVM Metaspace OOM的問題

merge_cp_and_rewrite

方法代碼如下:

// Merge constant pools between the_class and scratch_class and
// potentially rewrite bytecodes in scratch_class to use the merged
// constant pool.
jvmtiError VM_RedefineClasses::merge_cp_and_rewrite(
             InstanceKlass* the_class, InstanceKlass* scratch_class,
             TRAPS) {
  // worst case merged constant pool length is old and new combined
  int merge_cp_length = the_class->constants()->length()
        + scratch_class->constants()->length();

  // Constant pools are not easily reused so we allocate a new one
  // each time.
  // merge_cp is created unsafe for concurrent GC processing.  It
  // should be marked safe before discarding it. Even though
  // garbage,  if it crosses a card boundary, it may be scanned
  // in order to find the start of the first complete object on the card.
  ClassLoaderData* loader_data = the_class->class_loader_data();
  ConstantPool* merge_cp_oop =
    ConstantPool::allocate(loader_data,
                           merge_cp_length,
                           CHECK_(JVMTI_ERROR_OUT_OF_MEMORY));
  MergeCPCleaner cp_cleaner(loader_data, merge_cp_oop);

  HandleMark hm(THREAD);  // make sure handles are cleared before
                          // MergeCPCleaner clears out merge_cp_oop
  constantPoolHandle merge_cp(THREAD, merge_cp_oop);

  // Get constants() from the old class because it could have been rewritten
  // while we were at a safepoint allocating a new constant pool.
  constantPoolHandle old_cp(THREAD, the_class->constants());
  constantPoolHandle scratch_cp(THREAD, scratch_class->constants());

  // If the length changed, the class was redefined out from under us. Return
  // an error.
  if (merge_cp_length != the_class->constants()->length()
         + scratch_class->constants()->length()) {
    return JVMTI_ERROR_INTERNAL;
  }

  // Update the version number of the constant pools (may keep scratch_cp)
  merge_cp->increment_and_save_version(old_cp->version());
  scratch_cp->increment_and_save_version(old_cp->version());

  ResourceMark rm(THREAD);
  _index_map_count = 0;
  _index_map_p = new intArray(scratch_cp->length(), scratch_cp->length(), -1);

  _operands_cur_length = ConstantPool::operand_array_length(old_cp->operands());
  _operands_index_map_count = 0;
  int operands_index_map_len = ConstantPool::operand_array_length(scratch_cp->operands());
  _operands_index_map_p = new intArray(operands_index_map_len, operands_index_map_len, -1);

  // reference to the cp holder is needed for copy_operands()
  merge_cp->set_pool_holder(scratch_class);
  bool result = merge_constant_pools(old_cp, scratch_cp, &merge_cp,
                  &merge_cp_length, THREAD);
  merge_cp->set_pool_holder(NULL);

  if (!result) {
    // The merge can fail due to memory allocation failure or due
    // to robustness checks.
    return JVMTI_ERROR_INTERNAL;
  }

  // Save fields from the old_cp.
  merge_cp->copy_fields(old_cp());
  scratch_cp->copy_fields(old_cp());

  log_info(redefine, class, constantpool)("merge_cp_len=%d, index_map_len=%d", merge_cp_length, _index_map_count);

  //關鍵分支
  if (_index_map_count == 0) {
    // there is nothing to map between the new and merged constant pools

    if (old_cp->length() == scratch_cp->length()) {
      // The old and new constant pools are the same length and the
      // index map is empty. This means that the three constant pools
      // are equivalent (but not the same). Unfortunately, the new
      // constant pool has not gone through link resolution nor have
      // the new class bytecodes gone through constant pool cache
      // rewriting so we can't use the old constant pool with the new
      // class.

      // toss the merged constant pool at return
    } else if (old_cp->length() < scratch_cp->length()) { // ** 3.2.0版本執行到這裡 **
      // The old constant pool has fewer entries than the new constant
      // pool and the index map is empty. This means the new constant
      // pool is a superset of the old constant pool. However, the old
      // class bytecodes have already gone through constant pool cache
      // rewriting so we can't use the new constant pool with the old
      // class.

      // toss the merged constant pool at return
    } else {
      // The old constant pool has more entries than the new constant
      // pool and the index map is empty. This means that both the old
      // and merged constant pools are supersets of the new constant
      // pool.

      // Replace the new constant pool with a shrunken copy of the
      // merged constant pool
      set_new_constant_pool(loader_data, scratch_class, merge_cp, merge_cp_length,
                            CHECK_(JVMTI_ERROR_OUT_OF_MEMORY));
      // The new constant pool replaces scratch_cp so have cleaner clean it up.
      // It can't be cleaned up while there are handles to it.
      cp_cleaner.add_scratch_cp(scratch_cp());
    }
  } else {
    if (log_is_enabled(Trace, redefine, class, constantpool)) {
      // don't want to loop unless we are tracing
      int count = 0;
      for (int i = 1; i < _index_map_p->length(); i++) {
        int value = _index_map_p->at(i);

        if (value != -1) {
          log_trace(redefine, class, constantpool)("index_map[%d]: old=%d new=%d", count, i, value);
          count++;
        }
      }
    }

    // We have entries mapped between the new and merged constant pools
    // so we have to rewrite some constant pool references.
    if (!rewrite_cp_refs(scratch_class, THREAD)) {  // ** 3.4.1版本執行到這裡 ** 
      return JVMTI_ERROR_INTERNAL;
    }

    // Replace the new constant pool with a shrunken copy of the
    // merged constant pool so now the rewritten bytecodes have
    // valid references; the previous new constant pool will get
    // GCed.
    set_new_constant_pool(loader_data, scratch_class, merge_cp, merge_cp_length,
                          CHECK_(JVMTI_ERROR_OUT_OF_MEMORY));
    // The new constant pool replaces scratch_cp so have cleaner clean it up.
    // It can't be cleaned up while there are handles to it.
    cp_cleaner.add_scratch_cp(scratch_cp());
  }

  return JVMTI_ERROR_NONE;
} // end merge_cp_and_rewrite()           

再回過來看一下本案例的位元組碼,發現存在一個非常頻繁的常量項映射:

  • 原始類中:
#262 = Class              #544          // demo/BigMethod250
  ......
  #544 = Utf8               demo/BigMethod250           
  • 新類中:
#1 = Utf8               demo/BigMethod250
     #2 = Class              #1           // demo/BigMethod250           

新類的cp_index 為#2 比原始類的cp_index #262小,而#262 > 0xFF,需要擴充為

ldc_w

指令!

新類中

ldc #2

現次數高達761次,意味着需要擴充ldc指令761次!!由調試資料可知,擴充一個

ldc

指令可能需要申請超過1MB的空間,本案例執行這麼幾百次擴充,申請的Metaspace空間就達到了恐怖的421MB。

新類出現大量

ldc #2

是因為Arthas trace指令增強位元組碼時,對每個方法調用都會插入

atBeforeInvoke

,

atInvokeException

atAfterInvoke

3個回調方法,而都用到本類的class參數,反編譯的代碼如下:

if ("Dustin".equals("a0")) {
        var10000 = "You found me!";
        String var21 = "demo/BigMethod250|print|(Ljava/lang/String;)V|15";
        Class var20 = BigMethod250.class;
        Object var19 = null;
        SpyAPI.atBeforeInvoke(var20, var21, var19);
        
        try {
            print(var10000);
        } catch (Throwable var2797) {
            String var1540 = "demo/BigMethod250|print|(Ljava/lang/String;)V|17";
            Class var1539 = BigMethod250.class;
            Object var1538 = null;
            SpyAPI.atInvokeException(var1539, var1540, var1538, var2797);
            throw var2797;
        }
        
        String var780 = "demo/BigMethod250|print|(Ljava/lang/String;)V|17";
        Class var779 = BigMethod250.class;
        Object var778 = null;
        SpyAPI.atAfterInvoke(var779, var780, var778);
    } else if ("Dustin".equals("a1")) {           

其中類似

Class var20 = BigMethod250.class;

的語句就是指令

ldc #2 // class demo/BigMethod250

反編譯而來。

Block 回收利用

當擴充方法位元組碼觸發複制方法時,會将舊的方法加入到待回收清單(deallocate_list)中:

// Deallocate the old Method* from metadata
  ClassLoaderData* loader_data = method()->method_holder()->class_loader_data();
  loader_data->add_to_deallocate_list(method()());           

但deallocate_list中的這些method占用的空間不能被立即使用,在下次GC時将被歸還到classloader的

SpaceManager

block_freelists

中。

block_freelists

中的block不會直接被釋放,而是在下次配置設定空間時優先從

block_freelists

中查找,如果找到滿足要求的block則重用之。

觀察JVM log發現,retransformClasses過程觸發Metaspace GC的頻率很低,這意味着deallocate_list中的很多待回收method空間不能被充分重用,導緻需要申請大量Metaspace空間。

可以使用下面的指令打開JVM log觀察Metaspace GC活動:

% jcmd arthas-demo VM.log what="metaspace*=info,stackmap*=info"
88451:
Command executed successfully
% jcmd arthas-demo VM.log list
88451:
Available log levels: off, trace, debug, info, warning, error
Available log decorators: time (t), utctime (utc), uptime (u), ....
Described tag sets:
 logging: Logging for the log framework itself
Log output configuration:
 #0: stdout all=warning,metaspace*=info,stackmap*=info uptime,level,tags (reconfigured)
 #1: stderr all=off uptime,level,tags
           

JVM 日志預設輸出到stdout:

[48.318s][info][gc,metaspace] GC(1) Metaspace: 20736K->20736K(524288K)
illegalArgumentCount: 25, number is: -100438, need >= 2
illegalArgumentCount: 26, number is: -154116, need >= 2
[50.278s][info][gc,metaspace] GC(3) Metaspace: 22190K->22190K(524288K)
[50.550s][info][gc,metaspace] GC(4) Metaspace: 22428K->22428K(526336K)
[50.780s][info][gc,metaspace] GC(5) Metaspace: 22457K->22457K(526336K)
[51.040s][info][gc,metaspace] GC(6) Metaspace: 34393K->34393K(540672K)
illegalArgumentCount: 27, number is: -111050, need >= 2
[51.534s][info][gc,metaspace] GC(8) Metaspace: 70774K->70774K(585728K)
[51.589s][info][gc,metaspace] GC(9) Metaspace: 76161K->76161K(593920K)
[51.643s][info][gc,metaspace] GC(10) Metaspace: 81547K->81547K(600064K)
illegalArgumentCount: 28, number is: -45880, need >= 2
[52.878s][info][gc,metaspace] GC(12) Metaspace: 149700K->149700K(684032K)
14847=3*7*7*101
[55.057s][info][gc,metaspace] GC(14) Metaspace: 263767K->263767K(825344K)
illegalArgumentCount: 29, number is: -96539, need >= 2
[58.963s][info][gc,metaspace] GC(16) Metaspace: 451906K->451906K(1060864K)
illegalArgumentCount: 30, number is: -93735, need >= 2           

解決之道

複制原始類的常量池

增強類需要複制原始類的常量池,不能重新生成常量池,避免因為常量項index發生變化而産生映射,導緻需要擴充

ldc

指令。

ASM模拟代碼如下:

// 解析位元組碼
    ClassReader reader = new ClassReader(classBytes);
    ClassNode classNode = new ClassNode();
    reader.accept(classNode, ClassReader.SKIP_FRAMES);
    
    // 增強class
    ......
    
    // 生成位元組碼
    int flags = ClassWriter.COMPUTE_FRAMES | ClassWriter.COMPUTE_MAXS;
    // 建立ClassWriter時傳入原始的classReader,自動複制原始類的constant pool 
    ClassWriter writer = new ClassWriter(classReader, flags);
    classNode.accept(writer);
    return writer.toByteArray();           

注: 如果要指定ClassLoader,請參考

com.taobao.arthas.bytekit.asm.ClassLoaderAwareClassWriter

ASM ClassWriter doc中有關于

constant pool

的說明:

public ClassWriter(@Nullable ClassReader classReader, int flags)

Constructs a new ClassWriter object and enables optimizations for "mostly add" bytecode transformations. These optimizations are the following:

  • The constant pool and bootstrap methods from the original class are copied as is in the new class, which saves time. New constant pool entries and new bootstrap methods will be added at the end if necessary, but unused constant pool entries or bootstrap methods won't be removed.
  • Methods that are not transformed are copied as is in the new class, directly from the original class bytecode (i.e. without emitting visit events for all the method instructions), which saves a lot of time. Untransformed methods are detected by the fact that the ClassReader receives MethodVisitor objects that come from a ClassWriter (and not from any other ClassVisitor instance).
Params:
  • classReader – the ClassReader used to read the original class. It will be used to copy the entire constant pool and bootstrap methods from the original class and also to copy other fragments of original bytecode where applicable.
  • flags – option flags that can be used to modify the default behavior of this class.Must be zero or more of COMPUTE_MAXS and COMPUTE_FRAMES. These option flags do not affect methods that are copied as is in the new class. This means that neither the maximum stack size nor the stack frames will be computed for these methods.

其它問題

1)當增強的方法的語句比較多時,asm重新生成StackMapTable會非常大,會增加Metaspace空間的消耗。測試發現不生成

StackMapTable

也可以正确加載運作,是否可以不生成

StackMapTable

?是否存在JDK版本相容性問題?這一點沒有深入去研究。

// 生成位元組碼 不設定ClassWriter.COMPUTE_FRAMES則不生成StackMapTable
    int flags = ClassWriter.COMPUTE_MAXS;
    // 建立ClassWriter時傳入原始的classReader,自動複制原始類的constant pool 
    ClassWriter writer = new ClassWriter(classReader, flags);
    classNode.accept(writer);
    return writer.toByteArray();           

2)

Relocator::insert_space_at()

方法在修改位元組碼後,每次都會克隆方法資料,沒有考慮到極端情況下大量調用帶來的問題。

Relocator擴充指令時,先複制一份method的位元組碼(code_array),然後在複制的副本中修改位元組碼,而不是直接修改原method的位元組碼。

通過

Method::clone_with_new_data()

建立的新method複制了Relocator修改後的位元組碼(code_array)和原方法的參數表、異常表及stackmap等。

本案例中method的stackmap非常大(超過1MB),導緻每次擴充指令浪費大量記憶體。

// Construct the new method
  methodHandle new_method = Method::clone_with_new_data(method(),
                              code_array(), code_length(),
                              compressed_line_number_table(),
                              compressed_line_number_table_size(),
                              CHECK_(methodHandle()));           

小結

這是一個由class constant pool引發的血案,修改起來很簡單,要弄明白卻不容易。

通過深入分析JVM retransformClasses的處理過程,我們對JVM的class位元組碼熱更新有初步了解,加深了對class常量池的認識。另外也掌握了一些檢視JVM Metaspace 和 VM日志的指令,日後在遇到類似的問題可以繼續發掘

jcmd

指令,在新版的JDK 15/16中提供了更多的功能。