First, why GC is needed

Program applications need to use memory to run, and two partitions of memory are concepts we often discuss: stack and heap.

The stack area is a linear queue that is automatically freed when the function finishes running, while the heap area is free dynamic memory space, heap memory is manually allocated and freed or automatically allocated by the Garbage Collection (hereinafter referred to as GC).

In the early days of software development or some languages, manual allocation and freeing of heap memory, such as C, C++. Although it can accurately operate the memory to achieve the best possible memory usage, the development efficiency is very low, and it is also prone to improper memory operation.

As technology evolves, none of the high-level languages (such as Java Node) require the developer to manually manipulate memory, and the programming language automatically allocates and frees space. At the same time, the GC (Garbage Collection) garbage collector was born to help free up and organize memory. In most cases, developers do not need to care about the memory itself, and can focus on business development. The following article focuses on heap memory and GC.

2. GC development

GC running will consume CPU resources, and the process of GC running will trigger STW (stop-the-world) to suspend business code threads, why STW? This is to ensure that there is no conflict with the newly created object during the GC process.

GC mainly evolves with the increase of memory size. It is roughly divided into 3 major representative stages:

Phase 1 Single-threaded GC (stands for: serial)

Single-threaded GC, which must completely pause all other worker threads while it is garbage collected, is the initial stage of GC and the worst performing

Stage 2 Parallel Multi-Threaded GC (Representative: Parallel Scavenge, ParNew)

In a multi-CPU environment, multiple GC threads are running in parallel at the same time, so that the time for garbage collection is reduced, the time for user threads to stall is also reduced, and this algorithm also STW completely suspends all other worker threads

Phase 3 Multi-threaded concurrent GC (stands for: CMS (Concurrent Mark Sweep) G1)

Concurrency here refers to the fact that GC multi-threaded execution can run concurrently with business code.

In the previous two stages of development, the GC algorithm will be fully STW, and in concurrent GC, there are some stages where GC threads can run concurrently with business code, ensuring a shorter STW time. But this pattern will have a labeling error, because there may be new objects coming in during the GC process, and of course the algorithm itself will correct and solve this problem

The above three phases do not necessarily mean that GC is one of the three described above. GC in different programming languages is implemented with a variety of algorithm combinations according to different needs.

3. v8 memory partition and GC

The heap memory design is closely related to the GC design. V8 divides the heap memory into several large areas and adopts a generational strategy.

Theft:

Memory and garbage collector (GC) for NodeJS V8 engine

New-space or young-generation: Small space, divided into two semi-spaces, in which the data lifetime is short.
Old-space or old-generation: Large space, incrementable, and the data in it has a long life span
Large-object-space: By default, objects larger than 256K will be in this space, as explained below
Code-space: This is where the just-in-time compiler (JIT) stores compiled code
Cell space: This space is used to store small, fixed-size JavaScript objects, such as numbers and booleans.
Property cell space: This space is used to store special JavaScript objects, such as accessor properties and certain internal objects.
Map Space: This space is used to store meta-information and other internal data structures used for JavaScript objects, such as Map and Set objects.

3.1 Generational strategies: new generation and old generation

In Node.js, GC adopts a generational strategy, which is divided into new and old generation regions, and most of the memory data is in these two regions.

3.1.1 Cenozoic

The Cenozoic is a small, fast, fast memory pool of objects that is divided into two semi-spaces, half of which is free (called to space) and the other half of which is where data is stored (called from space).

When objects are first created, they are assigned to the Cenozoic from halfspace, which has an age of 1. When the from space is insufficient or exceeds a certain size, Minor GC (using the replication algorithm Scavenge) is triggered, at which point GC suspends the execution of the application (STW, stop-the-world), marks all active objects in the (from space), and then organizes them to continuously move to another free space (to space) in the new generation. Finally, the memory of the original from space will be fully freed and become free space, and the two spaces will complete the swapping of from and to, and the replication algorithm is an algorithm that sacrifices space for time.

The space of the new generation is smaller, so this space will trigger GC more frequently. At the same time, the scanning space is smaller, the GC performance consumption is smaller, and its GC execution time is shorter.

Each time a minor GC completes surviving objects with an age of +1, objects that have survived multiple minor GCs (age greater than N) will be moved to the old generation memory pool.

3.1.2 Old Generation

The Prezoic is a large memory pool used to store long-lived objects. The memory of the old generation uses Mark-Sweep and Mark-Compact. One of its executions is called Mayor GC. When a certain proportion of objects in the Zozoic Era is full, that is, the ratio of surviving objects to total objects exceeds a certain threshold, a tag purge or tag compression is triggered.

Because it has more space, its GC execution time is also longer, and it is less frequent than the new generation. If there is still not enough space after the old generation completes GC recycling, V8 will request more memory from the system.

You can manually execute the global.gc() method, set different parameters, and actively trigger GC. However, it should be noted that by default, Node.js has this method disabled. If you want to enable it, you can turn it on by adding the --expose-gc parameter when you start the Node.js application, for example:

node --expose-gc app.js
复制代码

V8 mainly uses a combination of Mark-Sweep and Mark-Compact for garbage collection in the older generation.

Mark-Sweep means mark clearance, and it is divided into two stages, marking and purging. Mark-Sweep iterates through all objects in the heap during the labeling phase and marks the living objects, and in subsequent cleanup phases, only unmarked objects are purged.

The biggest problem with Mark-Sweep is that the memory space is discontinuous after a tag purge reclamation. This memory fragmentation can cause problems with subsequent memory allocations, because it is likely that a large object needs to be allocated, and all the fragmented space cannot complete this allocation, and garbage collection will be triggered early, which is unnecessary.

To solve the memory fragmentation problem of Mark-Sweep, Mark-Compact was proposed. Mark-Compact stands for markup and is an evolution of Mark-Sweep. The difference between them is that after the object is marked as dead, the living object is moved to one end during the defragmentation process, and the memory outside the boundary is directly cleared after the move is completed. V8 will also release a certain amount of free memory to return to the system according to certain logic.

3.2 Large object space

Large objects are created directly in the large object space and are not moved to other spaces. So how big an object will be created directly in the large object space, rather than in the Cenozoic from zone? Consulting the materials and source code finally found the answer. By default, it is 256K, V8 does not seem to expose the modification command, and the v8_enable_hugepage configuration in the source code should be set when packaging.

chromium.googlesource.com/v8/v8.git/+…

// There is a separate large object space for objects larger than
 // Page::kMaxRegularHeapObjectSize, so that they do not have to move during
 // collection. The large object space is paged. Pages in large object space
 // may be larger than the page size.
复制代码

source.chromium.org/chromium/ch…

(1 << (18 - 1)) 的结果 256K
(1 << (19 - 1)) 的结果 256K
(1 << (21 - 1)) 的结果 1M(如果开启了hugPage)
复制代码

4. V8 new and old partition size

4.1 Paleozoic partition size

Prior to v12.x:

In order to ensure that the execution time of GC remains within a certain range, V8 limits the maximum memory space and sets a default memory maximum for the older generation, which is about 1.4G in 64-bit systems and about 700M in 32-bit systems, exceeding which will cause the application to crash.

If you want to increase the memory, you can use --max-old-space-size to set the maximum memory (unit: MB)

node --max_old_space_size=
复制代码

After v12:

V8 will allocate the old generation size, which can also be said to be the heap memory size, based on the available memory, so there is no limit to the heap memory size. The previous restriction logic, in fact, is unreasonable, limiting the ability of V8, can not because the GC process takes longer time, do not let me continue to run the program, subsequent versions have also done more optimization of GC, memory is getting bigger and bigger is also a development need.

If you want to make restrictions, you can still use the --max-old-space-size configuration, after v12 its default value is 0, which means no limit.

Reference documentation: nodejs.medium.com/introducing...

4.2 Cenozoic partition size

A semi-space size in the new generation The default value for a 64-bit system is 16M, and a 32-bit system is 8M, because there are 2 semi-spaces, so the total size is 32M and 16M.

--max-semi-space-size

--max-semi-space-size Sets the maximum value of the new generation semi-space in MB.

This space is not bigger is better, the larger the space, the longer the scan takes. In most cases, this partition does not need to be modified, unless it is optimized for specific business scenarios and used carefully.

--max-new-space-size

--max-new-space-size sets the maximum space of the new generation in KB (does not exist)

There are many articles talking about this feature, I flipped through the documentation of v4 v6 v7 v8 v10 in the nodejs.org web page did not see this configuration, using node --v8-options did not check, maybe some old versions in the past have, and now should use --max-semi-space-size.

5. Memory analysis related APIs

5.1 v8.getHeapStatistics()

Run v8.getHeapStatistics() to view the v8 heap memory information and query the maximum heap memory heap_size_limit, of course, including the new, old and old generations, large object space, etc. My computer hardware memory is 8G, Node version 16x, and heap_size_limit is 4G.

{
  total_heap_size: 6799360,
  total_heap_size_executable: 524288,
  total_physical_size: 5523584,
  total_available_size: 4340165392,
  used_heap_size: 4877928,
  heap_size_limit: 4345298944,
  malloced_memory: 254120,
  peak_malloced_memory: 585824,
  does_zap_garbage: 0,
  number_of_native_contexts: 2,
  number_of_detached_contexts: 0
}
复制代码

Query the NodeJs application in the k8s container and view the v12, v14, and v16 versions respectively, as shown in the following table. It appears to be half of the current maximum memory of the system itself. At 128M, why is it 256M, because there is also swap memory in the container, and the actual maximum memory limit of container memory is the memory limit value x2, which has the same swap memory.

So the conclusion is that in most cases the default value of heap_size_limit is half of the system memory. However, if this value is exceeded and there is enough system space, V8 will still request more space. Of course, this conclusion is not the most accurate conclusion. And as memory usage increases, the maximum memory here will grow if the system memory is still sufficient.

Maximum container memory	heap_size_limit
4G	2G
2G	1G
1G	0.5G
1.5G	0.7G
256M	256M
128M	256M

5.2 process.memoryUsage

process.memoryUsage()
{
  rss: 35438592,
  heapTotal: 6799360,
  heapUsed: 4892976,
  external: 939130,
  arrayBuffers: 11170
}
复制代码

Through which it can view the memory occupation and usage of the current process heapTotal, heapUsed, you can obtain this interface regularly, and then draw a line chart to help analyze the memory occupation. The following are the features provided by Easy-Monitor:

It is recommended to use the local development environment, after opening, try a large number of requests, you will see the memory curve grow, and after the request ends, the memory curve will drop after the GC trigger, and then try to send a large number of requests many times, so that back and forth, if you find that the memory has been growing and the trough value is getting higher and higher, it may be a memory leak.

5.3 Turn on the Print GC event

How to use

node --trace_gc app.js
// 或者
v8.setFlagsFromString('--trace_gc');
复制代码

--trace_gc

[40807:0x148008000]   235490 ms: Scavenge 247.5 (259.5) -> 244.7 (260.0) MB, 0.8 / 0.0 ms  (average mu = 0.971, current mu = 0.908) task 
[40807:0x148008000]   235521 ms: Scavenge 248.2 (260.0) -> 245.2 (268.0) MB, 1.2 / 0.0 ms  (average mu = 0.971, current mu = 0.908) allocation failure 
[40807:0x148008000]   235616 ms: Scavenge 251.5 (268.0) -> 245.9 (268.8) MB, 1.9 / 0.0 ms  (average mu = 0.971, current mu = 0.908) task 
[40807:0x148008000]   235681 ms: Mark-sweep 249.7 (268.8) -> 232.4 (268.0) MB, 7.1 / 0.0 ms  (+ 46.7 ms in 170 steps since start of marking, biggest step 4.2 ms, walltime since start of marking 159 ms) (average mu = 1.000, current mu = 1.000) finalize incremental marking via task GC in old space requested
复制代码

GCType <heapUsed before> (<heapTotal before>) -> <heapUsed after> (<heapTotal after>) MB
复制代码

Scavenge and Mark-sweep above represent GC types, Scavenge is the clearing event in the Cenozoic, and Mark-sweep is the marker purge event in the Old Zoic. Before the arrow symbol is the actual used memory size before the event occurs, after the arrow symbol is the actual used memory size after the event, and the total memory space value in parentheses. You can see that the frequency of events in the Cenozoic generation is high, and the later triggered Zozoic events free up the total memory space.

--trace_gc_verbose

Displays the details of the heap space

v8.setFlagsFromString('--trace_gc_verbose');

[44729:0x130008000] Fast promotion mode: false survival rate: 19%
[44729:0x130008000]    97120 ms: [HeapController] factor 1.1 based on mu=0.970, speed_ratio=1000 (gc=433889, mutator=434)
[44729:0x130008000]    97120 ms: [HeapController] Limit: old size: 296701 KB, new limit: 342482 KB (1.1)
[44729:0x130008000]    97120 ms: [GlobalMemoryController] Limit: old size: 296701 KB, new limit: 342482 KB (1.1)
[44729:0x130008000]    97120 ms: Scavenge 302.3 (329.9) -> 290.2 (330.4) MB, 8.4 / 0.0 ms  (average mu = 0.998, current mu = 0.999) task 
[44729:0x130008000] Memory allocator,       used: 338288 KB, available: 3905168 KB
[44729:0x130008000] Read-only space,        used:    166 KB, available:      0 KB, committed:    176 KB
[44729:0x130008000] New space,              used:    444 KB, available:  15666 KB, committed:  32768 KB
[44729:0x130008000] New large object space, used:      0 KB, available:  16110 KB, committed:      0 KB
[44729:0x130008000] Old space,              used: 253556 KB, available:   1129 KB, committed: 259232 KB
[44729:0x130008000] Code space,             used:  10376 KB, available:    119 KB, committed:  12944 KB
[44729:0x130008000] Map space,              used:   2780 KB, available:      0 KB, committed:   2832 KB
[44729:0x130008000] Large object space,     used:  29987 KB, available:      0 KB, committed:  30336 KB
[44729:0x130008000] Code large object space,     used:      0 KB, available:      0 KB, committed:      0 KB
[44729:0x130008000] All spaces,             used: 297312 KB, available: 3938193 KB, committed: 338288 KB
[44729:0x130008000] Unmapper buffering 0 chunks of committed:      0 KB
[44729:0x130008000] External memory reported:  20440 KB
[44729:0x130008000] Backing store memory:  22084 KB
[44729:0x130008000] External memory global 0 KB
[44729:0x130008000] Total time spent in GC  : 199.1 ms
复制代码

--trace_gc_nvp

Details of each GC event, GC type, various time consumption, memory changes, etc

v8.setFlagsFromString('--trace_gc_nvp');

[45469:0x150008000]  8918123 ms: pause=0.4 mutator=83.3 gc=s reduce_memory=0 time_to_safepoint=0.00 heap.prologue=0.00 heap.epilogue=0.00 heap.epilogue.reduce_new_space=0.00 heap.external.prologue=0.00 heap.external.epilogue=0.00 heap.external_weak_global_handles=0.00 fast_promote=0.00 complete.sweep_array_buffers=0.00 scavenge=0.38 scavenge.free_remembered_set=0.00 scavenge.roots=0.00 scavenge.weak=0.00 scavenge.weak_global_handles.identify=0.00 scavenge.weak_global_handles.process=0.00 scavenge.parallel=0.08 scavenge.update_refs=0.00 scavenge.sweep_array_buffers=0.00 background.scavenge.parallel=0.00 background.unmapper=0.04 unmapper=0.00 incremental.steps_count=0 incremental.steps_took=0.0 scavenge_throughput=1752382 total_size_before=261011920 total_size_after=260180920 holes_size_before=838480 holes_size_after=838480 allocated=831000 promoted=0 semi_space_copied=4136 nodes_died_in_new=0 nodes_copied_in_new=0 nodes_promoted=0 promotion_ratio=0.0% average_survival_ratio=0.5% promotion_rate=0.0% semi_space_copy_rate=0.5% new_space_allocation_throughput=887.4 unmapper_chunks=124
[45469:0x150008000]  8918234 ms: pause=0.6 mutator=110.9 gc=s reduce_memory=0 time_to_safepoint=0.00 heap.prologue=0.00 heap.epilogue=0.00 heap.epilogue.reduce_new_space=0.04 heap.external.prologue=0.00 heap.external.epilogue=0.00 heap.external_weak_global_handles=0.00 fast_promote=0.00 complete.sweep_array_buffers=0.00 scavenge=0.50 scavenge.free_remembered_set=0.00 scavenge.roots=0.08 scavenge.weak=0.00 scavenge.weak_global_handles.identify=0.00 scavenge.weak_global_handles.process=0.00 scavenge.parallel=0.08 scavenge.update_refs=0.00 scavenge.sweep_array_buffers=0.00 background.scavenge.parallel=0.00 background.unmapper=0.04 unmapper=0.00 incremental.steps_count=0 incremental.steps_took=0.0 scavenge_throughput=1766409 total_size_before=261207856 total_size_after=260209776 holes_size_before=838480 holes_size_after=838480 allocated=1026936 promoted=0 semi_space_copied=3008 nodes_died_in_new=0 nodes_copied_in_new=0 nodes_promoted=0 promotion_ratio=0.0% average_survival_ratio=0.5% promotion_rate=0.0% semi_space_copy_rate=0.3% new_space_allocation_throughput=888.1 unmapper_chunks=124
复制代码

5.4 Memory snapshots

const { writeHeapSnapshot } = require('node:v8');
v8.writeHeapSnapshot()
复制代码

When a snapshot is printed, the STW service stops responding, and the larger the memory footprint, the longer it takes. This method itself is time-consuming, so the process of generating should not be too high, be patient.

Note: The process of generating memory snapshots, STW (program will pause) almost no response, if the container uses health detection, if it cannot respond, the container may be restarted, resulting in the inability to obtain snapshots, if you need to generate snapshots, it is recommended to turn off health detection first.

Compatibility issue: This API arm64 architecture is not supported, and the execution will jam the process Generate an empty snapshot file No more response, If you use the library heapdump, an error will be reported directly:

(mach-o file, but is an incompatible architecture (have (arm64), need (x86_64))

This API generates a .heapsnapshot suffix snapshot file, which can be used to import the snapshot file using the "memory" function of the Chrome debugger, and view the specific number and size of objects in heap memory, as well as the distance to the GC root node. You can also compare the differences between two different time snapshot files and see the amount of data changing between them.

6. Use memory snapshots to analyze memory leaks

A Node application is often restarted because the memory exceeds the container limit, and the curve of the application memory is always rising through the container monitoring background, which should be a memory leak.

Using the Chrome debugger, snapshots over time were compared. The largest increase in objects was found to be the closure function, and then expanded to view the entire list, and found that the more data was the mongo document object, in fact, the data in the closure function was not released, and then by viewing the Object list, it was found that there were also many objects, and the outermost details showed Mongoose's Connection object.

So far, it has been roughly located that there is a memory leak near the mongo data storage logic of a class.

There are also more Timeout objects, and from the GC root node distance, these objects are very deep. Click on the details and see that this layer of nesting is located in the exact position in the code. Because there is a timer task in that class that uses the setInterval timer to batch some non-urgent tasks, when a setInterval finishes things, it will be clearInterval.

Leak resolution and optimization

Through the analysis of code logic, the problem was finally found, which was that there was a problem with the trigger condition of clearInterval, which caused the timer to not be cleared and keep looping. The timer is always executed, and this code and the data in it are still in the closure and cannot be recycled by the GC, so the memory will grow larger and larger until the upper limit is reached and the crash is reached.

The use of setInterval here is not reasonable, and by the way, it is changed to use the for await queue to execute sequentially, so as to avoid the effect of a large number of concurrent transactions at the same time, and the code is much clearer. Since this piece of code is relatively old, it is not necessary to consider why setInterval was used in the first place.

After the release of the new version, after more than ten days of observation, the memory remained at an average of more than 100M, and the GC normally reclaimed the temporarily increased memory, showing a wave curve, and no leakage occurred.

So far, the memory snapshot has been used to analyze and resolve the memory leak. Of course, the actual analysis should be a little tortuous, the content of this memory snapshot is not easy to understand and not so direct. The display of snapshot data is type aggregation, and you need to find some clues by looking at different constructors, as well as internal data details, combined with your own code comprehensive analysis. For example, from the memory snapshot I got at that time, there is a lot of data that is closures, strings, mongo model classes, Timeout, Object, etc., in fact, these incremental data are from the problematic code and cannot be recycled by GC.

VI. Finally

GC implementations are different for different languages, such as Java and Go:

Java: If you know about the JVM (corresponding to Node V8), Java also adopts a generational strategy, and there is also an eden area in its new generation, and new objects are created in this area. The V8 new generation does not have an eden zone.

Go: Adopt marker clearing, three-color marking algorithm

GC implementations are different in different languages, but they are essentially implemented using a combination of different algorithms. In terms of performance, different combinations bring different performance efficiency in all aspects, but they are all trade-offs, but they are biased towards different application scenarios.

Memory and garbage collector (GC) for NodeJS V8 engine