JVM Basics

Learn about HotSpot

When we talk about JVM in our daily life, we generally refer to the Hotspot virtual machine. Java originally compiled the source code into bytecode and executed it in a virtual machine, which was relatively slow to execute. Hotspot, on the other hand, compiles bytecode into native code, which improves overall operational efficiency.

HotSpot consists of an interpreter and two compilers (client or server) in a hybrid mode of interpreter and compiler.

Compiler: Compiles the source code into bytecode

Interpreter: Used to parse bytecode (as the book says)

Compiler-client: fast startup, small memory occupation, execution efficiency is not as high as server, dynamic compilation is not enabled by default, and is suitable for desktop applications

Compiler-server: slow startup, large memory occupation, high execution efficiency, dynamic compilation enabled by default, suitable for server-side applications You can use the following command to view the execution engine of the current virtual machine:

$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)


$ java -Xint -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, interpreted mode)


$ java -Xcomp -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, compiled mode)

Java compilation principle

What is bytecode, machine code, native code

Bytecode: .class files. The javac tool compiles the java file and generates a .class file that is the bytecode

Machine Code: Machine Instruction. The language that the operating system is able to recognize Local code: machine instructions. The language that the operating system is able to recognize

Compilation process

Summary of JVM virtual machine knowledge points

C and C++ languages compile source code into machine language so that the machine can execute it directly. The disadvantage is that it needs to adapt to different operating systems, and it is impossible to achieve "compile once and run everywhere"

The java language compiles the source code into a bytecode file, adapts it through different versions of the JVM, and finally generates machine instructions, realizing "once compiled, run everywhere". In fact, the most important instruction translation operations are hidden in the JVM, so programmers only need to think about writing their own code, and don't have to worry about the impact of different operating systems.

JIT

Background to the generation of JIT

The JVM translates bytecode into machine language through an interpreter, read it line by line, and translate it line by line. After interpretation execution, its efficiency is much slower than that of executable binary programs, which is the function of the traditional JVM interpreter (Interpreter), in order to solve the problem of slow execution, JIT just-in-time compilation technology came into being.

JIT just-in-time compiler

JIT (Just In Time) can compile the hot code into machine code related to the local platform when the JVM discovers it, and perform optimization at various levels, so as to improve the execution efficiency of the code.

Hot code: A method or block of code that is executed frequently
Purpose: To improve the efficiency of code execution

When the JVM executes code, the JIT does not compile it immediately. If this code is only run once, it is much more efficient to have the interpreter translate into machine language. JIT compilation is time-consuming and only applies to frequently accessed code. Therefore, the use of interpreter + compiler mixed mode can give full play to their respective advantages.

Hotspot code mechanism

As mentioned earlier, JIT compiles hotspot code into machine code to improve code execution efficiency. So how is the hotspot code determined?

Sampling-based hot spot detection: Periodically check the top of the stack of each thread and find that a certain method frequently appears at the top of the stack
Counter-based hotspot probing (HotSpot adoption): Creates a counter for each method (code block) that counts the number of executions of the method, and considers it a hotspot code when it exceeds a certain threshold.

Counter hotspot detection

Counter = Method Counter + Edgeback Counter, when the counter reaches the threshold, compilation is requested from the compiler.

Method counter: As the name suggests, it is to record the number of times a method has been called (PS: The method counter counts the number of calls within a certain period of time, when a certain time is exceeded, and the threshold is still not reached, then the counter will be halved, and this process is also called "heat decay")
Returning edge counter: used to count the number of executions of the loop body, and the command that controls the backward jump of the flow when encountered in the bytecode is called "return edge"

JVM runtime data area

Runtime data area: the storage area of the Java program running process data, which is divided into 5 areas (method area, heap, virtual machine stack, local method area, and program counter)

The method area stores data such as class information, constant pools, static variables, and local code after JIT compilation

JAVA HEAP

The heap is the largest chunk of memory in the JVM
All object instances and arrays are allocated here

Virtual machine stacks

The virtual machine stack is thread-private
The life cycle of the virtual machine stack is the same as that of the thread Explanation: There is a unit in each virtual machine stack, and the unit is the stack frame, and one method is one stack frame. In a stack of frames, he also needs to store, local variables, operand stacks, dynamic links, exits, etc. Parsing stack frames:

Local variable table: It is used to store our temporary 8 basic data types, object reference addresses, and returnAddress types. (The returnAddress is the instruction address of the bytecode to be executed after return.) ）

Operand stack: The operand stack is used for manipulation, for example, there is an i = 6*6 in the code, he will operate at the beginning, read our code, calculate it, and then put it into the local variable table

Dynamic linking: If I have a service.add() method in my method, and I want to link to another method, this is the dynamic link, where the link is stored.

Exit: What is the exit, if the exit is normal, it is return, if it is not normal, it is throwing an abnormal fall

Local method area

Methods that are modified by native
The underlying layer is the C or C++ implementation, which implements the instructions related to the operating system
The lifecycle of a local method area is the same as that of a thread

Program counters

The program counter has a small memory, and it is used to record the address of the current thread executing the bytecode instruction
Since JVM multithreading is achieved by allocating CPU execution time through thread switching, in order to return to the correct position after switching threads, a program counter is added to record the execution address of the current thread. Each thread assigns program counters independently and does not affect each other.
Program counters have the same lifespan as threads.

Garbage collection mechanism

When programming in C or C++ languages, programmers often need to write code to release memory space, which is difficult to get started and is prone to memory leakage and memory overflow.

Therefore, the Java language is designed to encapsulate memory management in the JVM, so programmers do not need to care about how the memory is recycled, and only need to write business code in the upper-layer application. This memory management pattern is known as garbage collection (GC).

How to determine whether an object is spammy?

I believe you are aware that the common algorithms for judging garbage objects include reference counting method and reachability analysis

1. Cite notation

How it works: When an object is created, bind a counter to the object. If the number of references to an object is 0, it means that the current object has no reference and is judged to be a garbage object.

Advantages: The reference counting method is simple to implement and highly efficient

Disadvantages: Unable to solve the problem of circular referencing Generally not recommended, at least not in the mainstream JVMs.

// 引用计数法案例
class User{
    User user;
}
public static void main(String[] args){
    User a = new User();
    User b = new User();
    // 循环引用
    a.user = b;
    b.user = a;
    // a、b指向null
    a = null;
    b = null;
    // 此时堆空间中循环引用对象仍然存在，无法回收
}

2. Accessibility analysis

How it works: Start from the root node and search downward, and the path taken to search is called the reference chain. When an object does not have any reference chain to the root node, it means that the object is unavailable and can be recycled. The root node is also called GC Roots

So what exactly is GC ROOTS? What kind of objects can be judged as GC Roots?

1、虚拟机栈（栈帧中的本地变量表）中引用的对象；
2、方法区中类静态属于引用的对象；
3、方法区中常量引用的对象；
4、本地方法栈中JNI（即一般说的Native方法）引用的对象。
...

Garbage collection algorithms

When an object is identified as garbage, garbage cleanup can begin. Common garbage collection algorithms are available in the following ways

Coping
标记-清除（Mark-Sweep）
标记-整理（Mark-Conpact）
Generational recycling algorithm

Mark - Clear

The mark-and-purge algorithm is divided into two stages

Marking stage: Through garbage identification, the objects on the reference chain are marked starting from GC Roots, and the unmarked objects are garbage objects

Purge phase: Memory space is reclaimed for unmarked garbage objects

Disadvantages:

Neither the marking nor the removal phase is efficient
Low space utilization. Cleanup generates a lot of space debris, and garbage collection has to be retriggered when memory needs to allocate a relatively large object

Marking - Finishing

Marking-decluttering is an improved version of mark-and-clean, which mainly solves the problem of memory fragmentation

Marking stage: Through garbage identification, the objects on the reference chain are marked starting from GC Roots, and the unmarked objects are garbage objects

Tidying Phase: Organizes tagged objects to one end of memory

Purge phase: Memory space is reclaimed for unmarked garbage objects

Advantages: Reduces memory fragmentation and can store more large objects Disadvantages: Compared with the mark-and-clear algorithm, the collation algorithm has one more memory movement operation, which will reduce efficiency

Replication algorithms

To improve efficiency, either time for space, or space for time

The mark-clear and mark-and-clean algorithms are relatively inefficient because they involve two traversals. So there is a replication algorithm

How it works: The replication algorithm divides the memory into two regions at a 1:1 ratio, and only one active area is allowed to be used at the same time. When garbage collection is triggered, the following stages are entered

Replication Phase: Find the surviving object through GC Roots, copy it to another memory space, and mark this space as the active area

Purge Phase: Empty the old active area (mostly time-consuming)

Brief summary:

Time complexity comparison: copy > marker - clear > mark - tidy
Spatial complexity comparison: mark-clear ~= mark-grooming > copy

Technology is always at the service of the business

When we do garbage collection, we can only consider the mark-collation or assignment algorithm considering that there can't be too much space debris, because space fragmentation leads to frequent GC.

Imagine, if you have a piece of memory space,

If there are a large number of objects that die in the space, there are many garbage objects and few surviving objects. In this case, we can use the replication algorithm to copy only a small number of surviving objects, which is the most efficient.
If the aging of the space is severe, there are few garbage objects and many surviving objects. In this case, it is not advantageous to use the replication algorithm, and the replication of many objects will lead to the overall performance degradation. So for the memory space where a large number of objects survive, we use a mark-and-collation algorithm
Cause:
a. High memory utilization;
b. When there are a large number of surviving objects, the performance of the replication algorithm and the mark-collation algorithm is similar. If you think about it, if 98% of the objects in memory are alive, 98% of the data needs to be copied in a traversal through the replication algorithm, which has to kill the JVM; The mark-collation algorithm only marks and organizes the memory in the first traverse, and the second traversal only needs to remove fewer garbage objects, so the overall efficiency is higher than that of the replication algorithm.

Hence we come to a conclusion

When the survival rate of the object is low, the replication algorithm is used

When the object survival rate is high, use the mark-and-clean algorithm (or mark-and-clear algorithm)

After analyzing a lot, this doesn't mention the old master's mark-clear, mark-clear algorithm is about to be abandoned?

In fact, each algorithm has its own business scenarios. When the survival rate of objects is high and all of them are small objects, it seems that one more step of sorting is a bit of a waste of performance, and it is more appropriate to use the mark-and-organize algorithm in this case. However, there are generally not many such scenarios, which creates the illusion that the mark-and-collation algorithm is better than the mark-and-clear algorithm

Generational recycling algorithm

The band-collection algorithm is an abstract concept, which is not a concrete implementation, but classifies and combines other garbage collection algorithms to improve the overall performance of the virtual machine. As mentioned earlier, the garbage collection algorithm is related to the survival rate of objects, so the generational recycling algorithm divides objects into two regions: young generation (low survival rate of objects) and old generation (high survival rate of objects).

Younger generation

The younger generation uses a replication algorithm, but does not allocate memory 1:1, which is too low for 50%. In order to solve the problem of low memory utilization, the gods of the Internet have sacrificed magic weapons, and finally produced the best partitioning method for the younger generation through data calculation: eden : survivor1 : survivor = 8:1:1 (for the sake of easy description, the following S represents survivor)

名词解释
eden:伊甸区。你可以想象到亚当和夏娃的故事，伊甸园作为一切生命的起源。因此，所有对象都在这个区域产生
s1: 幸存者1区
s2: 幸存者2区

Workflow: Only one area of S1 and S2 is active at the same time, and the other is used as a spare tire (why do I think of a spare tire?). Don't ask, if you ask, you just don't know)

First, the JVM will set S1 to be active when it starts, and when the object is created, the JVM will request memory from the Eden zone. Minor GC is triggered when the eden memory space is insufficient to allocate new memory

Old days

In order to prevent waste of space, only one piece of content space is used in the old age, so you can choose two methods: "mark-clear" or "mark-organize" when pulling and recycling. The specific implementation method depends on the garbage collector selected during the execution of the JVM

Common garbage collectors

What is a garbage collector?

The garbage collector refers to the specific implementation of mark-cleanup, replication algorithms, and mark-and-clean

Younger generation

Serial、ParNew、Parallel Scavenge

Old days

CMS、Serial Old、Parallel Old

年轻代-Serial

Single-threaded garbage collector, only one CPU can be used for garbage collection, and when GC, other threads will enter a pause state Features:

Use the replication algorithm
It is suitable for a single-CPU environment, does not have the cost of context switching, and runs efficiently

-XX:+UseSerialGC, set serial collector, young generation collector

Young Generation - ParNew

Multi-threaded garbage collector, multiple threads are started at the same time for garbage collection, and when GC, other threads will enter a paused state Features:

Adopt a replication algorithm
Suitable for multi-CPU environments
Can only be used with a CMS

-XX:+UseParNewGC, set parallel collector, young generation collector

年轻代-Parallel Scavenge

Multi-threaded collector with adjustable program throughput Features:

Adopt a replication algorithm
Suitable for multi-CPU environments
Adjustable throughput

-XX:+UseParallelGC, set up a parallel collector with the aim of achieving controllable throughput, the younger generation collector

Note: Throughput = Program running time/(Program running time + GC time) If the program running time is 99 seconds and the GC time is 1 s, then the throughput = 99/(99+1) = 99%

Parallel Scavenge uses the -XX:MaxGCPauseMillis and -XX:GCTimeRatio parameters to control throughput -XX:MaxGCPauseMillis: sets the maximum GC pause time in ms, which is suitable for high-use scenarios. Programmers advise against changing this value, as the JVM will automatically adjust it. Theoretically, high throughput can be obtained by reducing the GC pause time, but the frequency of MinorGC calls goes up, thus reducing the throughput of the system. Therefore, the parameter MaxGCPauseMillis is more difficult to control.

老年代-Serial Old

Single-threaded collector

Peculiarity:

A marking-collation algorithm is employed
Single-threaded collector

-XX:+UseSerialOldGC, set serial collector, old collector

老年代-Parallel Old

Peculiarity:

A marking-collation algorithm is employed
Multithreaded collector

-XX:+UseParallelOldGC, set parallel collector, old collector

Old Age - CMS

Peculiarity:

A mark-and-purge algorithm is employed
Concurrent collection, low pause
GC, the worker thread is not paused

-XX:+UseConcMarkSweepGC to set concurrent collectors, old collectors

G1

Generational recyclers

Peculiarity:

Multithreaded collector
Concurrent GC, with worker colleagues
Garbage collection is carried out by the younger generation and the older generation
The replication algorithm and the mark-and-collation algorithm are adopted

-XX: +UseG1GC to set the G1 collector, jdk1.9 is used by default

Memory tuning

VM options

Or you have seen parameters such as xmx, xms, and xss, and know that they are used for VM memory tuning, but you don't know what they mean or what they can be adjusted. Then this article is perfect for you to read. Let's take a look at the VM options first, of three:

-: Standard VM option, option for VM specification
-X: Non-standard VM option, not guaranteed to be supported by all VMs
-XX: Advanced option, advanced feature, but unstable option

Common JVM parameters

-X parameter

-Xmx(memory maxium)：最大堆内存，等同于 -XX:MaxHeapSize
-Xms(memory startup)：初始化堆内存大小
-Xmn(memory new)：堆中年轻代初始大小，可具体细化，初始化大小用-XX:NewSize,最大大小用-XX:MaxNewSize
-Xss(stack size):线程栈大小,等同于 -XX:ThreadStackSize

-XX parameter

-XX:NewSize=n: sets the size of the younger generation
-XX:NewRatio=n: Sets the ratio of the younger generation to the older generation. For example, n=2 means that the ratio of the younger generation to the older generation is 1:2.
-XX:SurvivorRatio=n: 年轻代 ⁄Eden区/Suvivor区。比如n=4,代表Eden:Survivor=4:1*2(有两个Survivor区)
-XX:-UseAdaptiveSizePolicy: Cancels the default dynamic setting of Ratio, only if this option is set, the above option will take effect
-XX: MaxPermSize=n: Set the persistent generational size
-XX:+PrintTenuringDistribution:打印Tunuring年龄信息
-XX:+HeapDumpOnOutOfMemoryError: OOM时输出heap dump
-XX:HeapDumpPath=${目录}参数表示生成DUMP文件的路径，也可以指定文件名称，例如：-XX:HeapDumpPath=${目录}/java_heapdump.hprof。如果不指定文件名，默认为：java_<pid><date><time>_heapDump.hprof。

Tuning summary

In practice, we can directly equal the initial heap size to the maximum heap size, which has the advantage of reducing the number of garbage collections when the program is running, thus improving efficiency.
The larger the initial heap value and the maximum heap memory, the higher the throughput, but it also needs to be compared according to the actual memory of your computer (server).
It is best to use a parallel collector because it is faster and faster than serial throughput. Of course, the server must be multi-threaded
Set the ratio of new generation to old generation in heap memory to 1:2 or 1:3. The default is 1:2
Reduce the recycling of old age by GC. Set the maximum age of garbage objects in the new generation, and do not input java objects with a large amount of continuous memory space, because it will directly go to the old age, and if the memory is insufficient, GC will be executed

Note: In fact, the most important thing is that the server is good, your hardware can't keep up, and the software is useless no matter how good it is Note: The old GC is very slow, and the new generation has nothing to do Note: The default JVM heap size seems to be about a quarter of the actual memory of the computer

classloader

Class loading process

Load-link-initialize

Loading First, the JVM reads the class file into memory, converts the static data into a data structure in the method area, and generates a corresponding java.lang.Class file in the heap.
Connection When the class is loaded, the system generates the class file, and then the join phase begins. The connection phase is responsible for loading the binary data of the class into the JRE (i.e. loading the binary data of the java class into the JVM runtime environment). The connection phase is further divided into the following three processes:
2-1. Verify: Check whether the loaded class file conforms to the JVM specification. Mainly check the class binary data format.
2-2. Preparation: Responsible for allocating memory for the static variables of the class and setting the default initial values.
2-3. Parsing: Replace the symbolic reference in the binary data of the class with a direct reference. To illustrate: Symbolic references: Symbolic references are a set of symbols that describe the object being referenced, and the symbols can be literals of any literal form, as long as there is no conflict and can be located. Layout has nothing to do with memory. Direct Reference: A pointer, offset, or handle to a target. This reference is related to the in-memory layout and must be loaded.
Initialize Gives the correct initial value to the static variable of the class.

PS: It should be noted that the preparation phase and the initialization phase are not contradictory, let's look at an example.

class Demo{
    private static int a = 10;
}

Phase 1 - Loading: The JVM loads the Demo.class file and generates the class object in the method area

Phase 2.1 - Verification: Check the security of the Demo.class file, verify the data format, etc

Phase 2.2 - Preparation: Find the static variable a, and a is of type Int, so give a a default initial value of 0 (the default initial value of the integer)

Phase 2.3 - Parsing: Replacing character references with direct references (which can be understood as pointer pointers)

Phase 3 - Initialization: Find the static variable a, execute the assignment expression, and assign 10 to a

classloader

Start the classloader
Extended classloader
Apply the classloader
User-defined classloader

1. Bootstart ClassLoader It is used to load the core classes of java, written in native code (C or C++), and is responsible for loading all class files in jre/lib/rt.jar. There is no parent loader

2. 扩展类加载器(Extension ClassLoader) 扩展类加载器是指Sun公司(后被Oracle收购)实现的sun.misc.Laucher$ExtClassLoader类,由java语言实现。它负责加载jre/lib/ext目录下的所有jar包父类加载器为null

3. Application ClassLoader (Application ClassLoader) is also known as System ClassLoader, which is responsible for loading jar packages and class files in the -classpath path (generally the user's java project). The application class loader inherits from the extension class loader, and if there is no custom class loader specified, the application class loader is used by default. The parent classloader is Extension ClassLoader

4. 自定义类加载器(Custom ClassLoader) 用户自己编写的类加载器,需要继承ClassLoader。父类加载器为Application ClassLoader

Parental delegation mechanism

Class loading relies on the Parents Delegate: when a class is about to be loaded, the classloader doesn't load it right away. If the current classloader has a parent class, it will let the parent class load it up to the topmost Bootstart ClassLoader. When the parent class fails to load, the child class is allowed to load. (It can also be understood as: what your parents can't do, you can handle it yourself)

The parental delegation mechanism has security considerations to ensure that the types defined in the Java core API will not be replaced. Suppose a class named java.lang.Integer is passed over the network and transferred to the boot class loader through parental delegation, and the boot class loader finds that the class has already been loaded, and will not load the java.lang.Integer transferred from the network, and directly returns the loaded Integer.class

Author: Yan Yiqiang

Source-WeChat public account: Sanqi Mutual Entertainment Technical Team

Source: https://mp.weixin.qq.com/s/GFozIQqu5fKb1xCWqgKSgQ

Summary of JVM virtual machine knowledge points

Java compilation principle

What is bytecode, machine code, native code

Compilation process

JIT

Background to the generation of JIT

JIT just-in-time compiler

Hotspot code mechanism

JVM runtime data area

Garbage collection mechanism

How to determine whether an object is spammy?

Garbage collection algorithms

Mark - Clear

Marking - Finishing

Replication algorithms

Generational recycling algorithm

Younger generation

Old days

Common garbage collectors

What is a garbage collector?

年轻代-Serial

Young Generation - ParNew

年轻代-Parallel Scavenge

老年代-Serial Old

老年代-Parallel Old

Old Age - CMS

G1

Memory tuning

VM options

Common JVM parameters

Tuning summary

classloader

Class loading process

classloader

Parental delegation mechanism