English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Detailed Explanation of Java GC Mechanism and Memory Allocation Strategy
Collection algorithms are the methodology of memory recycling, and garbage collectors are the concrete implementation of memory recycling
Automatic memory management solves the problem of: allocating memory to objects and reclaiming the memory allocated to objects
Why do we need to understand and learn about GC and memory allocation?
With the help of the JVM's automatic memory management mechanism, it is no longer necessary to write a paired delete for each new operation/Free code. However, when memory leaks and overflow problems occur, if you do not understand how the virtual machine uses memory, it will be a very difficult task to debug errors.
Before the GC (garbage collector) recycles the heap, it first determines which objects are 'alive' and which have already 'died'. Therefore, there is the object survival determination algorithm.
Object survival determination algorithm
Reference counting algorithm:
Algorithm idea: Add a reference counter to the object, and increase the counter value every time it is referenced by somewhere1When the reference becomes invalid, the counter value is reduced1At any moment, an object with a counter of 0 is impossible to be used again.
Advantages: Simple implementation, and the judgment efficiency is also very high
Disadvantages: It is difficult to solve the problem of circular references between objects.
Reachability analysis algorithm:
Algorithm idea: Starting from a series of 'GC Roots' objects as starting points, search downward from these nodes, and the path searched is called the reference chain. When an object has no reference chain connected to GC Roots, it proves that this object is unusable.
As shown in the figure: object5、object6、object7Although they are related to each other, they are unreachable to GC Roots, so they will be judged as recyclable objects.
Objects that can be GC Roots include the following:
1. Objects referenced by the virtual machine stack
2. Objects referenced by class static attributes in the method area
3. Objects referenced by constant references in the method area
4. Objects referenced by JNI in the local method stack
To live or to die?
Even objects that are unreachable in the reachability analysis algorithm are not 'unavoidably dead'. At this time, they are temporarily in a 'probation' stage. To truly declare an object dead, it must go through at least two marking processes:
If an object, after performing reachability analysis, finds that there is no reference chain connected to GC Roots, it will be marked for the first time and filtered, the filter condition is whether this object needs to execute the finalize() method. When the object does not override the finalize() method, or the finalize() method has been called by the virtual machine, the virtual machine considers both cases as 'no need to execute'.
If this object is determined to need to execute the finalize() method, then this object will be placed in a called F-In the Finalizer Queue, and later executed by a low-priority Finalizer thread automatically established by the virtual machine, where the term 'execution' refers to the virtual machine triggering this method, but does not promise to wait for it to complete. The reason for this is that if an object performs slowly in the finalize() method or encounters a deadlock, it may very likely cause the F-The other objects in the Finalizer Queue are permanently waiting, which may even cause the entire memory collection system to crash.
The finalize() method is the last chance for an object to escape its fate of death, later the GC will be executed on the F-The objects in the Queue are marked for the second time on a small scale, if an object wants to successfully save itself in the finalize() method, it only needs to establish an association with any object on the reference chain, for example, assigning itself (the this keyword) to a class variable or an object's member variable, then it will be removed from the 'about to be collected' set during the second marking. If the object has not escaped at this point, it is basically collected.
In addition, the finalize() method of any object will only be automatically called by the system once. All the work that can be done by finalize() can be done using try-Finally, or other methods can be better and more timely, so it is recommended that everyone can completely forget about the existence of this method in the Java language. See 'In-Depth Understanding of the Java Virtual Machine'.
Garbage collection algorithm
Marking-Sweeping algorithm:
The algorithm is divided into two stages: 'marking' and 'sweeping'.
First, mark all objects that need to be recycled, and then recycle all marked objects uniformly after marking is completed.
Shortcomings:
1.Efficiency issue, the efficiency of both marking and sweeping processes is not high.
2.Space issue: After marking and sweeping, a large number of discontinuous memory fragments will be produced. Too many space fragments may cause it to be unable to find enough continuous memory when a larger object needs to be allocated during program execution, and may have to trigger another garbage collection action prematurely.
Copying algorithm:
Algorithm implementation: Divide the available memory into two blocks of equal size, using only one block at a time. When this block of memory is used up, the surviving objects are copied to the other block, and then the used memory space is cleaned up once.
This makes it so that each time the entire half-space is used for memory recycling, memory allocation does not need to consider complex situations such as memory fragmentation. It is only necessary to move the heap top pointer and allocate memory in order. The implementation is simple and runs efficiently. However, the cost of this algorithm is that the memory is reduced to half of the original, which is too high.
Marking-Compaction algorithm:
Algorithm implementation: Mark all objects that need to be recycled, and let all surviving objects move to one end. Then, directly clean up the memory outside the end boundary.
Generational collection algorithm:
Algorithm implementation: The memory is divided into several blocks according to the different survival cycles of objects. Generally, the Java heap is divided into the young generation and the old generation, so that the most appropriate collection algorithm can be adopted according to the characteristics of each generation.
In the young generation, it is found that a large number of objects die each time garbage collection occurs, with only a small number surviving. In this case, the copying algorithm is chosen. The collection can be completed by paying only the cost of copying a small number of surviving objects.
And in the old generation, because the survival rate of objects is high and there is no additional space allocated to guarantee it, it is necessary to use 'marking'.-Cleaning or marking-The 'compaction' algorithm is used for recycling.
Young generation: It is divided into three spaces, one Eden space, and two survivor spaces.
The vast majority of the most recently created objects are allocated here. Since most objects become unreachable soon after creation, many objects are created in the young generation and then disappear. The process of objects disappearing from this area is what we call 'minor GC'.
Old generation: Objects that have not become unreachable and have survived from the young generation will be copied here. The space they occupy is more than that of the young generation. It is also due to its relatively large space that the garbage collection (GC) that occurs in the old generation is much less than that in the young generation. The process of objects disappearing from the old generation is what we call 'major GC' (or 'full GC').
The vast majority of objects created just recently will be placed in the Eden space. After the first GC is executed in the Eden space, the surviving objects will be moved to one of the survivor spaces. After that, after GC is executed in the Eden space again, the surviving objects will accumulate in the same survivor space. When a survivor space is saturated, the surviving objects will be moved to another survivor space. Then, the saturated survivor space will be cleared.
Objects that still survive after repeating the above steps will be moved to the old generation.
Garbage Collector
As shown in the figure, garbage collectors acting on different generations. If there is a connection between two collectors, they can be used together. The area of the virtual machine indicates whether it belongs to the young generation collector or the old generation collector.
Before learning about various garbage collectors, let's first understand 'Stop the World'. 'Stop the World' occurs in any GC algorithm. 'Stop the World' means that the JVM has stopped the execution of the application to perform garbage collection. When 'Stop the World' occurs, all threads except the GC threads are in a waiting state until the GC task is completed. GC optimization often refers to reducing the time 'Stop the World' occurs.
Understanding 'Stop the World' in this way is very vivid: when your mother is cleaning your room, she will definitely ask you to sit quietly on the chair or wait outside the room. If you throw paper屑 while she is cleaning, can the room be cleaned up?
Serial Collector: Single-threaded, young generation collector, using copying algorithms. It will only use one CPU or one collection thread to complete the garbage collection task. During garbage collection, it must 'Stop the World', pausing all other working threads until it is completed.
ParNew Collector: The multi-threaded version of the Serial collector, with identical control parameters, collection algorithms, Stop the World, object allocation rules, and garbage collection strategies as the Serial collector.
Parallel Scavenge Collector: A young generation collector that uses copying algorithms and parallel multi-threading.
Serial Old Collector: The old generation version of the Serial collector, single-threaded, using 'marking'.-Garbage Collection Algorithm.
Parallel Old Collector: The old generation version of the Parallel Scavenge collector, multi-threaded, using 'marking'.-Garbage Collection Algorithm
CMS Collector: A collector that aims to achieve the shortest pause time for garbage collection, based on 'marking'.-Cleaning algorithm. The operation process is divided into four steps: initial marking, concurrent marking, re-marking, and concurrent clearing.
Initial marking and re-marking still need to 'Stop the World'. Initial marking only marks the objects that can be directly associated with GC Roots, which is very fast. The concurrent marking stage is the process of GC Roots Tracing, and the re-marking stage is to correct the marking records of the part of the objects that have changed due to the continued operation of the user program during the concurrent marking period. The pause time of this stage is generally slightly longer than that of the initial marking stage, but much shorter than that of the concurrent marking period.
The entire process takes the longest time in the concurrent marking and concurrent clearing process, and the collector thread can work together with the user thread, so the memory recycling process of the CMS collector is executed concurrently with the user thread.
Advantages: concurrent collection, low pause
Disadvantages: very sensitive to CPU resources, unable to handle floating garbage, based on the marking-sweep algorithm, a large amount of control fragments are produced at the end of collection
G1Collector: G1The collector is one of the most advanced achievements in the development of collector technology today, a garbage collector oriented to server-side applications.
G1Characteristics: parallel and concurrent, generational collection, space integration, predictable pause
The operation process is as follows: initial marking, concurrent marking, final marking, and screening recycling.
The initial marking stage only marks the objects that can be directly associated with GC Roots, and modifies the value of TAMS, so that the user program can create new objects in the correct and available Region when it runs concurrently in the next stage. This stage requires stopping threads, but takes a very short time.
The concurrent marking stage starts from the GC Roots to perform reachability analysis on the objects in the heap, finding the surviving objects. This stage takes a long time, but can be executed concurrently with the user program.
The final marking stage is to correct the part of the marking records that have changed due to the continued operation of the user program during the concurrent marking period. The virtual machine records the object change during this period in the thread Remembered Set Logs. The final marking stage needs to merge the data of Remembered Set Logs into Remembered Set, which requires stopping threads, but can be executed in parallel.
Finally, in the screening recycling stage, the recycling value and cost of each Region are sorted first. The recycling plan is formulated according to the expected GC pause time by the user.
Memory allocation and recycling strategy
Objects are preferably allocated in the Eden space: Most of the time, objects are allocated in the young generation Eden space. When there is not enough space in the Eden space for allocation, the virtual machine will initiate a Minor GC.
Large objects directly enter the old generation: Large objects refer to Java objects that require a large amount of contiguous memory control, the most typical of which are very long strings and arrays.
Long-term surviving objects will enter the old generation: The virtual machine adopts the idea of generational collection to manage memory, so when memory recycling occurs, it must be able to identify which objects should be placed in the young generation and which should be placed in the old generation. In order to achieve this, the virtual machine defines an object age counter for each object. If an object is born in Eden and survives after the first Minor GC, and can be accommodated in the Survivor, it will be moved to the Survivor space, and the object age will be set to1, the age of an object in the Survivor area will increase every time it "survives" a Minor GC1years old, when its age increases to a certain degree (the default is15years old, it will be promoted to the old generation.
Dynamic object age determination: The virtual machine does not always require that the age of an object must reach MaxTenuringThreshold before it can be promoted to the old generation. If the total size of all objects of the same age in the Survivor space is greater than half of the Survivor space, objects with an age greater than or equal to that age can directly enter the old generation without waiting for the age required by MaxTenuringThreshold.
Space allocation guarantee: Before a Minor GC occurs, the virtual machine will first check whether the maximum available continuous space in the old generation is greater than the total space of all objects in the young generation. If this condition is met, then the Minor GC can ensure it is safe. If not, the virtual machine will check whether the HandlePromotionFailure setting value allows the guarantee to fail.
Summary
Memory recycling and garbage collectors are often one of the main factors affecting system performance and concurrency. The reason why the virtual machine provides a variety of different collectors and a large number of adjustable parameters is that only the optimal collection method can be selected according to the actual application requirements and implementation methods to achieve the highest performance. There is no fixed collector, parameter combination, nor an optimal tuning method, so the virtual machine does not have any inevitable memory recycling behavior.
The above is the summary notes I made while studying the book 'Deep Understanding of Java Virtual Machine'.
Reference book 'Deep Understanding of Java Virtual Machine'
Thank you for reading, I hope it can help everyone. Thank you for your support of our website!