(post from the series “I have cloned my hotspot source code, let's look at them together”)Everyone who faces multithreaded problems (be it performance or incomprehensible heisenbags), inevitably encounters in the process of solving problems like inflation, contention, membar, biased locking, thread parking and the like. But does everyone really know what is behind these terms? Unfortunately, as practice shows,
not all .
Hoping to correct the situation, I decided to write a series of articles on this topic. Each of them will be built on the principle of
"first briefly describe what should happen in theory, and then go to the source and see how it happens there .
" Thus, the first part is largely applicable not only to Java, and therefore developers for other platforms may find something useful for themselves.
Before reading in-depth descriptions, it is useful to make sure that you are sufficiently familiar with the Java Memory Model. You can study it, for example, on the
slides of Sergei
Walrus Kuksenko or on my
early topic . This
presentation is also a great material, starting with slide # 38.
Theoretical minimum
As you know, every object in java has its own
monitor , and therefore, unlike the same C ++, there is no need to guard access to the objects by separate mutexes. To achieve the effects of mutual exclusion and synchronization of threads use the following operations:
- monitorenter : capture monitor. Only one stream can own a monitor at a time. If the monitor is busy at the time of the capture attempt, the stream attempting to capture it will wait until it is free. At the same time, there may be several threads in the queue.
- monitorexit : monitor release
- wait : move the current thread to the so-called monitor wait set and wait for
notify
occur. Exiting the wait
method may also be false. After the thread owning the monitor has made a wait
, any other stream can take over the monitor. - notify (all) : one (or all) threads are awakened, which are now in the monitor's wait set. To get control, the awakened stream must successfully capture the monitor (monitorenter)
')
Before continuing, we define an important concept:
contention - a situation when several entities simultaneously try to own the same resource that is intended for exclusive use
How contention to owning a monitor really depends on how it is captured. The monitor may be in the following states:
- init : the monitor has just been created, and so far no one has been captured
- biased : (smart optimization, which did not appear immediately) The monitor is “reserved” for the first stream that captured it. In the future, to capture this stream does not need expensive operations, and the capture occurs very quickly. When a capture tries to produce another stream, either the monitor is re-backed up for it ( rebias ), or the monitor changes to the thin ( revoke bias ) state. There are also additional optimizations that act immediately on all instances of the class of an object whose monitor is trying to capture ( bulk revoke / rebias )
- thin : the monitor is trying to capture multiple streams, but there is no contention (i.e., they capture it not simultaneously, or with very little overlap) . In this case, the capture is performed using a relatively cheap CAS . If contention occurs, the monitor changes to the inflated state.
- fat / inflated : synchronization is performed at the operating system level. The stream parks and sleeps until it is its turn to capture the monitor. Even if we forget about the cost of changing the context, then when the flow receives control, it also depends on the system scheduler, and therefore the time can pass significantly more than we would like. When contention disappears, the monitor may return to the thin state.
This is where abstract reasoning ends, and we dive into how it is implemented in hotspot.
Object Headers
Inside a virtual machine, object headers generally contain two fields: the
mark word and a pointer to the class of the object. In particular cases, there may be something added: for example, the length of the array. These headers are stored in the so-called
oop -
ohs (Ordinary Object Pointer), and you can look at their structure in the file
hotspot/src/share/vm/oops/oop.hpp
. We will study in more detail what the mark word is, which is described in the
markOop.hpp
file located in the same folder.
(Do not pay attention to the inheritance from oopDesc
: it is only for historical reasons) In an amicable way, it should be carefully read, paying attention to detailed comments, but for those who are not very interested, below is a brief description of what is in this mark Word contains and in what cases. You can still see
this presentation here from the 90th slide.
Content mark words
condition | Tag | Content |
unlocked thin | 01 | Identity hashcode | age | 0 |
locked, thin, unbiased | 00 | mark word |
inflated | ten | |
biased | 01 | id - | epoch | age | 1 |
marked for GC | eleven | |
Here we see several new values. First,
identity hash code is the object hash code that is returned when
System.identityHashCode
called. Secondly,
age - how many garbage assemblies survived the object. And there is also
epoch , which indicates the number of bulk revocation or bulk rebias for the class of this object. To why this is necessary, we will come later.
Have you noticed that in the case of biased there was not enough space at the same time for identity hash code and threadID + epoch? And this is true, and from here there is an interesting consequence: in hotspot, a call to System.identityHashCode
will result in a revoke bias object.
Further, when the monitor is busy,
a mark is stored in the mark word to the place where the real mark word is stored . In the stack of each thread there are several "sections" in which different things are stored. We are interested in the one where the lock record is stored. This is where we copy the mark word of the object for lightweight locking. Because, by the way, thin-locked objects are called
stack locked . A swollen monitor can be stored both on the thread that inflated it and in the global pool of thick monitors.
It's time to go to the code.
A simple example of using synchronized
Let's start with this class:
1 2 3 4 5 6 7 | public class SynchronizedSample { void doSomething() { synchronized(this) { |
and see what it compiles into:
javac SynchronizedSample.java && javap -c SynchronizedSample
I will not give a full listing, but I will only manage with the body of the
doSomething
method, providing it with comments.
void doSomething(); Code: 0: aload_0
Here we are interested in the instructions
monitorenter
and
monitorexit
. You can, of course, search for what they do in the
Yandex search system at your discretion, but this is fraught with misinformation, and not in a patzanski way. In addition, I have the OpenJDK sources at hand,
which you can have fun with at all . In these source codes it is quite easy to see what happens to the byte code in the interpretation mode. There is only one caveat: Lesha
TheShade Shipilev
said that
In general, the VM helper code for some action may differ in content from that pasted by JIT. Up to the point that some optimizations by JIT can simply not be ported to the interpreter.
Lesha also recommended PrintAssembly in the teeth and look at the compiled and JIT-compiled code right away, but I decided to start from a path of less resistance, and then see how it
really is.monitorenter
The interpreter sources are in the
hotspot/src/share/vm/interpreter
daddy, and there are a lot of them. Rereading everything at this stage is not very appropriate, therefore with the help of
grep
we will find places in which, probably, what we need happens. First of all, it’s worth looking at the ads in
bytecodes.hpp
and
bytecodes.cpp
:
./bytecodes.hpp:235: _monitorenter = 194, // 0xc2 ./bytecodes.cpp:489: def(_monitorenter, "monitorenter", "b", NULL, T_VOID, -1, true);
As you can easily guess, the human enum constant for bytecode
.hpp
is defined in
.hpp
, and this operation is registered in the
.cpp
using the def method. Talking about it separately in this article makes no sense: it is enough to clarify that the monitorenter command is
monitorenter
, which is a single byte-code without parameters (
b
), returns nothing, pulls one value from the stack and can provoke a lock or a safepoint call ( about the latter later).
The following is of interest file
bytecodeInterpreter.cpp
. It has a wonderful method
BytecodeInterpreter::run(interpreterState istate)
, which takes only about 2,200 lines, and generally rotates in a loop until the body of the method being processed ends. (In fact, another large piece deals with other useful things such as method initialization, locking, if the method is
synchronized
, and so on). Finally, starting at line
1667
, what happens when a
monitorenter
operation is
monitorenter
. First of all, there is a free monitor on the stream stack (if there are none, then it is requested from the interpreter using
istate->set_msg(more_monitors)
), and an unlocked copy of the mark word is placed there. After that, using CAS, we try to write a pointer to this copy in the mark word of the object, which is called the
displaced header .
CAS - Compare-and-swap - atomically compares *dest
and compare_value
, and if they are equal, *dest
and exchange_value
places. The initial value *dest
returned. (This is guaranteed bilateral membar, but about them in the next article)
If CAS is a success, then the victory (and with it the monitor) is ours, and this can be completed (the tag is contained in the pointer to the displaced header itself - optimization). If not, then go ahead, but first pay attention to an important point:
we didn’t check if this monitor wasn’t biased . Remembering the warnings of Leshina, we understand that we encountered optimization, which did not reach the interpreter. By the way, when processing
synchronized
methods everything is checked normally, but it will be a little later.
If CAS did not succeed, then we check whether we are already owners of the monitor (recursive capture); and if so, then success is with us again, the only thing we are doing is writing to the displaced header on our
NULL
stack (we will see later why this is necessary). Otherwise, we make the following call:
CALL_VM(InterpreterRuntime::monitorenter(THREAD, entry), handle_exception)
The
CALL_VM
macro
CALL_VM
all sorts of technical operations such as creating frames and performs the transferred function. In our case, this function is
InterpreterRuntime::monitorenter
, which is located in the new
interpreterRuntime.cpp
file. The
monitorenter
method, depending on whether
UseBiasedLocking
is
UseBiasedLocking
, calls either
ObjectSynchronizer::fast_enter
or
ObjectSynchronizer::slow_enter
. Let's start with the first.
fast_enter
First a couple of words about the feasibility of such optimization. Some Tokyo scientists from IBM Research Labs calculated statistics by some unknown method and found that in most cases synchronization is uncontended. Moreover, an even stronger statement was proposed: the monitors of most objects throughout their life are captured by only one stream. And so the idea of biased locking was born: the one who stood up first, the one and the slippers. You can read more about biased locking, for example, on
these slides (although they are slightly out of date), and we will return to the hotspot sources. Now we are interested in the
src/share/vm/runtime/synchronizer.cpp
file, starting at line 169. First we should try to do a rebias for ourselves, and if it does not work out - do a revoke and go to the usual thin slow_enter. Optimistic attempts occur in the
BiasedLocking::revoke_and_rebias
, which is in the
biasedLocking.cpp
file. We describe them in more detail:
- Rebias is a flow change that reserves the right to quickly capture a monitor. It is permissible in simple cases: for example, when the monitor is actually still someone did not have time to capture ( anonymously biased ), or if bias has been preserved since the previous era (the number of bulk revocation for a particular class, see below), which is essentially similar the fact that the monitor is not over-bias-il. In addition, rebiasing can be banned completely by calling
fast_enter
( attempt_rebias = false
). - Revoke - disable biased locking for the monitor. Used in most cases, especially when
attempt_rebias
set to false
. If the monitor is now owned by another live stream, then rebias will require this stream to stop at safepoint , after which it will run over the stack of this stream and fix the monitor header stored there on unbiased. - Bulk rebias - rebias of all living and later allocated instances of this class. It is executed when the number of revocations or rebiasing operations for a given class during this epoch exceeds the
BiasedLockingBulkRebiasThreshold
. Provokes a change of epoch, runs at a global safepoint. - Bulk revoke - revoke all live and later instances of this class. It is executed when the number of revocations or rebiasing for this class during this era exceeds the
BiasedLockingBulkRevokeThreshold
. Provokes a change of epoch, runs at a global safepoint.
Let me remind those who
knew, but forgot tm , what is a safepoint:
safepoint is a state of the virtual machine in which the execution of threads is stopped in safe places. This allows for intrusive operations, such as revoke bias at the monitor, which the stream currently owns, deoptimizing or taking the thread dump.
In our case, the
attempt_rebias
parameter is always
true
, but sometimes it may turn out to be
false
: for example, in the case when the call comes from VM Thread.
As you can guess, bulk operations are tricky optimizations that make it easy to transfer a large number of objects between threads. If it were not for this optimization, it would be dangerous to include
UseBiasedLocking
by default, since then a large class of applications would always be engaged in revocations and rebiasing.
If the fast path to capture the stream failed (that is, revoke bias was made), we proceed to capture the thin-lock.
slow_enter
The method that interests us now is in the
src/share/vm/runtime/synchronizer.cpp
file. Here we have several options for the development of events.
- Case one, “good” : the object monitor is currently free. Then we try to take it with the help of CAS. CAS is already architecture-dependent (and, as a rule, natively supported by the processor) instruction, so there is no special meaning to go deeper. (Although Ruslan # let me know the cheremin and was outraged that I "did not get to the running of electrons in the sources-drains of MOS transistors") . If CAS succeeds, then victory: the monitor is captured. If it fails, we are filled with sadness and proceed to the third case.
- The second case is also “good” : the object monitor is not free, but is occupied by the same stream that is now trying to capture it. This is a recursive capture. In this case, for simplicity, we write to the displaced header in our NULL stack, because we previously captured this monitor and already have information about it in our stack.
- The third case, the “bad” inflate : So, all attempts to be cunning with the collapse failed, and we have nothing to do but to sulk, to show the vile user the language and fall off the segfault. Haha Joke. However, I didn’t just say the same about inflating: in the case when you have to act “honestly”, resorting to OS-level primitives, they say that monitor was inflated . In the code, this behavior is described in the
ObjectSynchronizer::inflate
, where we will not look particularly carefully: in fact, the method is thread safe and taking into account some technical subtleties sets the monitor a flag that it is bloated.
After inflating the monitor it is necessary to enter it. The ObjectMonitor::enter
method does just that, it uses every conceivable and inconceivable trick to avoid parking the stream. These tricks include, as you might have guessed, attempts to capture using a spin loop, using one-off CAS and other “free methods”. By the way, it seems I have found a slight discrepancy between comments and what is happening. once we try to enter the spin loop monitor, claiming that we do it only once:
352 353 354 355 356 357 358 359 360 361 362 363 | // Try one round of spinning *before* enqueueing Self // and before going through the awkward and expensive state // transitions. The following spin is strictly optional ... // Note that if we acquire the monitor from an initial spin // we forgo posting JVMTI events and firing DTRACE probes. if (Knob_SpinEarly && TrySpin (Self) > 0) { assert (_owner == Self , "invariant") ; assert (_recursions == 0 , "invariant") ; assert (((oop)(object()))->mark() == markOopDesc::encode(this), "invariant") ; Self->_Stalled = 0 ; return ; } |
But a little further, in the called enterI
method, enterI
do it again, again speaking of only once:
485 486 487 488 489 490 491 492 493 494 495 496 497 | |
Hmm, parking at the operating system level is so scary that we are ready for almost everything to avoid it. Let's see what is so terrible in her.
Parking flows in reverse
I must note that we have now come to the code that was written a long time ago, and this is noticeable. There are a lot of duplication, reengineering and other amenities. However, the presence of comments like “remove this crutch” and “combine these with that” is slightly reassuring.
So what is parking streams? Everyone probably heard that each monitor has a so-called Entry List (not to be confused with the Waitset ) So: he really does have it, although it is actually a queue. After all the failed attempts to enter the monitor cheaply, we add ourselves to this queue, after which we park:
591 592 593 594 595 596 597 598 599 600 601 | // park self if (_Responsible == Self || (SyncFlags & 1)) { TEVENT (Inflated enter - park TIMED) ; Self->_ParkEvent->park ((jlong) RecheckInterval) ; // Increase the RecheckInterval, but clamp the value. RecheckInterval *= 8 ; if (RecheckInterval > 1000) RecheckInterval = 1000 ; } else { TEVENT (Inflated enter - park UNTIMED) ; Self->_ParkEvent->park() ; } |
Before going directly to the parking lot, note that it may be timed or not timed, depending on whether the current thread is responsible. Responsible flows are always no more than one, and they are needed in order to avoid the so-called stranding: a sadness when the monitor is free, but all the threads in the wait set are still parked and waiting for a miracle. When there is a person in charge, he automatically wakes up from time to time (the more time a futile wakeup occurs , an awakening, after which the lock cannot be captured - the longer the parking time. Note that it does not exceed 1000 ms) and tries to enter the monitor. The rest of the streams can wait for awakening for even ages.
Now it is time to go to the very essence of parking. As you already understood, this is something semantically similar to wait/notify
familiar to every java-developer, but it happens at the operating system level. For example, in linux and bsd, as one would expect, POSIX threads are used, in which pthread_cond_timedwait
(or pthread_cond_wait
) are called to wait for the release of the monitor. These methods change the status of the Linux flow to WAITING and ask the system sheduler to wake them up when some event occurs (but not later than after a certain period of time, if the flow is responsible).
Thread scheduling at OS level
Well, it's time to get into the linux kernel and see how the scheduler works there. The linux sources are known to be in git, and you can clone the scheduler like this:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-rt.git
Let's go to the folder ... Okay, okay, just kidding. The features of a sheduler in Linux up to the lines in the source code are already too deep for a harmless article, which, let me remind, began with a simple synchronized
block :) In order to reduce hardcore, I will tell you how sheaders work in general; both in linux and in alternative systems. - , , : — , . — quantum — , , , . linux , 10-200 , 1 . windows , 2-15 , — 10 15 .
, , , . , , (, - I/O-), , - . - . , , , -, , .
, , , — . , , , , , , .
, , : , — , . : , , -, .
, , , . , contention, .
monitorexit
biased locking , -, . , displaced header NULL, . : biased lock
IllegalMonitorStateException
( -).
unbiased locking
InterpreterRuntime::monitorexit
. (, :
IllegalMonitorStateException
)
ObjectSynchronizer::slow_exit
, ,
fast_exit
.
, fast path . :
stack-locked ,
inflating inflated . : , , . , - .
, , - , . , , ,
TrySpin
(. ). , . , , , .
, , (
Knob_QMode
. , , ,
0
, .
, , , , , ), , . , , , . -
os::PlatformEvent::unpark()
, . , linux bsd
pthread_cond_signal
.
, , , .
NB: synchronized
-
java- :
synchronized void doSomething() {
- :
synchronized void doSomething(); Code: 0: return
bytecodeInterpreter.cpp
synchronized
- ,
767
.
if (METHOD->is_synchronized())
.
if
-, biased locking. ,
monitorenter
. , , , ( CAS) biased -.
, , synchronized.
Wait notify
synchronizer.cpp
, 377 . wait/notify inflated, , — inflate' .
wait
notify
.
wait set ( ) , ( , ; - notify).
Notify
wait set , , - , .
NotifyAll
, wait set .
Memory effects
, JMM, ,
happens-before . thin CAS-; inflated
OrderAccess::fence();
. biased, , : program order . revoke HB monitorexit, , ( thin, inflated), enter ( thin, inflated).
wait fence, HB.
. aka Disclaimer
, , . , JIT . . , , « hotspot» , .
Stay tuned
, ,
memory barriers ,
happens-before JMM.
volatile
, . final- ,
TheShade cheremin , (
). , ,
PrintAssembly , , JIT.
And one more thing ©
:
144f8a1a43cb jdk7u. , — ..
Biased locking ,
BiasedLockingStartupDelay
(4000 ). , , safepoints, revoke bias .
safepoint
ObjectSynchronizer::deflate_idle_monitors
, , .
TheShade ,
artyushov ,
cheremin AlexeyTokar , (|) , , - , .