Description
TL;DR:
- Add owned-locks array to
java.lang.Thread
- Implement new locking procedure in
java.lang.Thread
(phase 1, fast lock/heap lock, no nursery) - Implement new AQS-based
Monitor
class injava.lang
without condition - Implement new AQS-based
Condition
class injava.lang
- Implement nursery GC support (TBD, see below for rough ideas)
- Revise new locking procedure (phase 2, nursery support and GC-based monitor location)
- Drop monitor field from
java.lang.Object
- Drop monitor class from qbicc runtime API
Long version:
Right now monitors are implemented using a single lazily-allocated, ReentrantLock
-based monitor class. This approach is simple and functional but has several drawbacks:
- The
monitor
field is an oop-sized field that always exists on every object, even if that object's monitor is never used - When a lock is taken, even if it is uncontended, the full monitor object must be allocated
- The lock includes a condition object, even if wait/notify is never used
- The monitor uses a minimum of three objects: the monitor itself, a
ReentrantLock
instance, and aCondition
instance
Ideally, we would want an implementation with the following characteristics:
- Objects which are never locked should not have measurable any overhead at all (no heap objects, no extra fields)
- Objects which are locked but can be statically shown to be uncontendable should have their corresponding monitor operations deleted or replaced with functional equivalent lockless operations at compile time (specifically,
wait
could be converted into an equivalent sleep operation) - Objects which are locked without contention should have minimal overhead (e.g. a one-bit lock)
- Objects which are contended for should only allocate the necessary objects for managing waiters on the lock, and only once contention actually happens (i.e. no lock object until a second locker arrives, and no condition object until/unless it is required)
- Avoid the overhead of a field or fields on an object pointing to its monitor and/or condition even when one exists
- Avoid the overhead of additional indirection to lock and/or condition sub-objects
- If a nursery is used, have a low-overhead mechanism to deregister nursery objects when their corresponding heap objects are GC'd or the nursery object is moved out of the nursery
- Initial heap objects which are locked but can somehow be statically shown to be contended could be preinitialized with a monitor (and optionally a condition)
Many of these characteristics are not possible to achieve under our current architecture, but we can take some steps in this direction.
I propose a design where we do the following:
- Drop the current monitor implementation and field
- Introduce a lock nursery, essentially a
ConcurrentReferenceHashMap
from reference-to-Object to monitor object instance - Introduce a new
Reference
subtype,NurseryReference
, which is weaker thanWeakReference
orFinalReference
but stronger thanPhantomReference
(i.e. can be read by GC code) and which is also queued when an object no longer needs to be tracked in the nursery, for use as the key reference type in the nursery map (note that this is contingent on implementingReference
support in GC, which is not yet done at the time of this writing) - Organize fully-inflated locks such that the lock object always appears directly after the object that owns it in the parseable heap
- Add two bits to the object header, reflecting the following states:
00
- object is not locked and has no monitor01
- object is "fast-locked", no waiters are contending for the lock10
- object has an inflated monitor in the nursery, lock state is there11
- object has an inflated monitor on the heap consecutively after this object itself
- Introduce an embedded array field of small size (5 to 10 slots) on
Thread
to act as a set or list of currently-owned fast-locked monitors, to facilitateThread.holdsLock
for objects which are fast-locked
The fast lock algorithm is contingent on identifying matching monitor enter-exit pairs, and would be something like this:
- On enter,
- If the thread's owned-lock array is full, skip fast locking and go directly to inflation
- If the lock state is
00
(unlocked)- CAS it to
01
- Record the object in the embedded owned-lock array
- Return
true
(monitor needs to be released i.e. this is the outermost owner)
- CAS it to
- Else if the lock state is
01
(fast-locked)- If the object is present in the embedded owned-lock array, return
false
(monitor does not need to be released)
- If the object is present in the embedded owned-lock array, return
- Else go to inflated lock procedure
- On exit,
- If the fast-lock attempt returned
true
- Remove the object from the embedded owned-lock array
- If the lock state is
01
(still fast-locked)- CAS it to
00
- CAS it to
- Else if the lock state is
10
or11
(lock was inflated)- Go to inflated unlock procedure
- If the fast-lock attempt returned
The lock inflation procedure would be something like this:
- Construct a new lock instance, and add it to the nursery
- The owner of the lock is
null
unless the corresponding object appears in the calling thread's owned-lock array, in which case we can set the owner
- The owner of the lock is
- CAS the lock state to
10
- If the CAS fails, clean up the mess appropriately
- If we set the owner of the lock to our own thread, remove the object from the owned-lock array
The inflated unlock procedure would be like this:
- If the owner of the lock is
null
, it was inflated while we held the lock (verify assertion by checking the owned-lock array)- After unlock, remove the object from the owned-lock array
- Perform the normal
unlock()
call on the monitor object
The inflated monitor itself would be a subclass of AbstractQueuedSynchronizer
with only simple lock/unlock and ownership operations.
This process does not take into account condition objects. In order to facilitate conditions, we would extend this process to account for a second object: the Condition
object.
A Condition
object would normally not exist for an object, until/unless it performs a condition operation. Once a valid condition operation (i.e. one which happens while the corresponding lock is held) takes place, we would then need to inflate a condition object to mediate the operation. In order to track the condition's inflation without needing extra bits on the source object, we can use the same two lock bits on the inflated lock's instance (which itself may not have a monitor, leaving these bits free to use). The bits would be defined as follows:
00
- No condition exists10
- The condition is inflated in the nursery11
- The condition is inflated on-heap directly after the lock object
The following would be done:
- The lock would be inflated if it is not already inflated
- The condition would be inflated using an equivalent process
GC
The procedure to move objects out of the nursery could work like this:
- When marking an object, if it has an on-heap monitor, also mark the monitor. If there is an on-heap condition, also mark the condition.
- When copying an object to the destination space, if the nursery monitor is also marked, and there is sufficient room in the destination space, then preemptively move the monitor object to the destination space directly after its host object. Queue and clear the nursery reference for reference queue processing.
- Follow a similar process for conditions.
Note that this procedure relies on being able to read (but not write) the nursery map during safepoint/GC, which is itself a tricky detail.
Deflation
With a system like this, it is theoretically possible to track the object's use of monitors somehow (perhaps by keeping a counter on the monitor instance itself which is incremented on GC and cleared on usage) such that if a monitor is unused (perhaps for a certain number of GC cycles, or for some time duration), it can be discarded and the lock state of the host object can be set back to 00
. Likewise if a condition object is unused for a certain length of time, it too could be discarded in a similar way.