Psychosomatic, Lobotomy, Saw: What do Atomic*::lazySet/Atomic*FieldUpdater::lazySet/Unsafe::putOrdered* actually mean?

Tuesday 20 December 2016

What do Atomic::lazySet/AtomicFieldUpdater::lazySet/Unsafe::putOrdered* actually mean?

Paved with well intended definitions it is.

lazySet/putOrdered (or an ordered store) was added as a bit of a rushed/non-commital afterthought after the JMM was done, so it's description is subject to many debates on the mailing lists, stack overflow and watercoolers the world over. This post merely tries to provide a clear definition with references to relevant/reputable sources.

Definition:

An ordered store is a weakened volatile store[2][5]. It prevents preceding stores and loads from being reordered with the store[1][3][6], but does not prevent subsequent stores and loads from being reordered with it[2][4].

If there was a JMM cookbook entry for ordered store defining it with barriers in mind it would seem that the consensus is that ordered stores are preceded by a StoreStore AND a LoadStore barrier[4][6].

Ordered store is practically the same as a C++ memory_release_store[5][7].

References:
[1] Original bug:

"lazySet provides a preceding store-store barrier (which is either a no-op or very cheap on current platforms), but no store-load barrier"
See here: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6275329

[2] java.util.concurrent docs:

"lazySet has the memory effects of writing (assigning) a volatile variable except that it permits reorderings with subsequent (but not previous) memory actions that do not themselves impose reordering constraints with ordinary non-volatile writes."
See here: https://docs.oracle.com/javase/8/docs/api/?java/util/concurrent/package-summary.html

[3] JMM cookbook: Defining barriers meaning here: http://g.oswego.edu/dl/jmm/cookbook.html

[4] concurrency-interest Q&A with Doug Lea, October 2011:

"[Ruslan]:... If it happens (== we see spin-wait loop finished) -- does it mean,that all writes preceding lazySet are also done, committed, and visible to thread 2, which finished spin-wait loop?

[Doug]: Yes, although technically, you cannot show this by reference to the Synchronization Order in the current JLS.
...
lazySet basically has the properties of a TSO store"
See here: http://cs.oswego.edu/pipermail/concurrency-interest/2011-October/008296.html

The discussion is REALLY worth reading, involving Hans, Doug, Vitaly, Ruslan and other such respectable members of this excellent mailing list. Go on, I'll wait.

The discussion on that thread concludes the following is required:
...
LoadStore + StoreStore
st [Y],X // store X into memory address Y
...

Outcome: Stores before and after are now prevented from floating across the barrier. Loads before the barrier are also prevented from floating down. Later loads are free to float up. Note that st may in theory be delayed indefinitely, certainly other loads and stores are allowed to float up between it and the barrier.

[5] concurrency-interest Q&A with Aleksey Shipilev, May 2016:

"putOrdered is a release in disguise, most of the C++11 std::atomic(...,

mem_order_release) reasoning applies here."

"acquire/release are the relaxations from the usual volatile

rules -- while producing happens-before-s, they drop from total

synchronization order, thus breaking sequential consistency."

And adds some fine examples:

"Safe publication still works:


                       int x; volatile int y;
-----------------------------------------------------------------------
    put(x, 1);                   |  r1 = get{Acquire|Volatile}(y);
    put{Release|Volatile}(y, 2); |  r2 = get(x);

(r1, r2) = (2, 0) is forbidden.

But anything trickier that requires sequential consistency fails. IRIW
fails, because no consistent write order observed by all threads. Dekker
fails, because release stores followed by loads may or may not be
visible in program order:

                     volatile int x; volatile int y;
-----------------------------------------------------------------------
    putRelease(x, 1);            |    putRelease(y, 1);
    r1 = getAcquire(y);          |    r2 = getAcquire(x);

(r1, r2) = (0, 0) is allowed. Forbidden if all ops are volatile.


Safe construction still does not work (even for volatiles!):

                                A global;
-----------------------------------------------------------------------
    A a = <alloc>;                  |  A a = global;
    put{Release|Volatile}(a.x, 1);  |  r1 = get{Acquire|Volatile}(a.x);
    global = a;                     |

(r1) = (0) is allowed."

See here: http://cs.oswego.edu/pipermail/concurrency-interest/2016-May/015104.html

[6] concurrency-interest Q&A with Aleksey Shipilev, March 2016:

"> int a, b;

> 
> boolean tryLock() {
>     UNSAFE.putOrdered(a, 1); // Declare intent.
> 
>     // No StoreLoad here as store is not volatile.
> 
>     if (UNSAFE.getVolatile(b) == 1)) {
>         // Reset intent and return false;
>     }
> 
>     return true;
> }

Even in the naive barrier interpretation that usually gives stronger
answers, you have:

 [LoadStore|StoreStore]
 a = 1;

 r1 = b;

 [LoadLoad|LoadStore]"

See here: http://cs.oswego.edu/pipermail/concurrency-interest/2016-March/015037.html

[7] C++ memory_order_release definition:

"A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below)."

See here: http://en.cppreference.com/w/cpp/atomic/memory_order

Many thanks to A. Shipilev, M. Thompson, D. Lawrie and C. Ruslan  for reviewing, any remaining errors are their own and they shall be most severely reprimanded for them.

9 comments:

Ruslan Cheremin20 Dec 2016, 09:45:00
>Note that st may in theory be delayed indefinitely, certainly other loads and stores are allowed to float up between it and the barrier.

This also true for ordinary volatile stores: they can be delayed infinitely (in theory).

I was always found confusing why lazySet javadocs state it is "eventually visible" store -- while it seems no more "eventually" than any other plain/volatile store.
ReplyDelete
Replies
forked_franz21 Dec 2016, 11:05:00
Now all the comments on JCtools with something like this:

soElement(buffer, offset, e); // StoreStore

need to be updated to:

soElement(buffer, offset, e); // LoadStore + StoreStore

:P

With no jokes, that could mean that when you have something like:

if(getAndAdd(x,1)==0){
mutuallyExclusiveOperation();
putOrdered(x,0);
}

it is valid and effectively prevents the mutuallyExclusiveOperation instructions to float outside the "guarded" block?

ReplyDelete
Replies
eugene22 Dec 2016, 14:31:00
This has puzzled me for quite a while too, and while your article(s) is/are very good, I still have a few questions if I may.

Strictly speaking about x86 : the usual volatile store will emit an mfence, that basically says "flush the Load Buffer to L1" before this barrier. About the re-orderings x86 is already a strict memory model, so when reading a volatile variable, there's no need for a lfence.

lazySet on the other hand does not flush the Load Buffer, all it does is prevent re-orderings, as you call it a weaker volatile store (which I like a lot). So it does not trigger the flush, someone else has to, but it ensures the order of the writes (I assume this is what make it such a good tool in the SingleWriter principle).

My question is actually simple : do i make any sense? :) thank you

ReplyDelete
Replies
Kannan Makino12 Feb 2018, 11:54:00
Thank You for Great Informative article. Keep it Up.
ReplyDelete
Replies

Add comment

Note: only a member of this blog may post a comment.