Illuminating Unnoticed Efficiency Pitfalls with AtomicLongMap

Illuminating Unnoticed Efficiency Pitfalls with AtomicLongMap

Benchmark testing for AtomicLongMap

Why am I doing this benchmark test?

It’s important to verify if a particular change for a package, in this case Google Guava performs in an optimized manner within a multi-threaded environment and doesn’t trigger a thread contention state. Well, I came across one such change in the implementation for AtomicLongMap which suffers from thread contention with the new releases.

Which method within AtomicLongMap is suffering from thread contention?

The underlying method – addAndGet(K key, int delta)

When was a new implementation introduced for addAndGet method in Google Guava?

Google Guava – v21.0

Why was this change introduced?

According to the release notes:

AtomicLongMap: added a number of methods such as accumulateAndGet(K, LongBinaryOperator) that take advantage of new Java functional types.

How did I notice it?

I bumped my java package to use GoogleGuava – v32.0.0 from GoogleGuava – v20.0 and my package blew up on load-testing. It took me a bit of digging through profiling and monitoring and was able to pinpoint that the trigger for the issue is none other than AtomicLongMap! This sparked the curiousity into understanding why did a new release start suffering from thread contention and that’s the reason for the benchmark testing.

Enough story time, let’s dig into the implementation difference


via Giphy

Google Guava 20.0 Implementation Flow [Psuedo]

Link to code: AtomicLongMap.java

addAndGet(K key, int delta) > atomic = map.putIfAbsent(key, delta)
> if atomic.get() == 0 :
map.replace(atomic.get(), atomic, delta)
return delta
> else:
val = atomic.get()
atomic.compareAndSet(val, val + delta)
return val + delta

Google Guava 32.0.0 Implementation Flow [Psuedo]

Note: Upon back-tracking, the below implementation flow was introduced in Google Guava v21.0 so this degradation is existing since v21.0 : https://github.com/google/guava/wiki/Release21 however the benchmark testing is carried out using Google Guava v32.0.0

Link to code: AtomicLongMap.java

addAndGet(K key, int delta) > accumulateAndGet(key, delta, Long::sum as aFunc) >
updateAndGet(key, o -> aFunc.applyAsLong(o, delta) as uFunc) >
map.compute(key, (k, x) -> uFunc.applyAsLong(x == null ? 0 : x.longValue()))

What’s the possible difference in their behavior within a multi-threaded environment?

It is use of the specific underlying method of ConcurrentMap that is being utilised and the impact in their performance is majorly driven by how well it behaves in highly contended environment.

In case of Google Guava 20.0, putIfAbsent is utilised and when using putIfAbsent, the operation succeeds only if the key is not already present in the map. If the key is present, the operation does not perform any update and returns the existing value associated with the key. This behavior helps reduce contention because it avoids the need for complex coordination between threads when updating values. This is followed by using compareAndSet which is another atomic operation which may offer better performance in a multi-threaded environment compared to the compute method, especially when you’re dealing with atomic updates on a single shared variable. This is because compareAndSet operates directly on a single atomic variable and doesn’t involve the overhead of interacting with a concurrent data structure like a map.

In case of Google Guava 32.0.0, compute method is utilized and using compute method
with an accumulator function involves a more complex operation. It requires reading the current value associated with the key, applying the accumulatorfunction to compute a new value based on the old value (if any) and the input,
and then updating the map with the new value. In highly contended scenarios, multiple threads attempting to perform such operations concurrently may lead to increased contention and synchronization overhead, potentially affecting performance.

LET’S GET INTO TESTING !!


via Giphy

Note: Setup of the testing is present in the End of the blog

Case 1 : NumThreads = 2

Who won the battle? Google Guava v20.0

What’s the difference in performance? 78.6%

Version = 20.0

Trial
Time to Execute (s)

1
14

2
14

3
14

4
14

5
14

Average
14

Version = 32.0.0

Trial
Time to Execute (s)

1
25

2
25

3
25

4
25

5
25

Average
25

Case 2 : NumThreads = 10

Who won the battle? Google Guava v20.0

What’s the difference in performance? 73.2%

Version = 20.0

Trial
Time to Execute (s)

1
73

2
73

3
73

4
74

5
73

Average
73.2

Version = 32.0.0

Trial
Time to Execute (s)

1
126

2
126

3
127

4
127

5
128

Average
126.8

Case 3 : NumThreads = 32

Who won the battle? Google Guava v20.0

What’s the difference in performance? 72.5%

Version = 20.0

Trial
Time to Execute (s)

1
236

2
235

3
234

4
234

5
235

Average
234.8

Version = 32.0.0

Trial
Time to Execute (s)

1
403

2
405

3
401

4
410

5
406

Average
405

Case 4 : NumThreads = 64

Who won the battle? Google Guava v20.0

What’s the difference in performance? 70.7%

Version = 20.0

Trial
Time to Execute (s)

1
474

2
474

3
473

4
473

5
472

Average
473.2

Version = 32.0.0

Trial
Time to Execute (s)

1
804

2
811

3
809

4
809

5
805

Average
807.6

Conclusion


via Giphy

Google Guava v20.0 has a simplified implementation which performs 70% better than the later releases for AtomicLongMap.java. Will love to hear other people’s opinion on the same and why do you think the new implementation of AtomicLongMap is suffering from thread contention!

Setup of the test

It is a simple benchmark test to test the performance of the 2 GoogleGuava releases addAndGet() method defined within the AtomicLongMap class in a multi-threaded environment

Main.java

import java.time.Clock;
import main.java.com.benchmark.Benchmark;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {
private static Clock clock = Clock.systemDefaultZone();
private static Benchmark benchmark = new Benchmark();
private static final long NUM_CALLS = 1000000000;
private static final int NUM_THREADS = 1;

public static void main(String[] args) {
long current = clock.millis();
System.out.println(“Current time : “ + current);

ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
IncrementTask task = new IncrementTask(benchmark, NUM_CALLS);

for (int threadPool = 0; threadPool < NUM_THREADS * 2; ++threadPool) {
executor.submit(task);
}

executor.shutdown();

try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}

long completed = clock.millis();
long difference = completed current;
System.out.println(“Completed time: “ + completed);
System.out.println(“Time difference for Google Guava: “ + difference/1000 + ” seconds.”);
}
}

class IncrementTask implements Runnable {

private Benchmark benchmark;
private long numCalls;

public IncrementTask(Benchmark benchmark, long numCalls) {
this.benchmark = benchmark;
this.numCalls = numCalls;
}

@Override
public void run() {
long threadId = Thread.currentThread().getId();
String threadName = Thread.currentThread().getName();
System.out.println(“Task executed by Thread “ + threadId + ” (“ + threadName + “)”);

synchronized (benchmark) {
for (int trial = 0; trial < numCalls ; ++trial) {
benchmark.increment(1);
}
}
}
}

Benchmark.java

public class Benchmark {

private final AtomicLongMap<String> counters = AtomicLongMap.create();
private final String KEY = “Key”;

public void increment(int delta) {
counters.addAndGet(KEY, delta);
}
}

Leave a Reply

Your email address will not be published. Required fields are marked *