Understanding Java Virtual Threads | by The Bored Dev | Nov, 2022

A discussion of how it addresses the problem of Asynchronous Programming

Photo by Shubham Dhage on Unsplash

In the JDK 19 release, we can find the first preview of something JDK developers have worked on for a long time, Project Loom. This first preview is available as part of JEP 425, and it will allow the creation of “virtual threads.” As it is still a preview, you will have to enable previews when you compile your program in Java 19. You can check how to enable preview features in Java in our article “How to enable previews in Java.”

More recently, the second preview of virtual threads has been released as part of JEP 436. Some of the changes introduced in the first preview have been made final; we are one step closer to gaining full access to virtual threads.

In this article, we will try to give you a solid background about why Java virtual threads are very needed in the JVM ecosystem and provide you with the basics to understand Java virtual threads.

Photo by Pan Yunbo on Unsplash

Parity between OS threads and platform threads

Currently, in the JDK, a one-to-one relationship exists between Java threads (also called “platform” threads) and OS threads.

This means that when a thread is waiting to complete an IO operation, the underlying OS thread will remain blocked and unused until this operation completes. This has always been a big problem in terms of scalability in the Java ecosystem because the available threads limit our application in the host.

In the last decade, we have tried to address this problem using asynchronous processing libraries and using futures. For example, using CompletableFuture we can achieve a non-blocking approach, although the readability of these models is, in many cases, not what we would expect. If you are interested in looking at some CompletableFuture examples, you can read our article “Multiple API calls with CompletableFuture.”

Although async programming is a viable solution for the threads limitation, writing asynchronous code is more complex than sequential. The developer has to define callbacks to apply operations based on the response to a given task, making it difficult to follow and reason about the code.

Another big issue is that debugging these applications becomes hard. Multiple threads can handle a given request, therefore, debugging, logging, or analysing stack traces gets difficult.

In terms of flexibility and maintainability, asynchronous programming is very limited too. We have to renounce certain sequential workflow structures like loops. This means that a piece of code that has been written as sequential cannot be transformed to asynchronous easily. The same happens in the opposite case.

Lastly, but not less important, writing explicit asynchronous code is more error-prone due to the complexities that come with it.

Another problem with platform threads is that they are heavy objects which are expensive to create. Therefore, we need to create them beforehand and store them in a pool of threads to avoid creating new threads every time that we need a thread to run our code. Why are they expensive?

The creation of Java threads is expensive because it involves allocating memory for the thread, initialising the thread stack, and also making OS calls to register the OS thread.

When we consider both problems, the limit in OS threads and how expensive it is to create platform threads, we need bounded pools of threads to run our applications safely. If we don’t use a bounded pool of threads, we risk running out of resources with dramatic consequences for our system.

The other problem with this design is how expensive context switching is. When there is a context switch, an OS thread switches from one platform thread to another. The OS has to save the current thread’s local data and memory pointers and load the ones for the new platform thread. Context switching is a very expensive operation as it involves many CPU cycles. The OS pauses a thread, saves its stack, and allocates the new thread. This process is costly as it requires loading and unloading the thread stack.

So, how do we solve these problems? This is where Project Loom and its virtual threads come into play. Let’s see how!

Virtual threads in Java get their name from an analogy with virtual memory. This is because we are provided with an illusion of having an almost unlimited number of threads available (figuratively speaking), in a similar way to what virtual memory does.

Photo by Armand Khoury on Unsplash

Virtual threads resolve one of the main problems with scalability in the JDK, but how does it resolve? The answer is, mainly, by breaking the association between a platform thread and an OS thread.

Many applications in the JVM ecosystem break way before reaching their CPU or memory limits, mainly due to this parity between platform threads and OS threads. The creation of platform threads is very expensive. Therefore, there’s a need to use thread pools. We are always limited by the number of processing units (CPUs) available in our host.

On the other hand, a virtual thread contributes minimal overhead to the system. Therefore, we can have thousands of them in our application. Every virtual thread requires an OS thread to do some work, but it does not hold the OS thread while waiting for resources. This means that virtual threads can wait for IO, free the platform thread they are currently using so another virtual thread can use it to do some work, and resume their work later on when the IO operation gets done.

What is the main advantage of this? One of the answers is a cheap context switching! Let’s see why!

As we mentioned earlier, context switching is very expensive in Java due to having to save and load thread stacks every time it happens.

The difference with virtual threads is that, due to them being under the control of the JVM, the thread stack is stored in the heap memory and not in the stack. This means that allocating the thread stack for an awakened virtual thread becomes much cheaper. The process of loading a virtual thread’s data stack into the “carrier” thread stack when it gets assigned to its carrier is called mounting. The opposite process is called unmounting.

Let’s take a look a brief look at thread scheduling now.

The operating system schedules traditional platform threads, whereas the JDK runtime schedules virtual threads.

In the case of platform threads, the OS scheduler is responsible for assigning work to each OS thread. The way it does it is by assigning time slices to each process. When this time is up, it is another process’ turn to get CPU time to do some work. This is how the OS tries to ensure a fair(ish) distribution of CPU time among the existing processes.

On the other hand, virtual threads are directly scheduled by the JDK runtime. The way it’s been implemented is by using a ForkJoinPool internally. This is a dedicated pool used as a virtual thread scheduler. Meaning that the common pool returned by ForkJoinPool.commonPool is a different instance from this new one.

Image credit: Author

The JDK scheduler does not use time slices. In this case, the virtual thread yields and renounces to its carrier thread when waiting for a blocking operation response. The main consequence is that we will have a much better resource utilisation, therefore, an increased throughput in our application.

It’s worth mentioning that the underlying platform threads, also called carrier threads in terms of scheduling, are still managed by the OS scheduler. They’re now a layer of abstraction completely invisible to the developer writing concurrent code.

Another aspect to consider here is that virtual threads provide a false sensation of executing work in parallel. What is happening is that processing units’ time gets distributed more efficiently. Each processing unit won’t be doing any work in parallel, just switching from one virtual thread to another much more frequently and cheaply.

Now that we have seen how virtual threads work, there is one question we could be having now. Will every application benefit from the introduction of virtual threads? Not really. Let’s see why that is.

Not every application will benefit from a big performance improvement after adopting virtual threads. We will only observe a huge benefit when our application is IO-bound.

What does it mean? IO-bound applications spend considerable time waiting for the response of IO operations such as network calls, files, or database access. These are the majority of applications nowadays.

The benefit of using virtual threads is huge in IO-bound applications because virtual threads are very good at waiting, meaning that a thread can wait and resume in a very cheap manner in terms of performance.

In this situation, the virtual thread blocks while it waits, but the platform thread won’t. The platform thread will be assigned to a different virtual thread to continue doing useful work instead of waiting. This means we will have a much better resource utilisation in our system!

In the example shown below, we have two platform threads mapped to a corresponding OS thread in our operating system. We can see how platform threads are always occupied doing some work instead of waiting for the completion of IO.

Every time that a virtual thread waits for IO, it yields to free its carrier thread. Once the IO operation gets completed, the virtual thread is put back into the FIFO queue of the ForkJoinPool and will wait until a carrier thread is available.

Image Credit: Author

This also means that we can considerably increase our applications’ throughput. Virtual threads achieve this by increasing the number of tasks we can process concurrently, not by reducing latency. To make it clear, virtual threads are not faster than platform threads. They are just more efficient in how they wait and distribute the work.

For CPU-bound applications, we have other tools, like parallel tasks or work stealing in ForkJoinPool, to improve its performance. Virtual threads will have a minimal impact on their performance. Please remember the difference between the two to avoid getting unexpected results!

What other benefits do virtual threads bring to our applications? There is a very important one. We can now write non-blocking concurrent code in a synchronous manner.

With the introduction of virtual threads in Java, writing concurrent code gets simplified enormously. Our code becomes easier to read and understand; this is one of the big problems of asynchronous programming nowadays, and its complexity can sometimes get out of hand.

We could now write concurrent code without orchestrating the interactions that could happen asynchronously. The JDK runtime will deal with it for us, assigning the available OS threads among the existing virtual threads.

What happens if we maintain an old application using the traditional concurrency mechanisms available in Java?

There’s good news if you were wondering what would happen to your code after migrating to Java 19 if it uses synchronized blocks or any of the traditional concurrency mechanisms. The old concurrent code will work with virtual threads without having to modify it at all. You probably won’t even need to recompile in many cases and build new artefacts, because the JDK runtime deals with all this. In other cases, the required changes to take full advantage of virtual threads will be minimal.

Let’s look at what the JDK API looks like now!

There’s been a few changes in the JDK proposed under JEP 425. We will see that it’s quite straightforward in terms of how to write code to take advantage of virtual threads.

You can write code in the same way that you always do. Virtual threads are a built-in feature in the JDK. Therefore, you don’t need to do much to take advantage of it.

One of the good things is that virtual threads extend from the Thread class. There is no need for a new thread class object.

The only changes are in how we define whether a thread we create represents a virtual or platform thread. To achieve this, the JDK brings a Thread.Builder to be able to instantiate and configure both easily.

Thread.Builder provides two methods to instantiate threads. One of them creates a traditional platform thread by using Thread.Builder.ofPlatform() method. To instantiate a virtual thread, we’ll have to use Thread.Builder.ofVirtual() instead.

Another change is the inclusion of a new ExecutorService. This new executor service can be instantiated by running the Executors.newVirtualThreadPerTaskExecutor() method.

Let’s see how it works with a couple of examples!

This new method easily transitions from existing concurrent code to virtual threads. Let’s see how in the following example:

final LongAdder adder = new LongAdder();
Runnable task = () ->
try
Thread.sleep(10);
System.out.println("I'm running in thread " + Thread.currentThread());
adder.increment();
catch (InterruptedException e)
Thread.interrupted();

;
long start = System.nanoTime();
try (ExecutorService executorService = Executors.newCachedThreadPool())
IntStream.range(1, 10000)
.forEach(number -> executorService.submit(task));

long end = System.nanoTime();
System.out.println("Completed " + adder.intValue() + " tasks in " + (end - start)/1000000 + "ms");

You can see how the example above uses a cached thread pool to submit 10,000 tasks, simulating a small IO operation that takes 10ms plus the time taken to print to the console and increment a counter.

If we run this code, we get the following results:

...
I'm running in thread Thread[#1271,pool-1-thread-1242,5,main]
I'm running in thread Thread[#1260,pool-1-thread-1231,5,main]
I'm running in thread Thread[#928,pool-1-thread-899,5,main]
I'm running in thread Thread[#275,pool-1-thread-246,5,main]
Completed 9999 tasks in 4740ms

I’ve only included the latest elements and the final result for brevity. You can see how we use platform threads from a cached thread pool. It takes 4.7 seconds to run something as simple as this.

Let’s see what happens when we use the new virtual thread executor:

long start = System.nanoTime();
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor())
IntStream.range(1, 10000)
.forEach(number -> executor.submit(task));

long end = System.nanoTime();
System.out.println("Completed " + adder.intValue() + " tasks in " + (end - start)/1000000 + "ms");

As you will notice, switching to virtual threads is as simple as switching to the new executor service. The rest of the code stays the same! This is awesome, right?

What about the performance using virtual threads? These are the results we got:

I'm running in thread VirtualThread[#10029]/runnable@ForkJoinPool-1-worker-10
I'm running in thread VirtualThread[#10031]/runnable@ForkJoinPool-1-worker-10
I'm running in thread VirtualThread[#10027]/runnable@ForkJoinPool-1-worker-10
I'm running in thread VirtualThread[#10028]/runnable@ForkJoinPool-1-worker-10
Completed 9999 tasks in 760ms

It only took 760ms with virtual threads! Why is that? As we saw earlier, platform threads don’t get blocked while virtual threads wait for IO operations. Therefore, additional tasks can be processed while virtual threads are waiting. This is huge for the JVM ecosystem!

Now let’s look at a similar example. In this case, we will use Thread.ofPlatform() and Thread.ofVirtual() to specify what kind of threads we’ll be using in the test.

We will run an example using Thread.ofPlatform() first:

long start = System.nanoTime();
int[] numbers = IntStream.range(1, 10000).toArray();
List threads = Arrays.stream(numbers).mapToObj(num ->
Thread.ofPlatform()
.start(task)
).toList();
threads.parallelStream().forEach(thread ->
try
thread.join();
catch (InterruptedException e)
throw new RuntimeException(e);

);
long end = System.nanoTime();
System.out.println("Completed " + adder.intValue() + " tasks in " + (end - start)/1000000 + "ms");

We start 9,999 threads to run the same task we used in our previous example. We then wait for their completion by using join().

If we run this test, it takes around 2–3 seconds to complete. Here’s the results:

I'm running in thread Thread[#10023,Thread-9993,5,main]
I'm running in thread Thread[#10025,Thread-9995,5,main]
I'm running in thread Thread[#10026,Thread-9996,5,main]
I'm running in thread Thread[#10028,Thread-9998,5,main]
Completed 9999 tasks in 2394ms

What happens if we use the same example but instantiate virtual threads?

List threads = Arrays.stream(numbers).mapToObj(num ->
Thread.ofVirtual()
.start(task)
).toList();

As we observed in the previous example, virtual threads provide much better throughput, as seen in the results below:

I'm running in thread VirtualThread[#10029]/runnable@ForkJoinPool-1-worker-4
I'm running in thread VirtualThread[#10030]/runnable@ForkJoinPool-1-worker-4
I'm running in thread VirtualThread[#10031]/runnable@ForkJoinPool-1-worker-4
I'm running in thread VirtualThread[#9954]/runnable@ForkJoinPool-1-worker-5
Completed 9999 tasks in 722ms

Again, virtual threads beat platform threads with a considerable difference.

Please keep in mind that these timings are inaccurate because we’re not running proper benchmarks. We are not warming up the JVM to give time to the JIT compiler to perform improvements, and we’re also running a single execution. This shows you how much our throughput can improve with virtual threads!

We wanted to mention that virtual threads also open the door to introducing structured concurrency in Java. This change will make Java code much safer when running multiple concurrent tasks at different nested levels.

Java will soon introduce structured concurrency and something called scopes as part of JEP 429. This is quite similar to what Kotlin does in their coroutines.

In this article, we have seen how virtual threads will solve one of the major problems in the Java ecosystem. The existing parity between OS threads and platform threads was a huge limitation factor for some applications due to the limit in the number of OS threads in a host.

Asynchronous programming has been our saviour for a long time. However, we see virtual threads are a big factor to cause what we think will be the death of asynchronous programming as we know it. Simple concurrency paradigms will be adopted after this change is made available in one of the next JDK releases.

That’s all from us today! We hope you have enjoyed this article and learned something new about the JVM ecosystem. We think the future is bright for the JVM ecosystem and all the developers and languages part of this community after this upcoming change.

Thanks for reading.

Leave a Reply