Node.js programming model - async/await vs Promise

Before I seriously to start to learn Node.js, I heard many times that Node.js application is a single threaded Event-Loop application where all your codes run in this main thread. I also saw quite a few articles on the Internet compared the performance and the throughput of Node.js application and Tomcat or SpingBoot based Java application where the throughput of Node.js looks much better. I am kind of really suspicious how this can be achievable for a normal application where we need to read/write records in database, and calling REST services (SOAP services in old days) to integrate with another system. Don't your codes need wait to get the response back?

In Java world, we are so used to the thread pool concept that J2EE servers (Tomcat, WebSphere, WebLogic, JBoss) manage a pool of worker threads for us. Most of time your codes are running in one thread taken from the thread pool to serve one request until the response is sent back to the client. That means during the request, all the database queries and downstream API calls your codes make are synchronous and the thread your codes are running in is blocked. This thread cannot be used by the server to serve other requests in the queue until it is released. This is fundamental concept of J2EE that you are writing single threaded codes even though your server cluster will handle hundreds, thousands, or even millions concurrent requests at same time.

In certain extreme scenario in my past projects, I would consciously submit Callable tasks to a thread pool (or spin off a few threads by myself), so that I can make long running database queries or slow external service calls concurrently in order to improve the performance. I guess a lot of good Java developers would brag the multi-thread experiences in their resume.

So when I started to learn Node.js, the first culture shock is almost all functions are written in asynchronous non-blockingstyle because all I/O intensive calls like database CRUD operations, REST service calls, and file reads etc are handled by separate threads in the Worker Pool. You can read this article which explains the difference between Java programming model and Node.js one.

Quickly I realized (actually I always know) that asynchronous coding has to be treated like multi-thread coding. For example:

            let a : ClassA = new ClassA();
            this.asyncCall1(a); 
            this.asyncCall2(a);
        
If asyncCall2 method depends on some changes to the properties of a instance made by asyncCall1, this piece of codes fails quickly with the chance that the changes asyncCall2 waits in asyncCall1 might have not happened yet.

Above sample code might not make sense at all. In real life when we query the database or make a REST call to get some objects, that method will return a Promise. We can use Promise.then() to manage the callbacks but sometimes that could quickly become very complicated and get into the well known Promise Hell and Nested.then(...) problem like the example here.

            fetchA().then(
                (a : A) => {
                    fetchB().then(
                        (b : B) => {
                            fetchC().then(
                                (c : C) => {
                                    ...
                                }
                            )
                        }
                    );        
                }
            );
        

In order to solve nested hell problem, ES2017 introduced async/await syntax to reduce the boilerplate around promises. When you want to call a async function you prepend await, and the calling code will stop until the promise is resolved or rejected. Above sample codes can be rewritten in a much simpler way:

            let a : A = await fetchA();
            let b: B = await fetchB();
            let c: C = await fetchC();
        
According to the Node.js document, They make the code look like it's synchronous, but it's asynchronous and non-blocking behind the scenes. This is partly true because according to Event Loop Explained

when a Promise (the await keyword is used) is returned to the calling code, the callback function is placed into the poll queue. The main Event-Loop thread can continue to process the next callback function in the poll queue which might be a REST GET or POST service call from an mobile app or browser. So the node.js application is responsive and non-block that it can continue to process other requests from everywhere. This is desired behavior in a high volume backend service scenario when requests keeps coming here. Everybody has the chance to share the CPU time, do something and eventually get its response back. But In a low traffic service scenario, your codes above executes linearly because it is blocked and waiting three times after it calls fetchA, fetchB, and fetchC. There is no difference than writing tradition Java codes.

Here is a real example where I have a long running job which goes through years of stock trades, and calculate the market value at the end of each day based on the close price of each stock. At first I started with async/await syntax so that the codes look much cleaner.

			for (let openPosition of thisDayOpenPositionList) {
			    let symbol : string = openPosition.symbol;
			    let closePrice : number = await HttpUtil.get("xxx"); //Get the close price of a stock of a given date from a HistoryData REST service
			    ...
			}
        
The whole job took around 4 minutes 29 seconds.
            2022-03-14 19:38:45.683 - INFO [AccountDailyStatsService]: Recalculate Account Daily Stats from begining for Account AAA
            2022-03-14 19:43:14.098 - INFO [AccountDailyStatsService]: Finished the recalculation
        

But from top, you can see the overall CPU usage is 11.9% with the main Even-Loop thread takes 11.3%, and one Worker Thread uses around 1.0% while all other threads are almost idle and do nothing. This is pretty slow because each REST GET call in the for loop is putting to the Node.js poll queue one at a time, and the rest codes are waiting for closePrice to be returned.

Obviously I was not happy at all. I spent so much time to learn Node.js and its asynchronous coding model, and how can I accept the fact that it performs as slow (or as fast) as my old good Java codes? Even in the Java world, I can submit Callable task for each REST GET, submit them to a thread pool, and wait for all the tasks finish and get the result from the Future. Couldn't I do the same thing in Node.js?

I immediately found the answer - Promise.all(). The new codes look like:

			let promiseList : Promise[] = [];
			let symbolListInPromiseList : string[] = [];
			for (let openPosition of thisDayOpenPositionList) {
			    promiseList.push(HttpUtil.get("xxx")); //Push all Promises into a list
			}
			let closePriceList : number[] = await Promise.all(promiseList);
			for (let i = 0; i < closePriceList.length; i++) {
			    ...
			}
        
New codes took 2 minutes 32 seconds. It cut almost 50% of processing time.
			2022-03-14 20:23:43.649 - INFO [AccountDailyStatsService]: Recalculate Account Daily Stats from begining for Account AAA
			2022-03-14 20:26:15.037 - INFO [AccountDailyStatsService]: Finished the recalculation
        

Now from top, you can see the overall CPU usage jumps to 53.3% with the main Even-Loop thread takes 38.0%. There are 4 Worker Threads using 2.3% of CPU each, another 4 Worker Theads are using 1.0% each. Finally I got my CPU busy now. The busier CPU is, the happier I am!

The calling service and being called REST service are running in the same Linux VM. I believe if these 2 services are running on different servers across the Internet, the performance difference will be even more dramatical because of network latency. As usual after this exercise, I tried to standardize my code templates as:

  • Use await/async function when there are one query or REST API call, follow by sequential insert/update/delete database calls. Anyway codes couldn't be optimized much
  • Use Promise.all when I need to make multiple REST calls or database queries to make them run concurrently in Node.js Worker Threads.
One last comment, when I tune the application performance, besides optimize database queries which is easiest and first thing to do, I always read the codes of hot spot and try to rewrite them in a more efficient way.

Again what a fun evening!