
Last post, we dived deep into load balancing: What is it, When to use it, and how to build a basic system with Express & Node.js. To understand load balancing, please go ahead and read that article.
In this post, we are going deep inside how to leverage the node js cluster module to build a load balancing system and improve your application’s performance.
what is a Cluster Module?
In Node.js, clustering is a mechanism for multiplying child processes that can deal with incoming requests concurrently, enhancing the application’s performance and availability.
The Node.js Cluster Module makes it possible to create concurrent child processes (sometimes known as “Workers”) that use the same server port. Each spawned child has its event loop, Memory, and V8 instance. The child Process uses the IPC (Inter-process-communication) for communication with the Parent process (Master).
The cluster module can be used to facilitate load balancing across multiple CPU cores in an environment. It is based on the child process module fork mechanism. Clustering lets us duplicate the main process for as many CPU cores as available. Then it will take control and distribute the load across all forked processes such that all requests go to the main process.
Incoming connections are distributed among child processes in one of two ways:
- The master process listens for connections on a port and distributes them across the workers in a round-robin fashion. This is the default approach on all platforms, except Windows.
- The master process creates a listen socket and sends it to interested workers who will then be able to accept incoming connections directly.
Create a load balancing server
This example shows the benefits of clustering in Node.js. We’ll build a sample app without clusters, and then compare it to a clustered one, including performance differences via load tests.
Without cluster module:

It’s a bit hard to find a real-world situation, but the above example is suited to our needs. The server.js has two routes — a route that returns the simple string “Hello World!” and a second route that takes a route parameter n
and adds numbers up to n
to a variable count
and return the final count.
Here in the second route, we have simulated a very large operation by feeding a large value of the parameter n
, so there would be O(n)
operations.
Now launch the application using node server.js
and supply a small value for n
(e.g, http://localhost:3000/api/50000) it will run rapidly and respond right away. However, if you raise the value of “n
” (e.g, http://localhost:3000/api/500000000), you will have issues since the request will take a few seconds to respond.
Now, if you open another tab on the browser and submit another request to the server, the request will take a few seconds to complete since the program is working on a single thread and is occupied with completing the initial lengthy operation. The second job will only be handled by a single CPU thread when the first one is finished.
Using cluster module

Now we have just modified the server.js to use clustering and the app does the same thing as before. Multiple child processes now share port 3000 to handle requests. we can create as many child processes as there are CPU cores in the machine.
The master process acts as the boss, creating and managing worker processes. It checks its status upon app launch using cluster.isMaster
. If It’s the boss, it forks new processes to match available CPU cores using cluster.fork()
the method. It is like hiring a team based on open tasks.
To Test the clustering’s power: Open multiple tabs with the same request. Notice the speed? No Delays right? It’s because multiple threads now handle requests together, keeping things fast and smooth.
Performance Metrics
Let’s run a load test on our two apps to see how each handles a large number of incoming requests. we will use autocannon
package for this test.
Autocannon lets you test your API’s limits by mimicking a crowd of users, revealing how it handles heavy traffic.
First, install the autocannon package globally:
npm install -g autocannon
Then run the app that you want to test with node server.js
. First, we will start with the version that doesn’t use clustering. Run the app in one terminal and open another terminal and run the following load test command:
autocannon -c 100 http://localhost:3000/api/50000000
The above command will send 100 concurrent requests for 10s to the given URL. The following output is from running the above command.

The load test revealed the server’s ability to handle 465 requests when faced with 100 concurrent connections over 10 seconds. The average request per second is 36.5 and the average latency is 2366.21 milliseconds.
Now stop the non-cluster app and run the clustered one and, finally run the same load tests using the same command used before.

The above image showed that the server can respond to 2000 requests when running 100 concurrent connections for 10 seconds. This time server can handle 163.4 requests per second and the average latency time is reduced to 581.88 milliseconds.
Enjoyed this post?
If you like my article then share and subscribe for more techie articles. Please share it with your friends 👦 and co-workers 👭 🧑🤝🧑
I appreciate your support. 💚 Thanks for reading! 🙏
Happy Coding… 😁
One response
In my last blog post, I have explained how to leverage the node js cluster module to build a load-balancing system and improve your application’s performance. Unleash the Kraken: Node.js Clusters for Epic Load Balancing