Node.js on Alpine vs Debian: Performance differences

A few days ago, I was working on a project that required me to use Node.js. I was helping a colleague set up the Docker container used to run the project, and we had to choose between using the Alpine or Debian image for my Docker container. I was heavily against using Alpine, and had to explain why to my colleague. I decided to do write this article to share my stance and explain the rationale behind it, and as an exercise in collecting detailed proof of my POV on the matter.

What is Alpine?

If you don't know what Alpine is, don't talk to me. Alpine Linux is a lightweight Linux distribution that is designed with security and simplicity in mind. It is known for its small size and minimalistic design, making it a popular choice for containerized applications. Alpine Linux uses musl as its standard C library, which is known for its small size and performance optimizations.

What is Debian?

If you don't know what Debian is, consider a career in carpentry. However, for the sake of completeness: Debian is a popular Linux distribution that is known for its stability and large package repository. It is widely used in production environments and is a popular choice for server deployments. C code compiled on Debian systems is compiled against glibc, the GNU C library, which is known for its feature-richness and compatibility with a wide range of software.

Who cares about the standard C library used in a Docker container?

You might've read the previous sections and thought, "Who cares about the standard C library used in a Docker container? I'm not a kernel developer, I'm just trying to run my Node.js application." And you'd be right. For most applications, the choice of standard C library doesn't matter. However, if you know me at all, you know that I care about the nitty-gritty low-level details of software development. And I won't let my application run on anything that I ignore the implementation details of.

Node.js is a C++ application at its core, and it interacts with the operating system through system calls and the standard C library. The choice of standard C library can have an impact on the performance of the application, especially when it comes to I/O-bound operations, and thinking that a JIT-compiled language like JavaScript is immune to these low-level details is naive.

Syscalls and I/O operations are the most common ways that a Node.js application interacts with the operating system. glibc's implementations of most of them (I'm talking malloc, memcpy, printf, etc.) are heavily optimized. malloc, for example, is implemented in a way that makes it fast and efficient for allocating and deallocating memory different than musl in several aspects.

Why is important? Well, malloc is our key to the magical heap.

Heap? What's that?

The heap is essentially a list of memory regions that a running program uses to store data on, dynamically. When we access memory on the heap, the operational the speed is OFC slower than when handling the stack because in a stack frame, data (or instructions) is simply pulled/pushed into memory in a sequential, atomic fashion. This creates no need specifically seek out particular memory addresses. On the heap, however, the program must search for an empty memory block that is large enough to store the data. I don't need to tell you that this is a much slower process than the stack operation, so the way memory allocators optimise the de/allocation of memory blocks is crucial for the performance of the program.

Since the stack is much faster than the heap, then why we would even need the heap? Because sometimes we need to allocate memory dynamically in the program. Therefore the heap is designed to cope with our need. The data stored in heap regions are requested during runtime. This also means when we don't use functions like malloc in the program, heap will not be initiated in the memory zone.

A bit of history on musl vs glibc

musl and glibc are two different implementations of the standard C library, and they have different design philosophies.

glibc has been around for a long time (since 1987, probably older than me and you) and is known for its feature-richness and compatibility with a wide range of software. It's accumulated a ton of triage and consequent optimizations over the years, and it's the default C library on most Linux distributions. However, glibc is also known for its large size and memory footprint, which can be a problem for resource-constrained environments.

musl, on the other hand, is a newer (its first release was in 2011) implementation of the standard C library that is designed with simplicity and lightweightness in mind. It is known for its small size and minimalistic design, and it is optimized for static linking, which can result in developers being able to ship applications as small, single binaries with faster startup times, keeping their runtime memory footprint low. A godsent for embedded systems and IoT devices. musl is the default C library on Alpine Linux, and has become a popular choice for containerised applications because of its minimalistic design philosophy. However, musl is also known for its lack of compatibility with some software that expects glibc-specific behavior (or maybe glibc is knowing for doing sh!t its own way) and for putting simplicity over tunability, and this has a certain impact on the performance of some
applications compiled against it.

Breaking down the performance differences

Well, now you know the difference between musl and glibc, and you might've guessed that the choice of standard C library is what makes the difference in performance between Alpine and Debian. But how does it actually affect the performance of a Node.js application on a practical level? The internet is full of theories with no practical examples, but you know this blog's existence is entirely devoted to clearing the fog of confusion around software development, so let's provide some straightforward explanations.

glibc

glibc uses the ptmalloc ("pthreads malloc") implementation of malloc, a modern derivation of dlmalloc that is optimized for multi-threaded applications. It uses sorted lists instead of binary tries in bins for large blocks, plus a special array of fast bins for blocks of memory. When freed, these blocks are kept marked as used and put into the appropriate fast bin. Allocation is satisfied using exact fit from fast bins when possible. Fast bins are emptied under multiple heuristic conditions. The big difference between ptmalloc and dlmalloc, though, is in support for concurrent allocations. The ptmalloc allocator maintains multiple arenas. Each arena is an independent instance of the allocator protected by a lock. Upon allocation, the thread invoking the allocator attempts to lock an arena, starting with the one it used last. If an arena is locked successfully, it is used to satisfy the allocation request, otherwise a new arena is allocated. ptmalloc's architecture avoids lock-contention as much as possible. But its strength is (as always in cases like this) also in a much better API, which makes supporting slabs and array or struct elements super neat via independent_comalloc (here), which speeds up my compiled code by ~20%. Its thread stack size is variable, depending on resource contraints, but can be set to up to ~10MB, making it suitable workloads split into a large number of concurrent threads.

musl

musl uses the dlmalloc implementation of malloc, which is optimized for simplicity and minimalism. It uses a single heap for all threads, and it doesn't have the same level of triage as ptmalloc. It uses a binary trie structure for small blocks and a sorted, doubly-linked list for large blocks. It doesn't support concurrent allocations, with a single lock for all threads, which can lead to lock contention in multi-threaded applications. Memory management is differentiated for large and small blocks of memory:

For small (< 256kb) blocks brk - a kernel-level memory allocation call - is used, even though there's a detail: more memory than needed will be requested. It's gonna be used for later allocation without diving down to the kernel level. For frees, it won’t free the memory directly at the kernel level, but just mark it as free.
For large (> 256kb) blocks mmap and munmap are called, so no more caching is needed. This is a good thing, because it's a lot faster than brk for large blocks. But it can also be a bad thing, because it's a lot slower than brk for small blocks.

Its thread stack size is fixed at ~128KB, which is a lot smaller than glibc's default. This can be a problem for workloads that require a large number of concurrent threads, as the stack size can be a limiting factor.

Conclusions

With all that said, it's clear that the choice of standard C library can have a significant impact on the performance of a Node.js application.

Most web applications do heavy string processing, I/O operations, and/or depend on delegating workloads to native dependencies, all factors which can be affected by the memory management and thread allocation strategies of the standard C library used a few layers below the Node.js runtime.

In general, if you're running a Node.js application and want to maximize its runtime performance, expecting a high number of concurrent threads, then glibc is the way to go. If you're running a Node.js application in a resource-constrained environment and want to minimize its memory footprint and startup time, then musl is probably the way to go for you.