My JavaScript is faster than your Rust

Compile the | Nuclear Coke, Chu Xingjuan

Josh Urbane is a software architect for years and enjoys sharing technical perspectives on social media. Recently, he wrote an article about his experience winning a bet with a new developer, and the conclusion that "my JavaScript is faster than your Rust" also comes from this bet. His story may illustrate the importance of operational strategies in R&D practices.

For me, one of the most enjoyable aspects of the software architect job is that it guides developers to understand the latest concepts and influences their technical judgments. Some developers are not very arrogant, so use theory and reality to punch them in the face; architects are also responsible for creating an edutainment learning atmosphere to help young and vigorous developers grow up and mature.

The most satisfying thing for me is that a stunned developer suddenly jumps out and wants to challenge my technical advice (from the developer's point of view, the architect is a bunch of fools who always make "wrong" suggestions), and bets on the whole family to insist that their solution is better.

The problem is, I've been in this business for a long time, and I don't have to verify what the correct answer to the question is. So let's see the real chapter under our hands, and I recorded this story and compiled it into today's article a few years later.

Stud is a kind of "wisdom"

Honestly, it's been years since the rest of the story, so I can't remember many of the details. The general situation was to combine the team's knowledge reserves, available tool libraries and original technical debt, and my advice was to let everyone use Node .js.

A new junior developer was confident in his newly earned Bachelor of Computer Science certificate and wanted to frustrate me with a "showmanship" approach. They heard that I was a minor in computer science, so they thought I didn't understand the underlying principles of computers. In fact, when I first graduated, I thought I understood it very well, but after doing this business for a long time, I felt more and more that computer systems were like magic...

His confidence is not unfounded, and this conclusion is like "C++ is faster than JavaScript", which is basically the consensus of the industry. But as a typical architect, I still insist that it depends on the situation.

More specifically, "fully optimized C++ does run faster than JavaScript with the same level of optimization", after all, JavaScript has an unavoidable execution overhead (even so, we can compile code into static programs to achieve performance that is highly close to C++). Anyway, that's it, that's it.

Surprisingly, JavaScript code is indeed a little faster than the C++ version, and from an architectural design standpoint, the JS version can be maintained by the current team without the technical capabilities of other departments.

Thankfully, I'm not 100% sure I'm right, but considering that the size of the memory object in this use case may be dynamic, and the fact that the young developer is really inexperienced, I'm willing to take a gamble.

JS is faster than C++, how to achieve it?

I'm guessing most developers don't understand this result. This is obviously contrary to the basic principle that "compiled" languages are faster than "interpreted" languages, and "static" programs are faster than "VM" programs. But note that these are experiences, not truths.

As I mentioned before, "optimization" is the key to determining speed. After all, even if the performance advantages of the C++ language itself are strong, poor writing quality will put the program in a quagmire. Node .js, on the other hand (using the C++/C-based V8 and libuv libraries), is more optimized, so the actual speed is not bad. It can even be said that JS and C++ programs with the same poor quality may perform a little better in JS. But this is only a macroscopic discussion, let's look at some details.

Memory is key

Most developers should be familiar with the concept of stacks and heaps, but this understanding is basically only superficial - for example, only know that the stack is linear, and the heap is a "lump" with pointers (not strict terms, everyone can understand it).

More importantly, the concepts of stack and heap correspond to multiple implementations and approaches. The underlying hardware doesn't know what the "heap" is, because the way memory is managed is defined by software, and the choice of memory management is bound to have a huge impact on the final performance of the program.

You can also dig deeper into this issue, which is very meaningful and valuable. Modern hardware and kernels are quite complex, and tend to contain a large number of optimization mechanisms with special purposes, such as more efficient use of advanced memory layouts. This means that the software can (or must) borrow the memory management functions provided by the hardware. And then there's the impact of virtualization... I won't do much here.

The heart of magic: garbage collection

Yes, node.js solutions definitely take longer to start because it requires the JIT compiler to load and run scripts. But once loaded, Node.js code actually has a mysterious advantage - garbage collection mechanism.

In C++ programs, applications tend to create dynamically sized objects in the heap and then delete them later. This means that the program's allocator must allocate and free memory in the heap over and over again. The operation itself is slower, and the actual performance is largely determined by the algorithm in the allocator. In most cases, dealloc will be particularly slow, even after the stripped alloc is not much better.

For Node .js programs, this trick is that the program exits after only one run. Node .js also runs the script and allocates the necessary memory, but subsequent deletions are delayed by the garbage collector picking idle time.

It's true that garbage collection is not inherently better or worse than other memory management strategies (everything is a trade-off), but in this particular program we bet on, garbage collection does provide a significant performance boost because the program never really runs at all. We just cram a bunch of objects into memory and discard them all at once when we exit.

Garbage collection certainly comes at a cost, and Node.js processes consume significantly more memory than C++ programs. This is the classic puzzle of "saving CPU = fee memory" and "saving memory = cost CPU", but my goal is to punch the boy in the face, so it doesn't matter if you spend some memory.

And I was able to win because the other side chose a childish strategy. In fact, the best way for him to win is to add memory leaks and deliberately keep all allocations in memory. In this way, the memory footprint of C++ programs is still smaller, but it is much faster than before. Alternatively, he can further improve performance by designing a buffer to the stack, which is often used in actual production.

There is also the question of how to choose a performance baseline. Generally speaking, what people compare is the number of operations per second. JS here is a good example of C++, showing that "understanding the overall performance cost before making a choice" is often more reliable. In software architecture, we have to keep an eye on the "total cost of ownership" at the resource level.

Step into the modern era: There is Rust on the field

Rust is one of my favorite languages at the moment. It offers a lot of modern features, is fast, has a good memory model, and generates code that is fairly secure.

Rust is certainly not perfect, it takes a long time to compile, and involves a lot of strange semantics, but overall it is still recommended. You have flexible control over how memory is managed in Rust, but its "stack" memory always follows the ownership model, which is the basis for achieving proud security performance.

One of the projects I'm currently working on is a FaaS (Function as a Service) host written in Rust that executes WASM (WebAssembly) functions. It executes isolated functions quickly and safely, minimizing the operational overhead of FaaS. It's also fast, with the ability to handle 90,000 simple requests per core per second. What's more, its total memory footprint is only about 20 MB, which can be said to be quite exaggerated.

But what does this have to do with Node.js and C++ betting?

In simple terms, I think of Node .js as a "reasonable" performance benchmark (Go is a "fantastic" benchmark, and its performance is definitely not comparable to those languages designed for Web services, so don't reduce the dimensionality here), after all, the performance of the early C++ version of our program is really not bad, the only benefit is that the memory footprint is less than one-tenth of the node .js version.

While there's really nothing wrong with getting the code up and running and then optimizing it, losing to JavaScript on a "fast" language like C++ can be frustrating. The reason why I dare to stud on the spot is based on the basic judgment of the obvious bottleneck. This bottleneck is memory management.

Each guest function is allocated to a memory array, but allocating memory within the function and copying data between function memory and host memory certainly introduces significant performance overhead. Because dynamic data is littered around, the dispenser is like a heavy blow from all sides. As for the solution, cheat!

Add piles, two piles, three piles...

Essentially, the heap represents a portion of the memory that the allocator uses to manage the map. The program requests N memory units, and the allocator searches the available memory pool (or requests more memory from the host) and which storage units are already occupied, and then returns a pointer to the location of that memory. When the program runs out of memory, it tells the allocator, which updates the map to make it clear which cells are now available again. Pretty simple, right?

But if we need to allocate a whole bunch of memory units of different life cycles and sizes, the trouble comes. This is bound to create a lot of fragmentation, which in turn magnifies the cost of allocating new memory. The performance penalty then begins to occur, after all, the function of the allocator is too simple, just looking for a usable storage location.

There is obviously no good solution to this problem, although there are many optional allocation algorithms, but they still have their own trade-offs, requiring us to choose the most appropriate method based on the characteristics of the use case (we can also use the default options directly like most developers).

Let's talk about cheating. There's more than one way to cheat: for FaaS, we can release the dealloc for each run and clear the entire heap after each run is complete; we can also use different allocators at different stages of the function's lifecycle, such as making an explicit distinction between the initialization phase and the run phase. This allows both a clean function (which is reset to the same initial memory state with each run) or a state function (which preserves state between runs) that has an optimized memory strategy that corresponds to it.

In our FaaS project, we ended up building a dynamic allocator that selects the allocation algorithm based on usage, and the actual selection persists between each run.

For "less utilized" functions (that is, most functions), just use a simple stack allocator to point to the next free slot. When dealloc is called, the pointer is rolled back if the cell is the last cell on the stack; if it is not the last cell, no operation. When the function completes, the pointer is set to 0 (equivalent to Node.js exits before garbage collection). If the function's dealloc failures and usage reach a certain threshold, a different allocation algorithm is used in the remaining calls. As a result, this scheme significantly speeds up memory allocation in most cases.

Another "heap" is also used in the runtime – the host (or function shared memory). It uses the same dynamic allocation strategy and allows bypassing the copy step in earlier C++ versions and writing directly to function memory. This allows I/O to copy guest functions directly from the kernel and bypass the host runtime, significantly increasing throughput.

Node .js vs. Rust

Optimized, the Rust FaaS runtime ends up being more than 70% faster than our Node .js reference implementation, and the memory footprint is less than one-tenth of the other.

But the key here is "optimized", and its initial implementation is actually slower. Our optimizations also require some restrictions on WASM functions that are completely transparent during compilation and rarely incompatible.

The biggest advantage of the Rust version is that it has a small memory footprint, and the RAM saved can be used for other purposes such as caching or distributed memory storage. This means that I/O overhead is further reduced, production runs are more efficient, and the effect is even more pronounced than pulling up CPU configurations.

We have more optimization plans in place, but mainly to address some issues in the host layer that have significant security implications. Although it has nothing to do with memory management or performance, it supports the view of the "Rust is faster than Node" party.

Summary

In fact, after writing down the whole text, I can't come to a particularly clear conclusion. Here are just a few superficial points:

Memory management is fun, and each method is a trade-off. As long as the strategy is used properly, any language can get a huge performance boost.

I still recommend that you use Node.js and Rust flexibly according to your actual goals, so I don't make a judgment here. JavaScript is indeed more portable and is particularly well suited for cloud-native development scenarios, but rust may be a better choice if performance is a big concern.

I've been talking about JavaScript from start to finish, but here I'm actually referring to TypeScript.

In the final analysis, we have to choose the most suitable technical solution according to the actual situation. The more we learn about the different characteristics of different stacks, the more relaxed we are when choosing.

https://medium.com/@jbyj/my-javascript-is-faster-than-your-rust-5f98fe5db1bf

My JavaScript is faster than your Rust

Read on