Some package managers are faster than others. The early JavaScript package managers, npm and yarn, are commonly replaced by faster alternatives like bun and pnpm. I've also seen benchmarks between package managers where the performance gap is rather large – but I'm not sure why one package manager would ever be significantly faster than another.
To understand more about package manager performance, I traced some call paths through bun's Zig codebase and pnpm's TypeScript codebase but I was still missing some details about the performance challenges these projects were taking on.
So I built my own toy package manager called caladan. For now, it just does two things: install npm packages from a valid package-lock.json
file and run bin scripts.
I wanted to get close to the cold install performance of bun
and I'm pretty happy with the results. Benchmarks are usually incorrect so there's a good chance I'm being unfair to bun
here. Here are the results nonetheless:
# ran on m1 mac w/ 600mbps network, bun v1.2.5# both have an equivalent lockfile with 16 packages (311mb on disk)# cache is cleared before each run with `bun pm cache rm && rm -rf node_modules`./benchmark.shBenchmark 1: ./caladan install-lockfile fixtures/1Time (mean ± σ): 1.767 s ± 0.052 s [User: 2.168 s, System: 2.236 s]Range (min … max): 1.729 s … 1.857 s 5 runsBenchmark 2: bun install --force --ignore-scripts --no-cache --network-concurrency 64Time (mean ± σ): 1.587 s ± 0.097 s [User: 0.496 s, System: 1.293 s]Range (min … max): 1.486 s … 1.693 s 5 runsSummarybun install --force --ignore-scripts --no-cache --network-concurrency 64 ran1.11 ± 0.08 times faster than ./caladan install-lockfile fixtures/1
The much lower user time of bun
points to its efficient Zig codebase. Seeing similar-ish system times and overall wall clock times suggests that both tools have the same fundamental limits (whether network, disk I/O, or system call overhead). On a faster and more capable machine, bun
would be able to make better use of the available resources.
To verify that my package manager is doing the same work, I checked that the sizes of the directories inside node_modules
were comparable, and I checked that the bin scripts ran without any errors (e.g. nanoid
, next
, and image-size
).
./caladan run fixtures/1 nanoidRunning nanoid with args: []Working directory: fixtures/1guxvWmbNcvIuAowqzrnEu
The benchmark script is open source and hopefully you'll correct me if I've set it up unfairly.
I'll outline my efforts to get close to bun
's cold install performance in the following sections.
Installing a Package
package-lock.json
is automatically generated by the previous install to lock the exact versions of all dependencies (and their dependencies) in a Node.js project. It ensures consistent installations across different environments by recording the precise dependency tree at the previous install.
It's mostly made up of dependency entries like this:
"dependencies": {// .."date-fns": {"version": "2.29.3","resolved": "<https://registry.npmjs.org/date-fns/-/date-fns-2.29.3.tgz>","integrity": "sha512-dDCnyH2WnnKusqvZZ6+jA1O51Ibt8ZMRNkDZdyAyK4YfbDwa/cEmuztzG5pk6hqlp9aSBPYcjOlktquahGwGeA=="},
Our job, as a minimal package manager, is to install all of these dependencies.
- Parse
package-lock.json
- Download the compressed files from
resolved
- Verify their
integrity
by calculating the hash of these files - Extract them to
node_modules
- Parse
node_modules/$package/package.json
and check for abin
property - (If so, create a symlink inside
node_modules/.bin/$package
)
Not listed here are other features like pre- and post-install scripts that I haven't implemented. I think I'm also missing some validation steps (e.g. checking if package.json
differs from the lockfile).
To get everything working, I started by implementing these steps to run sequentially. It was very slow and took, like, ~30sec to install all the packages for my small project.
I got a 2x improvement by skipping installing extra packages when I didn't need them (i.e. by filtering by OS). On my MacBook, I don't need to install node_modules/@next/swc-darwin-x64
but I do need to install node_modules/@next/swc-darwin-arm64
.
The next big improvement was to run things in parallel. I put each package's download-and-extract step in its own goroutine and stuck them in an errgroup.
g := errgroup.Group{}// Process each package in parallelfor pkgName, pkgInfo := range packages {g.Go(func() error {// Skip OS-specific packages that don't match current OS// ..// Create package directory// ..// Normalize package path// ..// Download the package tarballDownloadAndExtractPackage(ctx,httpSemaphore,tarSemaphore,client,pkgInfo.Resolved,pkgInfo.Integrity,pkgPath)return nil})}// Wait for all packages to completeerr := g.Wait()// ..
This was much faster than doing everything sequentially. However, without limits on parallelism, there was resource contention in two areas: HTTP requests, and unzipping files.
Comparing CPU Profiles
From reading their codebases, I knew that bun and pnpm used different levels of concurrency for HTTP requests and unzipping files.
When I added separate semaphores around these steps, the performance of my install step improved by ~20% for the small project I've been testing. I knew intuitively that these semaphores helped with resource contention but I thought it would be interesting to prove this using profiling tools.
I've chosen to highlight the effect of adding the semaphore for unzipping files as the performance improvement is more significant there.
In my program, I have an env var that allows me to output CPU profiles:
if cpuProfilePath := os.Getenv("CPU_PROFILE"); cpuProfilePath != "" {f, err := os.Create(cpuProfilePath)if err != nil {fmt.Printf("Error creating CPU profile file: %v\n", err)os.Exit(1)}pprof.StartCPUProfile(f)defer pprof.StopCPUProfile()fmt.Printf("CPU profiling enabled, writing to: %s\n", cpuProfilePath)}
I used pprof's -text
output to compare two different profiles (with unzip sema and without it) side-by-side in my code editor:
go tool pprof -text cpu_without_sema.prof > cpu_without_sema.txtgo tool pprof -text cpu_with_sema.prof > cpu_with_sema.txt
Decompression Performance Improvement
With the semaphore, the core decompression functions represented less of the overall percentage of program time, and were also quicker to run. Below is the profile data for huffmanBlock
(decoding a single Huffman block), and huffSym
(reading the next Huffman-encoded symbol).
# with semaphoreflat flat% sum% cum cum%0.11s 1.94% 89.22% 0.42s 7.42% compress/flate.(*decompressor).huffmanBlock0.11s 1.94% 87.28% 0.19s 3.36% compress/flate.(*decompressor).huffSym# without semaphoreflat flat% sum% cum cum%0.11s 1.88% 88.57% 0.51s 8.70% compress/flate.(*decompressor).huffmanBlock0.19s 3.24% 82.08% 0.29s 4.95% compress/flate.(*decompressor).huffSym
There was also a ~5% decrease in the time spent waiting on system calls (syscall.syscall
) and I/O (os.(*File).Write
and os.(*File).ReadFrom
).
More Detail on Why
The semaphore limits the number of concurrent extraction operations, preventing CPU, memory, and I/O contention. By matching the extraction concurrency to available CPU resources (using 1.5x cores), the system avoids thrashing and context switching.
Notably, there was an increase in "scheduling time" which may seem counterintuitive but here it's desirable as it means synchronization is more orderly and predictable and there's less chaotic contention for system resources:
runtime.schedule +2.70%runtime.park_m +1.23%runtime.gopreempt_m +0.42%runtime.goschedImpl +0.42%runtime.notewakeup +0.21%runtime.lock +1.31%runtime.lockWithRank +1.31%runtime.lock2 +1.31%
We traded a small amount of scheduling time for faster I/O and faster decompression (CPU).
Keeping Things in Memory
One of the ways you can be fast is to avoid disk operations altogether. This was the final optimization I added. Initially, I downloaded each package to a temporary file and then extracted it into node_modules
.
I realized I could do everything at the same time using the HTTP response stream:
- Download the bytes of the archive
- Extract directly to the final location
- Calculate the hash as we go so we can verify each package's integrity
// DownloadAndExtractPackage downloads a package tarball and extracts itfunc DownloadAndExtractPackage(ctx context.Context, httpSemaphore, tarSemaphore *semaphore.Weighted, client *http.Client, url, integrity, destPath string) error {httpSemaphore.Acquire(ctx, 1)defer httpSemaphore.Release(1)// Request the tarballresp, err := client.Get(url)if err != nil {return fmt.Errorf("error downloading package: %v", err)}defer resp.Body.Close()// Setup hash verificationvar hash interface {io.WriterSum() []byte}// ..// Use a TeeReader to compute hash while readingteeReader := io.TeeReader(resp.Body, hash)reader := teeReadertarSemaphore.Acquire(ctx, 1)defer tarSemaphore.Release(1)// Extract directly from the download streamerr := extractTarGz(reader, destPath)if err != nil {return fmt.Errorf("error extracting package: %v", err)}// Compare hashes// ..return nil}
In a way, everything gets blocked on the semaphore that wraps the extracting step. But seeing as it's an order of magnitude faster than downloading bytes over the network, it feels like a good design.
Running Scripts
The final part of my package manager program configures the symlinks for any bin scripts that the packages might have. It also runs them when invoked with caladan run <directory> <script> <args>
.
After a package is downloaded to node_modules/$package/
it has a package.json
file which may have a bin
property.
For example, nanoid
has:
"bin": "./bin/nanoid.cjs",
Which means there's a file at node_modules/nanoid/bin/nanoid.cjs
that we need to create an executable symlink for at node_modules/.bin/nanoid
.
The hardest part is getting the relative file paths correct and ensuring that args are passed correctly. Running the script isn't too hard. It's effectively just exec.Command
.
func Run(directory string, args []string) {// Set up command to run script using project-relative pathbinScriptName := filepath.Join("./node_modules/.bin", scriptName)cmd := exec.Command("sh", "-c", binScriptName+" "+strings.Join(scriptArgs, " "))// Set working directory to the specified directory (project root)cmd.Dir = directoryfmt.Printf("Working directory: %s\n", directory)// Connect standard IOcmd.Stdout = os.Stdoutcmd.Stderr = os.Stderrcmd.Stdin = os.Stdin// Run the command and wait for it to finisherr := cmd.Run()// ..
To Conclude
All this to have a package manager that implements 2% of the spec that users expect, hah.
It's ~700 lines of Go (open source) and it was fun to write. I have more of an understanding about the upper-end of performance that's possible when it comes to installing packages.
I'd like to be able to handle a cold install of a package.json
(creating and updating the lockfile) at similar speeds. I hope to put a follow-up post together when I'm able to get my dependency resolution and hoisting to match how npm
does it.
I'd also like to look into the cache optimizations that bun
does for repeat package installs which, in some cases, takes tens of milliseconds.
After getting up close to the basics of package manager-ing over the past week, I feel like JavaScript doesn't cut it as far as the required performance is concerned. I used to think that package managers were network-bound but now I've changed my mind.
The raw performance (and concurrency primitives) of a systems-y language like Go give you so much more power.
To end on a Jarred Sumner post:
A lot of performance optimizations come from looking closely at things people assume is "just the network" or "just I/O bound"