You run npm install dozens of times a week. But what actually happens between pressing Enter and seeing βadded 847 packagesβ?
Step 1: Read package.json
npm reads your package.json and builds a dependency tree. Every package you listed, plus every package those packages need, recursively.
A project with 15 dependencies in package.json might resolve to 847 packages because of transitive dependencies β dependencies of dependencies of dependencies.
Step 2: Resolve versions
For each package, npm needs to figure out which exact version to install. Your package.json says "react": "^18.2.0" β thatβs a range, not a specific version.
npm checks the registry (registry.npmjs.org) for all versions of react that satisfy ^18.2.0. That means any version >= 18.2.0 and < 19.0.0.
It picks the highest matching version unless your package-lock.json already specifies one. This is why the lockfile matters β without it, two developers running npm install on the same package.json might get different versions.
Step 3: Check the cache
Before downloading anything, npm checks its local cache (~/.npm/_cacache/). If youβve installed react@18.2.0 before on this machine, itβs already cached. No network request needed.
This is why your second npm install is faster than the first.
Step 4: Fetch packages
For anything not cached, npm downloads tarballs from the registry. Each package is a .tgz file containing the source code, package.json, and whatever the author published.
npm downloads in parallel β multiple packages at once. The progress bar you see is tracking these parallel downloads.
Step 5: Extract to node_modules
npm extracts each tarball into node_modules/. But the structure isnβt straightforward.
The flat structure (npm v3+):
node_modules/
react/
react-dom/
scheduler/ β dependency of react-dom
loose-envify/ β dependency of react
npm tries to flatten everything to the top level. This avoids deeply nested paths (Windows has a 260-character path limit) and allows shared dependencies.
When flattening fails: If two packages need different versions of the same dependency, npm nests the conflicting one:
node_modules/
package-a/
package-b/
node_modules/
lodash@3.0.0/ β package-b needs an older lodash
lodash@4.0.0/ β everyone else uses this
Step 6: Run lifecycle scripts
After installation, npm runs scripts in this order:
preinstallinstall(often compiles native modules with node-gyp)postinstall
This is where things sometimes break β native modules need compilers (python, make, g++), and if theyβre missing, you get cryptic errors.
Step 7: Write package-lock.json
npm writes the exact resolved tree to package-lock.json. Every package, every version, every integrity hash. This file is the source of truth for reproducible installs.
npm install vs npm ci
npm install:
- Reads
package.json - May update
package-lock.json - Installs missing packages, keeps existing ones
npm ci:
- Reads
package-lock.jsononly - Deletes
node_modules/entirely - Installs exactly what the lockfile says
- Faster, deterministic, used in CI/CD
Why node_modules is so big
A typical Next.js project has 200-400MB in node_modules. Why?
- Transitive dependencies. You install 20 packages, they bring 800 friends.
- No deduplication across versions. If three packages need three different versions of the same library, you get three copies.
- Published junk. Many packages include test files, documentation, TypeScript source, and build artifacts that arenβt needed at runtime.
The full timeline
npm install
βββ Read package.json (1ms)
βββ Resolve dependency tree (200-500ms)
βββ Check cache (50ms)
βββ Fetch missing packages (1-30s, depends on network)
βββ Extract to node_modules (2-10s)
βββ Run lifecycle scripts (0-60s)
βββ Write package-lock.json (100ms)
Total: anywhere from 3 seconds (everything cached) to 2 minutes (fresh install, slow network, native modules).
Now you know whatβs happening behind that progress bar.