laitimes

探究前端包管理工具:npm、yarn 和pnpm

author:Flash Gene

introduction

For package managers, different languages actually have their own package managers, for example: Python/Rust has its own package managers (pip/cargo), as well as rpm, maven, etc.

Also in modern front-end development, various package managers such as bower, npm, yarn, cnpm, pnpm, etc., simplify the dependencies of resource references and improve our development efficiency. This article will make a comprehensive analysis and comparison of the development history of package managers and the three mainstream tools: npm, yarn, and pnpm, and discuss their advantages and application scenarios.

Ancient times

Before NodeJS, we wanted to reference some third-party repositories, such as jQuery, which often used the following methods:

  • Remotely download the zip archive, decompress it, put the resource file into the project, and reference it.
  • Through CDN, the resource link is introduced into the HTML with script tags.

As a result, various problems such as version management confusion, large project files, invalid CDN resources, and dependency upgrades will occur.

With the explosion of nodejs and the birth of the concept of modularity, npm appeared, initially npm was only a package manager for server-side nodejs, and with the continuous development of the front-end community, npm was also used in client-side development.

So when the package management tool appears, how to gradually solve the above problems? This has to start with its development history

History of package management tools

npm

npm v3

In July 2011, npm released version 1.0. What did the node_modules folder look like at that time?

node_modules
└─ foo
   ├─ index.js
   ├─ package.json
   └─ node_modules
      └─ bar
         ├─ index.js
         └─ package.json
           

The problem exists

  • node_modules folder is too large, for example, when multiple projects reference lodash at the same time, lodash will be installed multiple times in node_modules, and the computer disk will soon be full, and the node_modules must be deleted frequently by rm -rf node_modules.
  • If the nesting level is too deep, it will stop only when a leaf node that does not depend on any package is found, which will cause the path to be too long and the deletion node_modules failure problem in Windows.
  • The installation speed is very slow, there are reasons for the nesting of directories, and there are also problems with the installation logic, which is downloaded according to the queue, which will lead to only one module being downloaded, parsed, and installed at the same time.

npm V3

In order to solve the above problems, the npm team carefully thought about the structure of the node_modules and proposed a flattening strategy, that is, to flatten the layers that are too deep, and promote the dependency packages to make the nested layers as few as possible. In the npm v3 phase, the structure of the node_modules is as follows:

node_modules
├─ foo
|  ├─ index.js
|  └─ package.json
└─ bar
   ├─ index.js
   └─ package.json
           

Although it is true that the problem of deep nested dependencies and duplicate installations is reduced through the flat strategy, why is it said to solve some of the problems? Take a look at the diagram below:

探究前端包管理工具:npm、yarn 和pnpm

As you can see, the project depends on both B1.0.0 and B2.0.0, and only B1.0.0 is installed on the top layer, and B2.0.0 will still be installed repeatedly. In fact, B2.0.0 may also be installed on the top floor, and B1.0.0 may be installed repeatedly. And the decision to promote this order follows a first-come, first-served strategy, so there is a lot of uncertainty.

The problem exists

  • The problem of duplicate installations has not been completely solved;
  • There are ghost dependency issues, such as the [email protected] was promoted to the top level during installation, but it is not declared in the package.json, and the project can still reference [email protected];
  • Offline cache mode is not supported, and the installation speed is slow;

yarn

The emergence of yarn can be said to fundamentally solve many problems existing in npm, such as resource consistency, slow installation and other problems.

Resource Consistency Solution: Version locking

I think the biggest contribution to yarn was the introduction of yarn.lock to solve the problem of dependency version confusion, and npm followed yarn's footsteps in npm@5 a year later and launched package-lock.json.

Differences between npm V5 and yarn in the way they handle flattening:

// 在一个项目中存在如下依赖:
node_modules
├─ htmlparser2@^3.10.1
|  ├─ entities@^1.1.1
└─ dom-serializer@^0.2.2
|  ├─ entities@^2.0.0
└─ entities@^2.1.0 
           

After installing the dependencies via npm install, the resulting package-lock.json and node_modules structure is as follows:

探究前端包管理工具:npm、yarn 和pnpm

After installing the dependencies through yarn, the resulting yarn.lock and node_modules structure is as follows:

探究前端包管理工具:npm、yarn 和pnpm

The comparison can be seen:

  • yarn.lcok file, all dependency descriptions are flat and straightforward;
  • In yarn.lock, dependencies with different versions of the same name will be merged if the semver range is the same, and there will be multiple version descriptions.
  • yarn uses a stricter versioning algorithm when generating the yarn.lock file, which records exactly the version of each dependency. This means that whenever a dependency is reinstalled, yarn uses the same version, ensuring consistency of the dependency version;

SEMVER specification

SemVer stands for Semantic Versioning, which is used to specify the package versioning format. It consists of three parts: the major version number, the minor version number, and the revision number.

探究前端包管理工具:npm、yarn 和pnpm
  • MAJOR: YOU MUST UPGRADE THE MAJOR VERSION NUMBER WHEN THE UPGRADE API IS NOT BACKWARD COMPATIBLE AND WILL BREAK THE FUNCTIONALITY OF THE EXISTING CODE.
  • MINOR: Added a backward compatibility feature that allows you to upgrade the minor version number. This means that new features have been added, but they do not affect the use of existing features.
  • PATCH: THE REVISION NUMBER CAN BE UPGRADED WHEN A BACKWARDS COMPATIBLE BUG FIX IS MADE. This means that the new version only fixes bugs from the previous version, does not add new features, and is compatible with the previous version.
  • Advance release number and release build number (TAG): Usually use the connectors "-" and "+" to connect, for example: 2.1.3-beat.1+build3.2

Did yarn shoot npm to death on the beach?

In fact, yarn is essentially downloading npm packages, but optimized for the pain points in npm v3:

Caching mechanism:

  • yarn uses a global cache directory to store all dependencies, while npm uses a decentralized cache directory structure. This makes yarn much easier to manage and maintain. - yarn has an offline mode, when you type yarn install on the command line, it will first try to use the local cache, and if you have already cached these dependencies before, you can install them in offline mode.
  • Parallel installation: Yarn is designed with parallel installation dependencies in mind, and uses multi-threading to download and install dependency packages by default, making installation faster.
  • Version locking is more stable: As analyzed above, yarn.lock files are more flat and accurate, which can minimize the problem of multiple version dependencies.

Someone in the community compared the performance of yarn and npm (source: github.com/appleboy/npm-vs-yarn):

npm installnpm ciyarninstall without cache (without node_modules)3m3m1minstall with cache (without node_modules)1m18s30sinstall with cache (with node_modules)54s21s2sinstall without internet (with node_modules)--2s

pnpm

Why is it called the most advanced package management tool?

The original intention of the PNPM project is as follows:

  • Save disk space
  • Increase installation speed
  • Create a non-flat node_modules directory

Save disk space

When using npm for dependency installation, different projects with the same dependencies will be installed repeatedly. When using pnpm, dependencies are stored in a content-addressable repository (store) using store + hardLink:

  • When a project references a different version of a dependency, PNPM will only add the diff files from the different versions to the store when it is installed. For example, if the new version of a dependency in our project changes only one of the files, then pnpm update will only update the one file to the store, but will not change the entire dependency package file.
  • All dependency package files are stored in the global store directory, and when a project installs dependencies, the dependency resources will be linked to the project through hard links, without reinstalling the dependency package again and not taking up additional disk space.

Increase installation speed

As mentioned above, pnpm uses store + hardLink for dependency management and installation.

  • When there are multiple identical dependencies in a project, pnpm only needs to be downloaded once and referenced in different projects through hardLink.
  • PNPM installs dependencies in parallel, allowing you to download multiple dependencies at the same time to further improve the installation speed.

Create a non-flat node_modules directory

When installing dependencies using npm or yarn, all packages are promoted to the root of the module directory. This leads to the problem that the source code can directly access and modify dependencies instead of being read-only project dependencies.

First, let's take a look at how PNPM solves the problem of nested dependencies:

-> - a symlink (or junction on Windows)

node_modules
├─ foo -> .registry.npmjs.org/foo/1.0.0/node_modules/foo
└─ .registry.npmjs.org
   ├─ foo/1.0.0/node_modules
   |  ├─ bar -> ../../bar/2.0.0/node_modules/bar
   |  └─ foo
   |     ├─ index.js
   |     └─ package.json
   └─ bar/2.0.0/node_modules
      └─ bar
         ├─ index.js
         └─ package.json
           

In node_modules folders created by pnpm, all packages have their own dependencies grouped together, but the directory tree is never as deep as npm@2. pnpm keeps all dependencies flat, but uses symbolic links to group them together.

Performance comparison

探究前端包管理工具:npm、yarn 和pnpm

Limitations of PNPM

Here's a description from the official website:

  1. npm-shrinkwrap.json and package-lock.json are ignored. Unlike pnpm, npm can install the same name@version multiple times and have different combinations of dependencies. npm's lock file is intended to reflect the tiled node_modules layout, however, because pnpm creates an isolated layout by default, it cannot be reflected by npm's lock file format. However, if you wish to convert the locked file to the format of pnpm, take a look at pnpm import (https://pnpm.io/zh/cli/import).
  2. Binstubs (files in node_modules/.bin) are always shell files, not symbolic links to JS files. The shell file is created to help programs that support the plugin CLI find their plugin correctly in a special node_modules structure. This is a rare issue, and if you want the file to be a JS file, quote the original file directly, as shown in #736 (https://github.com/pnpm/pnpm/issues/736).

summary

npm, yarn, and pnpm are all excellent package management tools at the moment, and the choice of which one to choose depends on the team's project situation and personal preferences. npm is part of the Node.js ecosystem, yarn provides faster dependency installation and file locking capabilities, while pnpm focuses on reducing disk space usage and installation time.

Attached is a comparison of the three in some features (https://pnpm.io/zh/feature-comparison):

探究前端包管理工具:npm、yarn 和pnpm

Author: Song Yongjie

Source-WeChat public account: Goodme front-end team

Source: https://mp.weixin.qq.com/s/KfGy90i8qoSlIQVbQ7bU8Q

Read on