Cleanup the node_modules for a lighter Lambda Function

Featured on Hashnode

Subscribe to my newsletter and never miss my upcoming articles

Any nodejs project carries a bulky folder - the node_modules - that carries all the modules and dependencies that the application would need. If you try to peep into that folder, you can see a huge chunk of folders and files. That often makes me wonder - are these really required? Does my application use so much?

Not just that, each of these modules come with several versions of the code - the dist, prod, and the elaborate bulky src folder. Along with that, it has a ton of readme files and license agreements. Few of them also have a photograph of the developers! With due regards to each of these, I feel these are not required on my production deployment. That is a big waste of disk space.

People who deploy on a bare server or an EC2 instance, may not mind all of this. Not because the cost and compute are free, but they have already resigned to overprovisioning. So such problems may be a low priority.

But, for someone who is conscious and goes for Lambda functions, it may be a big concern - where each millisecond of compute time is valuable, and so is the memory used.

One may get generous about provisioning RAM, but the deployment package has to restrict to 500MB. An ugly node_modules folder can easily grow well beyond that - and put us in trouble. Also, larger deployment size means longer warmup times. So we should do everything to ensure a compact node_modules folder to get a cleaner deployments.

Here are some of the techniques that helped me.

Check the Dependencies

First of all, we have to overcome the shock - why is my node_modules so huge?

{
  "name": "layerjs",
  "version": "1.0.0",
  "description": "Lambda function triggered by event, to generate daily reports",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "aws-sdk": "^2.805.0",
    "jsonwebtoken": "^8.5.1",
    "pdfkit": "^0.11.0",
    "uuid4": "^2.0.2",
    "xlsx": "^0.16.9"
  }
}

Consider for example, this simple and small package.json. It pulls in a node_modules folder of 117 MB!

$  sudo du -sh node_modules
117M    node_modules

I need to know what is going on here. What does it pull in?

I found a very good tool for this. NPM Graph. Very simple to use, it provides a graphical view of all that goes into the node_modules. Just drop the package.json in there and it will show all that goes into the node_modules

layerjs_dependencies (1).jpg

That's HUGE! Let's try to reduce it now.

AWS SDK modules

This is a very common mistake. A lot of developers - who want to test stuff locally, include the AWS SDK in the package.json. This is great. But, problem starts when we have this pushed into our deployment package.

The Lambda runtime environment carries its own AWS SDK. Unless you have to make a lot of tweaks in there an need a highly customized version, this is really not required in your deployment package. This can be simply achieved by making it a dev-dependency

$ npm install PACKAGE --save-dev

This will make the package a dev dependency. We can use it for development and testing. But it is purned off when we make a production deployment

We can do the same about many other modules that we need only in our development environment.

Production Flag

This follows from the previous one. It is the simplest and yet ignored one. Just delete the node_modules folder and install it again using the --production flag

Any package that we have marked as dev dependencies will not be a part of the deployment. Not just that, any dev-dependency of the our prod dependencies will also drop off.

With this, the package.json becomes

{
  "name": "layerjs",
  "version": "1.0.0",
  "description": "This is the lambda layer generated for the service",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "jsonwebtoken": "^8.5.1",
    "pdfkit": "^0.11.0",
    "uuid4": "^2.0.2",
    "xlsx": "^0.16.9"
  },
  "devDependencies": {
    "aws-sdk": "^2.805.0"
  }
}

Now, we install it with the production flag

$ rm -rf node_modules
$ npm install --production
$ sudo du -sh node_modules
59M     node_modules

Now, the node_modules folder is 40 MB. Note that this chunk is mainly because of the SWS SDK. If everyone had followed the good coding practices, this would have made a huge difference. But... So we may not see miracles here, but it can reduce the deployment size to some extent.

Remove Unnecessary Files

Now that we have dropped the unnecessary packages, we have to start with cleaning the packages themselves. For that, we have some good utilities.

Node Prune

$ npm install -g node-prune

When we run this in the root folder of the project, it will again tear off what is not useful.

$ node-prune
Before: 59M .
Files: 5696
After: 47M .
Files: 4115

That was good. But it could be better. Let's top it up with other utilities.

ModClean

npm install modclean -g

Then, use it to cleanup the node_modules


$ modclean -n default:safe,default:caution -r


MODCLEAN  Version 3.0.0-beta.1

āœ” Found 689 files to remove
[==============================] 100% (689/689) 0.0s

āœ” Found 546 empty directories to remove
[==============================] 100% (546/546) 0.0s


FILES/FOLDERS DELETED
    Total:    1235
    Skipped:  0
    Empty:    546


$

It did some work. Now, the size is 43MB

$ sudo du -sh node_modules
43M     node_modules

Uglify Code

We have come down from 98MB to 43MB. That is good, but not as much as one would want. Considering the amount of junk in the node_modules folder, we need something better. And white space is what occupies most space. So we work on that. Uglifying code certainly reduces the file size.

There are several node modules that can help you uglify code. But a lot of them are not compatible with the ES2015 and above. Uglify ES is a good one. Let's start with installing that

npm install uglify-es -g

With this in, let's uglify each JavaScript file in the node_modules folder.

find node_modules -name *.js | while read a
> do
> echo $a
> uglifyjs $a -o $a
> done

This takes a long time, as it has to access and analyze each JS file in there.

At times, this generates a heap overflow error. Because uglifyjs is asynchronous, running in a loop spawn too many of them - causing trouble. Adding a sleep 1 in the loop can solve the problem. But it will increase the runtime further. In any case, it is worth all the effort.

$ sudo du -sh node_modules
37M     node_modules

There, now we have 37MB. That is good! Reduces my warmup time and

anatawa12's photo

In my opinion,

$ npm prune --production

is better to remove devDependencies because npm install downloads dependencies but npm prune only removes devDependencies, doesn't downloads dependencies.

Small Benchmark with your example package.json:

$ cat package.json
{
  "name": "layerjs",
  "version": "1.0.0",
  "description": "This is the lambda layer generated for the service",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "jsonwebtoken": "^8.5.1",
    "pdfkit": "^0.11.0",
    "uuid4": "^2.0.2",
    "xlsx": "^0.16.9"
  },
  "devDependencies": {
    "aws-sdk": "^2.805.0"
  }
}
$ npm install # reinstall all dependencies
...
$ rm -rf node_modules
$ time npm install --production
...
added 146 packages from 120 contributors and audited 158 packages in 3.491s
...
npm install --production  3.29s user 1.81s system 99% cpu 5.122 total
$ npm install # reinstall all Dependencies
...
$ time npm prune --production
...
removed 12 packages and audited 146 packages in 1.3s
...
npm prune --production  1.52s user 0.44s system 95% cpu 2.054 total
Vikas Solegaonkar's photo

Thanks for the insight. What I showed above is just a reminder to delete the node_modules that is already collected in the previous step.

Ofcourse, in real live, we don't have to install, delete and install once again. A simple npm install --production is enougn. Perhaps I need to elaborate on that point.

Thanks for pointing out

Amaan Ahmad's photo

hey, that's a really awesome amount of value you shared with the community! Thanks for it, saved for later! it was much needed šŸ˜„

Abdullah T's photo

Nice article, I didn't know about the prune. Normally I just use webpack and serverless framework and run the optimisations on webpack itself.

but the deployment package has to restrict to 500MB

I think you mean 250Mb.

From the docs, the deployment package can be:

  • 50 MB (zipped, for direct upload)
  • 250 MB (unzipped, including layers)
  • 3 MB (console editor)

or now up to 10Gb if you are using a container image.

Vikas Solegaonkar's photo

Thanks for your inputs. docs.aws.amazon.com/lambda/latest/dg/runtim..

500 Mb (512 to be precise) is the net allowed - including the runtime itself - that is if you take up the a custom runtime.

Of course, things have changed with the container in Lambda. But that has impact on the pricing as well.

The serverless framework itself brings in so much to the deployment - it ends up blowing up the deployment size.

I had not tried the webpack before. Tried it after your comment. But I saw it had problems with non js code in the node_modules. It was wonderful with pure js code. Any way to overcome that?