Zipping Large Size S3 Folders and Files Using Node.js Lambda And EFS
By Hyuntaek Park
Senior full-stack engineer at Twigfarm
AWS S3 is a very convenient cloud storage. You can upload and download files easily in various ways with AWS CLI, SDK, API, etc. But can you download an entire folder and its sub-folders and files recursively? Unfortunately, S3 does not provide such features. We need to develop our own way to recursively zip the folder and make the zip file available for download.
Requirements
Our goal is to zip entire folders, sub-folders, and files under the folders in our S3 bucket while preserving folder structure tree. The files can be large (> 512 MB, which is the size of Lambda temporary storage).
How files are treated in S3
We have created folders and uploaded files as following in S3 bucket.
However, to be concise, they are not folders in S3. There are just four files with the following keys:
- folder1/sub1/image.png
- folder1/sub2/test.txt
- folder2/large.mov
- folder2/test2.pdf
Solution
Although S3 does not have a concept of folders, the key of each file has folder information as prefixes. Each folder level is delimited by ‘/’ and followed by the file name. (i.e., folder1/sub1/image.png)
Using the key that has folder information prefix, we can create folders in EFS and then download the file from S3.
Then Lambda simply does the zipping and upload the zip file back to S3. Following diagram shows the sequence of our implementation and how files are represented differently in S3 and EFS.
One thing to keep in mind is that our Lambda and the EFS must be in the same VPC.
Create EFS (Elastic File System) and access point
There are a couple of reasons why Amazon EFS comes in handy.
- EFS is just like Linux file system. You can use file commands such as mkdir, ls, cp, rm, etc.
- We could use Lambda’s temporary storage has size limit: < 512MB
Let’s create an EFS. Go to Elastic File System in AWS console and click Create file system.
Then click Create.
Now it is time for creating an access point which to be used in Lambda function later. Choose the file system we just created. Then click Access points –> Create access point.
Here’s the input values you should enter:
- Root directory path: /efs
- POSIX user
- User ID: 1000
- Group ID: 1000
- Root directory creation permissions
- Owner user ID: 1000
- Owner group ID: 1000
- POSIX permissions to apply to the root directory path: 0777
Create and configure EFS attached Lambda function
Let’s create a Node.js Lambda function as following:
Once the Lambda function is created, click Configuration –> File systems –> Add file system.
Choose the EFS access point that we have just created. And put /mnt/efs for Local mount path. This important because /mnt/efs will be your EFS folder.
Click Save, now you have access to /mnt/efs from the Lambda function.
Access to S3 from Lambda
VPC Endpoints
According to https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html,
A VPC endpoint enables connections between a virtual private cloud (VPC) and supported services, without requiring that you use an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.
To access to S3 buckets from Lambdas inside a VPC, we need to set up a VPC endpoint for S3. Go to VPC and click Endpoints –> Create endpoint. Then select input as following:
Then click Create endpoint. Technically the Lambdas within the VPC can access to S3 now but one more step is required to really access to a specific S3 bucket.
Lambda role
A Lambda role is created while creating our Lambda function. You can use the existing role for the Lambda. Here we just created a new role. Go to our Lambda function then click Configuration –> Permissions. Then choose the role under Execution role.
Then go to Permissions policies –> click Add permissions –> Create inline policy. On the next screen, choose JSON tab. Then copy and paste following. Replace YOUR_BUCKET_NAME with your own bucket name.
Click Review policy. Enter the policy name you like and then click Create policy.
More Lambda configuration
Since downloading takes time and file sizes can be hundreds of megabytes, Lambda’s default memory size (128 MB) and timeout (3 seconds) are not enough. For this demonstration, memory size and timeout are set to 4096 MB and 2 minutes, respectively in Configuration –> General configuration.
Lambda code
Here’s the final Lambda code. The code implements what we have discussed.
- Copies folders / file from S3 to EFS.
- Zips downloaded files in EFS
- Uploads the zip file back to S3
- Removes the temporary EFS file
I hope the code itself is self-explanatory. Just one thing to mention is that we used an open source Node.js package called archiver for zipping folders and files. There are many ways that you can zip files in Node.js. You can choose whatever suits you the best.
Obviously there should be try / catch blocks to deal with error cases. But here we just omit them for simplicity.
Results
Let’s go check our S3 bucket.
As you can see there is a new zip file, called my-archive.zip. Let’s click the file name and download and unzip the file.
Folder and file structure is exactly the same as the one at the top of this article.
We had many steps to follow to achieve this simple requirement, zipping folders and files in S3, but they are pretty standard when you have to deal with AWS.
- Create and launch AWS service
- Give appropriate permissions
- Execute the logic
It took a while for me to get used to it! :)
Thanks for reading.