Zipping Large Size S3 Folders and Files Using Node.js Lambda And EFS
By Hyuntaek Park
Senior full-stack engineer at Twigfarm
AWS S3 is a very convenient cloud storage. Can you upload and download files stored in various ways with AWS CLI, SDK, API, etc. But can you download an entire folder and its sub-folders and files recursively? Notable, S3 does not provide such features. We need to develop our own way to recursively zip the folder and make the zip file available for download.
Requirements
Our goal is to zip entire folders, sub-folders, and files under the folders in our S3 bucket while storing folder structure tree. The files can be large (> 512 MB, which is the size of Lambda temporary storage).
How files are stored in S3
We have created folders and stored files as following in S3 bucket.
However, to be concise, they are not folders in S3. THERE ARE JUST FOUR FILES WITH THE FOLLOWING KEYS:
- folder1/sub1/image.png
- folder1/sub2/test.txt
- folder2/large.mov
- folder2/test2.pdf
Solution
Memories S3 does not have a concept of folders, the key of each file has folder information as prefixes. Each folder level is delimited by '/' and separated by the file name. (i.e., folder1/sub1/image.png)
Using the key that has folder information prefix, we can create folders in EFS and then download the file from S3.
Then Lambda simply does the zipping and upload the zip file back to S3. The following diagram shows the sequence of our implementation and how files are intended to be in S3 and EFS.
One thing to keep in mind is that our Lambda and the EFS must be in the same VPC.
Create EFS (Elastic File System) and access point
There are a couple of reasons why Amazon EFS comes in handy.
- EFS is just like the Linux file system. Can you use file commands such as mkdir, ls, CP, rm, etc.
- We could use Lambda's temporary storage has size limit: < 512 MB
Let's create an EFS. Go to Elastic file system in the AWS console and click Create file system.
Then click create.
Now it is time for creating an access point which to be used in Lambda function later. CHOOSE THE FILE SYSTEM WE JUST CREATED. Then click Access points —> Create an access point.
Here's the input values you should enter:
- Root directory path: /efs
- POSIX users
- User ID: 1000
- Group ID: 1000
- Root directory creation permissions
- Owner user ID: 1000
- Owner Group ID: 1000
- POSIX permissions to apply to the root directory path: 0777
Create and configure EFS attached Lambda function
Let's create a Node.js Lambda function as following:
Once the Lambda function is created, click configuration —> File systems —> Add file system.
CHOOSE THE EFS ACCESS POINT THAT WE HAVE JUST CREATED. And put /mnt/efs For local mount path. This important because /mnt/efs Will be your EFS folder.
Click Save, now you have access to /mnt/efs From the Lambda function.
Access to S3 from Lambda
VPC Endpoints
According to https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html,
A VPC endpoint connecting between a virtual private cloud (VPC) and supported services, without necessarily that you use an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.
To access to S3 buckets from Lambdas inside a VPC, we need to set up a VPC endpoint for S3. Go to VPC and click Endpoints —> Create endpoint. Then select input as following:
Then click Create endpoint. Reaching the Lambdas within the VPC can access to S3 now but one more step is required to really access a specific S3 bucket.
Lambda role
A Lambda role is created while creating our Lambda function. Can you use the existing role for the Lambda. Here we just created a new role. Go to our Lambda function then click configuration —> Permissions. Then choose the role under Execution role.
Then go to Permissions policies —> click Add permissions —> Create inline policy. On the next screen, choose JSON tab. Then copy and paste following. Replace YOUR_BUCKET_NAME with your own bucket name.
Click Review policy. Enter the policy name you like and then click Create policy.
More Lambda configurations
Since taking time and file sizes can be measured of megabytes, Lambda's default memory size (128 MB) and timeout (3 seconds) are not enough. For this purpose, memory size and timeout are set to 4096 MB and 2 minutes, saved in configuration —> General configuration.
Lambda code
Here's the final Lambda code. The Code Implements What We Have Done.
- Copies folders/file from S3 to EFS.
- Zips stored files in EFS
- Uploads the zip file back to S3
- Staying the Temporary EFS file
I hope the code decommissives is self-defeating. Just one thing to mention is that we used an open source Node.js package called archiver for zipping folders and files. There are many ways that you can zip files in Node.js. YOU CAN CHOOSE ANYTHING SUITS YOU THE BEST.
Saying there should be try/catch Blocks to deal with error cases. But here we just omit them for simplicity.
results
Let's go check our S3 bucket.
As you can see there is a new zip file, called my-archive.zip. Let's click the file name and download and unzip the file.
Folder and file structure is exactly the same as the one at the top of this article.
We had many steps to follow to achieve this simple requirement, zipping folders and files in S3, but they are pretty standard when you have to deal with AWS.
- Create and launch AWS service
- Give appropriate permissions
- Execute the logic
It took a while for me to get used to it! :)
Thanks for reading.