Zipping Large Size S3 Folders and Files Using Node.js Lambda And EFS

2022-11-03

‍

Zipping Large Size S3 Folders and Files Using Node.js Lambda And EFS

‍

By Hyuntaek Park

Senior full-stack engineer at Twigfarm

‍

AWS S3 is a very convenient cloud storage. You can upload and download files easily in various ways with AWS CLI, SDK, API, etc. But can you download an entire folder and its sub-folders and files recursively? Unfortunately, S3 does not provide such features. We need to develop our own way to recursively zip the folder and make the zip file available for download.

‍

Requirements

Our goal is to zip entire folders, sub-folders, and files under the folders in our S3 bucket while preserving folder structure tree. The files can be large (> 512 MB, which is the size of Lambda temporary storage).

‍

How files are treated in S3

We have created folders and uploaded files as following in S3 bucket.

However, to be concise, they are not folders in S3. There are just four files with the following keys:

folder1/sub1/image.png
folder1/sub2/test.txt
folder2/large.mov
folder2/test2.pdf

‍

Solution

Although S3 does not have a concept of folders, the key of each file has folder information as prefixes. Each folder level is delimited by ‘/’ and followed by the file name. (i.e., folder1/sub1/image.png)

Using the key that has folder information prefix, we can create folders in EFS and then download the file from S3.

Then Lambda simply does the zipping and upload the zip file back to S3. Following diagram shows the sequence of our implementation and how files are represented differently in S3 and EFS.

One thing to keep in mind is that our Lambda and the EFS must be in the same VPC.

‍

Create EFS (Elastic File System) and access point

There are a couple of reasons why Amazon EFS comes in handy.

EFS is just like Linux file system. You can use file commands such as mkdir, ls, cp, rm, etc.
We could use Lambda’s temporary storage has size limit: < 512MB

Let’s create an EFS. Go to Elastic File System in AWS console and click Create file system.

Then click Create.

Now it is time for creating an access point which to be used in Lambda function later. Choose the file system we just created. Then click Access points –> Create access point.

Here’s the input values you should enter:

Root directory path: /efs
POSIX user
User ID: 1000
Group ID: 1000
Root directory creation permissions
Owner user ID: 1000
Owner group ID: 1000
POSIX permissions to apply to the root directory path: 0777

‍

Create and configure EFS attached Lambda function

Let’s create a Node.js Lambda function as following:

Once the Lambda function is created, click Configuration –> File systems –> Add file system.

Choose the EFS access point that we have just created. And put /mnt/efs for Local mount path. This important because /mnt/efs will be your EFS folder.

Click Save, now you have access to /mnt/efs from the Lambda function.

‍

Access to S3 from Lambda

VPC Endpoints

According to https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html,

A VPC endpoint enables connections between a virtual private cloud (VPC) and supported services, without requiring that you use an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

To access to S3 buckets from Lambdas inside a VPC, we need to set up a VPC endpoint for S3. Go to VPC and click Endpoints –> Create endpoint. Then select input as following:

Then click Create endpoint. Technically the Lambdas within the VPC can access to S3 now but one more step is required to really access to a specific S3 bucket.

‍

Lambda role

A Lambda role is created while creating our Lambda function. You can use the existing role for the Lambda. Here we just created a new role. Go to our Lambda function then click Configuration –> Permissions. Then choose the role under Execution role.

Then go to Permissions policies –> click Add permissions –> Create inline policy. On the next screen, choose JSON tab. Then copy and paste following. Replace YOUR_BUCKET_NAME with your own bucket name.

Click Review policy. Enter the policy name you like and then click Create policy.

‍

More Lambda configuration

Since downloading takes time and file sizes can be hundreds of megabytes, Lambda’s default memory size (128 MB) and timeout (3 seconds) are not enough. For this demonstration, memory size and timeout are set to 4096 MB and 2 minutes, respectively in Configuration –> General configuration.

‍

Lambda code

Here’s the final Lambda code. The code implements what we have discussed.

Copies folders / file from S3 to EFS.
Zips downloaded files in EFS
Uploads the zip file back to S3
Removes the temporary EFS file

I hope the code itself is self-explanatory. Just one thing to mention is that we used an open source Node.js package called archiver for zipping folders and files. There are many ways that you can zip files in Node.js. You can choose whatever suits you the best.

Obviously there should be try / catch blocks to deal with error cases. But here we just omit them for simplicity.

‍

Results

Let’s go check our S3 bucket.

As you can see there is a new zip file, called my-archive.zip. Let’s click the file name and download and unzip the file.

Folder and file structure is exactly the same as the one at the top of this article.

We had many steps to follow to achieve this simple requirement, zipping folders and files in S3, but they are pretty standard when you have to deal with AWS.

Create and launch AWS service
Give appropriate permissions
Execute the logic

It took a while for me to get used to it! :)

‍

Thanks for reading.

‍

✏️콘텐츠 번역&현지화, 한 곳에서 해결하세요.

• 영상번역 툴 무료 체험하기
• 월간 소식지로 더 많은 이야기 읽어보기 💌

전체 목록 보기

다음 노트 살펴보기

WORKS note

페르소나 챗봇의 모든 것: 기술, 활용, 그리고 LETR WORKS의 접근법

2024-12-16

LETR note

콘텐츠 제작의 패러다임 혁신 - AI 더빙 기술의 현재와 미래

2024-12-12

WORKS note

NER과 NLP을 활용한 레터웍스의 AI AGENT

2024-12-11