AWS Knowledge Series: Serverless Image Processing With S3 and AWS Lambda

10 min readDec 10, 2020

In this article we will see how you can build a serverless image processor using S3 and Lambda. Lets see the use case where serverless image processing is required. Assume you are building a photo album application where people can upload their photos and then access the uploaded photos on web or using a mobile application.

Such use case very often require that once the original image is uploaded to the server, it needs to be processed further for following use cases:

Resizing images to make them suitable for display across multiple form factors (Web v/s Mobile)
Adding watermarks
Image meta data extraction — Geo location, Camera, Exposure info etc.
Face / object detection
Intelligent cropping
Nudity / Graphic violence detection and reporting

The diagram below depicts a very simplistic implementation of the online photo album use case described above.

This is the complete data flow:

The user uploads photos using web or mobile application and it is stored in a S3 bucket.
The S3 bucket has a “object create” trigger configured which invokes a Lambda function whenever a new image is added to the S3 bucket.
The Lambda function performs the required post processing of the uploaded image.
After post processing the image, Lambda function stores the processed image in another S3 bucket (we will talk about why we need second S3 bucket in a while)
An image distribution cloudfront then serves the processed images to the web and mobile application.

This article focuses on the data flow and processing that happens in the highlighted red dotted box above. We will implement a simple “resize” post processing use case using the Sharp NodeJS library.

Using Sharp NodeJS Module

Sharp is one of the most widely used image processing NodeJs module. In order to use the Sharp NodeJS module with our image processing lambda function, we have to make it available to the NodeJS runtime executing our Lambda function. There are two ways of doing this — One is to package the sharp module along with the Lambda function and second is to create a Lambda layer and then attach the Lambda layer to our Lambda function. The advantage of using Lambda layer is that it keep the size of the Lambda code package small and also allows us to use the packaged modules across multiple Lambda functions (Think of it as DLL for serverless!).

Open a command prompt and create a folder structure that resembles the following:

lambda — This folder will contain the Lambda function code
layer / nodejs — This folder will contain the Sharp module
terraform — This folder will contain the terraform deployment scripts
test — This folder will contain the test script to test our solution

So lets create a Lambda layer that contains Sharp NodeJS module. First download sharp module using npm. Open a command prompt on your Mac, navigate to the “layer” folder and issue the following command:

npm install --arch=x64 --platform=linux --prefix nodejs sharp

Not specifying the arch and platform arguments will download the darwin-x64 compatible Sharp module and hence it will not work on NodeJS Lambda runtime. Next step is to create a Zip file containing the downloaded sharp module. On the command prompt execute the following command:

zip -r9 sharplayer ./nodejs

This will create a file names sharplayer.zip in the layer folder. Upload the zip file to an S3 bucket. We will then use terraform to create the layer resource from the Zip file as shown below.

# Layer resource - sharp nodejs module
resource "aws_lambda_layer_version" "sharplib_lambda_layer" {  # The S3 bucket where the layer ZIP file was uploaded
  s3_bucket  = "sdsip-code-bucket"
  s3_key     = "sharplayer.zip"
  layer_name = "sharplib_lambda_layer"  source_code_hash = filebase64sha256("../layer/sharplayer.zip")  compatible_runtimes = ["nodejs12.x"]
}

Create S3 Buckets

Next step is to create the S3 buckets where the original and processed images are stored. Open AWS console and navigate to the S3 service. Click the “Create Bucket” button to open the following form:

Specify the name of the bucket, uncheck the “Block all public access” checkbox, acknowledge the warning that “Block all public access” is not recommended (we will do it for this sample) and hit Create bucket to create a new bucket. As shown in the diagram above we will create two buckets — so repeat the process and create the second bucket where processed images will be stored. For this sample, I created two buckets as follows:

sdsip-in-image-bucket — This will store the original images

sdsip-out-image-bucket — This will store the processed resized images.

Create Lambda Execution Role

In order to access the image objects in a S3 bucket, the Lambda function needs to be given required permissions. For this we need to author a policy and attach it to the Lambda execution role.

The policy document is as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "logs:CreateLogStream",
            "logs:CreateLogGroup",
            "logs:PutLogEvents"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
          "s3:PutObject",
          "s3:GetObject",
          "s3:PutObjectAcl"
        ],
        "Resource": [
            "arn:aws:s3:::sdsip-in-image-bucket/*",
            "arn:aws:s3:::sdsip-out-image-bucket/*"
        ]
    }
  ]
}

The above policy gives get / put object as well as put object ACL permissions. Next the following terraform script will create the required policy, role and role policy mappings.

# Role assigned to Lambda function
resource "aws_iam_role" "SDSIPLambdaRole" {
  name               = "SDSIPLambdaRole"
  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}# Policy assigned to Lambda function role
data "template_file" "policy" {
  template = "${file("${path.module}/policy.json")}"
}resource "aws_iam_policy" "SDSIPLambdaPolicy" {
  name        = "SDSIPLambdaPolicy"
  path        = "/"
  description = "IAM policy for lambda functions"
  policy      = data.template_file.policy.rendered
}# Assigning policy to role
resource "aws_iam_role_policy_attachment" "SDSIPRolePolicy" {
  role       = aws_iam_role.SDSIPLambdaRole.name
  policy_arn = aws_iam_policy.SDSIPLambdaPolicy.arn
}

Create Lambda Function

In the lambda folder create a file named resizeimage.js and paste the following code.

const sharp = require('sharp');
const path = require('path');var AWS = require('aws-sdk');// Set the REGION
AWS.config.update({
    region: "ap-south-1"
});var s3 = new AWS.S3();// This Lambda function is attached to an S3 bucket. When any object is added in the S3
// busker this handler will be called. This Lambda function "assumes" that only image files
// are added in the S3 bucket. When an image file is added in the S3 bucket, this function
// creates a square thumbnail of 300px x 300px size and it also creates a cover photo of
// 800px x 800px size. It then stores the thumbnail and coverphotos back to the same S3 bucket
// at the same location as the original image file.
exports.handler = async (event, context, callback) => {// Print the event that we got
    console.log(JSON.stringify(event));var records = event.Records;// How many records do we have? Each record represent one object in S3
    var size = records.length;// Iterate over all the records for which this handler was celled
    for (var index = 0; index < size; index++) {var record = records[index];console.log(record);// Extract the file name, path and extension
        var fileName = path.parse(record.s3.object.key).name;
        var filePath = path.parse(record.s3.object.key).dir;
        var fileExt = path.parse(record.s3.object.key).ext;// Log file name, path and extension
        console.log("filePath:" + filePath + ", fileName:" + fileName + ", fileExt:" + fileExt);// Read the image object that was added to the S3 bucket
        var imageObjectParam = {
            Bucket: record.s3.bucket.name,
            Key: record.s3.object.key
        }console.log(JSON.stringify(imageObjectParam));var imageObject = await s3.getObject(imageObjectParam).promise();// Use sharp to create a 300px x 300px thumbnail
        // withMetadata() keeps the header info so rendering engine can read
        // orientation properly.
        var resizePromise1 = sharp(imageObject.Body)
                            .resize({
                                width: 300,
                                height: 300,
                                fit: sharp.fit.cover
                            })
                            .withMetadata()
                            .toBuffer();// Use sharp to create a 800px x 800px coverphoto
        var resizePromise2 = sharp(imageObject.Body)
                            .resize({
                                width: 800,
                                height: 800,
                                fit: sharp.fit.cover
                            })
                            .withMetadata()
                            .toBuffer();var promises = [];
        promises.push(resizePromise1)
        promises.push(resizePromise2)// Execute the resize operation - Promises offer a good way of
        // doing "parallel" work.
        var result = await Promise.all(promises);console.log("result:" + JSON.stringify(result));// Now save the thumbnail object back to the same s3 file
        // We give public read permission so that client apps can read
        // it easily.
        //
        // CAUTION - When writing a "s3 create object" handler make sure that you are
        //           not doing "put object" in the same bucket in the handler. As best practice, 
        //           always do "put object" in a seperate s3 bucket. Doing put object in the same
        //           bucket has a very high possibility of ending up in an infinite loop
        //           which will not only cost you but will end up creating thousands of
        //           objects in a matter of seconds. So remember - ALWAYS do put object in
        //           a different bucket.
        //
        //           In this example, we are writing the original object in sdsip-in-image-bucket.
        //           The processed images are written to sdsip-out-image-bucket.
        //
        var putObjectParam1 = {
            Body: result[0],
            Bucket: "sdsip-out-image-bucket",
            Key: fileName + "_thumbnail" + fileExt,
            ACL: "public-read",
            CacheControl: "max-age=3600",
            ContentType: "image/" + fileExt.substring(1)
        }console.log("putObjectParam1:" + JSON.stringify(putObjectParam1));var putResult1 = await s3.putObject(putObjectParam1).promise();console.log("putResult1:" + JSON.stringify(putResult1));// Now save the coverphoto object back to the same s3 file
        // We give public read permission so that client apps can read
        // it easily.
        var putObjectParam2 = {
            Body: result[1],
            Bucket: "sdsip-out-image-bucket",
            Key: fileName + "_coverphoto" + fileExt,
            ACL: "public-read",
            CacheControl: "max-age=3600",
            ContentType: "image/" + fileExt.substring(1)
        }console.log("putObjectParam2:" + JSON.stringify(putObjectParam2));var putResult2 = await s3.putObject(putObjectParam2).promise();console.log("putResult2:" + JSON.stringify(putResult2));
    }}

Execute the following command to create a zip file of the Lambda code.

zip ./sdsip_lambda.zip *.js

Upload the zip file to an S3 bucket and we can the use the following terraform script to deploy the Lambda function.

# Lambda function resource
resource "aws_lambda_function" "ServerlessImageHandler" {  function_name = "ServerlessImageHandler"  # The bucket where the lambda zip file is uploaded.
  s3_bucket = "sdsip-code-bucket"
  s3_key    = "sdsip_lambda.zip"  handler = "resizeimage.handler"
  runtime = "nodejs12.x"  source_code_hash = filebase64sha256("../lambda/sdsip_lambda.zip")  layers = ["${aws_lambda_layer_version.sharplib_lambda_layer.arn}"]  # The IAm role that gives permission to get / put objects
  # from S3 buckets
  role = aws_iam_role.SDSIPLambdaRole.arn  # Timeout for this lambda function is 5 m
  timeout     = "300"
  # The max memory is specified as 2 GB - If you are processing
  # bigger images this may need to be increased.
  memory_size = "2048"}

Attaching Lambda to S3 Events

Now that we have created and deployed the Lambda function, the next step is to configure the Lambda to process the create object notifications. To do this navigate to the S3 bucket you created for ingesting original images. Click the properties tab and navigate to the Event notifications section.

Click the Create Event Notification button to bring up this form:

Give an appropriate name for the event name field. Check the “All object create events” checkbox. Navigate down to the destination section to specify the Lambda function as shown below:

Save the changes. With this configuration — when any object is added to the S3 bucket, the “ServerlessImageHandler” lambda function will be invoked.

Test Serverless Image Processor

Now that everything is setup properly, we can test if it is working as expected or not. Upload an image file to the S3 bucket meant for storing original images (sdsip-in-image-bucket in my case). After a few seconds check the target S3 bucket meant to store processed images (sdsip-out-image-bucket in my case). You will see two files — One will be a 300px by 300px thumbnail and another will be a 800px by 800px cover photo. So as you can see it is very easy to setup image processing using serverless technologies.

Why do we need two buckets?

One question you may have is why do we need two buckets ? Can we not work with just one bucket? Well, the short answer — It is a recommended best practice to avoid infinite looping of Lambda execution. Consider the following scenario of using just one bucket:

Image is added to the S3 bucket.
The object created notification Lambda handler function is invoked.
The Lambda function processes the image and save it back into the same S3 bucket.
The saving of object in step 3 above again invokes the object created notification Lambda handler function and the loop continues!

So one way to use is to follow some kind of naming conventions to prevent this infinite looping. For example, in the code we implemented the processed images have names ending in “<orig img name>_thumbnail.<extn>” / “<orig img name>_coverphoto.<extn>”. So we can now put in a condition that if the file name contains “_thumbnail” or “_coverphoto” then we do not process it. So even if an object create notification Lambda handler function is invoked — it will be a NO-OP. However this has to be thoroughly tested because if the termination condition has any bug — Within a few seconds, you will end with thousands of objects in your S3 bucket along with a huge AWS bill!

So in order to ensure that you never end-up in such a situation a safe and “free” way to get around the issue is to use two separate buckets.

If you do get into this infinite loop situation — what is the way out? Well there are a couple of things you can do:

Modify the S3 bucket event notification configuration and remove the handler function.
Modify your handler function code by commenting the “s3.putObject” calls and deploy the function again.

I hope this was useful. Full code with terraform deployment script for this is available on github:

anthedan/serverless_image_processor

A simple serverless image processor using sharp nodejs library GitHub is home to over 50 million developers working…

github.com