AWS Knowledge Series: DynamoDB — Part 2

Sanjay Dandekar

6 min readSep 6, 2020

Please read the first article in the series here:

AWS Knowledge Series: DynamoDB — Part 1

DynamoDB is fully managed NoSQL database and one of the key components of modern server less design on AWS. It is…

medium.com

The code samples this article uses the DynamoDB table we designed in the above article.

In this article we will look at the following aspects of DynamoDB:

Exponential backoff when doing batch operation
Updating attributes of a record
Atomic update of numeric attribute of record
Accessing DynamoDB service from a Lambda hosted in VPC

Batch Operation and Exponential Backoff

One of the most efficient ways to fetch / update / create / delete multiple items from DynamoDB is to use the batch APIs it offers. Using batch API prevents multiple round trips and improves the latency of the service / operation you are trying to perform. DynamoDB offers two APIs to perform batch read / write / update / delete operations:

Batch Get Item — This API can be use to get upto 100 items in a single call (total size of returned data must not exceed 16 MB).
Batch Write Item — This API can be used to update / create / delete upto 25 items in a single call (total size of data written must not be more than 16 MB and each individual item being written must not be more than 400 KB).

If you execute a batch get request that tries to fetch more than 100 items or execute a batch write request that tries to update / create / delete more than 25 items, it will fail immediately with validation exception error. One important aspect to keep in mind is that even if you request 100 items — It is not guaranteed that the API will return all 100 items. Same with write — If you request update / create / delete of 25 items it ie not guaranteed that the API will update / create / delete all 25 items. There could be multiple reasons for not processing all the items in a batch request and we will not go into why but will talk about how you can implement the batch operation so that all items are processed!

When batch API does not process all the items it was asked to process — the API will return an attribute called “unprocessed items” containing the list of keys that it did not process. The expectation from the developers is that the batch operation will be retried with these returned “unprocessed keys” till the API finishes processing all items. If you look at the documentation of the above APIs (links below) you will see the following statement:

“If DynamoDB returns any unprocessed items, you should retry the batch operation on those items. However, we strongly recommend that you use an exponential backoff algorithm. If you retry the batch operation immediately, the underlying read or write requests can still fail due to throttling on the individual tables. If you delay the batch operation using exponential backoff, the individual requests in the batch are much more likely to succeed.”

So how does one implement exponential backoff? The following code shows how it can be done in NodeJS (Use case: Fetch 100 customer profiles from ECOMMERCEDB DynamoDB table):

As you can see — when the API returns unprocessed keys, we introduce an exponential backoff and keep calling the batch get API till it stops returning the unprocessed keys indicating that all the keys have been processed.

Batch Get Item API Documentation:

Batch Write Item API Documentation:

Updating attributes of a record

As mentioned previously, DynamoDB is schema less and hence one can add arbitrary number of attributes to a given record. This is what allows the your data model entities to morph or evolve over time allowing you to add richness to your data model without much headache.

Here is how you can use the update item API to update specific attributes of an item. Here I am showing how to update name and email address of a customer record.

Atomic update of numeric attribute of record

With the data model we developed in the first part of this article, let us say there is now a need to track number of orders for a given customer. Given that DynamoDB is schema less — Adding a property called “NO_OF_ORDERS” for the customer row is just about passing value for this attribute while creating / updating the customer item. This is where the atomic counters feature of DynamoDB is very useful. You want to ensure that if there are multiple concurrent updates to this column, we have to ensure that they are serialised and we get a very ACID like transactional update for this property without the headache of starting / committing transactions.

The update item API that we saw in the topic above can be used to atomically update the “NO_OF_ORDERS” attribute of the customer item. The following code demonstrates how this can be done.

Let me explain what is happening in the following statement above:

set NO_OF_ORDERS = if_not_exists(NO_OF_ORDERS, :zero) + :val

As indicated before, the NO_OF_ORDERS attribute was added at a date later than creation of the ECOMMERCEDB DynamoDB table. This means that you may have customer records where this attribute may not exist. So the “if_not_exists” function above provides a value to be used in case the attribute NO_OF_ORDERS does not exist for the given record. When you implement the updation of the NO_OF_ORDERS attribute as shown above — DynamoDB grantees that all write requests are applied in the order in which they are received. So even if there are concurrent modifications happening for a given customer, they will be serially applied without overwriting each other.

Accessing DynamoDB service from a Lambda hosted in VPC

In most production setup, you will have a situation where you want to a REST API that needs to access resources that are hosted in a VPC and also DynamoDB (or any other such service that is not hosted / available in VPC). Take a look at the example below.

The client application is accessing a REST API via the API gateway. The API gateway is configured to call a Lambda function. The Lambda function wants to access an Aurora Serverless DB hosted in a VPC. This means that the Lambda also needs to be configured to run on the same VPC otherwise it will not be able to access the Aurora Serverless DB hosted in the VPC. The Lambda function also needs access to DynamoDB. With the way DynamoDB AWS service is configured, it can not be accessed from within the VPC. So in order for Lambda to access DynamoDB, there needs to be a route that will allow HTTP traffic from the VPC to DynamoDB. With the default VPC configuration, it is not possible for the Lambda function to access DynamoDB. In order to allow Lambda function to access DynamoDB service, we have configure an Endpoint that allows HTTP traffic from VPC to the DynamoDB service. So the configuration looks like the following:

Following steps describes how you can configure an endpoint to access DynamoDB from a VPC:

Login to AWS Console and navigate to the VPC service (ensure that you are in the region)
From the left side menu select “Endpoints” and click “Create Endpoint” to create a new end point.
Select “AWS Services” in the Service category radio group
In the “Service Name”, search for “DynamoDB” and hit enter to search. Select the DynamoDB service that is listed.
From the VPC dropdown, select the VPC from where you want to access DynamoDB
Click “Create endpoint” to create the end point.

Once the end point is created, you will be able to access the DynamoDB service from the Lambda function.

AWS Knowledge Series: DynamoDB — Part 2

AWS Knowledge Series: DynamoDB — Part 1

DynamoDB is fully managed NoSQL database and one of the key components of modern server less design on AWS. It is…

Written by Sanjay Dandekar