DynamoDB errors: handle with care! (pt 3)

Retries, idempotency, exactly-once processing and your data integrity.

You made it to part 3! This is the last part of the blog - I promise. But it is probably the one I'm most excited about - consider it the crescendo, if you please. In 2018, DynamoDB launched support for distributed transactions. These allow you to extend DynamoDB's atomicity and isolation to multi-item actions (items can be in any tables in the same account and region) on an all-or-nothing basis. This is incredibly powerful stuff, opening up the capability to ensure data integrity across a range of new scenarios.

Transaction basics

There's a lot of great information already out there that can give you a foundational understanding of how to put DynamoDB transactions to work - one of the best is Alex DeBrie's page here: https://www.alexdebrie.com/posts/dynamodb-transactions/

I'll give you what you need to know only so far as understanding the rest of this article. When a set of DynamoDB transaction actions are combined into that atomic unit for all-or-nothing handling, what DynamoDB does under the covers is essentially a 2-phase commit for writes, or for reads just retrieving the item twice and verifying that the item images are unchanged from one result to the next. It makes sense, then, that each read or write is metered twice. It is the price to be paid for multi-item guarantees in a distributed database.

Photo by Rachael Gorjestani on Unsplash

Knowing that DynamoDB developers would use transactions for things like numeric transfers from one item to another (a bank account transfer for example), the DynamoDB team were concerned about these not being idempotent, and the potential for repeated processing for a transaction that's intended to occur only once.

The magic token ride

To address the problem of repeated application of transactions involving non-idempotent actions, the DynamoDB team added a "client token" to the TransactWriteItems API. This parameter accepts a value which is intended to uniquely identify the intent of a transactional write. If the transaction succeeds, the token is stored by DynamoDB for approximately 10 minutes. If the same transaction request is seen again (with same token) within that 10 minute window, the transactional writes are not applied again, but the client will receive a success response as though it had. The idea is to allow for the proper retry operation of DynamoDB clients using transactions, but not allow repeat application of the same transactional writes. If you're using TransactWriteItems via one of the standard SDKs, you're getting this benefit - even if you weren't aware of it. What's really nice about this is that it does not consume any write units - the cost of managing this client token in a retry situation where the transaction has already succeeded on a prior attempt is metered as just one read for each item in the transaction - much cheaper than the writes.

What if you want to extend upon this to cover retries further up your stack? Perhaps in a step function or processing requests from a queue? Just take a unique identifier for each change intent at the source, and supply it as the client token for all associated calls to TransactWriteItems. Now your higher level retries are also covered! But only for that 10 minute window of course. Unfortunately, if you want this functionality for a single item change, the only way to get it is to wrap that one action in a transaction and pay double (for no reason - the two phases are meaningless as a single item change is fundamentally atomic). Yep - it's a rip-off folks (as is the metering of conditional write failures - but I digress). This client token support really should be extended to UpdateItem, and possibly PutItem and DeleteItem too.

Photo by Nick Fewings on Unsplash

You say you want more?

Okay - what if 10 minutes is not long enough? What if there is batch processing involved in addition to online transactions? And if a batch fails you want to be able to just replay and not make a huge mess? You need to persist those unique transaction intent identifiers longer. The answer is to persist them as their own unique items in a separate table that tracks applied transaction intents - use the identifier as the (simple) primary key, and then make their non-existence a condition check when applying any transaction. How long should you store those applied transaction identifiers? Well, that depends how long you think you need to protect against duplicate application of the same intent. You can use TTL on those records to expire them and keep things efficient - maybe a month is suitable for you... or a year?

For a complete example of this and a very simple playground where you can experiment and learn about client tokens and transactions, take a look at the simple online bank demo I've created (using only the AWS CLI):

https://github.com/pete-naylor/ddb_tx-basic-demo

You can use the long-term token storage to protect for longer periods, but still benefit from use of the client token as a means to achieve the same with better efficiency in the short term.

Okay, now I'm really done

Hopefully this blog has made you aware of the importance of handling errors appropriately when making writes to DynamoDB - and an understanding of why the default retry behavior in the SDKs could both help and hurt you. You can harness some of the powerful functionality in DynamoDB to use retries to your advantage without accepting risk to the integrity of your data. DynamoDB's strong leader-based replication makes it different from many other non-relational databases you may hear about - you don't have to accept eventual consistency or divergence, you can easily enforce constraints, and you can make atomic all-or-nothing changes across multiple items based on conditions you choose. If you build something cool using transactions, please drop me a message to tell me about it!