Skip to content

Dynamodb

  • Dynamodb is a fast and flexible NoSQL
  • Designed for apps that need single-digit millisecond latency at any scale
  • Full managed databse that supports both document and key-value data models
  • Has flexible data model and reliable performance
  • Great fit for mobile, web, gaming, ad-tech, IoT applicaitons
  • Dynamodb is serverless and scalable
  • Backed by SSD Storage
  • Spread accross 3 geographically distinc data centers
  • Choice of 2 consitency models:
  • Eventual consistent Reads (Default)
    • Consistency across all copies of data that is usually reached within a second
  • Strongly consistent Reads
    • Returns a result that reflects all writes that received a successful response prior to the read
  • Dynamodb make up:
  • Tables
  • Items (think row in table)
  • Attribute (Think Column of table)
  • Key (identifier for data)
  • Value (actual data)
  • Documents can be written in JSON, HTML, XML

Dynamodb Primary Keys

  • Dynamodb stores and retrieves data based on Primary Key
  • Two types of primary keys
  • Partition Key
    • Unique attribute (example user ID)
    • Value of partition key is input to internal hash function which determines the partion or physical location on which the data is stored
    • If using partition key as your primary key, then no two items can have the same partition key
  • Composite Key (Partition Key + Sort Key) in combination. Same user posting mutiple times in forum
    • Primary key would be COmposite Key consisting of:
    • Partition Key - User ID
    • Sort key - Timestamp of the post
    • 2 items may have the same partition key but they must have a different Sort key
    • All items with same Partition Key are stored together, then sorted according to the sort key value
    • Allows you to store multiple items with the same Partition Key

Dynamodb Access Control

  • Authentication and Access Control managed using AWS IAM
  • Create an IAM user within your AWS accoun which has specific permissions to access and create DynamoDB tables
  • Create IAM role which can enable you to obtain temporary access keys
  • Can create special IAM condition to restrict user access to only their records.

Exam Tips

  • Low latency
  • Supports both document and key value data models
  • JSON, HTML, XML
  • 2 types of primary key
  • Partion key
  • Composite Key (Partition + Sort Keys)
  • Consistency models
  • Strongly Consitent
  • Eventually Consistent

Indexies

  • 2 types of indexes supported to help speed up queries
  • Local Secondary Index
    • Can only be created when creating table
    • Cannot be added or modified later
    • Has same partition key as table
    • Has different sort key than table
    • ANy queries based on this sort Key are much faster than using the index than the main table
  • Global Secondary Index
    • Much more flexible
    • Create index whenever you like
    • Choose different partion key and sort key to original table
    • Speeds up any queries relating to this alternative partion and sort key

Indexies Exam Tips

  • Indexies enable faster queries on data
  • Gives different view of your data based on alternative Partion / Sort keys
  • Differences in 2 indexies
  • Local created at table creation. Same partion different sort
  • Global secondary created whenever and has different partion and different sort keys

Dynamodb Scan vs Query API Calls

Query

  • A query is operation that finds items in your table based on primary key attribute and distinct value to search for
  • If looking for user with ID of 212 then you would query based off primary user ID and value 212
  • This will select all attributes for document
  • Can use optional sort key name and value to refine results
  • By default query returns all attributes for item
  • ProjectionExpression parameter will allow you to specify what attributes should be return
  • Results always sorted by sort key
  • Numeric order by default in ascending order (1,2,3,4)
  • Can reverse order by setting ScanIndexForward (Tricky because this is for Query not SCAN!) paramerte to false
  • By default Queries are eventually consistent
  • Can be explicitly set for Stongly consistent

Scan

  • A scan operation examines ever item in the table
  • By default returns all data attributes
  • Can also use ProjectionExpression parameter to set what attributes to return
  • Can filter results of scan after they have been run

Query Or Scan

  • Query is more efficient than Scan
  • Scan dumps the entire table then filters out the values to provide
  • Scans take longer the larger the table gets
  • Scan operation on a large table can use up the provisioned throughput for a table in just one single operation
  • You can reduce impact of query or scan by setting smaller page size
  • could set page size to 40
    • larger number of operations with smaller size each
  • Avoid using scan operations if you can
  • design tables in a way that you can use the query, GET, or BatchGetItem APIs
  • By default scan operation processes data sequentially then returns in 1 MB incements
  • only scans one partition at a time
  • Can configure scan to run in parrallel if you divide a table or index into segments

Scan and Query exam tips

  • Query operation finds items in table using only the primary key attribute
  • You provide the primary key name and distinct value to search
  • Scan operation examines every item in the table
  • Both return all data attributes by default
  • Use ProjectionExpression parameter to refine results
  • Query results alwasy sorted by Sort key
  • Sorted in ascending order
  • Set ScanIndexForward parameter to false to reverser the order (QUERIES ONLY)
  • Query operation is generally more efficient than scan
  • Reduce page size to make scan more efficient
  • Can make parrallel scan
  • Design tables to use query, GET, or BatchGetItem APIs

DynamoDb Provisioned Throughput

  • Throughput measured in capacity Units
  • Two Types of Capacity Units
  • Write Capacity Unit
    • 1 x Write Capacity Unit = 1 x 1KB Write per second
  • Read Capacity Unit
    • 1 x Read Capacity Unit = 1 x Strongly Consistent Read of 4KB per second OR 2 x Eventually Consistent Reads of 4KB per second (DEFAULT)
  • If application reads or writes large objects it will cost more

Provisioned Throughput Exam Tips

  • Provisioned throughput is measured in capacity units
  • 1 x Write Capacity = 1 x 1KB Write per second
  • 1 x Read Capacity Unit = 1 x Strongly Consistent Read of 4KB per second OR 2 x Eventually Consistent Reads of 4KB per second (DEFAULT)
  • When Calculating take number of reads or writes a second multiply that by the size per operation / the per second of operation rounded up.

DynamoDB OnDemand Capacity

  • New Pricing option
  • Chareges apply for Reading, Writing, and Storing data
  • Do not need to specify Read and Write capacity
  • Will scale up and down based on read and writes to your database
  • Ideal for unpredictable workloads
  • Allows you to pay for only what you use (pay per request)

DynamoDB Accelerator (DAX)

  • Fully managed clustered in-memory cache for DynamoDB
  • ONLY FOR READ operations
  • Delivers up to 10x read performance
  • Microsecond performance for millions of requests per second
  • Ideal for Read-Heavy and bursty workloads

  • How does it work

  • DAX is write-through caching service
    • When data is written to dynamodb it is also written to DAX
    • Point Dynamodb API Calls to DAX cluster
    • If item is in DAX it will be returned from DAX
    • If item does not exist then DAX will retrieve item from DynamoDB and write it in cache for further requests
  • Allows ability to reduce Provisioned Read Capacity

  • Not GOOD for

  • Caters ONLY for Eventually consitent reads
    • CANNOT do Strongly Consitent
  • Write intensive
  • Applications that do not perform many read operations
  • Applications that dont require microsecond response times

Elasticache with DynamoDB

  • In memory caching that sits infront of many RDS databases
  • This can also sit infront of Dynamodb
  • Sits between application and database
  • Takes load off database
  • good if your database is particularly read-heavy and the data is not changing frequently
  • Supports Memcached and Redis
  • 2 Stategies available
  • Lazy Loading
    • Loads data into cache only when necessary
    • if requested data is in cache Elasticache returns data to application
    • If not in cache or has expired Elasticache returns a null
    • Application then fetches data from the database and writes the data recieved into cache so that is available next time
    • Advantages:
    • Only requested data cached
    • Node failures are not fatal
    • Disadvantages:
    • Cache miss penalty (initial request query to database and write data after done)
    • Stale data - if data is only updated when cache is updated it can become out of date and does not automatically update
    • Time To Live (TTL)
    • How to deal with stale data
    • Sets a number of seconds until data expires
    • Hits on expired data will be treated as miss
  • Write-Through
    • adds or updates data to cache whenever data is written to database
    • Advantages:
    • Data is never stale
    • Users are generally more tolerant of additional latency when updating data than when retrieving it
    • Disadvantages
    • Does invoke write penalty for having to write twice
    • If node fails and new one is spun up data is missing until added or updated
      • This can be mitigated by implementing lazy loading in conjucture with write-through)
    • Wasted resources if most of data is never read

DAX vs Elasticache

  • DAX optimized just for DynamoDB
  • DAX ONLY supports Write-Through
  • If you need lazy loading you have to use Elasticache

DynamoDB Transactions

  • DynamoDB Transactions were designed for mission critical operations
  • Transactions
  • ACID Transactions (Atomic, Consitent, Isolated, Durable)
  • Read or write multiple items across multiple tables as an all or nothing operation
  • Check for a pre-requisite condition before writing to a table
  • Implement complex business logic into single atomic transaction

DynamoDB TTL

  • DynamoDB Time To Live (TTL)
  • Time To Live is attribute that defines an expiry time for your data
  • Expired items marked for deletion
  • If data is marked it will be deleted with in 48 hours
  • Good for removing irrelevant or old data
  • Session data
  • Event Logs
  • Temporary data
  • Reduce cost by removing data no longer relevant
  • TTL is expressed as epoch/unix time
  • numeric value represents the number of seconds that have elapsed since 12am January 1 1970
  • When current time is greater than TTL the item will become expired and marked for deletion
  • You can filter out expired items from your queries and scans
  • This is useful because deletes can take 48 hours

DynamoDB Streams

  • Streams are Time-ordered sequence of itme level modifications (insert, update, delete)
  • Logs are encrypted at rest and stored for 24 hours
  • Accessed using a dedicated endpoint
  • By default the Primary key is recorded
  • Before and After images can be captured
  • Used for triggers
  • Accessed through their own endpoints
  • Events are recorded in near real time
  • Really good for serverless and lambda
  • Application can take actions based on content
  • Lambda can pull stream and trigger events based on stream events

Provisioned Throughput Exceeded Exception

  • Will see this if your request read/write capacity provisioned is exceeded for DynamoDB table
  • SDK will autoretry until successful
  • If not using SDK
    • Reduce request frequency
    • Implement Exponetial backoff

Exponential Backoff

  • Components in network can generate errors due to being overloaded
  • Usally dealt with by implementing retries (which SDK does)
  • In addition to reties SDK also uses exponential Backoff
  • Progressively longer waits between consecutive retries (50 ms, 100 ms, 200ms)
  • if after 1 minute does not work, your request size may be exceeding the throughput for read/write capacity
  • Exponential backoff is used for more than Dynamodb. Every feature of AWS SDK.
  • Applies to many services in AWS
    • S3
    • CloudFormation
    • SES
  • If not using SDK will need to implement this yourself in the application
Back to top