Dynamodb
- Dynamodb is a fast and flexible NoSQL
- Designed for apps that need single-digit millisecond latency at any scale
- Full managed databse that supports both document and key-value data models
- Has flexible data model and reliable performance
- Great fit for mobile, web, gaming, ad-tech, IoT applicaitons
- Dynamodb is serverless and scalable
- Backed by SSD Storage
- Spread accross 3 geographically distinc data centers
- Choice of 2 consitency models:
- Eventual consistent Reads (Default)
- Consistency across all copies of data that is usually reached within a second
- Strongly consistent Reads
- Returns a result that reflects all writes that received a successful response prior to the read
- Dynamodb make up:
- Tables
- Items (think row in table)
- Attribute (Think Column of table)
- Key (identifier for data)
- Value (actual data)
- Documents can be written in JSON, HTML, XML
Dynamodb Primary Keys
- Dynamodb stores and retrieves data based on Primary Key
- Two types of primary keys
- Partition Key
- Unique attribute (example user ID)
- Value of partition key is input to internal hash function which determines the partion or physical location on which the data is stored
- If using partition key as your primary key, then no two items can have the same partition key
- Composite Key (Partition Key + Sort Key) in combination. Same user posting mutiple times in forum
- Primary key would be COmposite Key consisting of:
- Partition Key - User ID
- Sort key - Timestamp of the post
- 2 items may have the same partition key but they must have a different Sort key
- All items with same Partition Key are stored together, then sorted according to the sort key value
- Allows you to store multiple items with the same Partition Key
Dynamodb Access Control
- Authentication and Access Control managed using AWS IAM
- Create an IAM user within your AWS accoun which has specific permissions to access and create DynamoDB tables
- Create IAM role which can enable you to obtain temporary access keys
- Can create special IAM condition to restrict user access to only their records.
Exam Tips
- Low latency
- Supports both document and key value data models
- JSON, HTML, XML
- 2 types of primary key
- Partion key
- Composite Key (Partition + Sort Keys)
- Consistency models
- Strongly Consitent
- Eventually Consistent
Indexies
- 2 types of indexes supported to help speed up queries
- Local Secondary Index
- Can only be created when creating table
- Cannot be added or modified later
- Has same partition key as table
- Has different sort key than table
- ANy queries based on this sort Key are much faster than using the index than the main table
- Global Secondary Index
- Much more flexible
- Create index whenever you like
- Choose different partion key and sort key to original table
- Speeds up any queries relating to this alternative partion and sort key
Indexies Exam Tips
- Indexies enable faster queries on data
- Gives different view of your data based on alternative Partion / Sort keys
- Differences in 2 indexies
- Local created at table creation. Same partion different sort
- Global secondary created whenever and has different partion and different sort keys
Dynamodb Scan vs Query API Calls
Query
- A query is operation that finds items in your table based on primary key attribute and distinct value to search for
- If looking for user with ID of 212 then you would query based off primary user ID and value 212
- This will select all attributes for document
- Can use optional sort key name and value to refine results
- By default query returns all attributes for item
- ProjectionExpression parameter will allow you to specify what attributes should be return
- Results always sorted by sort key
- Numeric order by default in ascending order (1,2,3,4)
- Can reverse order by setting ScanIndexForward (Tricky because this is for Query not SCAN!) paramerte to false
- By default Queries are eventually consistent
- Can be explicitly set for Stongly consistent
Scan
- A scan operation examines ever item in the table
- By default returns all data attributes
- Can also use ProjectionExpression parameter to set what attributes to return
- Can filter results of scan after they have been run
Query Or Scan
- Query is more efficient than Scan
- Scan dumps the entire table then filters out the values to provide
- Scans take longer the larger the table gets
- Scan operation on a large table can use up the provisioned throughput for a table in just one single operation
- You can reduce impact of query or scan by setting smaller page size
- could set page size to 40
- larger number of operations with smaller size each
- Avoid using scan operations if you can
- design tables in a way that you can use the query, GET, or BatchGetItem APIs
- By default scan operation processes data sequentially then returns in 1 MB incements
- only scans one partition at a time
- Can configure scan to run in parrallel if you divide a table or index into segments
Scan and Query exam tips
- Query operation finds items in table using only the primary key attribute
- You provide the primary key name and distinct value to search
- Scan operation examines every item in the table
- Both return all data attributes by default
- Use ProjectionExpression parameter to refine results
- Query results alwasy sorted by Sort key
- Sorted in ascending order
- Set ScanIndexForward parameter to false to reverser the order (QUERIES ONLY)
- Query operation is generally more efficient than scan
- Reduce page size to make scan more efficient
- Can make parrallel scan
- Design tables to use query, GET, or BatchGetItem APIs
DynamoDb Provisioned Throughput
- Throughput measured in capacity Units
- Two Types of Capacity Units
- Write Capacity Unit
- 1 x Write Capacity Unit = 1 x 1KB Write per second
- Read Capacity Unit
- 1 x Read Capacity Unit = 1 x Strongly Consistent Read of 4KB per second OR 2 x Eventually Consistent Reads of 4KB per second (DEFAULT)
- If application reads or writes large objects it will cost more
Provisioned Throughput Exam Tips
- Provisioned throughput is measured in capacity units
- 1 x Write Capacity = 1 x 1KB Write per second
- 1 x Read Capacity Unit = 1 x Strongly Consistent Read of 4KB per second OR 2 x Eventually Consistent Reads of 4KB per second (DEFAULT)
- When Calculating take number of reads or writes a second multiply that by the size per operation / the per second of operation rounded up.
DynamoDB OnDemand Capacity
- New Pricing option
- Chareges apply for Reading, Writing, and Storing data
- Do not need to specify Read and Write capacity
- Will scale up and down based on read and writes to your database
- Ideal for unpredictable workloads
- Allows you to pay for only what you use (pay per request)
DynamoDB Accelerator (DAX)
- Fully managed clustered in-memory cache for DynamoDB
- ONLY FOR READ operations
- Delivers up to 10x read performance
- Microsecond performance for millions of requests per second
-
Ideal for Read-Heavy and bursty workloads
-
How does it work
- DAX is write-through caching service
- When data is written to dynamodb it is also written to DAX
- Point Dynamodb API Calls to DAX cluster
- If item is in DAX it will be returned from DAX
- If item does not exist then DAX will retrieve item from DynamoDB and write it in cache for further requests
-
Allows ability to reduce Provisioned Read Capacity
-
Not GOOD for
- Caters ONLY for Eventually consitent reads
- CANNOT do Strongly Consitent
- Write intensive
- Applications that do not perform many read operations
- Applications that dont require microsecond response times
Elasticache with DynamoDB
- In memory caching that sits infront of many RDS databases
- This can also sit infront of Dynamodb
- Sits between application and database
- Takes load off database
- good if your database is particularly read-heavy and the data is not changing frequently
- Supports Memcached and Redis
- 2 Stategies available
- Lazy Loading
- Loads data into cache only when necessary
- if requested data is in cache Elasticache returns data to application
- If not in cache or has expired Elasticache returns a null
- Application then fetches data from the database and writes the data recieved into cache so that is available next time
- Advantages:
- Only requested data cached
- Node failures are not fatal
- Disadvantages:
- Cache miss penalty (initial request query to database and write data after done)
- Stale data - if data is only updated when cache is updated it can become out of date and does not automatically update
- Time To Live (TTL)
- How to deal with stale data
- Sets a number of seconds until data expires
- Hits on expired data will be treated as miss
- Write-Through
- adds or updates data to cache whenever data is written to database
- Advantages:
- Data is never stale
- Users are generally more tolerant of additional latency when updating data than when retrieving it
- Disadvantages
- Does invoke write penalty for having to write twice
- If node fails and new one is spun up data is missing until added or updated
- This can be mitigated by implementing lazy loading in conjucture with write-through)
- Wasted resources if most of data is never read
DAX vs Elasticache
- DAX optimized just for DynamoDB
- DAX ONLY supports Write-Through
- If you need lazy loading you have to use Elasticache
DynamoDB Transactions
- DynamoDB Transactions were designed for mission critical operations
- Transactions
- ACID Transactions (Atomic, Consitent, Isolated, Durable)
- Read or write multiple items across multiple tables as an all or nothing operation
- Check for a pre-requisite condition before writing to a table
- Implement complex business logic into single atomic transaction
DynamoDB TTL
- DynamoDB Time To Live (TTL)
- Time To Live is attribute that defines an expiry time for your data
- Expired items marked for deletion
- If data is marked it will be deleted with in 48 hours
- Good for removing irrelevant or old data
- Session data
- Event Logs
- Temporary data
- Reduce cost by removing data no longer relevant
- TTL is expressed as epoch/unix time
- numeric value represents the number of seconds that have elapsed since 12am January 1 1970
- When current time is greater than TTL the item will become expired and marked for deletion
- You can filter out expired items from your queries and scans
- This is useful because deletes can take 48 hours
DynamoDB Streams
- Streams are Time-ordered sequence of itme level modifications (insert, update, delete)
- Logs are encrypted at rest and stored for 24 hours
- Accessed using a dedicated endpoint
- By default the Primary key is recorded
- Before and After images can be captured
- Used for triggers
- Accessed through their own endpoints
- Events are recorded in near real time
- Really good for serverless and lambda
- Application can take actions based on content
- Lambda can pull stream and trigger events based on stream events
Provisioned Throughput Exceeded Exception
- Will see this if your request read/write capacity provisioned is exceeded for DynamoDB table
- SDK will autoretry until successful
- If not using SDK
- Reduce request frequency
- Implement Exponetial backoff
Exponential Backoff
- Components in network can generate errors due to being overloaded
- Usally dealt with by implementing retries (which SDK does)
- In addition to reties SDK also uses exponential Backoff
- Progressively longer waits between consecutive retries (50 ms, 100 ms, 200ms)
- if after 1 minute does not work, your request size may be exceeding the throughput for read/write capacity
- Exponential backoff is used for more than Dynamodb. Every feature of AWS SDK.
- Applies to many services in AWS
- S3
- CloudFormation
- SES
- If not using SDK will need to implement this yourself in the application