Query 24 Hours Worth of Data Using BatchGet on Amazon DynamoDB Using Scan and Filter Without a GSI
I’m testing how to query data in DynamoDB which will always be the retrieval of yesterdays data, without using a Global Secondary Index.
This is done just to see what other ways you can use to query data based on a specific timeframe.
Data from DynamoDB needs to be batch processed (daily for the last 24-hours), into a external datasource. Data will be written into DynamoDB, the HK (uuid) and RK (timestamp) will be duplicated to the daily table. But only uuid and timestamp will be duplicated to the daily table, and only data for that day will be written into that datestamp formatted table name.
Let’s say data for 2018-10-30 needs to be written into our external data source, we will do a scan on table tbl-test_20181030, then from our response we will have a list of HashKeys (uuid) which we will use to do a BatchGet Item on our base table: tbl-test_base, which essentially grabs all the data for that day.
If deeper filtering needs to be done on that day, the FilterExpression can be used to do a deeper filtering which leads to grabbing only the filtered down data from the base table.
Note: The base table might have millions of items, so a Scan operation on the Base table would be really expensive, as it reads all the items in the table.
Once the data has been processed, the daily or metadata table can be removed.
DynamoDB Table Design
The base table: tbl-test_base will have:
HashKey: uuid (string)
RangeKey: timestamp (number)
Attributes: city, stream, transaction_date, name, metric_uri
Getting Data for 20181030 but also filter data greater than the timestamp attribute, greater than 1540841144 in epoch time (which will give us about 254 items).
The BatchGet Item supports up to 100 items per call, we will limit the scans on 100 items per call, then paginate using the ExlusiveStartKey with the value of our LastEvaluatedKey that we will get from our response: