How does the DynamoDB Item Size Calculator and Capacity Simulator work?
The DynamoDB Item Size Calculator and Capacity Simulator is a free online tool that calculates the size of DynamoDB items. It provides detailed information on how much space an item takes, which helps determine the required Read Capacity Units (RCU) and Write Capacity Units (WCU) for various operations. Additionally, the tool helps assess table capacity based on the calculated item size, providing information on how RCUs and WCUs will impact the number of specific operations the table can handle before throttling.
How item size is calculated in DynamoDB
The DynamoDB engine utilizes a specific format of JSON, incorporating strict data types for every attribute (the specific format can be converted directly in the tool or with its more advanced and dedicated tool - DynamoDB two-way converter tool). The size of the item is a combination of the size of the name of the key and the value, which depends on the data type. Information on how to assess item size is publicly available on AWS - DynamoDB Item sizes and formats.
String (S): Represents a string data type.
Strings are Unicode with UTF-8 binary encoding. The size of a string is the number of UTF-8-encoded bytes of the attribute name plus the number of UTF-8-encoded bytes of the string value.
Example:
{ "id": { "S": "uniqueIdString" } }
First, encode both the attribute name and string value into UTF-8 and then calculate the byte length. This can be evaluated with the following Node.js function:
Buffer.byteLength(key, 'utf8');
Thus, the calculation will be:
id (2) + uniqueIdString (14) = 16 bytes
Common issue:
A common error occurs when you calculate the attribute name and the following string value based solely on the length of the strings and not their UTF-8 byte representation, like in the following example:
{ "caféName": { "S": "Mocca" } }
Thus, the calculation will be:
caféName (9) + Mocca (5) = 14 bytes
However, if you only consider the length, the total size will be 13 bytes, which is incorrect.
Number (N): Represents a number.
Numbers are variable length, with up to 38 significant digits. Leading and trailing zeroes are trimmed. The size of a number is approximately the number of UTF-8-encoded bytes of the attribute name plus 1 byte per two significant digits plus 1 byte.
Example:
{ "id": { "N": 777 } }
Start by determining which digits are insignificant and drop them. These will be all zeros preceding a number without a decimal point (e.g., 002321, significant is only 2321) or all zeros after a decimal point that stand in the last position before a digit larger than 0 (e.g., 0.00200, significant is only 0.002).
Having only significant digits, divide the whole number by 2 (1 byte per two significant digits), and because AWS states it is only an approximation, it is safer to round this up using the ceil
function.
Math.ceil((significantDigits ? significantDigits.length : 0) / 2);
Finally, add 1 byte and overhead for the Number data type. Note that negative numbers consume an additional byte, which is easy to re-engineer but not officially disclosed by AWS. Thus, the calculation will be:
id (2) + 777 (2) + (1) = 5 bytes
and for
{ "id": { "N": 7777 } }
It will still be 5 bytes because we count only 1 byte per two significant digits, so
7777 = (7777.length) / 2 = 2
id (2) + 7777 (2) + (1) = 5 bytes
Null (NULL): Represents a null value.
The size of a null attribute or a Boolean attribute is the number of UTF-8-encoded bytes of the attribute name plus 1 byte.
Example:
{ "phoneNumber": { "NULL": true } }
Similar to the String data type, calculate the attribute name UTF-8 bytes length and add 1 byte. Thus, the calculation will be:
phoneNumber (11) + NULL (1) = 12 bytes
Boolean (BOOL): Represents a boolean value (true or false).
The size of a Boolean attribute is the number of UTF-8-encoded bytes of the attribute name plus 1 byte.
Example:
{ "isActive": { "BOOL": true } }
Thus, the calculation will be:
isActive (8) + BOOL (1) = 9 bytes
Binary (B): Represents binary data.
A binary value must be encoded in base64 format before it can be sent to DynamoDB, but the value's raw byte length is used for calculating size. The size of a binary attribute is the number of UTF-8-encoded bytes of the attribute name plus the number of raw bytes.
Example:
{ "picture": { "B": "U29tZSBiaW5hcnkgZGF0YQ==" } }
Similar to strings, use the byteLength
function to calculate the raw byte length.
Thus, the calculation will be:
picture (7) + “U29tZSBiaW5hcnkgZGF0YQ==” (16) = 23 bytes
List (L): Represents an ordered collection of values.
An attribute of type List or Map requires 3 bytes of overhead, regardless of its contents. The size of a List or Map is the number of UTF-8-encoded bytes of the attribute name plus the sum of the sizes of nested elements plus 3 bytes. The size of an empty List or Map is the number of UTF-8-encoded bytes of the attribute name plus 3 bytes.
Each List or Map element also requires 1 byte of overhead.
Example:
{"interests": { "L": [
{ "S": "Reading" },
{ "S": "Traveling" },
{ "N": "77" }
]}}
The easiest way to calculate a list is by using the reduce
function with recursion to iterate over all list elements with proper arguments. The starting point for an empty list data type is 3, and then, apart from the list element size value, there is an additional overhead of 1 byte per list item. It could look like this:
function calculateSizes(key: string, value: DynamoDBValue): number {
const dataType = Object.keys(value)[0] as keyof DynamoDBValue;
const dataValue = value[dataType as keyof DynamoDBValue];
let valueSize = 0;
let keySize = Buffer.byteLength(key, 'utf8');
switch (dataType) {
case 'L':
valueSize = (dataValue as DynamoDBValue[]).reduce((acc, item) => acc + calculateSizes('', item)['attributeSize'] + 1, 3);
break;
default:
throw new Error(`Unknown data type: ${dataType}`);
}
}
Thus, the calculation will be:
interests (9) + (3) + Reading (7) + (1) + Traveling (9) + (1) + 77 (2) + (1) = 33 bytes
Map (M): Represents a collection of key-value pairs.
Maps are calculated similarly to Lists but are a little harder to implement. Their sizes will be greater than Lists because we have additional size to calculate per each item since there has to be an attribute name per item instead of only a data type classifier as with Lists.
Example:
{ "address": { "M": {
"street": { "S": "Wall Street" },
"city": { "S": "New York" },
"state": { "S": "NY" },
"zipCode": { "S": "12345" }
}}}
Thus, the calculation will be:
address (7) + (3) + street (6) + Wall Street (11) + (1) + city (4) + New York (8) + (1) + state (5) + NY (2) + (1) + zipCode (7) + 12345 (4) + (1) = 61 bytes
String Set (SS): Represents a set of unique strings.
AWS doesn't provide official guidelines about String Set, but empirical tests proved that its calculation is the sum of strings without List initial and item overhead.
Example:
{ "interests": { "SS": ["Reading", "Traveling", "Cooking"] } }
Thus, the calculation will be:
interests (9) + Reading (7) + Traveling (9) + Cooking (7) = 32 bytes
Number Set (NS): Represents a set of unique numbers.
Similarly to String Set, only empirical tests were used to reverse-engineer how AWS calculates this DynamoDB data type. To sum up, each element is calculated the same way as a Number and summarized at the end for the total size.
Example:
{
"favoriteNumbers": { "NS": ["7", "14", "21"] }
}
Thus, the calculation will be as follows:
favoriteNumbers (15) + 7 (1) + 14 (2) + 21 (2) = 20 bytes
Binary Set (BS): Represents a set of unique binary values.
Similarly to String Sets and Number Sets, AWS hasn't provided specific instructions on how to calculate them. However, tests have shown that it is just a sum of calculations related to binary data types.
Example:
{
"preferences": { "BS": [
"U29tZSBiaW5hcnkgZGF0YTE=",
"U29tZSBiaW5hcnkgZGF0YTI=",
"U29tZSBiaW5hcnkgZGF0YTM="
]}
}
Thus, the calculation will be as follows:
preferences (11) + "U29tZSBiaW5hcnkgZGF0YTE=" (24) + "U29tZSBiaW5hcnkgZGF0YTI=" (24) + "U29tZSBiaW5hcnkgZGF0YTM=" (24) = 83 bytes