Hello.
I bring to your attention the translation of the first article from the comparison cycle of services provided by Windows Azure and Amazon, which is written by a specialist well-known in cloud circles - Gaurav Mantri.
In this article, I will compare Windows Azure Table Storage and Amazon DynamoDB — WATS and ADDB, respectively.
In terms of functionality, WATS and ADDB provide similar capabilities. Both are NoSQL systems for storing large amounts of data. Amazon also has another NoSQL database,
SimpleDB .
')
An important point to note is that ADDB is not just a NoSQL database. This is a database service. Yes, it is true that it is used to manage data, but you control the degree of scalability of the system with the bandwidth you need. In this sense, all this is very similar to instances of the Amazon or Windows Azure computing service. In the case of a compute service instance, you choose which instance size you need, and the system responds to the request. Similarly, in the case of ADDB, you tell the system how many read and write operations your application will produce in the ADDB table, and ADDB allocates the required capacity.
Conceptually, both systems are similar:
- Both systems are non-relational NoSQL.
- Both systems as a whole are repositories of key-value entries.
- There is no support for relationships that are available in the relational database.
- Implied support for high availability and flexibility.
- Both systems provide a REST API for working with queues and messages and other high-level language libraries, which are usually wrappers that implement the REST API. In both systems, each API release has its own version, expressed in a date. At the time of writing, these versions are equal: WATS - 2011-08-18 , ADDB - 2011-12-05 .
Naturally, there are several significant differences:
- In ADDB, the bandwidth that you need is what you allocate when you start working with the system, in the case of WATS, the bandwidth is controlled by the system. Therefore, the ADDB system is more flexible, but it requires more “capital” work.
- Unlike SimpleDB, where a 10 GB limit is placed on a domain, ADDB does not have a similar restriction - you can store as much data as you like. WATS also does not put hard-limits on the data in the table, but you are limited by the size of the storage account (now 100 TB).
- ADDB can, at your desire, index your data, unlike WATS. Technically, WATS also indexes your data, but only on certain attributes (PartitionKey, RowKey), and the ability to have secondary indexes in WATS is one of the most user-requested functions.
Concepts
Table: when we think of a table, the first thing that comes to mind is the statement that it is “something that consists of rows and columns”. The table in WATS and ADDB may look like we imagine, but in fact it is not. Consider a table as a container containing collections of key-value pairs that display data. In the relational model, we define columns for tables, and rows contain data. To store data in a table, you must define columns. The table in WATS and ADDB does not contain a schema, that is, there is no need to define columns. In short, consider the table as a bag where you put the necessary data.
Although conceptually the tables in both systems are containers for storing data, there are several differences between them:
- By default, ADDB has a limit on the number of tables in 256 pieces (it can be increased on request, in WATS there are no restrictions. In WATS, you can have as many tables as you like, given the limit on the storage account (now 100 TB).
- When you define an ADDB table, you must define a primary key for this table, which can be one attribute or several. All objects in this table must have unique primary key values. In WATS, the primary key is generated by the system and is a combination of the PartitionKey and RowKey attributes. Each entity in the table must have a unique combination of these attributes.
- When creating a table in ADDB, you must specify the allocated bandwidth (the number of read and write operations) that is not available in WATS. You can later use the ADDB API to change this bandwidth. When the allocated bandwidth expires, the ADDB begins to limit requests (throttling).
Entity and object: What defines the data in the table. Each entity (in WATS) and an object (item, ADDB) consists of one or more attributes. An attribute is a collection of key-value pairs (key-value-data in WATS). In relational databases, this would be a string. Here, each row in a table or domain has no links to other rows. Each entity in WATS is uniquely identified by two attributes: PartitionKey and RowKey — treat this as a composite primary key. Each entity must have a unique combination of these attributes. In ADDB, each object is uniquely identified by a primary key, which is one of the attributes of the object. All objects in the ADDB table must have a primary key.
There are several differences between an entity and an object:
- In WATS, an entity has a maximum of 256 attributes, in ADDB there are no restrictions. Each WATS entity has three system attributes: PartitionKey, RowKey and Timestamp, so the number of user-defined attributes is reduced to 253. The attribute values for PartitionKey and RowKey can be defined independently, while the Timestamp is determined by the system and contains the value of the entity . The attributes PartitionKey and RowKey contain the type String.
- The maximum size of an entity in WATS is set to 1 MB, the ADDB object is 64 KB.
- In WATS, attribute values can have one of 8 data types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, and String, which provides a rich data model. In ADDB, the set of available types contains: String, Number, and String / Number Sets (arrays of strings or numbers).
- In WATS, data is indexed only by PartitionKey and RowKey, indexing by other attributes is not yet available. The data in Windows Azure is partitioned by the value of PartitionKey, which necessitates its careful selection, since the wrong choice of value can significantly reduce performance. A wonderful article can be read here . In ADDB, data is indexed by the attributes that make up the primary key of the table.
ADDB supports two types of primary keys:
- Hash Type Primary Key: In this case, the primary key consists of a single attribute, hash. ADDB builds an unstructured hash index on the attribute of this primary key.
- Hash and Range Type Primary Key: In this case, the primary key consists of two attributes. The first attribute is a hash attribute, the second is a range attribute. ADDB builds an unstructured hash index by a hash attribute and a sorted range index by a range attribute.
Bandwidth allocation
One of the most important functions in ADDB is bandwidth allocation, which allows you to configure the necessary bandwidth for the application. In short, bandwidth allocation determines how many reads and writes per minute can be performed on an ADDB table. Based on the values you provide, ADDS allocates the appropriate resources, and you can update the configuration on the fly using the API or Amazon Management Console.
Bandwidth allocation operates in two terms - Read Capacity Units for read operations and Write Capacity Units for write operations.
Read Capacity Unit is defined as the number of read operations per second in a block of 1 Kb. So, if you requested 10 RCUs, it means that you can perform consistent read operations on 10 objects up to 1 KB in size per second. If the size of the object is more than 1 KB, the number of objects you can read per second will be less. For example, if your objects are 1 and 2 KB in size, you can only perform 5 consistent read operations per second before the system starts restricting you. If you want to use eventually agreed read (instead consistent read), throughput The ability usually doubles - if you have requested 10 RCUs, you can do 20 operations of eventually coordinated readings on objects of 1 Kb or less.
Similarly, with Write Capacity Unit - the number of reads or writes per 1 KB. If 10 WCUs are requested, it will be possible to record 10 objects up to 1 Kb in size per second. If the object size exceeds 1 KB, the number of objects to write per second decreases. For example, if the size of objects is between 1 and 2 Kb, it will be possible to perform 5 write operations per second before the system starts limiting.
Please note that bandwidth allocation has particular pricing issues, as ADDB prices are set separately from other services. Essentially, you pay for the read and write operations you reserve. At the time of this writing, you would pay $ 0.01 / hour for every 10 unit of write capacity and $ 0.01 / hour for every 50 units write capacity in the data center in US East (Virginia). Basically, pricing is similar to how prices are created for instances of computing services, in which case you request a virtual machine of a certain size (with certain capacities and RAM) and pay for this virtual machine hourly, regardless of whether you fully download it or not . Similarly in ADDB - you pay by the hour for bandwidth that you requested from Amazon, regardless of its use.
When it comes to bandwidth allocation, there are a few things to consider:
- This is configured for each table.
- The minimum bandwidth is 5 RCU and 5 WCU per table - for each table you pay at least $ 0.001 ($ 0.01 * 5/50) for read operations and $ 0.005 for consistent write operations per hour, even if you do not use this table.
- The increase or decrease in the allocated bandwidth should be at least 10% different from the previous value - for example, if you now have 100 read capacity units and you want to increase this value, the new value must be equal to or greater than 110.
- When you increase or decrease the capacity, you can double the maximum value in one request - for example, if you now have 100 read capacity units, you can increase this value to a maximum of 200.
- You can reduce the allocated bandwidth once a day.
- You can allocate a maximum of 10,000 read capacity units and 10,000 write capacity units (by default) per table. The default between all tables in an account is a maximum of 20,000 read capacity units and 20,000 write capacity units. These values can be increased by writing Amazon .
Prices
Before we talk about the functionality provided by each of the systems, let's look at pricing. In both systems there are no “capital” costs. The components included in the pricing include:
- Transaction : In WATS, you pay for the number of transactions and their cost is fixed ($ 0.01 for 10,000 transactions). Thus, it turns out that to calculate the final price, it is necessary to multiply the number of transactions by their value.
- Bandwidth Allocation : In ADDB, you pay for bandwidth allocated at fixed prices for read and write operations. You can calculate the total price by multiplying the number of allocated RCU and WCU by the price per hour.
- Data Transfer : You pay for the amount of data transferred to and from the system. At the time of writing the post, both systems provide free incoming traffic. Data transferred between ADDB and Amazon EC2 within one region is free. Data transferred between ADDB and Amazon EC2 in different regions are paid according to the tariffs. In WATS only outgoing traffic is paid.
Pricing in ADDB is more predictable than pricing in WATS, however, it is necessary to correctly calculate the necessary bandwidth in order not to pay for extra requests or cause their restriction by the system.
Feature listLet us consider in more detail all the functions from the list.
| Wats
| ADDB
|
Create Table / CreateTable
| Yes
| Yes
|
As the name of this function suggests, it creates a table in WATS and ADDB. Unlike SimpleDB, where the CreateDomain operation is idempotent, it is not in ADDB — if you try to create a table with the name of an existing table, the system throws an error.
There are several table / domain naming conventions that are tabulated below.
| Wats
| ADDB
|
Min / Max Length
| 3/63
| 3/255
|
Case sensitivity
| Mixed case
| Mixed case
|
Allowed characters
| Alphanumeric
| Alphanumeric, hyphen (-), dash (_), period (.)
|
There are a few more points:
- In WATS, the table name cannot begin with a digit; moreover, the register of table names keeps the register in which they were created, but when used, the register is not important. As mentioned above, by default you can create up to 256 tables per ADDB account. To increase this value, you can write a request to Amazon: ( http://www.amazon.com/gp/html-forms-controller/DynamoDB_Limit_Increase_Form) .
- This operation in ADDB is asynchronous, whereas in WATS, on the contrary, it is synchronous. When the ADDB receives a request to create a table, many processes are created (resource allocation) and you are not allowed to use this table until all processes are completed and the table is in the Active state.
- When creating a table in ADDB, you must specify the primary key for this table and the necessary bandwidth, which can be changed later using UpdateTable (but the primary key cannot be changed).
| Wats
| ADDB
|
Query Tables / ListTables
| Yes
| Yes
|
The function returns a list of tables. One function request returns up to 1000 tables in WATS and all tables in ADDB, if there are still tables or domains, the continuation token is also returned, allowing access to the next set of tables or domains.
| Wats
| ADDB
|
Maximum number of entries per function call
| 1000
| -
|
Return continuation token
| Yes
| Yes
|
| Wats
| ADDB
|
Delete Table / DeleteTable
| Yes
| Yes
|
The function deletes the table. The ADDB is not idempotent.
To delete a table in ADDB, the table must be in the Active state. This operation in ADDB is asynchronous. In WATS, although it seems to be synchronous, it is also asynchronous. When a request to delete a table is sent to WATS, the table is marked by the system for deletion and becomes unavailable, and is deleted only during the garbage collection process, so the current deletion of the table may vary depending on the size of the data in this table. In my experience, deleting a very large table can take hours. At this time, an attempt to create a table with the same name will result in an error (Conflict error - HTTP Status Code 409).
| Wats
| ADDB
|
UpdateTable
| Not
| Yes
|
The function is used to update the allocated bandwidth for a table in ADDB. You can increase and decrease the allocated bandwidth:
- The new bandwidth should be within limits and without breaking the rules (see the section “allocating bandwidth” above)
- The table is in the Active state.
| Wats
| ADDB
|
DescribeTable
| Not
| Yes
|
The DescribeTable function is used to get the following information about a table:
- CreationDateTime : Dana creation in UNIX epoch time.
- ItemCount : The number of objects in the table is updated approximately every 6 hours, so changes may not immediately lead to an update of this value.
- KeySchema : Primary key structure (simple or composite).
- ProvisionedThroughput : The throughput for the table, consisting of LastIncreaseDateTime (if available), LastDecreaseDateTime (if available), ReadCapacityUnits, and WriteCapacityUnits. If the bandwidth for the table has never changed, the ADDB does not return values for these elements.
- TableSizeBytes : The total size of the table in bytes. Amazon DynamoDB updates this value approximately every 6 hours, so changes may not immediately lead to an update of this value.
- TableStatus : The current state of the table ( CREATING , ACTIVE , DELETING or UPDATING ).
Please note that the results of this operation are ultimately agreed upon, therefore it is not guaranteed that you will receive the latest updates.
| Wats
| ADDB
|
CRUD on one entity / object
| Yes
| Yes
|
Both systems allow you to perform Create, Read, Update, Delete (CRUD) operations on a single entity / object.
What you need to remember:
- A limit of 256 attributes per entity in WATS and no limits in ADDB. In WATS, with 3 existing system attributes (PartitionKey, RowKey, and Timestamp), up to 253 attributes can be defined.
- In WATS, attribute values can be 8 types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, String. In ADDB: String, Number, and String / Number Sets (arrays of strings or numbers).
- The maximum size of an object in ADDB is 64 Kb, in WATS an entity can be up to 1 MB in size.
Creature
In WATS, you can use several operations for creating operations, in ADDB they are all combined into one function (
PutItem ). The PutItem operation creates an object or, if the table contains an object with the specified primary key, this object is completely replaced. WATS has three functions for creating an entity:
- Insert Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, an error is thrown.
- Insert or Merge Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be merged with the new entity, i.e. the values of the attributes existing in both entities will be updated, the attributes that exist only in the new entity will be added, and the attributes that exist only in the old entity will be left in the old state.
- Insert or Replace Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be replaced with the new entity by deleting the old entity and creating a new entity with the specified PartitionKey and RowKey values .
Reading
In both systems, read operations consist in querying the attributes of an entity / object. In WATS, this is implemented using
Query Entities and passing
PartitionKey and
RowKey as arguments. In ADDB, this is implemented using
GetItem and passing the object's primary key as arguments.
Notice that by default, the GetItem operation performs a consistently read. However, you can specify this function to perform a consistent read using the optional parameter ConsistentRead.
Update
There are several ways to update entities in WATS, but only two in ADDB:
- PutItem : The PutItem operation creates an object, or if an object with the specified primary key already exists, completely replaces it.
- UpdateItem : If you need to replace several attributes of an existing object instead of a full replacement, you can use this functionality, which provides flexible control over attribute changes.
WATS has four functions for updating an entity:
- Merge Entity : If an entity with the specified PartitionKey and RowKey values already exists, this entity will be merged with the new entity, i.e. the values of the attributes existing in both entities will be updated, the attributes that exist only in the new entity will be added, and the attributes that exist only in the old entity will be left in the old state.
- Update Entity : The operation replaces an existing entity with a new entity, deleting the old entity and creating a new one with the specified PartitionKey and RowKey values .
- Insert or Merge Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be merged with the new entity, i.e. the values of the attributes existing in both entities will be updated, the attributes that exist only in the new entity will be added, and the attributes that exist only in the old entity will be left in the old state.
- Insert or Replace Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be replaced with the new entity by deleting the old entity and creating a new entity with the specified PartitionKey and RowKey values .
Updating by condition (Conditional Updates) : Both systems support updating by condition, but these mechanisms work differently. In ADDB, you define conditions on the values of existing attributes, that is, you define that ADDB will update the value of attribute1 only if the value of another attribute, attribute2, is equal to some value. Update as per condition in ADDB support attribute existence check. WATS is different. In WATS, it all depends on the value of the ETag entity. To update an entity according to the condition, you must provide the ETag value of the entity in one of the request headers (when using the REST API), after which WATS compares this value with the current ETag value of the updated entity and the update is performed only if these values match.
Deletion
To delete an entity in WATS, you can use the
Delete Entity passing the
PartitionKey and
RowKey of this entity as input arguments. Similar to deleting an object in ADDB, you use
DeleteItem with passing the primary key of this object as an input argument.
DeleteAttributes in ADDB is
idempotent , that is, if you try to delete a non-existent entity, ADDB will not throw an error until you use deletion based on a condition. If you perform a deletion according to a condition, the operation is not idempotent in ADDB. In WATS, an attempt to delete a non-existent entity will throw an error (NotFound error - HTTP Status Code 404).
Delete by condition : Both systems support deletion by condition, but these mechanisms work differently. In ADDB, you define conditions on the values of existing attributes, that is, you define that ADDB will delete an object only if the value of attribute attribute 2 is equal to some value. Deletion by condition in ADDB supports attribute existence checking. WATS is different. In WATS, it all depends on the value of the ETag entity. To update an entity according to a condition, you must provide the ETag value of the entity in one of the request headers (when using the REST API), after which WATS compares this value with the current ETag value of the entity being deleted and deletes it only if these values match.
| Wats
| ADDB
|
CRUD for multiple entities / objects
| Yes
| Yes
|
Both systems support the execution of CRUD operations for several entities / objects within the same service call.
In WATS for CRUD, you can use
Entity Group Transactions . In ADDB, you can use a
BatchWriteItem for this. You can also use
BatchGetItem to read multiple objects from multiple tables using primary keys.
Comments:
- In WATS, this is a transaction-limited operation, that is, the entire operation is either completed or not. In ADDB it is not. It is possible that several objects may not return, in which case the ADDB will return a list of objects for which the operation failed and can be processed later.
- You cannot update an object using BatchWriteItem, just create and delete.
- In ADDB, the maximum request size is 1 MB, in WATS - 4 MB.
- In ADDB, there is a limit on 25 objects in a single BatchWriteItem operation. In WATS within the same entity group transaction, this limit is 100.
- BatchWriteItem in ADDB allows you to work with several tables in one query, but the entity group transaction in WATS requires both work within the same table and the fact that all entities in this operation had the same PartitionKey value.
- BatchGetItem in ADDB implements a consistently readable reading. Consistent reading is not possible.
| Wats
| ADDB
|
Query Entities / Query (Scan)
| Yes
| Yes
|
Used to retrieve one or more entities / objects from a table based on a criterion.
Comments:
- In WATS, Query Entities is used to obtain a list of entities, while in ADDB, it uses two functions, Query and Scan . The difference between calls to these functions is that for the Scan function there is no need to transfer the primary key. Since the data in ADDB is indexed by the primary key, the speed of Query is much higher than Scan, which simply scans the entire table. Query is available only on tables with a primary key hash-and-range. I think, and I can be mistaken, that if you use the Query function in ADDB, you can use the filter only on the attributes of which the primary key is made, you must use Scan to filter on other attributes. In WATS for the filter, you must specify a query using WCF ( $ filter ).
- Both systems are designed for high availability, delaying your requests for a timeout and returning the results in parts. From the documentation it is not clear when the request will be delayed for a timeout (in ADDB). WATS delayed the request after 5 seconds of execution.
- In WATS, it is possible to get a request back if the service crosses the boundary value PartitionKey.
- In ADDB there is a limit of 1 MB per response size, for example, the maximum response size can be 1 MB. If the query can return a larger set of data, part of the result will be returned.
- You can get all the requested records or part of the result, or not get the result at all, even with the availability of relevant data. When part of the result is returned or an empty set, regardless of the availability of data matching the query, the service will always return a continuation token. Thus, the development of processing logic for this continuation token in an application is very important.
- Both systems allow you to return all attributes or part of all attributes in the request. In ADDB, you define the “ AttributesToGet ” parameter for this. In WATS, this can be accomplished using the attribute names you want to return in the $ select query , for example, $ select = PartitionKey, RowKey, Attribute1, ...
Summary
Summarizing - both systems are comparable in functionality. There are some differences in functionality and, if the developer keeps them in mind during the development and planning process, it is possible to create a system that will use both services, even, perhaps, with integration. Each system has its advantages and disadvantages, and we must use these advantages and disadvantages to decide which system is best suited to our needs.