📜 ⬆️ ⬇️

Comparing Windows Azure Table Storage and Amazon DynamoDB

Hello.
I bring to your attention the translation of the first article from the comparison cycle of services provided by Windows Azure and Amazon, which is written by a specialist well-known in cloud circles - Gaurav Mantri.

In this article, I will compare Windows Azure Table Storage and Amazon DynamoDB — WATS and ADDB, respectively.


In terms of functionality, WATS and ADDB provide similar capabilities. Both are NoSQL systems for storing large amounts of data. Amazon also has another NoSQL database, SimpleDB .
')
An important point to note is that ADDB is not just a NoSQL database. This is a database service. Yes, it is true that it is used to manage data, but you control the degree of scalability of the system with the bandwidth you need. In this sense, all this is very similar to instances of the Amazon or Windows Azure computing service. In the case of a compute service instance, you choose which instance size you need, and the system responds to the request. Similarly, in the case of ADDB, you tell the system how many read and write operations your application will produce in the ADDB table, and ADDB allocates the required capacity.

Conceptually, both systems are similar:
  1. Both systems are non-relational NoSQL.
  2. Both systems as a whole are repositories of key-value entries.
  3. There is no support for relationships that are available in the relational database.
  4. Implied support for high availability and flexibility.
  5. Both systems provide a REST API for working with queues and messages and other high-level language libraries, which are usually wrappers that implement the REST API. In both systems, each API release has its own version, expressed in a date. At the time of writing, these versions are equal: WATS - 2011-08-18 , ADDB - 2011-12-05 .

Naturally, there are several significant differences:
  1. In ADDB, the bandwidth that you need is what you allocate when you start working with the system, in the case of WATS, the bandwidth is controlled by the system. Therefore, the ADDB system is more flexible, but it requires more “capital” work.
  2. Unlike SimpleDB, where a 10 GB limit is placed on a domain, ADDB does not have a similar restriction - you can store as much data as you like. WATS also does not put hard-limits on the data in the table, but you are limited by the size of the storage account (now 100 TB).
  3. ADDB can, at your desire, index your data, unlike WATS. Technically, WATS also indexes your data, but only on certain attributes (PartitionKey, RowKey), and the ability to have secondary indexes in WATS is one of the most user-requested functions.

Concepts

Table: when we think of a table, the first thing that comes to mind is the statement that it is “something that consists of rows and columns”. The table in WATS and ADDB may look like we imagine, but in fact it is not. Consider a table as a container containing collections of key-value pairs that display data. In the relational model, we define columns for tables, and rows contain data. To store data in a table, you must define columns. The table in WATS and ADDB does not contain a schema, that is, there is no need to define columns. In short, consider the table as a bag where you put the necessary data.

Although conceptually the tables in both systems are containers for storing data, there are several differences between them:
  1. By default, ADDB has a limit on the number of tables in 256 pieces (it can be increased on request, in WATS there are no restrictions. In WATS, you can have as many tables as you like, given the limit on the storage account (now 100 TB).
  2. When you define an ADDB table, you must define a primary key for this table, which can be one attribute or several. All objects in this table must have unique primary key values. In WATS, the primary key is generated by the system and is a combination of the PartitionKey and RowKey attributes. Each entity in the table must have a unique combination of these attributes.
  3. When creating a table in ADDB, you must specify the allocated bandwidth (the number of read and write operations) that is not available in WATS. You can later use the ADDB API to change this bandwidth. When the allocated bandwidth expires, the ADDB begins to limit requests (throttling).

Entity and object: What defines the data in the table. Each entity (in WATS) and an object (item, ADDB) consists of one or more attributes. An attribute is a collection of key-value pairs (key-value-data in WATS). In relational databases, this would be a string. Here, each row in a table or domain has no links to other rows. Each entity in WATS is uniquely identified by two attributes: PartitionKey and RowKey — treat this as a composite primary key. Each entity must have a unique combination of these attributes. In ADDB, each object is uniquely identified by a primary key, which is one of the attributes of the object. All objects in the ADDB table must have a primary key.

There are several differences between an entity and an object:
  1. In WATS, an entity has a maximum of 256 attributes, in ADDB there are no restrictions. Each WATS entity has three system attributes: PartitionKey, RowKey and Timestamp, so the number of user-defined attributes is reduced to 253. The attribute values ​​for PartitionKey and RowKey can be defined independently, while the Timestamp is determined by the system and contains the value of the entity . The attributes PartitionKey and RowKey contain the type String.
  2. The maximum size of an entity in WATS is set to 1 MB, the ADDB object is 64 KB.
  3. In WATS, attribute values ​​can have one of 8 data types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, and String, which provides a rich data model. In ADDB, the set of available types contains: String, Number, and String / Number Sets (arrays of strings or numbers).
  4. In WATS, data is indexed only by PartitionKey and RowKey, indexing by other attributes is not yet available. The data in Windows Azure is partitioned by the value of PartitionKey, which necessitates its careful selection, since the wrong choice of value can significantly reduce performance. A wonderful article can be read here . In ADDB, data is indexed by the attributes that make up the primary key of the table.

ADDB supports two types of primary keys:
  1. Hash Type Primary Key: In this case, the primary key consists of a single attribute, hash. ADDB builds an unstructured hash index on the attribute of this primary key.
  2. Hash and Range Type Primary Key: In this case, the primary key consists of two attributes. The first attribute is a hash attribute, the second is a range attribute. ADDB builds an unstructured hash index by a hash attribute and a sorted range index by a range attribute.

Bandwidth allocation

One of the most important functions in ADDB is bandwidth allocation, which allows you to configure the necessary bandwidth for the application. In short, bandwidth allocation determines how many reads and writes per minute can be performed on an ADDB table. Based on the values ​​you provide, ADDS allocates the appropriate resources, and you can update the configuration on the fly using the API or Amazon Management Console.

Bandwidth allocation operates in two terms - Read Capacity Units for read operations and Write Capacity Units for write operations.

Read Capacity Unit is defined as the number of read operations per second in a block of 1 Kb. So, if you requested 10 RCUs, it means that you can perform consistent read operations on 10 objects up to 1 KB in size per second. If the size of the object is more than 1 KB, the number of objects you can read per second will be less. For example, if your objects are 1 and 2 KB in size, you can only perform 5 consistent read operations per second before the system starts restricting you. If you want to use eventually agreed read (instead consistent read), throughput The ability usually doubles - if you have requested 10 RCUs, you can do 20 operations of eventually coordinated readings on objects of 1 Kb or less.

Similarly, with Write Capacity Unit - the number of reads or writes per 1 KB. If 10 WCUs are requested, it will be possible to record 10 objects up to 1 Kb in size per second. If the object size exceeds 1 KB, the number of objects to write per second decreases. For example, if the size of objects is between 1 and 2 Kb, it will be possible to perform 5 write operations per second before the system starts limiting.

Please note that bandwidth allocation has particular pricing issues, as ADDB prices are set separately from other services. Essentially, you pay for the read and write operations you reserve. At the time of this writing, you would pay $ 0.01 / hour for every 10 unit of write capacity and $ 0.01 / hour for every 50 units write capacity in the data center in US East (Virginia). Basically, pricing is similar to how prices are created for instances of computing services, in which case you request a virtual machine of a certain size (with certain capacities and RAM) and pay for this virtual machine hourly, regardless of whether you fully download it or not . Similarly in ADDB - you pay by the hour for bandwidth that you requested from Amazon, regardless of its use.

When it comes to bandwidth allocation, there are a few things to consider:
  1. This is configured for each table.
  2. The minimum bandwidth is 5 RCU and 5 WCU per table - for each table you pay at least $ 0.001 ($ 0.01 * 5/50) for read operations and $ 0.005 for consistent write operations per hour, even if you do not use this table.
  3. The increase or decrease in the allocated bandwidth should be at least 10% different from the previous value - for example, if you now have 100 read capacity units and you want to increase this value, the new value must be equal to or greater than 110.
  4. When you increase or decrease the capacity, you can double the maximum value in one request - for example, if you now have 100 read capacity units, you can increase this value to a maximum of 200.
  5. You can reduce the allocated bandwidth once a day.
  6. You can allocate a maximum of 10,000 read capacity units and 10,000 write capacity units (by default) per table. The default between all tables in an account is a maximum of 20,000 read capacity units and 20,000 write capacity units. These values ​​can be increased by writing Amazon .

Prices

Before we talk about the functionality provided by each of the systems, let's look at pricing. In both systems there are no “capital” costs. The components included in the pricing include:
  1. Transaction : In WATS, you pay for the number of transactions and their cost is fixed ($ 0.01 for 10,000 transactions). Thus, it turns out that to calculate the final price, it is necessary to multiply the number of transactions by their value.
  2. Bandwidth Allocation : In ADDB, you pay for bandwidth allocated at fixed prices for read and write operations. You can calculate the total price by multiplying the number of allocated RCU and WCU by the price per hour.
  3. Data Transfer : You pay for the amount of data transferred to and from the system. At the time of writing the post, both systems provide free incoming traffic. Data transferred between ADDB and Amazon EC2 within one region is free. Data transferred between ADDB and Amazon EC2 in different regions are paid according to the tariffs. In WATS only outgoing traffic is paid.

Pricing in ADDB is more predictable than pricing in WATS, however, it is necessary to correctly calculate the necessary bandwidth in order not to pay for extra requests or cause their restriction by the system.

Feature list
Wats
ADDB
Create Table / CreateTable
Yes
Yes
Query Tables / ListTables
Yes
Yes
Delete Table / DeleteTable
Yes
Yes
UpdateTable
Not
Yes
DescribeTable
Not
Yes
CRUD on one entity / object
Yes
Yes
CRUD on multiple entities / objects
Yes
Yes
Query Entities / Query (Scan)
Yes
Yes

Let us consider in more detail all the functions from the list.

Wats
ADDB
Create Table / CreateTable
Yes
Yes

As the name of this function suggests, it creates a table in WATS and ADDB. Unlike SimpleDB, where the CreateDomain operation is idempotent, it is not in ADDB — if you try to create a table with the name of an existing table, the system throws an error.

There are several table / domain naming conventions that are tabulated below.

Wats
ADDB
Min / Max Length
3/63
3/255
Case sensitivity
Mixed case
Mixed case
Allowed characters
Alphanumeric
Alphanumeric, hyphen (-), dash (_), period (.)


There are a few more points:


Wats
ADDB
Query Tables / ListTables
Yes
Yes

The function returns a list of tables. One function request returns up to 1000 tables in WATS and all tables in ADDB, if there are still tables or domains, the continuation token is also returned, allowing access to the next set of tables or domains.

Wats
ADDB
Maximum number of entries per function call
1000
-
Return continuation token
Yes
Yes

Wats
ADDB
Delete Table / DeleteTable
Yes
Yes

The function deletes the table. The ADDB is not idempotent.
To delete a table in ADDB, the table must be in the Active state. This operation in ADDB is asynchronous. In WATS, although it seems to be synchronous, it is also asynchronous. When a request to delete a table is sent to WATS, the table is marked by the system for deletion and becomes unavailable, and is deleted only during the garbage collection process, so the current deletion of the table may vary depending on the size of the data in this table. In my experience, deleting a very large table can take hours. At this time, an attempt to create a table with the same name will result in an error (Conflict error - HTTP Status Code 409).

Wats
ADDB
UpdateTable
Not
Yes

The function is used to update the allocated bandwidth for a table in ADDB. You can increase and decrease the allocated bandwidth:



Wats
ADDB
DescribeTable
Not
Yes

The DescribeTable function is used to get the following information about a table:


Please note that the results of this operation are ultimately agreed upon, therefore it is not guaranteed that you will receive the latest updates.

Wats
ADDB
CRUD on one entity / object
Yes
Yes

Both systems allow you to perform Create, Read, Update, Delete (CRUD) operations on a single entity / object.
What you need to remember:


Creature


In WATS, you can use several operations for creating operations, in ADDB they are all combined into one function ( PutItem ). The PutItem operation creates an object or, if the table contains an object with the specified primary key, this object is completely replaced. WATS has three functions for creating an entity:

  1. Insert Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, an error is thrown.
  2. Insert or Merge Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be merged with the new entity, i.e. the values ​​of the attributes existing in both entities will be updated, the attributes that exist only in the new entity will be added, and the attributes that exist only in the old entity will be left in the old state.
  3. Insert or Replace Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be replaced with the new entity by deleting the old entity and creating a new entity with the specified PartitionKey and RowKey values .


Reading


In both systems, read operations consist in querying the attributes of an entity / object. In WATS, this is implemented using Query Entities and passing PartitionKey and RowKey as arguments. In ADDB, this is implemented using GetItem and passing the object's primary key as arguments.

Notice that by default, the GetItem operation performs a consistently read. However, you can specify this function to perform a consistent read using the optional parameter ConsistentRead.

Update


There are several ways to update entities in WATS, but only two in ADDB:

  1. PutItem : The PutItem operation creates an object, or if an object with the specified primary key already exists, completely replaces it.
  2. UpdateItem : If you need to replace several attributes of an existing object instead of a full replacement, you can use this functionality, which provides flexible control over attribute changes.


WATS has four functions for updating an entity:

  1. Merge Entity : If an entity with the specified PartitionKey and RowKey values already exists, this entity will be merged with the new entity, i.e. the values ​​of the attributes existing in both entities will be updated, the attributes that exist only in the new entity will be added, and the attributes that exist only in the old entity will be left in the old state.
  2. Update Entity : The operation replaces an existing entity with a new entity, deleting the old entity and creating a new one with the specified PartitionKey and RowKey values .
  3. Insert or Merge Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be merged with the new entity, i.e. the values ​​of the attributes existing in both entities will be updated, the attributes that exist only in the new entity will be added, and the attributes that exist only in the old entity will be left in the old state.
  4. Insert or Replace Entity : Creates a new entity in the table. If an entity with the specified PartitionKey and RowKey values already exists, this entity will be replaced with the new entity by deleting the old entity and creating a new entity with the specified PartitionKey and RowKey values .


Updating by condition (Conditional Updates) : Both systems support updating by condition, but these mechanisms work differently. In ADDB, you define conditions on the values ​​of existing attributes, that is, you define that ADDB will update the value of attribute1 only if the value of another attribute, attribute2, is equal to some value. Update as per condition in ADDB support attribute existence check. WATS is different. In WATS, it all depends on the value of the ETag entity. To update an entity according to the condition, you must provide the ETag value of the entity in one of the request headers (when using the REST API), after which WATS compares this value with the current ETag value of the updated entity and the update is performed only if these values ​​match.

Deletion


To delete an entity in WATS, you can use the Delete Entity passing the PartitionKey and RowKey of this entity as input arguments. Similar to deleting an object in ADDB, you use DeleteItem with passing the primary key of this object as an input argument.

DeleteAttributes in ADDB is idempotent , that is, if you try to delete a non-existent entity, ADDB will not throw an error until you use deletion based on a condition. If you perform a deletion according to a condition, the operation is not idempotent in ADDB. In WATS, an attempt to delete a non-existent entity will throw an error (NotFound error - HTTP Status Code 404).

Delete by condition : Both systems support deletion by condition, but these mechanisms work differently. In ADDB, you define conditions on the values ​​of existing attributes, that is, you define that ADDB will delete an object only if the value of attribute attribute 2 is equal to some value. Deletion by condition in ADDB supports attribute existence checking. WATS is different. In WATS, it all depends on the value of the ETag entity. To update an entity according to a condition, you must provide the ETag value of the entity in one of the request headers (when using the REST API), after which WATS compares this value with the current ETag value of the entity being deleted and deletes it only if these values ​​match.
Wats
ADDB
CRUD for multiple entities / objects
Yes
Yes

Both systems support the execution of CRUD operations for several entities / objects within the same service call.

In WATS for CRUD, you can use Entity Group Transactions . In ADDB, you can use a BatchWriteItem for this. You can also use BatchGetItem to read multiple objects from multiple tables using primary keys.

Comments:


Wats
ADDB
Query Entities / Query (Scan)
Yes
Yes

Used to retrieve one or more entities / objects from a table based on a criterion.

Comments:


Summary


Summarizing - both systems are comparable in functionality. There are some differences in functionality and, if the developer keeps them in mind during the development and planning process, it is possible to create a system that will use both services, even, perhaps, with integration. Each system has its advantages and disadvantages, and we must use these advantages and disadvantages to decide which system is best suited to our needs.

Source: https://habr.com/ru/post/144762/


All Articles