This is a partial translation of the Netflix article; it concerns only problematic transitions from Oracle DB to Amazon SimpleDB and how the company resolves them.
Around the end of 2008, Netflix had only one data center. And this DC has put before us a few questions. As the only point of failure, for example, due to problems with electricity, it could lead to dissatisfaction of our users with the service. In addition, with the simultaneous growth of streaming traffic and subscriptions to services, Netflix would soon outgrow this data center - we saw an inevitable need for electricity, better cooling, we needed more space and more equipment.
Alternatively, it was possible to build new data centers. However, besides the high costs, this effort would have resulted in our technical staff not being able to deal with new products, so they would be busy expanding the DC. In addition, we understood that managing multiple data centers is a difficult task. Building and maintaining multiple data centers seemed like a dangerous distraction from our core business.
Instead of taking this path, we chose a more radical one. We switched to the IAAS (Infrastructure as a Service) solution offered at that time by the Amazon web service. With a multitude of data centers already operating, multiple redundancy levels of various services (such as S3 and SimpleDB), AWS promised better availability and scalability in a relatively short time.
')
Putting various networking and background tasks into outsourcing, Netflix focused on its core business, delivering movies and TV shows.
In the process of transition to the AWS infrastructure, we formulated a set of best practices needed to work with
AP systems such as SimpleDB.
Leaving the DBMS Behind
Partial or no SQL support at all. In general, SimpleDB supports submultiply SQL
- Use GROUP BY and JOIN operations at the application level.
- One way to avoid having to use a JOIN is to denormalize multiple tables into one logical SimpleDB domain.
Lack of connections between domains
- Implement communication at the application level
No transaction
- Use the SimpleDB API: ConditionalPut and Conditional Delete
No triggers
- It is possible to do without them.
There is no schema support — and besides, this is not obvious. A request with an invalid attribute name does not cause an error
- Implementing schema validation at the application data access level.
Lack of sequence support
Sequences are often used as primary keys.
- In this case, it is necessary to use a natural unique key, for example, in the customer contact domain, use the client’s mobile phone as the key.
- If there are no natural keys, you must use a UUID.
Sequences are also often used for order numbers.
- Use a distributed sequence generator.
No operations to work with time
There is no support for restrictions, in particular, there are no restrictions on the uniqueness of the field, no control of the foreign key, no restrictions on the integrity.
- The application can check the constraints at the time of reading the data and correct the problem after the fact. This is called read-repair. Read recovery should use the ConditionalPut or ConditionalDelete API to make the changes atomic.
New challenges posed by SimpleDB
In addition, there were features that we encountered that were specific to SimpleDB. Here are some of them:
SimpleDB domains provide the maximum write speed, if you split the data into several domains.
- All Netflix data with a significant write load was distributed across multiple domains.
No support for native data types. All data is stored and processed as strings.
All comparisons (i.e. WHERE conditions) and sorting occur only with strings.
- Store all dates in ISO 8601
- Add zeros in front of the numbers that are used when sorting and / or WHERE comparison.
Two separate API calls for DeleteAtributes and PutAttributes.
How to perform an atomic operation that requires both deleting one attribute and updating another attribute in the same line?
- The simplest option is to use pseudo zeros (for example, the word NULL) instead of the DeleteAttributes operation.
- This negates the optimization of the free space of the SimpleDB tables and leads to inflating data.
Using case-sensitive domain names and attributes
In many DBMS, the names of tables and columns are case-insensitive, and in SimpleDB, on the contrary, in this connection, the put, delete, and select operations may not work correctly without even reporting an error. The programmer’s task is to detect all inconsistencies in the name register.
- Accept a convention whereby you can only use domain and attribute names in upper case.
- In the application at the level of data access it is necessary to provide automatic reduction to upper case.
Operations with typos in select, put, or delete attribute names can end without error notification.
Unlike case sensitivity, this problem occurs due to the lack of schema checking. Simple DB is not only a sparse database, but also without schema support.
- Implement a single point of access to the layer data in the application and implement the verification there.
If you forget to specify LIMIT in select, you may need several queries in order to get all the data.
Multiple requests reduce the likelihood that the site will receive all the data for the required time interval.
- The name of the single point of access to data at the application level, you can install LIMIT directly in it.
- The maximum value can be changed at any time if Amazon increases the limit.
It is also necessary to bear in mind the problems of “integrity ultimately” ( eventual consistency ).
Anti-pattern reading should be avoided immediately after writing.
- Avoid reading directly after writing.
- If this is not possible use ConsistentRead
Non-indexed queries can be very expensive.
Anti-pattern: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 IS NULL
- Use a separate flag attribute with the string value TRUE or FALSE instead of checking for NULL. Thus, the query will use the index: SELECT * FROM MY_DOMAIN WHERE MY_ATTR_1 = 'FALSE'
Some queries are slow, although indexed.
Index selectivity affects speed just like other SQL engines.
- As in other databases, the performance of queries for sampling data is determined by the selectivity of the indices specified in WHERE. Make sure that you understand the selectivity of your indexes and that your WHERE expression contains the best one.
As with any other multi-user system, significant jumps in response time may appear.
- Protect the application against jumps in SimpleDB or S3 with a caching front-end, such as MemCached
Deciding to switch to SimpleDB and S3, Netflix very quickly migrated to the cloud infrastructure, fully armed with new aggressive product launch plans and traffic growth. There were problems, but we coped with almost everyone.