Methods of designing databases in Android

Working as an Android developer, I had to deal with two different approaches to database design in mobile applications. Perhaps someone stated here seems obvious, but maybe someone will give a new idea or save from errors. In general, without a long introduction go to the case ...

Two looks at the problem

As you know, universities teach you to build databases according to all the rules: decompose the subject area into entities, select attributes and identify primary keys, define relationships between entities, bring it all to at least 3 normal forms, etc. One of the “side effects” of this approach is a drop in performance on read operations, with a rather strong decomposition and normalization, since in queries you need to perform a greater number of joins. And the more entries you have in the tables, the longer they take.

Add here the very limited hardware capabilities of mobile platforms, in particular a tiny amount of RAM. Its already small, so in addition to this, Android limits the amount of available RAM per process, depending on the OS version from 16 to 48 MB. And even from these several megabytes, the DBMS gets only a part, because there is also the application itself. And finally, SQLite itself, because of its peculiarities, supports only two levels of transaction isolation. They are either serialized or disabled altogether!
')
In a situation where application performance begins to rest on the performance of the DBMS to the rescue and an alternative approach can come, let's call it key-value oriented. Instead of decomposing an entity into attributes and creating separate fields in the table into each attribute, the entity is stored “as is” into one single BLOB type field, in other words, it is serialized.

Consider an example for clarity. Let our data model in Java code look like this:

class Group { private Long _id; private String number; private List<Student> students; // getters and setters ... } class Student { private Long _id; private String name; private String surname; private Group group; // getters and setters ... }

Thus, in the “standard” version, we obtain two tables with corresponding sets of attributes.

 create table Group( _id primary key integer autoincrement, number text); create table Student( _id primary key integer autoincrement, name text, surname text, group_id integer foreign key);

In the present project, entities and attributes are much larger, plus various utility fields are added here, such as the last synchronization date with the server or a flag flag, is the entity required to be sent to the server to update the changed data, etc.

When applying the same key-value, the tables will look like this.

 create table Group( _id primary key integer autoincrement, value blob); create table Student( _id primary key integer autoincrement, value blob, group_id integer foreign key);

while groups and students are serialized separately on different tables. Or in general like this:

 create table Group( _id primary key integer autoincrement, value blob);

when the group is serialized directly with all students in one table.
Consider the advantages and disadvantages of both approaches and what benefits can be derived from this.

Comparison of approaches, pros and cons

Relational Algebra Features

When using the standard approach, we get all the advantages to which we are so used to using the relational approach, namely, the SQL language for convenient sampling, filtering and sorting data, as well as modifying the database schema. In order to get a collection of entities, we only need to form the required condition and take our data from the database. In the key-value approach, the task of filtering or organizing data lies on the shoulders of the developer.

DB file size

When using the standard approach, the database file is usually smaller. This is due to the lack of redundancy in data storage, due to normalization. In theory, the higher the degree of normalization, the less redundancy, however, increases the load on the database when reading these data. Significant resources are spent on table joins. When using the key-value approach, the degree of data redundancy is higher, since, as a rule, the level of normalization is much lower, which leads to an increase in the size of the database file.

Flexibility when changing the database schema

Usually, with the development of the project, the database schema is transformed more than once, new fields are added, previously used entities are deleted, the entities can be split up into several new ones, or vice versa, they are denormalized and several tables are combined into one. If, when updating the scheme, we can donate the data accumulated in the database, then everything is simple: we create a new database file every time we update the scheme, and delete the old one. But what if the data should be saved and converted to a new format?
In this embodiment, the standard approach has advantages. It is enough to write the corresponding update scripts that convert the database schema to the required form and update the new fields with default values or calculate them using one or another logic. When using serialization, updating the database schema is no longer such an easy task. It is necessary to convert the schema with the preservation of all the data, as well as update the data itself, de-implementing them, initializing new fields and serializing them back. Both the logical complexity of the operation and the time required for updating are increasing.

Synchronize access to entity instances

One of the main drawbacks of the key-value approach, it seems to me, is that in order to change just one field in essence, we need to deserialize the entire object. This greatly complicates access to objects. For example, in the case when a group is serialized into the database together with all students, in order to change the name of one of the students, we need to remove the entire group from the database, change one last name and save it back. If there are several streams, services and / or content providers in the application that can work with the same entities, the task is many times more complicated. The more potential “writers”, the more locks will arise and the more difficult it will be for us to synchronize access to objects. In the case of the standard approach, this problem is solved at the DBMS level.

Performance

On the one hand, the key-value approach allows for higher performance when sampling small amounts of data. The number of joins is reduced, a specific query and the DBMS as a whole works faster. On the other hand, with large amounts of data, if we need to filter or sort these data by a field that is serialized along with the entire object, then to perform this operation, we will first need to read all the entities, and only then filter everything that is unnecessary, which can lead not to performance gain, but to its further deterioration. Alternatively, you can store the fields involved in the request filtering or sorting standard approach, and the rest of the essence in the form BLOBa, but then this mess will be difficult to maintain.

Amount of code

In the standard approach, the number of SQL code, various scripts for creating and modifying the database schema, queries and conditions, DAO objects, etc., increase. In the key-value, the amount of such a code is reduced, but the amount of code that performs various sorting, grouping and filtering by conditions increases, because all this has to be done “manually”, when using the standard approach the DBMS does it, and we only need to write the required query.

Serialization

The minus key-value of the approach may consist in a drop in performance associated with the use of standard Java serialization / deserialization, which is known not to have high speed. Here, as an alternative, you can use one of the libraries that solve this problem, for example, protobuf from Google. In addition to speed, an additional advantage, in the case of protobuf, is the versioning, since This protocol supports object versioning.

Conclusion

It turned out a bit messy, but in general, I wanted to say: both approaches are good, you need to choose according to the situation, considering all the pros and cons listed. As a rule, if there are no problems with performance, then it is better to use a standard approach with greater flexibility. If these problems start to occur, try using denormalization. Perhaps, if there are only a few critical sections in the program, this can solve everything. If you experience permanent performance problems when denormalization does not help, you should take a closer look at the key-value approach.

Source: https://habr.com/ru/post/145442/

All Articles