📜 ⬆️ ⬇️

We deduce MySQL from an environment

As soon as your information system becomes operational (prduction), it becomes necessary to have at least two copies of its database. The first, backup, with some frequency is created using regular utilities and is a consistent dump (consistent dump). The purpose of its creation is to restore the system after a disaster (disaster recovery).

We will consider the creation of the second copy, which is necessary to continue working on the project. The article is aimed at developers just entering the thorny path of quality management. And it is useless to those who already know that the “second copy” is not really the second, and not quite the copy.

Divide and rule


The industry recommends 3-4 independent environments for information systems. In addition to the above-mentioned worker, these are the environments: development, testing (test) and simulation (staging).

Since the deployment and maintenance of each environment are direct costs, their use is determined by the specifics of the project. The higher the cost of an error in the working environment, the more environments are included in the development process. And vice versa. The latter two can be replaced by one testing environment, if the process does not provide for separate acceptance (acceptance) tests or a demonstration. Similarly, all three can be the same development environment, unless the process involves regression testing or continuous integration. In the limit, all three may not exist at all. For example, for a static website, an unknown company, updated by ftp.
')
In addition to games with functionality and bug fixes, the presence of independent environments allows you to scale the project team. When a programmer programs and the administrator administers, that’s good. This is called segregation of duties. It is recommended to close the developer to the development environment, and the tester to the testing environment. It is believed that such an organizational and technical approach reduces operational risks (operational risk), therefore, it greatly pleases the owners of information systems.

So what's up with the second copy? The easiest and most obvious way to get it is to copy the backup. And from it already raise any additional environment where necessary.

You can not just take


and copy the database from the worker to the development or testing environment.

The fact is that at present the key risk of information systems is the leakage of non-public data. For example, according to a Verizon's 2013 Data Breach Investigations Report study, out of 47,000 incidents, 69% of data leaks were detected by third parties. That is, a data leak is found somewhere by outsiders, including our customers. And more than half of the cases of insider accounted for by former employees who used their forgotten active accounts or backdoors.

Therefore, only those who work directly with it should have direct access to the working environment. For example, the database administrator. But not all programmers of the company “because it's easier to debug”. Yes, a non-disclosure agreement (NDA) in a contract is a good preventive measure. But only organizational, and therefore insufficient. Organizational measures should be supported by technical control.

Data becomes more expensive to maintain


More and more protection requirements are imposed on them. From voluntary compliance (for example, to ISO27K standards) up to certification by a local regulator (in the EU, this is the Office of the Data Protection). Yes, in the end, it all comes down to minimizing the damage to the company from the leakage of non-public data. Moreover, the protection of data about people has already overshadowed the protection of commercial secrets. And the use of cloud storages is unknown where and their service is unknown by whom, it only adds to the problem of severity.

The standard practice of protecting data outside of the working environment is to anonymize it (sanitization). The data to be protected is grouped by type. For example, bank account number, date of birth, last name, encrypted password. Next, one of the anonymization techniques is applied to them - full or partial masking, cleaning, mixing, substitution, encryption or hashing. We will not consider them here, this is the topic of a separate article. As a bonus, if you write anonymization of invoices once, pay attention to Sample Account Data. There are both valid and non-valid bank data by country.

Data is getting bigger in volume.


The manual method for deploying a test base is:

To transfer the “clean” database further, to the development environment or to the outside, its backup copy is made.

The advantages of this approach are obvious: high speed on small volumes and consistency of relational data at the output. But there are drawbacks that manifest themselves in full growth on volumes from tens of gigabytes:


Decision


In preparation for the Java exam, the org.crystalcopy utility for MySQL was written, devoid of these disadvantages and advantages.
The utility supports tables, records, triggers, stored procedures, indexes and foreign keys. Suitable for:

With partial copying, relational relationships between fields are preserved, but the consistency of all records is not guaranteed. The speed of a full copy from one database to another on my machine is about 2GB per hour. The utility is launched from the command line, it can be embedded in the build of continuous integration. Available on the Internet under a non-commercial license.

Status Beta. So be happy to receive your feedback!

Source: https://habr.com/ru/post/224821/


All Articles