Generating dummy data with Mimesis: Part II

We have previously published an article on how to generate dummy data using Mimesis, a library for the Python programming language. The article you are reading is a continuation of the previous one, because we will not give the basics of working with the library. If you missed the article, were too lazy to read or simply did not want to, then you probably want to now, because this article assumes that the reader is already familiar with the basics of the library. In this part of the article we will talk about best practice, we will talk about several, in our opinion, useful features of the library.

Remarque

First of all, I would like to note that Mimesis was not designed for use with a specific database or ORM. The main task that the library solves is to provide valid data. For this reason, there are no strict rules for working with the library, but there are recommendations that will help keep your test environment in order and will prevent the growth of entropy in the project. The recommendations are quite simple and fully consistent with the spirit of Python (if not, then we are waiting for comments).

Structuring

Contrary to the above statement that the library is not sharpened for use with a specific database or ORM, the need for test data most often occurs just in web applications that perform some operations (most often CRUD) with a database. We have some recommendations on the organization of test data generation for web applications.

Functions that perform data generation and write to the database should be kept next to the models, and even better, as static methods of the model to which they belong, following the example of the _bootstrap() method from the previous article. This is necessary to avoid running through the files, when the structure of the model changes and you need to add some new field. The Patient() model from the previous article well demonstrates the idea:

 class Patient(db.Model): id = db.Column(db.Integer, primary_key=True) email = db.Column(db.String(120), unique=True) phone_number = db.Column(db.String(25)) full_name = db.Column(db.String(100)) weight = db.Column(db.String(64)) height = db.Column(db.String(64)) blood_type = db.Column(db.String(64)) age = db.Column(db.Integer) def __init__(self, **kwargs): super(Patient, self).__init__(**kwargs) @staticmethod def _bootstrap(count=500, locale='en', gender): from mimesis import Personal person = Personal(locale) for _ in range(count): patient = Patient( email=person.email(), phone_number=person.telephone(), full_name=person.full_name(gender=gender), age=person.age(minimum=18, maximum=45), weight=person.weight(), height=person.height(), blood_type=person.blood_type() ) db.session.add(patient) try: db.session.commit() except IntegrityError: db.session.rollback()

Keep in mind that the example above is a Flask application model that uses SQLAlchemy. The organization of dummy data generators for applications created using other frameworks is similar.

Creating objects

If your application expects data in one specific language and only in it, then it is best to use the Generic() class, which provides access to all provider classes through a single object, rather than generating multiple instances of provider classes separately. Using Generic() you get rid of extra lines of code.

Right:

 >>> from mimesis import Generic >>> generic = Generic('ru') >>> generic.personal.username() 'sherley3354' >>> generic.datetime.date() '14-05-2007'

Wrong:

 >>> from mimesis import Personal, Datetime, Text, Code >>> personal = Personal('ru') >>> datetime = Datetime('ru') >>> text = Text('ru') >>> code = Code('ru')

At the same time, it is true:

 >>> from mimesis import Personal >>> p_en = Personal('en') >>> p_sv = Personal('sv') >>> # ...

Ie, importing classes of providers separately makes sense, only if you are limited to only the data that is available, the class you imported, in other cases it is recommended to use Generic() .

Writing data to the database

If you need to generate data and write it to the database, we strongly recommend generating data in chunks, rather than 600k at a time. It must be remembered that there may be some restrictions on the part of the database, ORM, etc. The smaller the portions of data generated for recording, the faster the recording.

Good:

 >>> User()._bootstrap(count=2000, locale='de')

Very bad:

 >>> User()._bootstrap(count=600000, locale='de')

Loading images

The Internet() class has several methods that generate image references. For testing, there are quite enough links to images located on remote resources, however, if you still want to have a set of random images locally, then you can download images generated by corresponding methods of the Internet() class using the download_image() function from the utils module:

 >>> from mimesis.utils import download_image >>> from mimesis import Internet >>> img_url = Internet().stock_image(category='food', width=1920, height=1080) >>> download_image(url=img_url, save_path='/some/path/')

Custom providers

The library supports a large amount of data and in most cases it will be quite sufficient, but for those who want to create their providers with more specific data, this feature is supported and it is done as follows:

 >>> from mimesis import Generic >>> generic = Generic('en') >>> class SomeProvider(): ... class Meta: ... name = "some_provider" ... ... @staticmethod ... def one(): ... return 1 >>> class Another(): ... @staticmethod ... def bye(): ... return "Bye!" >>> generic.add_provider(SomeProvider) >>> generic.add_provider(Another) >>> # ... >>> generic.some_provider.one() 1 >>> generic.another.bye() 'Bye!'

Everything is simple and clear without comments, so we’ll only clarify one thing - the name attribute, the Meta class, is the name of the class through which the methods of the user-class provider will be accessed. By default, the class name is the name of the class in lowercase.

Builtin providers

Most countries where a particular language is official have data that is specific only to these countries. For example, CPF for Brazil, SSN for the USA. This kind of data can cause inconvenience and disrupt the order (or at least annoy) by being present in all objects, regardless of the chosen language standard. You can see for yourself what you said if you look at an example of how it would look (the code will not work):

 >>> from mimesis import Personal >>> person = Personal('ru') >>> person.ssn() >>> person.cpf()

I think everyone will agree that it looks quite bad. We, being perfectionists, made sure that the Brazilian CPF did not bother the “Pole” and for this reason the provider classes providing this kind of mimesis.builtins specific data were put into a separate mimesis.builtins ( mimesis.builtins ) to keep the structure common to all languages classes and their objects.

So it works:

 >>> from mimesis import Generic >>> from mimesis.builtins import BrazilSpecProvider >>> generic = Generic('pt-br') >>> class BrazilProvider(BrazilSpecProvider): ... ... class Meta: ... name = "brazil_provider" ... >>> generic.add_provider(BrazilProvider) >>> generic.brazil_provider.cpf() '696.441.186-00'

In general, you do not need to add built-in classes to the Generic() object. In the example, this is done only to demonstrate in which cases it would be appropriate to add the built-in provider class to the Generic() object. You can use it directly, as shown below:

 >>> from mimesis.builtins import RussiaSpecProvider >>> ru = RussiaSpecProvider() >>> ru.patronymic(gender='female') '' >>> ru.patronymic(gender='male') ''

In what data most often there is a need for your work? What is missing in the library and what should be added immediately? We would be very happy to hear your wishes / recommendations / comments.

Link to the project: here .
To the documentation link: here .
On the first part of the article: here .

I have it all, my friends. You successful tests and may the force be with you!

Source: https://habr.com/ru/post/319880/

All Articles