Trafaret - library for checking and converting data

Baseline data - you build a certain service, and you will know that you will receive data from the outside in a certain format.
Suppose that this is JSON, the structure is not determined by you, and in general here it is:

sample_data = { 'userNameFirst': 'Adam', 'userNameSecond': 'Smith', 'userPassword': 'supersecretpassword', 'userEmail': 'adam@smith.math.edu', 'userRoles': 'teacher, worker, admin', }

Inside the project, of course, I would like the structure to be different:

 import hashlib desired_data = { 'name': 'Adam', 'second_name': 'Smith', 'password': hashlib.md5('supersecretpassword').hexdigest(), 'email': 'adam@smith.math.edu', 'roles': ['teacher', 'worker', 'admin'], }

')
Apparently you have to convert to the internal format, try this:

 new_data = { 'name': sample_data['userNameFirst'], 'second_name': sample_data['userNameSecond'], 'password': hashlib.md5(sample_data['userPassword']).hexdigest(), 'email': sample_data['userEmail'], 'roles': [s.strip() for s in sample_data['userRoles'].split(',')] } assert new_data == desired_data, 'Uh oh'

But then the suspicion creeps in that the sample sent does not include all possible fields. We take this into account and write a more flexible version:

 FIELDS = { 'userNameFirst': 'name', 'userNameSecond': 'second_name', 'userEmail': 'email', } new_data = dict((n2, sample_data[n1]) for n1, n2 in FIELDS.items()) new_data['roles'] = [s.strip() for s in sample_data['userRoles'].split(',')] new_data['password'] = hashlib.md5(sample_data['userPassword']).hexdigest() assert new_data == desired_data, 'Uh oh'

Not bad, flexible, easily expanded to a large number of fields. As soon as the full specification becomes available, we will easily extend our code to it.

And then a small update comes - userEmail is an optional field. A plus is added to the userTitle field, which by default, if not transmitted, should be 'Bachelor'.
Our hands are not for boredom, in anticipation of when the full information finally arrives, we take into account the possibility of optional fields and default values.

 desired_data['title'] = 'Bachelor' #      FIELDS = { 'userNameFirst': 'name', 'userNameSecond': 'second_name', 'userEmail': ('email', '__optional'), 'userTitle': ('title', 'Bachelor'), } new_data = {} for old, new in FIELDS.items(): if isinstance(new, tuple): new, default = new if old not in sample_data: if default == '__optional': continue new_data[new] = default else: new_data[new] = sample_data[old] new_data['roles'] = [s.strip() for s in sample_data['userRoles'].split(',')] new_data['password'] = hashlib.md5(sample_data['userPassword']).hexdigest() assert new_data == desired_data, 'Uh oh'

Damn, there are so few fields and so much code. It was easier, better to decide while in the forehead.

 new_data = { 'name': sample_data['userNameFirst'], 'second_name': sample_data['userNameSecond'], 'password': hashlib.md5(sample_data['userPassword']).hexdigest(), 'roles': [s.strip() for s in sample_data['userRoles'].split(',')] } if 'userEmail' in sample_data: new_data['email'] = sample_data['userEmail'] new_data['title'] = sample_data.get('userTitle', 'Bachelor') assert new_data == desired_data, 'Uh oh'

Ah, good familiar code, without undue complexity, fine. But what will be when the full specification comes? Apparently, let's go back to the second option, add data checking to it, good error messages, pack it into the library and use it.
Hmm, but there is already such a library, see:

 import trafaret as t hash_md5 = lambda d: hashlib.md5(d).hexdigest() comma_to_list = lambda d: [s.strip() for s in d.split(',')] converter = t.Dict({ t.Key('userNameFirst') >> 'name': t.String, t.Key('userNameSecond') >> 'second_name': t.String, t.Key('userPassword') >> 'password': hash_md5, t.Key('userEmail', optional=True) >> 'email': t.Email, t.Key('userTitle', default='Bachelor') >> 'title': t.String, t.Key('userRoles') >> 'roles': comma_to_list, }) assert converter.check(sample_data) == desired_data

Take here github.com/Deepwalker/trafaret

The full code of this topic as a script is here gist.github.com/2023370

A small addition at the request of nimnull errors.

Errors need to be caught and human language to report them in response. Usually, there may be exceptions, of course, a person can correct a mistake, and therefore it is necessary to write a message so that the addressee understands.

In the examples above, errors will normally be generated only in the latter, because in the rest I did not complicate the code. I decided that errors are still about validation, and in trafaret this is just a base, a trafaret about data conversion.

More specifically, github.com/barbuza/contract , from which the trafaret was made, and it was about validation and nothing more. Good thing, clearly performs the task. But I had a slightly different task, and the impression of '>>' from funcparserlib. Actually, in funcparserlib, ">>" makes it absolutely the same as in the stencil, it passes the collected data to the user function for processing.

Let's go back to the mistakes. Stencil errors are instances of trafaret.DataError. Each DataError has an error attribute. For simple types such as Float, Int, String, and etc., this is a string describing the error in English. For Dict, Mapping and List is a dictionary. For Dict and Mapping it is obvious - the dictionary elements are errors collected from field checks. In the case of a List, the keys will be numbers — the item position numbers. The remaining options for the organization look inappropriate.

That is an example:

 >>> import trafaret as t >>> c = t.Dict({'a': t.List(t.Int)}) >>> c.check({'a': [4, 5]}) {'a': [4, 5]} >>> c.check({'a': [4, 'a', 6]}) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "trafaret/__init__.py", line 110, in check return self._convert(self._check_val(value)) File "trafaret/__init__.py", line 804, in _check_val raise DataError(error=errors) trafaret.DataError: {'a': DataError({1: DataError(value cant be converted to int)})}

Yes, as you can see from the example, the error should be caught, but now it is not about that - we received nested errors, and thanks to this we can determine exactly where and what errors we have.

There is a small helper to translate the error into a more convenient form:

 >>> t.extract_error(c, {'a': [4, 'a', 6]}) {'a': {1: 'value cant be converted to int'}}

And even more convenient, for some purposes:

 >>> from trafaret.utils import unfold >>> unfold(t.extract_error(c, {'a': [4, 'a', 6]}), prefix='form') {'form__a__1': 'value cant be converted to int'}

About the last example - no, the stencil with forms does not work. That is, it does not contain a single widget, and does not build forms using mappers / alchemy tables or one wonderful form. But it is quite possible to check the data coming from the HTML form.

That's probably all, ask questions, I can add.

In conclusion, I will throw a hardcore example of use and flexibility:

 >>> todt = lambda m: datetime(*[int(i) for i in m.groups()]) >>> (t.String(regex='^year=(\d+),month=(\d+),day=(\d+)$') >> todt).check('year=2011,month=07,day=23') datetime.datetime(2011, 7, 23, 0, 0)

Source: https://habr.com/ru/post/139927/

All Articles

Trafaret - library for checking and converting data

More articles: