There is such a step in the development of a language when its compiler is written in it.
To prove the coolness of the trafaret library I also decided to do something the same.
recursive where you have to go deeper.
We write on the stencil parser Json Schema, which will return
ready stencil for checking documents in accordance with this description.
That is a certain object of type Trafaret, if you feed it a valid json schema document
returns Trafaret object at the output to which you can feed documents
matching the description.
How do you do this with a validation library? Well, the validation library does, but the universal transformer is easy. Trafaret is a data parser, more precisely a combinatorial parser. Only those parsers about which you have heard, such as parsec and funcparserlib, parse the lines, and Trafaret will parse everything that comes to hand and what the author of talents will do is encode.
Json Schema is described as a pack of documents, of which the closest to the people is probably this one - http://json-schema.org/latest/json-schema-validation.html
There is a description of a set of keywords with the help of which you can describe the criteria for correctness of a document, but a wonderful and subtle in implementing $ ref just in one place in passing.
In the json scheme in the base case, the implementation is quite simple - all keywords, such as maximum
(maximum value for a number), pattern
(the regular schedule by which the string should be checked), items
(child scheme or an array of schemes for checking array elements).
So, all these keywords should be used separately. We met the maximum, we will immediately check this number for compliance with the upper limit. That is, you can take the scheme, for example:
{ "type": "number", "maximum": 45, }
and disassemble it into components, just a list of checks all of which must pass.
validations = [] for key, value in schema: if key == 'type': if value == 'number': validations.append(is_a_number)
What the hell, and another scheme would be to check. Perhaps we’ll finish with kneaded examples, let's start writing a parser. Json Schema is a dictionary, object, map, shorter keys and values in curly brackets {}
. So we will check the dictionary, try:
import trafaret as t # trafaret json_schema_type = t.Enum('null', 'boolean', 'object', 'array', 'number', 'integer', 'string') json_schema_keys = t.Dict( t.Key('type', optional=True, trafaret=json_schema_type), t.Key('maximum', optional=True, trafaret=t.Int()), )
We took only a couple of keywords in order not to bore the reader with scrolling. Validation will work, but as a result we need not only to check the schema, but also to get validators. And this is exactly what trafaret, unlike many, does two times, but you have to think a little.
There is an operation &
, takes two stencils and applies the second one to the output of the first one if not
validation errors, that is, of the type:
check_password = ( t.String() & (lambda value: value if value == 'secret' else t.DataError('Wrong password')) )
If the input does not pass the check_password(123)
, then at the output we will immediately receive a message stating that the value is not a string and will not be checked for compliance with the string 'secret'.
To check any python values for equality in the stencil there is an Atom.
And it would be possible to describe types like:
json_schema_type = ( t.Atom('null') & t.Null() | t.Atom('boolean') & t.Bool() )
But this is not what we want. We want to return the stencil and not to apply it immediately
with the obviously erroneous variant - the string 'null' is definitely not None.
We write a helper, which is also a stencil, and returns the specified stencil:
def just(trafaret): """Returns trafaret and ignoring values""" def create(value): return trafaret return create
And apply:
json_schema_type = ( t.Atom('null') & just(t.Null()) | t.Atom('boolean') & just(t.Bool()) | t.Atom('object') & just(t.Type(dict)) | t.Atom('array') & just(t.Type(list)) | t.Atom('number') & just(t.Float()) | t.Atom('integer') & just(t.Int()) | t.Atom('string') & just(t.String()) )
Now the json_schema_type('null')
call will return an instance of t.Null()
. Parser began to spawn
final result.
The first difficulty level is passed, we implemented type
. With joy, we do the same way enum
, const
, multipleOf
, maximum
etc.
Almost all keywords in the json scheme are independent, but some of them all depend on each other. These are keywords for arrays and objects. additionalItems
is a child schema for checking array elements that are not described in items
. That is, for example, "items": [{"type":"string"}, {"type":"bool"}]
checks the first two elements, but if there are 3 or more of them in the checked document, then they should be checked via additionalItems
, if specified, or
this is a mistake in itself.
The second case is additionalProperties
. For checking objects in the json scheme, properties
and patternProperties
, and for everything that is not described in the first two, additionalProperties
used.
This is, in principle, a long-resolved topic in stencil making, special keys are used, but not 100% of the population are still engaged in stencil making, so let's stop a little more.
To check the dictionaries in the stencil is not quite a standard approach. In fact
the dictionaries in the stencil are dealt with by the keys, in particular, the Key
, and the Dict
itself
This is a binding around which collects the results of the execution of all keys on the given object.
The key type in terms of mypy looks like this:
KeyT = Callable[ # __call__ [Mapping], # Mapping (.. dict ) Sequence[ # , Tuple[ # str, # – Union[Any, t.DataError], # - DataError Sequence[str] # ] ] ]
We look carefully three times and pay attention to the last line before the bracket - the key reports about all the keys of the dictionary that he pulled. That is, the key can pull a bunch of keys, as well as return any number of keys. To know which keys of the dictionary yanked the key is necessary to find out if there are extra or additional elements in the dictionary.
It follows that the key can immediately take a bunch of keys and check them at once. It was generally a decision what to do with password
& password_confirmation
when the keys themselves are so independent. But in our case, the task is somewhat more cunning than comparing the two keys for equality, and the inherent flexibility still does not allow it.
Meet the subdict
:
def subdict(name, *keys, **kw): trafaret = kw.pop('trafaret') # coz py2k def inner(data, context=None): errors = False preserve_output = [] # , touched = set() collect = {} for key in keys: for k, v, names in key(data, context=context): touched.update(names) preserve_output.append((k, v, names)) if isinstance(v, t.DataError): errors = True else: collect[k] = v if errors: for out in preserve_output: yield out elif collect: # yield name, t.catch(trafaret, **collect), touched return inner
And something like this is applied in the depths of trafaret_schema
:
subdict( 'array', t.Key('items', optional=True, trafaret=ensure_list(json_schema)), t.Key('additionalItems', optional=True, trafaret=json_schema), trafaret=check_array, ),
Well, the state is not the state, but since birth the stencil was completely functional in its soul. Everything that gets on the conveyor does not affect the neighbors. And this is great! Our good-natured and never elite adherents of the functionals chewed it in terms of mathematics many times.
But at the next level of json scheme we are met by a mega boss - $ref
. A very reasonable thing, allows you to refer to another scheme already defined somewhere. For example, the schema may be defined in definitions
or in general in another document.
So in the process of parsing the scheme, all the definitions of the scheme with their addresses we need to be collected in one place. Then check that all $ ref met in the document have definitions. And in the process of execution it is already clear - the stencil for $ ref just pulls the stencil from the registry.
Well, write the registry once to spit:
class Register: def __init__(self): " " pass
But then I had to refine the stencil and, in addition to the standard value
argument, you can now also chain up the standard stencils with context=Any
. And actually our just-written Register
is this very context.
We use something like this to define a stencil for $ref
:
def ref_field(reference, context=None): register = context # context register.reg_reference(reference) # $ref, def inner(value, context=None): # , schema = register.get_schema(reference) # return schema(value, context=context) # return inner
Of course the most confusion is to collect references to the utmost subcircuits.
Here is an example for a child scheme, the link to which I want to save:
def deep_schema(key): # def inner(data, context=None): register = context register.push(key) # key # — push, # try: schema = json_schema(data, context=register) register.save_schema(schema) # , return schema finally: register.pop() return t.Call(inner)
The best of two worlds - json scheme is widespread and supported by any languages. A stencil is the best transformation library with checks under python. More precisely the only one. And most importantly, for the keyword format
you can slip any stencil like this:
import trafaret as t from trafaret_schema import json_schema, Register my_reg = Register() my_reg.reg_format('any_ip', t.IPv4 | t.IPv6) check_address = json_schema(open('address.rjson').read(), context=register) check_person = json_schema(open('person.json').read(), context=register)
trafaret_schema
working, ready to use, write if something trafaret_schema
wrong, we will rule. Look at https://github.com/Deepwalker/trafaret_schema or pip install trafaret_schema
.Trafaret
received context support, and at the same time, async/await
, toSource: https://habr.com/ru/post/336282/
All Articles