LIVR - “language independent validation rules” or data validation without “problems”

Each programmer has repeatedly encountered the need to validate user input. Being engaged in web development for more than 10 years, I tried a lot of libraries, but did not find the only one that would solve the tasks I set.

Main problems encountered in data validation libraries

Problem number 1. Many validators check only the data for which validation rules are described. For me, it is important that any user input that is not explicitly allowed is ignored. That is, the validator must cut out all the data for which the validation rules are not described. This is simply a fundamental requirement.
')
Problem number 2. A procedural description of validation rules. I don’t want to think about the validation algorithm every time, I just want to declare declaratively what the correct data should look like. In fact, I want to set the data schema (why not the “JSON Schema” - at the end of the post).

Problem number 3. Description of the validation rules in the form of code. It would seem that it is not so bad, but it immediately negates all attempts to serialize the rules of validation and the use of the same rules of validation on the backend and the frontend.

Problem number 4 . Validation stops at the first field with an error. This approach makes it impossible to highlight at once all the erroneous / required fields in the form.

Problem number 5. Non-standardized error messages. For example, "Field name is required". This error I can not show the user for several reasons:

field in the interface can be called quite differently
interface may not be in English
need to distinguish between the type of error. For example, errors on the empty value show in a special way

That is, you do not need to return an error message, but standardized error codes.

Problem number 6. Numeric error codes. This is just inconvenient to use. I want the error codes to be intuitive. Agree that the error code "REQUIRED" is clearer than the code "27". The logic is similar to working with exception classes.

Problem number 7. There is no way to check hierarchical data structures. Today, at the time of different JSON API, you just can't do without it. In addition to the actual validation of hierarchical data, it is necessary to provide for the return of error codes for each field.

Problem number 8. Limited set of rules. Standard rules are always lacking. The validator must be extensible and allow the addition of rules of any complexity.

Problem number 9. Too wide area of responsibility. The validator should not generate forms, should not generate code, should not do anything except validation.

Problem number 10. Inability to conduct additional data processing. Almost always, where there is validation, there is a need for some additional (often preliminary) data processing: cut out forbidden characters, bring in lower case, remove extra spaces. Especially important is the removal of spaces at the beginning and at the end of the line. In 99% of cases, they are not needed. I know that I said before that the validator should not do anything except validation.

3 years ago, it was decided to write a validator, which will not have all the problems described above. This is how LIVR (Language Independent Validation Rules) appeared. There are implementations in Perl, PHP, JavaScript, Python (we do not write in python - I cannot give feedback on it). The validator has been used in production for several years in almost every project of the company. The validator works both on the server and on the client. You can play with the validator here - webbylab.imtqy.com/livr-playground .

The key idea was that the kernel of the validator should be minimal, and all the validation logic is in the rules (or rather, in their implementation). That is, for the validator there is no difference between the “required” rules (checks for the presence of a value), “max_length” (checks the maximum length), “to_lc” (cites data in lower case), “list_of_objects” (helps to describe the rules for array of objects).

In other words, the validator knows nothing:

about error codes
that he can validate hierarchical objects
that he can convert / clean data
about many other things

All this is the responsibility of the validation rules.

LIVR specification

Since the task was to make the validator independent of a programming language, such as mustache / handlebars, but only in the world of data validation, we began with writing a specification.

Specification Objectives:

Standardize data description format.
Describe the minimum set of validation rules that must be supported by each implementation.
Standardize error codes.
To be the uniform basic documentation for all implementations.
Have a set of test data that allows you to check the implementation for compliance with specifications

The specification is available at livr-spec.org.

The basic idea was that the description of the validation rules should look like a data scheme and be as close to the data as possible, only instead of the rule values.

Example of description of validation rules for authorization form ( demo ):

{ email: ['required', 'email'], password: 'required' }

Example of validation rules for registration form ( demo ):

 { name: 'required', email: ['required', 'email'], gender: { one_of: ['male', 'female'] }, phone: {max_length: 10}, password: ['required', {min_length: 10} ] password2: { equal_to_field: 'password' } }

Example of validation of a nested object ( demo ):

 { name: 'required', phone: {max_length: 10}, address: { 'nested_object': { city: 'required', zip: ['required', 'positive_integer'] }} }

Validation Rules

How are the rules of valicia described? Each rule consists of a name and arguments (practically, as a function call) and is generally described as follows {"RULE_NAME": ARRAY_OF_ARGUMENTS}. For each field, an array of rules is described, which are applied in order.

For example,

 { "login": [ { length_between: [ 5, 10 ] } ] }

That is, we have a “login” field and a “length_between” rule, which has 2 arguments (“5” and “10”). This is the most complete form, but the following simplifications are allowed.

If the rule to the field is one, then the array is optional
If a rule has one argument, then you can only pass it (without framing the array)
If the rule has no arguments, then you can simply write the name of the rule.

All 3 entries are identical:

 "login": [ { required: [] } ]

 "login": [ "required" ]

 "login": "required"

More detailed in the specification in the section "How it works".

Supported Rules

All rules can be divided into 3 global groups:

Rules that validate data (numbers, strings, etc.). For example, "max_length".
Rules that allow you to make more complex rules with simpler ones. For example, "nested_object".
Rules that convert data. For example, "to_lc"

but the validator itself does not distinguish between them, for them they are all equal.

Here is a general list of rules that should be supported by each validator implementation:

Basic rules

required - the field is required and the value must not be empty
not_empty - the field is optional, but if it is, it cannot be empty
not_empty_list - the value must contain a non-empty array

Rules for checking strings

one_of
max_length
min_length
length_between
length_equal
like

Rules for checking numbers

integer
positive_integer
decimal
positive_decimal
max_number
min_number
number_between

Rules for special formats

email
url
iso_date
equal_to_field

Rules for describing more complex rules (meta-rules)

nested_object - describes the rules for the nested object
list_of - describes the rules with which each element of the list must comply
list_of_objects - the value should be an array of objects in the required format
list_of_different_objects - use when you need to check an array of objects of different types.

Rules for data conversion (names begin with a verb)

trim - removes spaces at the beginning at the end
to_lc - results in lower case
to_uc - results in upper case
remove - removes the specified characters
leave_only - leaves only the specified characters

Meta-rule

An example and error codes for each rule can be found in the LIVR specification. We will dwell a little more on meta-rules only. Meta-rules are rules that allow you to combine simple rules into more complex ones for validating complex hierarchical data structures. It is important to understand that the validator does not distinguish between simple rules and meta-rules. Meta-rules are no different from the same “required” (yes, I repeat).

nested_object
Allows you to describe validation rules for nested objects. You will use this rule all the time.
The error code depends on the nested rules. If the attached object is not a hash (dictionary), then the field will contain the error: “FORMAT_ERROR”.
Example of use ( demo ):

 address: { 'nested_object': { city: 'required', zip: ['required', 'positive_integer'] }}

list_of
Allows you to describe validation rules for a list of values. Each rule will be applied to each item in the list.
The error code depends on the nested rules.
Example of use ( demo ):

 { product_ids: { 'list_of': [ 'required', 'positive_integer'] }}

list_of_objects
Allows you to describe the validation rules for an array of hashes (dictionaries). Similar to nested_object, but waiting for an array of objects. The rules apply to each element in the array.
The error code depends on the nested rules. In case the value is not an array, the code “FORMAT_ERROR” will be returned for the field.
Example of use ( demo ):

 products: ['required', { 'list_of_objects': { product_id: ['required','positive_integer'], quantity: ['required', 'positive_integer'] }}]

list_of_different_objects
Similar to “list_of_objects”, but it happens that the array that comes to us contains objects of different types. The type of an object we can determine by some field, for example, “type”. “List_of_different_objects” allows you to describe the rules for a list of objects of different types.
The error code depends on the nested validation rules. If the nested object is not a hash, then the field will contain the error “FORMAT_ERROR”.
Example of use ( demo ):

 { products: ['required', { 'list_of_different_objects': [ product_type, { material: { product_type: 'required', material_id: ['required', 'positive_integer'], quantity: ['required', {'min_number': 1} ], warehouse_id: 'positive_integer' }, service: { product_type: 'required', name: ['required', {'max_length': 20} ] } } ]}] }

In this example, the validator will look at the “product_type” in each hash and, depending on the value of this field, will use the appropriate validation rules.

Error format

As already mentioned, the rules return string error codes that are understandable to the developer, for example, “REQUIRED”, “WRONG_EMAIL”, “WRONG_DATE”, and so on. Now the developer can understand what the error is, it remains convenient to convey in which fields it originated. For this, the validator returns a structure similar to that passed to it for validation, but it contains only fields in which errors occurred and, instead of initial values in the fields, string error codes.

For example, there are rules:

 { name: 'required', phone: {max_length: 10}, address: { 'nested_object': { city: 'required', zip: ['required', 'positive_integer'] }} }

and data for validation:

 { phone: 12345678901, address: { city: 'NYC' } }

we get the following error at the output

 { "name": "REQUIRED", "phone": "TOO_LONG", "address": { "zip": "REQUIRED" } }

demo validation

REST API and error format

Returning sane errors always requires additional efforts from developers. And very few REST APIs that give detailed information in errors. Often it's just “Bad request” and that's it. I would like to look at the error, to which field it belongs, and just the field paths are not enough, because the data can be hierarchical and contain arrays of objects ... In our company, we proceed as follows - absolutely for each request we describe the validation rules using LIVR. In case of a validation error, we return the error object to the client. The error object contains the global error code and the error received from the LIVR validator.

For example, you transfer data to the server:

 { "email": "user_at_mail_com", "age": 10, "address": { "country": "USQ" } }

and in return you receive ( demo validation on livr playground ):

 {"error": { "code": "FORMAT_ERROR", "fields": { "email": "WRONG_EMAIL", "age": "TOO_LOW", "fname": "REQUIRED", "lname": "REQUIRED", "address": { "country": "NOT_ALLOWED_VALUE", "city": "REQUIRED", "zip": "REQUIRED" } } }}

This is much more informative than some kind of “Bad request”.

Work with pseudonyms and register your own rules

The specification contains only the most used rules, but each project has its own specifics and there are always situations when there are not enough rules. In this regard, one of the key requirements for a validator was the possibility of its extension with its own rules of any type. Initially, each implementation had its own mechanism for describing rules, but starting with the specification of version 0.4, we introduced a standard way of creating rules based on other rules (creating aliases), which covers 70% of situations. Consider both options.

Creating an alias
The way in which the pseudonym is registered depends on the implementation, but how the pseudonym is described is regulated by the specification. This approach, for example, allows you to serialize pseudonym descriptions and use them with different implementations (for example, on the Perl-backend and JavaScript frontend)

 //   "valid_address" validator. registerAliasedRule({ name: 'valid_address', rules: { nested_object: { country: 'required', city: 'required', zip: 'positive_integer' }} }); //   "adult_age" validator.registerAliasedRule( { name: 'adult_age', rules: [ 'positive_integer', { min_number: 18 } ] }); //   ,   . { name: 'required', age: ['required', 'adult_age' ], address: ['required', 'valid_address'] }

Moreover, you can set your own error codes for the rules.

For example,

 validator.registerAliasedRule({ name: 'valid_address', rules: { nested_object: { country: 'required', city: 'required', zip: 'positive_integer' }}, error: 'WRONG_ADDRESS' });

and in case of an error during address validation, we get the following:

 { address: 'WRONG_ADDRESS' }

Registration of a full-fledged rule on the example of JavaScript implementation
For validation, callback functions are used to validate values. Let's try to describe a new rule called “strong_password”. We will check that the value is more than 8 characters and contains numbers and letters in upper and lower case.

 var LIVR = require('livr'); var rules = {password: ['required', 'strong_password']}; var validator = new LIVR.Validator(rules); validator.registerRules({ strong_password: function() { return function(val) { //   .           "required" if (val === undefined || val === null || val === '' ) return; if ( length(val) < 8 || !val.match([0-9]) || !val.match([az] || !val.match([AZ] ) ) { return 'WEAK_PASSWORD'; } return; } } });

Now we add the ability to set the minimum number of characters in the password and register this rule as global (available in all instances of the validator).

 var LIVR = require('livr'); var rules = {password: ['required', {'strong_password': 10}]}; var validator = new LIVR.Validator(rules); var strongPassword = function(minLength) { if (!minLength) throw "[minLength] parameter required"; return function(val) { //   .           "required" if (val === undefined || val === null || val === '' ) return; if ( length(val) < minLength || !val.match([0-9]) || !val.match([az] || !val.match([AZ] ) ) { return 'WEAK_PASSWORD'; } return; } }; LIVR.Validator.registerDefaultRules({ strong_password: strongPassword });

So, simply enough, there is a registration of new rules. If it is necessary to describe more complex rules, then the best option would be to see the list of standard rules implemented in the validator:

It is possible to register rules that will not only validate the value, but also change it. For example, result in upper case or remove extra spaces.

Its implementation according to the specification

If there is a desire to make your own validator implementation, then a set of test cases was created to facilitate the task. If your implementation passes all tests, then it can be considered correct. The test suite consists of 4 groups:

"Positive" - positive tests for the basic rules
"Negative" - negative tests for the basic rules
“Aliases_positive” - positive tests for rule aliases
"Aliases_negative" - negative tests for rule aliases

In fact, each test contains several files:

rules.json - description of validation rules
input.json - the structure that is passed to the validator for verification
output.json is a clean structure that is obtained after validation

Each negative test instead of “output.json” contains “errors.json” with a description of the error that should result from validation. In alias tests, there is an aliases.json file with aliases that must be registered in advance.

Why not JSON Schema?

Frequently asked question. In short, there are several reasons:

Difficult format for rules. It would be desirable, that the structure with rules was as close as possible to the structure with data. Try to describe this example in JSON Schema
The error format is not specified in any way and different implementations return errors in different formats.
There is no data conversion, for example "to_lc".

JSON Schema contains interesting things, such as the ability to specify the maximum number of elements in the list, but in LIVR this is implemented simply by adding another rule.

LIVR links

UPD:

LIVR 2.0 released ( http://livr-spec.org/ ). With new features:

A consistent approach to working with types.
“Base rules” renamed to “Common rules”.
“Filter rules” renamed to “Modifiers”. Because they do not validate, they only modify the data.
"Helper rules" renamed to "Metarules"
Added rule "any_object", checks that it is an object
Added string rule - “string”, just a string of any type
Added rule "eq" - checks for string equality
Added metadirectory "variable_object". Dynamically determines which validation to use depending on the fields in the object.
Added metadirect “or”. Allows you to apply the rules alternately to the first match.
Added modifier «default» for setting values by default, if the user has not transmitted anything.
Significantly expanded test suite.

JavaScript implementation already supports all new features, the rest of the implementation is in the process of updating.

Source: https://habr.com/ru/post/246521/

All Articles

LIVR - “language independent validation rules” or data validation without “problems”

More articles: