Among PHP programs, a procedural or, in recent versions, a partially object-oriented programming style prevails. But you can write differently, in connection with which you want to talk about the functional style, the benefit of some tools for this are in PHP.
Therefore, we will consider the implementation of the JSON parser in the form of the simplest functions and their combining functions into more complex ones, gradually reaching the full-fledged JSON parser format. Here is an example of the code that we get:
$jNumber = _do(function() { $number = yield literal('-')->orElse( literal('+') )->orElse( just('') ); $number .= yield takeOf('[0-9]')->onlyIf( notEmpty() ); if ( yield literal('.')->orElse( just(false) ) ) { $number .= '.'. yield takeOf('[0-9]'); } return +$number; });
In addition to the functional approach itself, attention can be paid to the use of classes for creating a DSL-like syntax and to the use of generators to simplify the syntax of combinators.
UPDATE by itself JSON parsing has long been a problem solved and, of course, a ready and tested C function will work better. The article uses this task as an example to explain the functional approach. It is also not advocated the use of just such a code in production, everyone can learn some ideas that can simplify the code and life.
The full code is on github .
How does a programmer cope with the enormous complexity of programs? He takes simple blocks and builds more complex ones out of which even more complex blocks are built and, finally, a program. At least it was after the appearance of the first languages with subroutines.
The procedural style is based on the description of the procedures that cause other procedures together to change some common data. The object-oriented style adds the ability to describe data structures composed of other data structures. The functional style uses the composition (connection) of functions.
What is the difference between the composition of functions and the composition of procedures and objects? The basis of the functional approach is the purity of the functions, which means that the result of the functions depends only on the input parameters. If the functions are pure, it is much easier to predict the result of their composition and even create ready-made functions for converting other functions.
The functions that receive and / or return other functions as a result are called higher order functions and represent the topic of this article.
For example, let's take a task that is not very easy to solve entirely and see how the functional approach will help us simplify this task.
For example, let's try to create a JSON parser, which from the JSON string will receive the corresponding PHP object: a number, string, list, or associated list (with all possible nested values, of course).
Let's start with the simplest elements: we write a parser, what is it? A parser is a function that takes a string and, if successful, returns a pair of values: the result of parsing and the remainder of the string (if the parsed value did not occupy the entire line) or an empty set if the line could not be parsed:
Parser: string => [x,string] | []
For example, if we have a function-parser number
then we could write such tests:
assert(number('123;') === [123,';']); assert(number('none') === []);
DISCLAMER : PHP has a not very convenient syntax for working with functions, so for simpler and more understandable code we will use a class that is nothing but a wrapper around the parser function and is needed to specify types and use convenient chaining syntax, let's talk more about what next.
class Parser { const FAILED = []; private $parse; function __construct(callable $parse) { $this->parse = $parse; } function __invoke(string $s): array { return ($this->parse)($s); } }
But let's remember that Parser
is nothing more than the function string => array
.
For convenience, we also introduce the parser
function, which we will use instead of calling the new Parser
constructor for short:
function parser($f, $scope = null) { return new Parser($f->bindTo($scope)); }
So, we figured out what parsers are, but we never wrote one, let's fix it. Here is an example of a parser that always returns 1
, regardless of the source string:
$alwaysOne = parser(function($s) { return [1, $s]; }); assert($alwaysOne('123') === [1, '123']);
The usefulness of this function is not obvious, let's make it more general and declare a function that allows you to create a similar parser for any value:
function just($x): Parser { return parser(function($s) use ($x) { return [ $x, $s ]; }); }
So far, everything is simple, but still not very useful, because we want to parse a string, and not always return the same thing. Let's make a parser that returns the first few characters of the input string:
function take(int $n): Parser { return parser(function($s) use ($n) { return strlen($s) < $n ? Parser::FAILED : [ substr($s, 0, $n), substr($s, $n) ]; }); } test(take(2), 'abc', ['ab','c']); test(take(4), 'abc', Parser::FAILED);
Already better, our first parser, which really parses the string! Let me remind you: we describe the simplest bricks from which we can build a more complex parser. Therefore, to complete the picture, we lack only a parser, which does not parse anything at all.
function none(): Parser { return parser(function($s) { return Parser::FAILED; }); }
He is still useful to us.
That's all the parsers we need. This is enough to parse JSON. Do not believe? It remains to think of a way to assemble these bricks into more complex blocks.
Since we decided to do functional programming, it is logical to use functions to combine parser functions into more complex parsers!
For example, if we have first
and second
parsers and we want to apply any of them to the string, we can define a parser combinator — a function that creates a new parser based on the existing ones:
function oneOf(Parser $first, Parser $second): Parser { return parser(function($s) use ($first,$second) { $result = $first($s); if ($result === Parser::FAILED) { $result = $second($s); } return $result; }); } test(oneOf(none(),just(1)), '123', [1,'123']);
But, as mentioned above, such syntax can quickly become unreadable (for example, oneOf($a,oneOf($b,oneOf($c,$d)))
), so we will rewrite this (and all the following) functions as class methods Parser
:
function orElse(Parser $alternative): Parser { return parser(function($s) use ($alternative) { $result = $this($s); if ($result === Parser::FAILED) { $result = $alternative($s); } return $result; }, $this); // <- parser bindTo: $this } test(none()->orElse(just(1)), '123', [1,'123']);
This is better, you can write $a->orElse($b)->orElse($c)->orElse($d)
instead of what was higher.
And one more, not so simple, but much more powerful function:
function flatMap(callable $f): Parser { return parser(function($s) use ($f) { $result = $this($s); if ($result != Parser::FAILED) { list ($x, $rest) = $result; $next = $f($x); $result = $next($rest); } return $result; }, $this); }
Let's deal with it in more detail. It takes the function f: x => Parser
, which takes the result of parsing our existing parser and returns on its basis a new parser, which continues to parse the string from the place where our previous parser stopped.
For example:
test(take(1), '1234', ['1','234']); test(take(2), '234', ['23', '4']); test( take(1)->flatMap(function($x) { # x -- take(1) return take(2)->flatMap(function($y) use ($x) { # y -- take(2) return just("$x~$y"); # -- }); }), '1234', ['1~23','4'] );
So we combined take(1)
, take(2)
and just("$x~$y")
and got a rather complicated parser that first parses one character, followed by two more and returns them as $x~$y
.
The main feature of the work done is that we describe what to do with the results of parsing, but the string itself is not involved here, we do not have the opportunity to be mistaken about what part of the string to send. And then we will see how to make the syntax of such combinations more simple and readable.
This feature will allow us to describe several other useful combinators:
function onlyIf(callable $predicate): Parser { return $this->flatMap(function($x) use ($predicate) { return $predicate($x) ? just($x) : none(); }); }
This combinator allows you to specify the action of the parser and check its result for compliance with some criterion. For example, with its help we build a very useful parser:
function literal(string $value): Parser { return take(strlen($value))->onlyIf(function($actual) use ($value) { return $actual === $value; }); } test(literal('test'), 'test1', ['test','1']); test(literal('test'), 'some1', []);
We have already described the simplest parsers take
, just
and none
, methods of combining them ( orElse
, flatMap
, onlyIf
) and even described using their literal parser.
Now we will start building more complex parsers, but before that I would like to make the way to describe them more simple: the flatMap
combining function allows us a lot, but it doesn't look that good.
In this regard, we shall see how other languages solve this problem. So in the languages of Haskell and Scala there is a very convenient syntax for working with such things (they even have their own name - monads), it is called (in Haskell) DO-notation.
What does flatMap
essentially do? It allows you to describe what to do with the result of parsing without performing the actual parsing. Those. the procedure, as it were, is suspended until the intermediate result is obtained. To implement this effect, you can use the new PHP syntax - generators.
Let's make a small digression and consider what generators are. In PHP 5.5.0 and higher, it is possible to describe a function:
function generator() { yield 1; yield 2; yield 3; } foreach (generator() as $i) print $i; # -> 123
What is more interesting for us is that you can not only receive data from the generator, but also transfer it to it via yield
, and even from version 7 get the result of the generator via getReturn
:
function suspendable() { $first = yield "first"; $second = yield "second"; return $first.$second; } $gen = suspendable(); while ($gen->valid()) { $current = $gen->current(); print $current.','; $gen->send($current.'!'); } print $gen->getReturn(); # -> first,second,first!second!
This can be used to hide flatMap
calls from a programmer.
flatMap
using combinators function _do(Closure $gen, $scope = null) { $step = function ($body) use (&$step) { if (! $body->valid()) { $result = $body->getReturn(); return is_null($result) ? none() : just($result); } else { return $body->current()->flatMap( function($x) use (&$step, $body) { $body->send($x); return $step($body); }); } }; $gen = $gen->bindTo($scope); return parser(function($text) use ($step,$gen) { return $step($gen())($text); }); }
This function takes each yield
in the generator (which contains a parser, the result of which we want to get) and combines it with the remaining code snippet (in the form of the recursive step
function) via flatMap
.
The same thing without recursion and flatMap
functions could be written like this:
function _do(Closure $gen, $scope = null) { $gen = $gen->bindTo($scope); # $this return parser(function($text) use ($gen) { $body = gen(); while ($body->valid()) { $next = $body->current(); $result = $next($text); if ($result === Parser::FAILED) { return Parser::FAILED; } list($x,$text) = $result; $body->send($x); } return $body->getReturn(); }); }
But the first entry is more interesting because it is not specifically tied to parsers, they only have the functions flatMap
, just
and none
(and even that could be rewritten just
to handle the null in a special way and do without none
).
Objects that can be combined using the two methods flatMap
and just
call monads (this is a slightly simplified definition) and the same code can be used to write combinators for promises, optional values (Maybe, Option), and many others.
But for the sake of what did we write this not very simple function? To make further use of flatMap
much easier. Compare the same code with a clean flatMap
:
test( take(1)->flatMap(function($x) { return take(2)->flatMap(function($y) use ($x) { return just("$x~$y"); }); }), '1234', ['1~23','4'] );
and the same code, but written in _do
:
test( _do(function() { $x = yield take(1); $y = yield take(2); return "$x~$y"; }), '1234', ['1~23','4'] );
The resulting parser does the same thing in the same way, but reading and writing such code is much easier!
Now, using this notation, we can write some more useful parsers:
function takeWhile(callable $predicate): Parser { return _do(function() use ($predicate) { $c = yield take(1)->onlyIf($predicate)->orElse(just('')); if ($c !== '') { $rest = yield takeWhile($predicate); return $c.$rest; } else { return ''; } }); } function takeOf(string $pattern): Parser { return takeWhile(function($c) use ($pattern) { return preg_match("/^$pattern$/", $c); }); } test(takeOf('[0-9]'), '123abc', ['123','abc' ]); test(takeOf('[az]'), '123abc', [ '','123abc']);
And useful Parser
methods for repeating elements:
function repeated(): Parser { $atLeastOne = _do(function() { $first = yield $this; $rest = yield $this->repeated(); return array_merge([$first],$rest); },$this); return $atLeastOne->orElse(just([])); } function separatedBy(Parser $separator): Parser { $self = $this; $atLeastOne = _do(function() use ($separator) { $first = yield $this; $rest = yield $this->prefixedWith($separator)->repeated(); return array_merge([$first], $rest); },$this); return $atLeastOne->orElse(just([])); }
Each of the parsers and combinators we wrote separately is simple (well, maybe except flatMap
and _do
, but there are only two of them and they are very versatile), but using them now we can easily write a JSON parser.
jNumber = ('-'|'+'|'') [0-9]+ (.[0-9]+)?
$jNumber = _do(function() { $number = yield literal('-')->orElse(literal('+'))->orElse(just('')); $number .= yield takeOf('[0-9]'); if (yield literal('.')->orElse(just(false))) { $number .= '.'. yield takeOf('[0-9]'); } if ($number !== '') return +$number; });
The code is completely self-documenting, reading and looking for errors in it is quite simple.
jBool = true | false
$jBool = literal('true')->orElse(literal('false'))->flatMap(function($value) { return just($value === 'true'); });
jString = '"' [^"]* '"'
$jString = _do(function() { yield literal('"'); $value = yield takeOf('[^"]'); yield literal('"'); return $value; });
jList = '[' (jValue (, jValue)*)? ']'
$jList = _do(function() use (&$jValue) { yield literal('['); $items = yield $jValue->separatedBy(literal(',')); yield literal(']'); return $items; });
jObject = '{' (pair (, pair)*)? '}'
$jObject = _do(function() use (&$jValue) { yield literal('{'); $result = []; $pair = _do(function() use (&$jValue,&$result) { $key = yield takeOf('\\w'); yield literal(':'); $value = yield $jValue; $result[$key] = $value; return true; }); yield $pair->separatedBy(literal(',')); yield literal('}'); return $result; });
jValue = jNull | jBool | jNumber | jString | jList | jObject
$jValue = $jNull->orElse($jBool)->orElse($jNumber)->orElse($jString)->orElse($jList)->orElse($jObject);
Here is the JSON jValue
parser! And it looks not so incomprehensible, as it seemed at the beginning. There are some performance notes, but they are solved by replacing the way the string is divided (for example, instead of string => [x, string]
you can use [string,index] => [x,string,index]
and avoid multiple splitting of the string). And to change this kind, just rewrite just
, take
and flatMap
, the rest of the code built on their basis will remain unchanged!
My goal was, of course, not to write the next JSON parser, but to demonstrate how to write small simple (and functionally pure) functions, as well as simple ways to combine them, allows you to build complex functions in a simple way.
And in a simple and clear code and errors are less. Do not be afraid of the functional approach.
Source: https://habr.com/ru/post/309962/
All Articles