📜 ⬆️ ⬇️

Badoo PHP Code Formatter. Now in open source!

A few years ago, Badoo began to grow significantly in the number of employees, from 20 to 100 and more. This required a major overhaul of many development processes. One of the problems we are facing is how to make all developers follow a single coding standard so that all our code looks uniform and is easily maintained?

To solve this problem, we decided to implement a code formatting tool that could do the following:

  1. display messages about non-compliance with the formatting standard in the form of a list, without touching the file itself;
  2. automatically fix all formatting problems found;
  3. to be able to format only part of the file (we do not need to reformat the repository as a whole at once, so as not to lose history).

We considered two projects that could be taken as the basis for writing such a tool - PHP Beautifier and PHP Code Sniffer. The first was able to format the code, but could not print the diagnostics, and the second - on the contrary, knew how to print the diagnostics, but could not format the files. Unfortunately, both of these projects, according to our estimates, were not too suitable to add the missing functionality to them, so a new utility, phpcf (PHP Code Formatter), was written. For two years now, it has been working as a git pre-receive hook, configured to reject (!) Changes that are not designed according to our coding standard.
')
Finally it is time to open the source code of our utility to the general public: github.com/badoo/phpcf

Functionality


The formatter was created in order to basically change the whitespace: line breaks, indents, spaces around operators, etc. Thus, phpcf does not replace other similar utilities, such as the aforementioned PHP Code Sniffer and PHP Coding Standards Fixer from Fabien Potenzer . It complements them by doing “dirty work” in properly arranging spaces and line breaks in the file. It is important to note that our utility takes into account the initial formatting in the file and changes only those spaces that do not meet the selected standard (unlike some other solutions that first remove all whitespace tokens, and then begin formatting).

The utility is extensible and supports arbitrary sets of styles. You can quite easily define your formatting style, which will implement a different standard different from ours (the coding standard in our company is very close to PSR).

Example of use (the command “phpcf apply <filename>” formats the specified file, and “phpcf check <filename>” checks the formatting and returns a non-zero exit code if there are unformatted fragments):

$ cat minifier.php <?php $tokens=token_get_all(file_get_contents($argv[1]));$contents='';foreach($tokens as $tok){if($tok[0]===T_WHITESPACE||$tok[0]===T_COMMENT)continue;if($tok[0]===T_AS||$tok[0]===T_ELSE)$contents.=' '.$tok[1].' '; else $contents.=is_array($tok)?$tok[1]:$tok;}echo$contents."\n"; $ phpcf apply minifier.php minifier.php formatted successfully $ cat minifier.php <?php $tokens = token_get_all(file_get_contents($argv[1])); $contents = ''; foreach ($tokens as $tok) { if ($tok[0] === T_WHITESPACE || $tok[0] === T_COMMENT) continue; if ($tok[0] === T_AS || $tok[0] === T_ELSE) $contents .= ' ' . $tok[1] . ' '; else $contents .= is_array($tok) ? $tok[1] : $tok; } echo $contents . "\n"; $ phpcf check minifier.php; echo $? minifier.php does not need formatting 0 


In addition to formatting the entire file, our utility can also format part of the file. To do this, specify the ranges of line numbers separated by a colon:

 $ cat zebra.php <?php echo "White "."strip".PHP_EOL; echo "Black "."strip".PHP_EOL; // not formatted echo "Arse".PHP_EOL; $ phpcf apply zebra.php:1-2,4 zebra.php formatted successfully $ cat zebra.php <?php echo "White " . "strip" . PHP_EOL; echo "Black "."strip".PHP_EOL; // not formatted echo "Arse" . PHP_EOL; $ phpcf check zebra.php zebra.php issues: Expected one space before binary operators (= < > * . etc) on line 3 column 14 Expected one space after binary operators (= < > * . etc) on line 3 column 15 ... $ echo $? 1 


Even though the utility is written in PHP, most files are formatted in a split second. But we have a large repository and a lot of code, so we wrote an extension that, when connected, increases the productivity of work a hundred times: our entire repository of 2 million lines is formatted in 8 seconds on a notebook Core i7. To use the extension, you need to collect it from the “ext /” directory, set it, enable “enable_dl = On” in php.ini, or set it as an extension.

I would like to emphasize once again that phpcf first of all changes whitespace and can only do simple transformations on the code: for example, replace the short opening tag with a long one or remove the last closing tag from the file. In addition, phpcf can automatically correct the Cyrillic alphabet in the names of functions in English characters. Also do not touch the expression, manually aligned using spaces. This is due to the architecture — the formatter works as a state machine with the rules that the user sets, and not as a set of hard-coded replacements (the formatter comes with a default config that conforms to our formatting rules). Therefore, if you want to automatically replace “var” with “public” or similar things, we recommend paying attention to the PHP-CS-Fixer - it pays little attention to whitespace characters (as opposed to phpcf), but it can rewrite tokens.

PHP version support


Initially, our formatter worked on PHP 5.3 and only supported it. At the moment, we fully support the PHP 5.4 and 5.5 syntax, and the formatter requires PHP version 5.4 or higher. If you want to format the code intended for earlier versions of PHP, then you can do it, but phpcf itself should be run using PHP 5.4+.

I would like to separately note that phpcf does not know how to format "unfinished" files containing, for example, unbalanced brackets: in this case an error message will be displayed and the file will not be formatted. At the same time, in some cases you can format the “invalid” code from the point of view of the PHP interpreter, because the formatter by itself does not check the syntax of the file.

As for the support of the next versions of PHP, the phpcf architecture is such that when adding or meeting “unfamiliar” keywords and tokens in the file, they will simply be ignored and left as is. Thus, phpcf now supports future versions of PHP, but with the proviso that formatting rules will not be applied to unknown tokens.

Git integration support


When you download our utility, you will most likely notice that there are not only “check”, “preview” and “apply” actions, but also the same actions with the suffix “-git”. In Badoo, we use Git as the version control system, and by default only modified strings are checked and formatted. In order not to force the developers to remember the numbers of the modified lines, we made “* -git” commands that work as follows:

  1. See “non-committed” and added changes to the index.
  2. View changes that are made in the current branch, but are not in origin / master and origin / <current-branch> (corresponding branches are updated with git push / git pull), or, in other words, have not yet been sent to the repository.
  3. Apply formatting only to the lines found in (1) and (2).

We use the development in feature branches, and at the same time in the master branch we have the production-code, so the “-git” commands are sharpened under this flow and determine the modified lines by the above algorithm.

Usage example:
 (master) $ git checkout -b some_feature (some_feature) $ vim test.php #  test.php (some_feature) $ phpcf apply-git #    ,     test.php formatted successfully 



Using the phpcf classes directly


In addition to using phpcf as a utility, it is also possible to use the phpcf classes directly, including with connecting extensions. This feature can be useful for different tasks, for example, to create a web service for formatting PHP files. For our needs, we use it in the code review process: when viewing changes made in a branch, we cannot skip the changes that are exclusively related to formatting (for this, two versions of the file, the old and the new, are fully formatted, after which the diff between them is considered ).

An example of using phpcf classes directly:
 <?php //     require_once __PHPCF_SRC . '/src/init.php'; //    $Options = new \Phpcf\Options(); //   (    ) $Options->setTabSequence(' '); //  3-4   Tab $Options->setMaxLineLength(130); // 120   $Options->setCustomStyle('style'); //       $Options->toggleCyrillicFilter(true|false); //     $Options->usePure(true); //     extension $Formatter = new \Phpcf\Formatter($Options); //   $Formatter->formatFile('file.php'); //   $Formatter->formatFile('file.php:1-40,65'); //   //   $Formatter->format('<?php phpinfo()'); //     $Formatter->format($code, [1, 2, 10]); //     //       \Phpcf\FormattingResult $Result->getContent(); //     $Result->wasFormatted(); // bool,     $Result->getIssues(); // array,      $Result->getError(); // \Exception|null     



IDE integration using PHPStorm as an example


If you want to be able to format the PHP code using our formatter while using PHPStorm, you can do the following steps:

1. git clone https://github.com/badoo/phpcf.git
2. In PHPStorm go to the settings and find the section "External Tools".
3. Click "Add ..." and fill in the fields:

Name: format whole file
Group: phpcf
Remove the tick "open console" (so as not to bother)
Program: php
Parameters: path_to_phpcf.git / phpcf apply $ FilePath $
Working directory: any

Name: format selection
Group: phpcf
Remove the tick "open console" (so as not to bother)
Program: php
Parameters: path_to_phpcf.git / phpcf apply $ FilePath $: $ SelectionStartLine $ - $ SelectionEndLine $
Working directory: any


After that, you can bind the hotkeys to the corresponding actions and format either the entire file or just the selected fragments.

In addition, you can configure phpcf check-git --emacs in the same way: in this mode, phpcf will print emacs-style file names with lines, and due to this you can go to the line indicated in the output by clicking on the link.

Implementation


In the course of its work, the formatter goes through the following stages:

  1. Prepare a list of file names and line numbers to be formatted.
  2. Get a list of tokens for a file by calling token_get_all (prepareTokens).
  3. Converting tokens into a single format with the ability to perform hooks, allowing you to replace some tokens with others.
  4. Calling the process () method that runs through all the tokens using the Phpcf \ Impl \ Fsm state machine and compiles an array of formatting actions (exec).
  5. Calling the exec () method, in which the generated array of actions is processed and turned into the final string.


More details

Description class Phpcf \ Impl \ Fsm - a finite state machine for parsing tokens


The class Phpcf \ Impl \ Fsm is a finite state machine, in which the state is represented in the form of a stack (array). The top stack element is used for state transition rules:

 <?php $fsm_context_rules = array( 'CTX_SOMETHING' => array( //  ,    = CTX_SOMETHING 'T_1' => 'CTX_OTHER_THING', //   T_1     CTX_OTHER_THING 'T_2' => array('CTX_OTHER_THING'), //   T_2  push(CTX_OTHER_THING) 'T_3' => -2, //   T_3  pop()   2  //   N     M     // ,        'T_4' => array(PHPCF_CTX_NOW => N, PHPCF_CTX_NEXT => M), // c        'T_5' => array('REPLACE' => array(-2, array('CTX_OTHER_THING')), //      ,     ,    ), ); 



In general, the rules for the state machine are as follows:

 <?php $fsm_context_rules = array( '<context_name>[ ... <context_name>]' => array( '<token_code>[ ... <token_code>]' => <context_rule>, ), ); $context_rule = '<context_name>'; //    <context_name>     $context_rule = array('<context_name>'); //    <context_name>,  <context_name>   $context_rule = -N; //   N  , N —   //  ,    PHPCF_CTX_NOW,       , //   PHPCF_CTX_NEXT ( debug        "delayed rule") $context_rule = array(PHPCF_CTX_NOW => <context_rule>, PHPCF_CTX_NEXT => <context_rule>); 



Array of formatting rules: $ controls


The formatter is basically designed in such a way that it only regulates the contents of whitespace tokens (with the exception of hooks). All formatting rules are defined in the $ controls array, which is represented as follows:

 <?php $controls = array( '<token_code>[ ... <token_code>]' => array( //    ,       ['<context>' => <formatting_rule>,] //  <formatting_rule>    PHPCF_KEY_ALL => <formatting_rule>, //  <formatting_rule>,         ), ); $formatting_rule = array( PHPCF_KEY_DESCR_LEFT => '<description>', //  ,        PHPCF_KEY_LEFT => PHPCF_EX_<action>, // ,         PHPCF_KEY_DESCR_RIGHT => '<description>', //      PHPCF_KEY_RIGHT => PHPCF_EX_<action>, // ,     ); // : 'T_AS T_ELSEIF T_ELSE T_CATCH' => array( //   "as", "elseif", "else"  "catch" PHPCF_KEY_ALL => array( //    PHPCF_KEY_DESCR_LEFT => 'One space before as, elseif, else, catch', PHPCF_KEY_LEFT => PHPCF_EX_SHRINK_SPACES_STRONG, //       ( ) PHPCF_KEY_DESCR_RIGHT => 'One space after as, elseif, else, catch', //    PHPCF_KEY_RIGHT => PHPCF_EX_SHRINK_SPACES_STRONG, //   ) ), 



In case different rules are defined for the same spaces, for example in "$ a = $ b;" after "=" you need to put 1 space, and before $ b - remove all spaces, then the order of application of the rules depends on their priority. The priority of operations is described in the “PHPCF_EX-constants” section: the higher the rule, the higher its priority.

Hooks on tokens


The $ token_hook_names property defines the names of the methods that should be called when the prepareTokens method comes across this token. Hooks are defined as follows:

 <?php namespace Phpcf\Impl; class Pure implements \Phpcf\IFormatter { /* *  $idx_tokens      array(T_SOMETHING => 'T_SOMETHING'), *       * *  $i_value       ,  token_get_all * *           $this->tokens   each() *   : tokenHookStr * *     ,     *        * *   ,      : */ private function tokenHookDoNothing($idx_tokens, $i_value) { if (is_array($i_value)) { $this->current_line = $i_value[2]; return array( array( PHPCF_KEY_CODE => $idx_tokens[$i_value[0]], PHPCF_KEY_TEXT => $i_value[1], PHPCF_KEY_LINE => $this->current_line, ) ); // set correct current line for next token if it does not have line number $this->current_line += substr_count($i_value[1], "\n"); } return array( array( PHPCF_KEY_CODE => $i_value, PHPCF_KEY_TEXT => $i_value, PHPCF_KEY_LINE => $this->current_line, ) ); } } 



Description of hooks for tokens


A brief description of hooks for tokens with a description of the cause of the hook and its actions:

  • tokenHookHeredoc, tokenHookStr: by default, PHP “tokens” text inside HEREDOCs, double and oblique quotes, and highlights variables there. Since the formatter should not touch lines, the contents are merged into one token;
  • tokenHookOpenBrace turns the token "(" to "(_LONG) if the expression in brackets is long (120 characters by default) or is a line break in the expression. It is used to distinguish between" long "and" short "arrays, and also calls and definitions of functions;
  • tokenHookCheckUnary determines whether the operator is unary (for example, "+", "-", and &). Used to reduce the number of transitions between contexts in the rules;
  • tokenHookStatic separates calls like "static :: HELLO" from use in the form of "public static function". It also serves to simplify the state transition logic;
  • tokenHookClassdef determines that there is a line break after the keyword (for example, "const \ n"), it is used to correctly format constructions of the form "const \ n var1 = 1, \ n var2 = 2;";
  • tokenHookOpenTag checks that the opening tag is long and also separates the whitespace from the opening tag (turning "<? php \ n" into two tokens: "<? php" and "\ n");
  • tokenHookCloseTag verifies that there is no closing tag at the end of the file;
  • tokenHookIncrement determines which side of the variable is the "++" or "-" operator. Made to simplify the logic of transition between contexts;
  • tokenHookWhiteSpace determines that expressions are justified using spaces, and replaces T_WHITESPACE with T_WHITESPACE_ALIGNED, which does not move when formatted;
  • tokenHookElse determines whether else is single line or contains blocks. Made to simplify logic;
  • tokenHookComment checks that single-line comments begin with "//", and also separates the line break from the token ("// something \ n" turns into "// something" and "\ n");
  • tokenHookTString renames T_STRING to T_FUNCTION_NAME when this T_STRING is the name of a method or function. Made to simplify the logic of transition between contexts;
  • tokenHookBinary serves to handle the situation with moving the statement to the next line;
  • tokenHookComma turns "," into ", _LONG" for commas, which can be transferred to a new line in a long array;
  • tokenHookFunction separates anonymous and non-anonymous functions from each other.


If the hook changes the contents of the token, then it must check whether it has the right to do this ($can_change_tokens = !isset($this->lines) || isset($this->lines[$this->current_line])) . If it does not, then the hook should not change the contents of the tokens, but can change their number and divide the tokens into their constituent parts. As an example, tokenHookOpenTag: it does not check the contents of the opening tag if the user requested formatting only part of the file, but still separates whitespace from the opening tag. The separation of whitespace is required in order to correctly take into account the number of lines (and indentation) after the opening tag and single-line comments.


Links


PHP Beautifier - pear.php.net/package/PHP_Beautifier
PHP Code Sniffer - pear.php.net/package/PHP_CodeSniffer
PHP CS Fixer - github.com/fabpot/PHP-CS-Fixer

Our phpcf utility - github.com/badoo/phpcf

Thanks for reading our article, we are ready to listen to your suggestions and comments. We hope you enjoy using our utility.

Yuri youROCK, Nasretdinov, PHP developer Badoo
Alexander alexkrash Krasheninnikov, PHP developer Badoo

Source: https://habr.com/ru/post/232133/


All Articles