📜 ⬆️ ⬇️

Work in PHP with Tokenizer

For reference:
Tokenizer (lexer) provides an interface for code analysis. Thus, you can write utilities without the need to work with language specification.
Tokenizer, starting with the version php> = 4.3 is included in the php assembly by default.

What tasks can be solved using tokenizr?
Yes, the most different, associated with the analysis and modification of the code.

Remove comments from code


The simplest example is provided on php.net - to remove comments:

<?php function strip_comments($fileName) { $source = file_get_contents($fileName); //     $tokens = token_get_all($source); $result = ''; foreach ($tokens as $token) { if (!is_array($token)) { //  1-  $result .= $token; } else { // - list($id, $value) = $token; switch ($id) { case T_COMMENT: case T_DOC_COMMENT: //   break; default: //   ->  " " $result .= $value; break; } } } return $result; } ?> 

As you can see from the code, we get an array of tokens and, depending on their type, leave it or skip it.
So you can solve problems and more interesting - for example, on the basis of php-files to generate a map of the project classes for autoload.
')

Getting a list of classes from a file


To get a list of classes from a file, I wrote this function:

 <?php function getClasses($fileName) { $result = array(); $content = file_get_contents($fileName); $tokens = token_get_all($content); $waitingClassName = false; $waitingNamespace = false; $waitingNamespaceSeparator = false; $namespace = array(); for ($i = 0, $c = count($tokens); $i < $c; $i++) { if (is_array($tokens[$i])) { list($id, $value) = $tokens[$i]; switch ($id) { case T_NAMESPACE: $waitingNamespace = true; $waitingNamespaceSeparator = false; $namespace = array(); break; case T_CLASS: case T_INTERFACE: $waitingClassName = true; break; case T_STRING: if ($waitingNamespace) { $namespace[] = $value; $waitingNamespace = false; $waitingNamespaceSeparator = true; } elseif ($waitingClassName) { if (!empty($namespace)) { $value = sprintf('%s\\%s', implode('\\', $namespace), $value); } $result[] = $value; $waitingClassName = false; } break; case T_NS_SEPARATOR: if ($waitingNamespaceSeparator && !$waitingNamespace && !empty($namespace)) { $waitingNamespace = true; $waitingNamespaceSeparator = false; } break; } } else { if (($waitingNamespace || $waitingNamespaceSeparator) && ($tokens[$i] == '{' || $tokens[$i] == ';')) { $waitingNamespace = false; $waitingNamespaceSeparator = false; } } } return $result; } ?> 


And then I thought and wrote a small utility that the autoloader generates based on the project files.
It analyzes all files with the extension "* .php" in the specified folder and builds a class map (including neymspeys, of course), on the basis of which the autoloader is then generated.
You can find it on github.com

Disable and redefine standard functions


The other day I remembered how I used to tinker with the runkit extension. Of its features, I was particularly interested in the redefinition of standard functions and the sandbox in which it was possible to prohibit the use of certain functions.
And now I wondered if it was possible to implement such functionality without using this extension. It turned out that the tokenizer may well help in this matter.
This is how the Runtime library was born, with which you can, during the execution of the script, prohibit the use of any standard functions, or override them.
I will give examples of work:

 <?php use Dm\Runtime; $code = <<<CODE <?php echo str_replace( 0, 1, 100 ); ?> CODE; //  Exception,    ,   str_replace  Runtime::code($code) ->disableFunction('str_replace') ->execute(); ?> 


 <?php use Dm\Runtime; $code = <<<CODE <?php echo str_replace( 0, 1, 100 ); ?> CODE; //  000,  111 Runtime::code($code) ->overrideFunction('str_replace', function ($search, $replace, $subject) { //  1  2   echo str_replace($replace, $search, $subject); }) ->execute(); ?> 


How to use these features is a personal matter. But you need to use it carefully.
I did a little research and was pleased with the result.
Regarding runtime, it's hard to say where it can be applied and where not. But the library itself clearly demonstrates the work of tokenizer and its capabilities.

Links

  1. Tokenizer
  2. Ranckit
  3. Autoloader generation on github.com
  4. Runtime Library on github.com

Source: https://habr.com/ru/post/176725/


All Articles