📜 ⬆️ ⬇️

PHP syntax hack

Have you ever thought about how to extend the core of PHP? What do you need to create a new keyword or even develop a new syntax? If you have a basic knowledge of the C language, then problems with making small changes should arise. Yes, I understand that this may be a little pointless, but it doesn’t matter - it’s funny after all.

Let's create an alternative way to define a class. The easiest definition method allowed in PHP is as follows:

<?php class ClassName {} 

We can simplify the syntax and replace the curly brackets with a semicolon.
')
 <?php class ClassName; 

If you try to execute this code, it will obviously generate an error. Not a problem, we can fix it.

The first step is to install the software.

 $ sudo apt-get install bison re2c 

PHP is written in C, but the parser is designed using Bison. Bison is a parser generator. The official site defines it as a general-purpose parser generator that converts a labeled context-free grammar into a deterministic LR or a generalized LR (GLR) analyzer, using the LALR parser tables (Look-Ahead LR parser).

This is a very powerful piece of software about which you can write a whole book. If you want to know more, I would advise you to read the documentation . It is not easy, but it contains good examples. And if you ever want to create a programming language, then it can be a good starting point.

Now go to http://php.net and download the latest PHP source.

 $ tar xvjf php-5.4.14.tar.bz2 $ cd php-5.4.14 $ ./configure $ cd Zend $ ls 

Take off your hat, because in front of you is the core of PHP. The code in these files controls the vast majority of web servers. Let's explore it.

The default for the Bison generator files is the extension “y”.

 $ ls *.y zend_ini_parser.y zend_language_parser.y 

We do not want to mess around with the “ini” syntax, so only “zend_language_parser.y” remains. Open it with your favorite editor.

Now, if you search for the word "class", you can find the following:

 %token T_CLASS "class (T_CLASS)" 

Parser likes to work with tokens. The class token is " T_CLASS ". If you search the text for T_CLASS , you will find something like this:

 class_entry_type: T_CLASS { $$.u.op.opline_num = CG(zend_lineno); $$.EA = 0; } | T_ABSTRACT T_CLASS { $$.u.op.opline_num = CG(zend_lineno); $$.EA = ZEND_ACC_EXPLICIT_ABSTRACT_CLASS; } | T_TRAIT { $$.u.op.opline_num = CG(zend_lineno); $$.EA = ZEND_ACC_TRAIT; } | T_FINAL T_CLASS { $$.u.op.opline_num = CG(zend_lineno); $$.EA = ZEND_ACC_FINAL_CLASS; } ; 

Here are four different ways to define a class.
  1. class (class)
  2. abstract class
  3. trait
  4. final (leaf, final) class (final class)

In curly braces, you can see several low-level assignments. I can only guess why they are needed. Let's not touch them.

We are on the right track, but this is not exactly what we are looking for. Look for the phrase “class_entry_type”, which combines those four class definitions. She will lead you to your destination. It is easy to understand this, but for the first time it is difficult to read.

 unticked_class_declaration_statement: class_entry_type T_STRING extends_from { zend_do_begin_class_declaration(&$1, &$2, &$3 TSRMLS_CC); } implements_list '{' class_statement_list '}' { zend_do_end_class_declaration(&$1, &$3 TSRMLS_CC); } | interface_entry T_STRING { zend_do_begin_class_declaration(&$1, &$2, NULL TSRMLS_CC); } interface_extends_list '{' class_statement_list '}' { zend_do_end_class_declaration(&$1, NULL TSRMLS_CC); } ; 

There are two ads here. One for the class, the other for the interface. We are interested in the first. It starts with " class_entry_type ", which allows constructs: class | abstract class | trait | final class. The next element is the token T_STRING . In the future, in its place will be the name of the class. " extends_from " is a group. This element can be converted to "extends T_STRING" or remain empty.

After that, the parser calls the Zend engine to start the class declaration.

 { zend_do_begin_class_declaration(&$1, &$2, &$3 TSRMLS_CC); } 

You can find this function in the zend_compiler.c file.

 void zend_do_begin_class_declaration(const znode *class_token, znode *class_name, const znode *parent_class_name TSRMLS_DC) 

The first argument here is the token of the class " class_entry_type ", the second is the name of the class " T_STRING ", and the last is the parent class of the " extends_from ".

Below comes the "implements_list" group. I am sure that you know why it is needed. True, to define interfaces. The following lines form the required body of the class: the opening brace " { ", the group " class_statement_list " and the closing brace " } ". Finally, the parser informs the Zend engine that the class declaration is over.

 { zend_do_end_class_declaration(&$1, &$3 TSRMLS_CC); } 

We need to duplicate this code, but without the body of the class.

 unticked_class_declaration_statement: class_entry_type T_STRING extends_from { zend_do_begin_class_declaration(&$1, &$2, &$3 TSRMLS_CC); } ';' { zend_do_end_class_declaration(&$1, &$3 TSRMLS_CC); } | class_entry_type T_STRING extends_from { zend_do_begin_class_declaration(&$1, &$2, &$3 TSRMLS_CC); } implements_list '{' class_statement_list '}' { zend_do_end_class_declaration(&$1, &$3 TSRMLS_CC); } | interface_entry T_STRING { zend_do_begin_class_declaration(&$1, &$2, NULL TSRMLS_CC); } interface_extends_list '{' class_statement_list '}' { zend_do_end_class_declaration(&$1, NULL TSRMLS_CC); } ; 

It was pretty simple, right? Now you just have to compile the changes.

 $ cd .. $ make 

The first compilation always takes some time.

 $ vim test.php 

Enter the code for testing.

 <?php class FooBar; $a = new FooBar; $a->bar = 10; print_r($a); 

Now test it.

 $ sapi/cli/php test.php FooBar Object ( [bar] => 10 ) 

Great, you did it!

Let's do one more thing. In PHP, you declare a class using the " class " keyword. How about making it shorter? I think " cls " will do.

Looking for lexer files:

 $ cd Zend/ $ ls *.l zend_ini_scanner.l zend_language_scanner.l 

File Bison operated with tokens. Lexer lets you decide how to convert a code to tokens. Open zend_language_scanner.l and look for the word " class ".

 <ST_IN_SCRIPTING>"class" { return T_CLASS; } 

Duplicate this block and change the class to cls.

 <ST_IN_SCRIPTING>"cls" { return T_CLASS; } <ST_IN_SCRIPTING>"class" { return T_CLASS; } 

It is done. Compile the code and can use the keyword " cls " instead of " class ".

Isn't it funny? Hope you enjoyed it as much as I did. Be interested, explore. And if you really liked it, you should think about how to fix some errors on https://bugs.php.net/ .

Source: https://habr.com/ru/post/179441/


All Articles