📜 ⬆️ ⬇️

Taking PHP seriously

image
Rocket Union, delivered by train to the launch pad. Photos from the public domain NASA.

This is a translation of the article Taking PHP Seriously , the author of which is one of the engineers of the famous Slack application. He talks about the advantages and disadvantages of PHP, as well as the Hack language and the HHVM virtual machine, which almost completed the Slack transition.

Slack uses PHP for most of its server logic, which is not the most popular choice these days. Why did we decide to write a new project in this language? Should you do the same?
')
Most programmers who have played a little bit with PHP know two things about him: these are bad language that they will never use if they have a choice, and that some of the extremely successful projects in the history of the world use it. This is not quite a contradiction, but this fact should make you think. That is, Facebook, Wikipedia, Wordpress, Etsy, Baidu, Box and recently Slack - all of them successfully solve problems, despite the fact that they use PHP? Would they be more successful if they used Ruby? Erlang? Haskell?

It is possible that not. The PHP language has many flaws that undoubtedly slowed its development, but the PHP environment has such advantages that more than compensate for these flaws. And he also has ways of solving his own language problems, which are quite impressive . According to the final results, PHP provides the best foundation for creating, modifying and operating a successful web project compared to competitive environments. Today I would like to start a new PHP project, with a couple of reservations, but seeing no reason to apologize for this.

Historical excursion


PHP was born in a unique web server environment for modern languages. His strengths are related to the context of the request on the server.

PHP was originally called " Personal Home Page ". It was published in 1995. Rasmus Lerdorf, with a focus on supporting small, simple dynamic web applications, such as guest books and visitor counters, popular at the dawn of the Internet.

Since the release of PHP, it has been used in much more complex projects than its authors had originally expected. He underwent several major changes, each of which brought new mechanisms to curb these complex applications. Today, in 2016, he is a feature-rich member of the Mixed Paradigm family of Productive Languages ​​(MPDPL) [1] , which includes JavaScript, Python, Ruby and Lua. If you tried PHP in the early 2000s, the modern PHP codebase may surprise you with traits , closures, and generators .

PHP virtues


PHP has some very deep and definitely unique features.

First, the state . Each web request begins with a completely clean slate. Its namespace and global variables are not initialized, with the exception of some standard global variables, functions, and classes that provide primitive functionality and life support. Starting each request from a known state, we get a sort of isolation from possible errors; if the request t encounters a software malfunction and fails, this bug has no effect on the execution of the subsequent request t + 1. In fact, of course, in addition to the software heap, the state of the application is in other places, and it is quite possible to completely spoil the database, memcache or file system. But PHP shares this weakness with all imaginable environments that allow you to maintain state. At the same time, the separation of software heaps between requests reduces the cost of most software errors.

The second is concurrency . Individual request works in one PHP thread. At first glance, this seems like a silly restriction. But since our application runs in the context of a web server, we have a natural source of concurrency: web requests. By simply sending asynchronous requests to ourselves, we easily get concurrency with isolated states and copy-restore calls. In practice, it is safer and more robust against errors than the lock-in and shared state approach used in other general purpose languages.

In conclusion, the fact that PHP programs operate at the request level means that the programmer’s workflow is fast and efficient, and remains fast after the application is modified. Many productive languages ​​pretend to do this, but if they do not clear their state with each request, and the main stream of events separates the program state level between requests, they almost always take some time to start. For a typical Python application server, a typical debug loop would look something like “think, edit, restart the server, send a few test requests”. Even if “restarting the server” takes only a few seconds from the total number of hours, it takes a large slice from 15-30 seconds of our human brains to the need to keep the most unnecessary part of the current state in our head.

I argue that developing “think, edit and reload page” PHP design makes developers more productive. In long and complex project development cycles, this gives an even greater increase.

Arguments against PHP


If all this is true, why is he so hated ? When you remove all the colorful hyperboles in the direction that the main complaints about the PHP cluster will be reduced to the following problems:

  1. Surprises during conversions. Almost all languages ​​today allow us to compare, for example, integer and float with the> = operator; Damn, even C allows. It is perfectly clear what is meant here. Comparing strings and numbers with == is less obvious, and different languages ​​made different choices. The choice of PHP in this situation is most flawed, which leads to surprises and unpleasant errors. For example, 123 == '123foo' is rated as true (what is he doing there?), But 0123 == '0123foo' is a lie (hmm).

  2. Inconsistency around references and semantic meanings. PHP 3 had a clear semantics of passing arguments, returning everything by value, creating a logical copy of the data in the request. The programmer can select reference semantics along with the & [2] sign. This came with the introduction of object-oriented programming tools in PHP 4 and 5. Most PHP OO annotations were borrowed from Java, and Java has semantics in which an object is passed by reference, while primitive types are passed by value. As a result, the current state of PHP semantics is that objects are passed by reference (choose Java, instead of, say, C ++), primitive types are passed by value (here Java, C ++, and PHP agree), but the old reference semantics and the & sign remained, from time to time interacting with the new world in ambiguous ways.

  3. Philosophy of the calculations, ignoring failures . PHP is trying very, very hard to keep the query running, even if it has already done something completely strange. So, for example, division by zero does not throw exceptions, does not return INF, and does not complete the fatal query. By default, it simply warns and sets the value to false. Since false is silently considered as 0 in numeric contexts, many applications are deployed and run with undiagnosed divisions by zero. Specifically, this problem was solved in PHP 7 , but the impulse in the design to handle ambiguities, even when they can make sense, permeates the libraries as well.

  4. Contradictions in the standard library. When PHP was young, its audience was most familiar with C, and many APIs used the C standard library design: six-character lowercase names, successful / unsuccessful responses, and returning a real value to the out parameter that was called Further. As PHP evolved, the C-shny style of splitting namespaces through prefixes with _ became more common: mysql _..., json _..., and so on. More recently, the camelCase style of naming methods from Java on CamelCase classes has become the most common way of introducing new functionality. So, in the end, sometimes we see code examples with mixed expressions like DirectoryIterator ($ path) with if (! ($ F = fopen ($ p, 'w +')) ... in confusing logic.

In order not to seem like a non-reflective apologist for PHP: these are all serious problems that make it more likely to create defects. They are unforced errors. There is no inherent compromise between the Good Parts of PHP and these problems. It should be possible to create PHP, which will resolve these disadvantages, while maintaining all the good sides.

HHVM and Hack


This successor to the PHP system is called Hack [3]

Hack is a programming language that people call phased type system for PHP. "Type system" means that it allows the programmer to compile automatically verifiable invariants about data that flows through the code: this function takes a string or number and returns a Fribbles sheet, such as in Java or C ++ or Haskell, or in any other statically typed language, which one you choose. Gradual means that some parts of your codebase may be statically typed, while other parts of it may still be in promiscuous, dynamic PHP. The ability to combine these approaches allows you to gradually migrate large code bases.

Instead of pouring tons of ink here in the description of the Hack type system and how it works, just play around with it . I will be here when you return.

This is a neat system, and it is very ambitious in that it allows you to express. And the possibility of a gradual migration of the project to Hack, in cases where it grows stronger than you originally expected, is a unique advantage of the PHP ecosystem. Hack type checks preserve the “think, edit, reload page” workflow, because they run in the background, gradually updating the codebase model when it sees modifications in the file system. The Hack project provides integration with all popular editors and IDEs, so you can see feedback about type errors when you finish typing the code, just like in a web demo.

Let's look at the totality of the real risks that PHP creates in the light of Hack:

  1. Transformation surprises become errors in Hack files. The whole class of data problems goes away.
  2. The semantics of links and values in Hack are cleared by simply denying the use of old-style links , so they are no longer needed in the new code base. This makes the semantics behavior similar to the style of objects-by-reference-and-everything-else-by-value, as well as in Java or C #
  3. PHP-based calculations that ignore failures are more a property of the runtime environment, and it is more difficult to process them with a semantics analyzer like Hack to implement it directly into these systems. However, in practice, most forms of calculations that ignore failures require those very surprises in conversions. For example, problems that arise from getting false after division by zero, will not end up at the intersection of the type [4] check border, which fails due to an attempt to convert a boolean to a number. These boundaries are more common in the Hack codebase. Together with the simple ability to describe these types, Hack in practice reduces the “stopping distance” for many incorrect launches.
  4. Finally, the contradictions in the standard library remain. Most in Hack hope that they can make this problem less painful by wrapping it in safer abstractions.

Hack provides an opportunity that other popular members of the MPDPL family do not have: the ability to introduce a type system after the main development, and only partially, in those parts of the system where the value outweighs the price.

HHVM


Hack was originally developed as part of the HipHop virtual machine , or HHVM, an open source virtual environment for PHP. HHVM provides another important option for a successful project: the ability to launch your website faster and more economically. Facebook reports a performance gain of 11.6 times on processor efficiency over the PHP interpreter, and Wikipedia reports an acceleration of 6 times.

Slack recently translated its web environments to HHVM and received significant reductions in latency at all exit points, but we lacked apples-to-apples-style measurements on processor loads at the time of this writing. We are also in the process of moving our code base to Hack and will report our experience here.

Looking forward


We started with the obvious paradox that PHP is a very bad language that is used in many successful projects. We believe that his reputation as a poor language, in isolation, is quite deserved. The success of projects using it has much in common with the basic properties of the PHP environment, and the possibility of accelerated development, which also provides PHP. And the benefits of this environment (reduced number of bugs through error isolation; safe parallelism; high bandwidth of programmers) are more valuable than the problems that arise from the shortcomings of the language.

In addition, unlike other members of the MPDPL family, it provides a clear path to migrate to a more productive, secure, and served environment in the form of Hack and HHVM. Slack is in the final stages of the transition to HHVM, and in the early stages of the transition to Hack, and we are optimistic, as they allow us to produce better software in a faster time.



Notes (they are also from the developer’s blog):

[1] I came up with the term MPDPL. While there are few genetic links between them, these languages ​​have strongly influenced each other. Looking at the last syntax, you can see that they have much more in common than differences. In the universe of MIPS , Haskell, C ++, Forth, and Erlang assembly programming languages, it is difficult to deny that MPDPLs form a dense cluster in the space of language designs. [back to text]

[2] Unfortunately, & is indicated in the received, not in the calling value. So when a programmer announces his desire to get parameters by reference, this is not displayed in any way. This makes the code difficult to understand and analyze what may change, and makes it difficult to work effectively with PHP. See Figure 2 in dl.acm.org/citation.cfm?id=2660199 . [back to text]

[3] Yes, Hack is a virtually non-mobile name for a programming language. Hacklang is sometimes used when ambiguity is possible. If Google can call its popular language even more non-Google Go, then why not? [back to text]

[4] These types of checks in the Hack program are also applied at run-time by default, since they operate on the basis of the PHP type hint system. This increases the security of mixed codebase, where Hack and classic PHP are mixed with each other. [back to text]

Source: https://habr.com/ru/post/314970/


All Articles