Hybrid PHP / Go Application Development Using RoadRunner

The classic PHP application is single-threaded, heavy loading (unless of course you write on microframes) and the inevitable death of the process after each request ... Such an application is heavy and slow, but we can give it a second life by hybridization. To speed up - we demonize and optimize memory leaks to achieve better performance - we will introduce our own Golang RoadRunner PHP application server to add flexibility - we simplify the PHP code, expand the stack and share responsibility between the server and the application. In essence, we will make our application work as if we were writing it in Java or another language.

Thanks to hybridization, a previously slow application stopped suffering from 502 errors under load, the average response time to requests decreased, productivity increased, and deployment and assembly became easier due to unification of the application and getting rid of unnecessary bindings in the form of nginx + php-fpm.

Anton Titov ( Lachezis ) is CTO and co-founder of SpiralScout LLC with 12 years of active commercial development experience in PHP. Over the past few years, he has been actively implementing Golang on the company's development stack. Anton spoke about one example at PHP Russia 2019 .

PHP Application Life Cycle

Schematically, the device of an abstract application with a certain framework looks like this.
')

When we send a request to a process, it happens:

project initialization;
loading shared libraries, frameworks, and ORMs;
loading libraries required for a specific project;
routing;
request routing to a specific controller;
response generation.

This is the principle of operation of a classic single-threaded application with a single entry point, which after each execution is completely destroyed or clears its state. All code is unloaded from memory, the worker is cleared, or simply resets its state.

Lazy-loading

The standard and simple way to speed up is the implementation of the Lazy-loading system or On-demand-loading libraries.

With Lazy-loading we request only the necessary code.

When accessing a specific controller, only the necessary libraries will be loaded into memory, processed, and then unloaded. This allows you to reduce the average response time of the project and greatly facilitate the process of working on the server. In all the frameworks we are currently using, the principle of lazy loading is implemented.

Cache frequent calculations

The method is more complicated and actively used, for example, in the Symfony framework, template engines, ORM schemes, and routing. This is not caching like memcached or Redis for user data. This system warms up parts of the code in advance . At the first request, the system generates code or a cache file, and at subsequent requests, these calculations, necessary, for example, to compile a template, will no longer be performed.

Caching significantly speeds up the application , but at the same time complicates it . For example, there are problems with invalidating the cache and updating the application. Do not confuse the user cache with the application cache - in one, the data changes over time, in the other only when the code is updated.

Processing request

When a request is received from an external PHP-FPM server, the request entry point and initialization will match.

It turns out that the client’s request is the state of our process.

The only way to change this state is to completely destroy the worker and start over with a new request.

This is a single-threaded classic model with its advantages.

All workers at the end of the request die.
Memory leaks, race condition, deadlocks are not inherent in PHP. You can not worry about it.
The code is simple: we write, process the request, die and move on.

On the other hand, for each request, we completely load the framework, all the libraries, perform some calculations, recompile the templates. With each request in a circle we produce a lot of manipulation and unnecessary work.

How it works on the server

Most likely, a bunch of nginx and PHP will work. Nginx will work as a reverse proxy: give users part of the statics, and delegate part of the requests to the PHP process manager PHP-FPM below. Already the manager raises a separate worker for the request and processes it. After that, the worker is destroyed or cleared. Next, a new worker is created for the next request.

Such a model works stably - the application is almost impossible to kill. But under heavy loads, the amount of work for initializing and destroying workers affects the system performance, because even for a simple GET request, we often have to pull a bunch of dependencies and re-raise the database connection.

Speeding up the application

How to speed up the classic application after introducing cache and Lazy-loading? What other options are there?

Turn to the language itself .

Use OPCache. I think no one is running PHP on production without OPCache enabled?
Wait for RFC: Preloading . It allows you to preload a set of files into a virtual machine.
JIT - seriously accelerates the application on CPU-bound tasks. Unfortunately, with tasks related to databases, it will not help much.

Use alternatives . For example, the HHVM virtual machine from Facebook. It executes code in a more streamlined environment. Unfortunately, HHVM is not fully compatible with PHP syntax. As an alternative, kPHP compilers from VK or PeachPie, which completely converts code to .NET C #, are an alternative.

Fully rewrite to another language. This is a radical option - completely get rid of code loading between requests.

You can completely store the state of the application in memory , actively use this memory for work, and forget about the concept of a dying worker and completely clear the application between requests.

To achieve this, we remove the entry point, which used to be together with the initialization point, deep into the application.

Transferring entry point - demonization

This is creating an infinite loop in the application: an incoming request - run it through the framework - we generate a response to the user. This is a serious saving - all bootstrapping, all framework initialization is performed only once, and then several requests are processed by the application.

We adapt the application

Interestingly, we can focus on optimizing only that part of the application that will run in runtime : controllers, business logic. In this case, you can abandon the Lazy-loading model. We will take part of the bootstrapping of the project to the beginning - at the time of initialization. Preliminary calculations: routing, templates, settings, ORM schemes will inflate initialization, but in the future they will save processing time for one specific request.

I do not recommend compiling templates when downloading a worker, but downloading, for example, all configurations is useful.

Compare Models

Compare the demonized (left) and classic models.

The demonized model from the moment the process was created until the moment the response is returned to the user takes longer. The classic application is optimized for quick creation, processing and destruction.

However, all subsequent requests after warming up the code are much faster. The framework, application, container is already in memory and ready to accept requests and respond quickly.

Problems of the long-lived model

Despite the advantages, the model has a set of limitations.

Memory leaks. The application lies in memory for a very long time, and if you use the "curves" of the library, the wrong dependencies or global states - the memory will begin to leak. At some point, a fatal error will appear that will break the user's request.

The problem is solved in two ways.

Write accurate code, use proven libraries.
Actively monitor workers. If you suspect that memory is leaking inside the process, proactively change it to an analog with a lower limit, that is, simply to a new copy that has not yet managed to accumulate uncleaned memory.

Data leaks . For example, if during an incoming request we save the current user of the system in some global variable and forget to reset this variable after the request, then there is a chance that the next user of the system will accidentally gain access to data that he should not see.

The problem is solved at the application architecture level.

Do not store an active user in a global context. All data that is specific to the request context is discarded and cleared before the next request.
Handle session data carefully. Sessions in PHP - with the classical approach, this is a global object. Wrap it correctly so that it is reset on a subsequent request.

Resource management .

Monitor connections to the database. If the application hangs in the memory for a month or two, then the open connection will most likely close in this time: the database will be re-installed, rebooted or the firewall will reset the connection. At the code level, consider reconnect or, after each request, reset the connection and re-raise it at the next request.
Avoid long-lived file lock. If your worker writes some information to a file, there is no problem. But if this file is open and has a lock on it, then no other process in your system will have access to it until the lock is released.

Explore the long-lived model

Consider the long-lived worker model — demonizing an application — and explore ways to implement it.

Non-blocking approach

We use asynchronous PHP - we load the application once into memory and process incoming HTTP requests inside the application. Now the application and the server are one process . When the request arrives, we create a separate coroutine or in the event loop we give a promise, process it and give it to the user.

The undeniable advantage of the approach is maximum performance. It is also possible to use interesting tools, for example, configure WebSocket directly on your application .

However, the approach significantly increases the complexity of the development . You need to install ELDO, remember that not all database drivers will be supported, and the PDO library is excluded.

To solve problems in the case of demonization with a non-blocking approach, you can use well-known tools: ReactPHP , amphp and Swoole - an interesting development in the form of a C-extension. These tools work quickly, they have a good community and good documentation.

Blocking approach

We do not raise coroutines inside the application, but do it from the outside.

We just pick up a few application processes , as PHP-FPM would do. Instead of transmitting these requests in the form of a process state, we deliver them from the outside in the form of a protocol or messaging.

We write the same single-threaded code that we know, we use all the same libraries and the same PDO. All the hard work of working with sockets, HTTP, and other tools is done outside of a PHP application .

Of the minuses: we must monitor the memory and remember that communication between two different processes is not free , but we need to transfer data. This will create a slight overhead.

To solve the problem, there is already a PHP-RM tool that is written in PHP. On the ReactPHP library, it has integration with several frameworks . However, PHP-PM is very slow, it leaks memory at the server level and under load it does not show as much growth as PHP-FRM.

We write our application server

We wrote our application server , which is similar to PHP-RM, but there is more functionality. What did we want from the server?

Combine with existing frameworks. We would like to have flexible integration with almost all frameworks on the market. I don’t feel like writing a tool that works only in a particular particular case.

Different processes for server and application . Possibility of a hot reboot, so that during local development, press F5 and see the new updated code, as well as be able to expand them individually.

High speed and stability . Still, we are writing an HTTP server.

Easy extensibility . We want to use the server not only as an HTTP-Server, but also for individual scenarios like a queue server or a gRPC server.

Work out of the box wherever possible: Windows, Linux, ARM CPU.

Ability to write very fast multi-threaded extensions specific to our application.

As you already understood, we will write in Golang.

RoadRunner Server

To create a PHP server, you need to solve 4 main problems:

Establish communication between Golang and PHP processes.
Process management: the creation, destruction, monitoring of workers.
Balancing tasks - efficient distribution of tasks to workers. Since we are implementing a system that blocks an individual worker for a particular specific incoming task, it is important to create a system that would quickly say that the process has finished work and is ready to accept the next task.
HTTP stack - sending HTTP request data to the worker. It is a simple task to write an incoming point to which the user sends a request, which is passed to PHP and returned.

Variants of interaction between processes

First, let's solve the communication problem between Golang and PHP processes. We have several ways.

Embedding: embedding a PHP interpreter directly in Golang. This is possible, but requires a custom PHP assembly, complex setup, and a common process for the server and PHP. Like in go-php , for example, where the PHP interpreter is integrated into Golang.

Shared Memory - The use of shared memory space, where processes share this space . It takes painstaking work. When exchanging data, you will have to synchronize the state manually and the amount of errors that may occur is quite large. Shared Memory also depends on the OS.

Writing your transport protocol - Goridge

We went along a simple path that is used in almost all solutions on Linux systems - we used the transport protocol. It is written on top of the standard PIPES and UNIX / TCP SOCKETS .

It has the ability to transfer data in both directions, detect errors, and also tag requests and put headers on them. An important nuance for us is the ability to implement the protocol without dependencies both on the side of PHP and Golang - without C-extensions in a pure language.

As with any protocol, the foundation is a data packet. In our case, the packet has a fixed header of 17 bytes.

The first byte is allocated to determine the type of packet. This can be a stream or a flag that indicates the type of data serialization. Then two times we pack the data size into Little Endian and Big Endian. We use this legacy to detect transmission errors. If we see that the size of the packed data in two different orders does not match, most likely a data transfer error has occurred. Then the data is transmitted.

In the third version of the package, we will get rid of such a legacy, introduce a more classical approach with a checksum, and also add the ability to use this protocol with asynchronous PHP processes.

To implement the protocol in Golang and PHP, we used standard tools.

On Golang: encoding / binary libraries and io and net libraries for working with standard pipes and UNIX / TCP sockets.

In PHP: the familiar function for working with binary data pack / unpack and the extensions streams and sockets for pipes and sockets.

An interesting side effect arose during implementation. We integrated it with the standard Golang net / rpc library, which allows us to call service code from Golang directly in the application.

We write a service:

//  sample type  struct{} // Hi returns greeting message. func (a *App) Hi(name string, r *string) error { *r = fmt.Sprintf("ll, %s!", name) return nil }

We call it a small amount of code from the application:

 <?php use Spiral\Goridge; require "vendor/autoload.php"; $rpc = new Goridge\RPC( new Goridge\SocketRelay("127.0.0.1", 6001) ); echo $rpc->call("App.Hi", "Antony");

PHP Process Manager

The next part of the server is the management of PHP workers.

Worker is a PHP process that we constantly monitor from Golang. We collect the log of its errors in the STDERR file, communicate with the worker via the Goridge transport protocol, and collect statistics on memory consumption, task execution, and blocking.

The implementation is simple - this is the standard functionality of os / exec, runtime, sync, atomic. To create workers we use Worker Factory .

Why Worker Factory? Because we want to communicate both on standard pipes and on sockets. In this case, the initialization process is slightly different. When creating a worker who communicates by pipe, we can create it immediately and send data directly. In the case of sockets, you need to create a worker, wait until it reaches the system, make a PID handshake, and only then continue working.

Task balancer

The third part of the server is the most important for performance.

For implementation, we use the standard Golang functionality - a buffered channel . In particular, we create several workers and put them in this channel as a LIFO stack.

Upon receiving tasks from the user, we send a request to the LIFO stack and ask for the first free worker to be issued. If the worker cannot be allocated for a certain amount of time, then the user receives an error of the type “Timeout Error”. If the worker is allocated - it gets out of the stack, is blocked, after which it receives the task from the user.

After the task is processed, the response is returned to the user, and the worker stands at the end of the stack. He is ready to carry out the next task again.

If an error occurs, then the user will receive an error, as the worker will be destroyed. We ask Worker Pool and Worker Factory to create an identical process and replace it on the stack. This allows the system to work even in the event of fatal errors by simply re-creating workers by analogy with PHP-FPM.

As a result, it turned out to implement a small system that works very quickly - 200 ns for the allocation of the worker . It is able to work even in case of fatal errors. Each worker at one point in time processes only one task, which allows us to use the classic blocking approach .

Proactive monitoring

A separate part of both the process manager and the task balancer is the proactive monitoring system.

This is a system that once a second polls workers and monitors indicators: it looks at how much memory they consume, how much they are in, whether they are IDLE. In addition to tracking, the system monitors memory leaks. If the worker exceeds a certain limit, we will see it and carefully remove it from the system before a fatal leak occurs.

HTTP stack

The last and simple part.

How is it implemented:

raises an HTTP point on the Golang side;
we receive a request;
convert to PSR-7 format;
send the request to the first free worker;
Unpack the request into a PSR-7 object;
we process;
we generate the answer.

For implementation, we used the standard Golang NET / HTTP library . This is a well-known library with many extensions. Able to work both over HTTPS and over the HTTP / 2 protocol.

On the PHP side, we used the PSR-7 standard . It is an independent framework with many extensions and Middlewares. The PSR-7 is immutable in design , which fits well with the concept of long-lived applications and avoids global query errors.

Both structures in both Golang and PSR-7 are similar, which significantly saved time for mapping a request from one language to another.

To start the server requires a minimum binding :

 http: address: 0.0.0.0:8080 workers: command: "php psr-worker.php" pool: numWorkers: 4

Moreover, from version 1.3.0 the last part of the config can be omitted.

Download the server binary file, put it in the Docker container or in the project folder. Alternatively, globally we write a small configuration file that describes which pod we are going to listen to, which worker is the entry point, and how many are required.

On the PHP side, we write a primary loop that receives a PSR-7 request, processes it, and returns a response or an error back to the server.

 while ($req = $psr7->acceptRequest()) { try { $resp = new \Zend\Diactoros\Response(); $resp->getBody()->write("hello world"); $psr7->respond($resp); } catch (\Throwable $e) { $psr7->getWorker()->error((string)$e); } }

Assembly To implement the server, we chose an architecture with a component approach. This makes it possible to assemble the server for the needs of the project, adding or removing individual pieces depending on the requirements of the application.

 func main() { rr.Container.Register(env.ID, &env.Service{}) rr.Container.Register(rpc.ID, &rpc.Service{}) rr.Container.Register(http.ID, &http.Service{}) rr.Container.Register(static.ID, &static.Service{}) rr.Container.Register(limit.ID, &limit.Service{} // you can register additional commands using cmd.CLI rr.Execute() }

Use cases

Consider the options for using the server and modifying the structure. To begin, consider the classic pipeline - the server’s work with requests.

Modularity

The server receives the request to an HTTP point and passes it through a set of Middleware, which are written in Golang. An incoming request is converted to a task that the worker understands. The server gives the task to the worker and returns it back.

At the same time, the worker, using the Goridge protocol, communicates with the server, monitors its status and transfers data to it.

Middleware on Golang: authorization

This is the first thing to do. In our application, we wrote Middleware to authorize a user by JWT token . Middleware is written similarly for any other type of authorization. A very banal and simple implementation is to write Rate-Limiter or Circuit-Breaker.

Authorization is quick . If the request is not valid - just do not send it to the PHP application and do not spend resources on processing useless tasks.

Monitoring

The second use case. We can integrate the monitoring system directly into Golang Middleware. For example, Prometheus, to collect statistics on the speed of response points, the number of errors.

You can also combine monitoring with application-specific metrics (available as standard with 1.4.5). For example, we can send the number of requests to the database or the number of processed specific requests to the Golang server, and then to Prometheus.

Distributed Tracing and Logging

We write Middleware with a process manager. In particular, we can connect to the realtime system for monitoring logs and collect all the logs in one central database , which is useful when writing distributed applications.

We can also tag requests , give them a specific ID and pass this ID to all downstream services or communication systems between them. As a result, we can build a distributed trace and see how the application logs go.

Record your query history

This is a small module that records all incoming requests and stores them in an external database. The module allows you to make replay requests in the project and implement an automatic testing system, a load testing system, or just checking the operation of the API.

How did we implement the module?

We process part of the requests for Golang . We write Middleware in Golang and we can send part of the requests to Handler, which is also written in Golang. If any point in the application is worrying in terms of performance, we rewrite it to Golang and drag the stack from one language to another.

We are writing a WebSocket server . Implementing a WebSocket server or push notification server is becoming a trivial task.

Golang service at the server level.
For communication we use Goridge.
Thin service layer in PHP.
We implement the notification server.

We receive a request and raise a WebSocket connection. If the application needs to send some kind of notification to the user, it launches this message via the RPC protocol to the WebSocket server.

Manage your PHP environment. When creating a Worker Pool, RoadRunner has full control over the state of environment variables and allows you to change them as you like. If we are writing a large distributed application, we can use a single source of configuration data and connect it as a system to configure the environment. If we raise a set of services, all these services will knock on one single system, configure and then work. This can greatly simplify the deployment, as well as get rid of .env files.

Interestingly, the env variables that are available inside the worker are not global within the system. This slightly improves container safety.

Golang library integration in PHP

We used this option on the official website of RoadRunner . This is an integration of a practically full-fledged database with full-text search BleveSearch inside the server.

We indexed the documentation pages: we placed them in Bolt DB, after which we performed a full-text search without a real database like MySQL, and without a search cluster like Elasticsearch. The result was a small project where some of the functionality is in PHP, but the search is in Golang.

Implementing Lambda Functions

You can go further and completely get rid of the HTTP layer. In this case, implementing, for example, Lambda functions is a simple task.

For implementation, we use the standard AWS runtime for the Lambda function. We write a small binding, completely cut out the HTTP servers and send the data in binary format to the workers. We also have access to the environment settings, which allows us to write functions that are configured directly from the Amazon admin panel.

Workers are in memory for the entire life of the process, and the Lambda function after the initial request remains in memory for 15 minutes. At this time, the code does not load and responds quickly. In synthetic tests, we received up to 0.5 ms per one incoming request .

gRPC for PHP

The more difficult option is to replace the HTTP layer with the gRPC layer. This package is available on GitHub .

We can completely proxy all incoming Protobuf requests to a subordinate PHP application, there they can be unpacked, processed and answered back. We can write code in both PHP and Golang, combining and transferring functionality from one stack to another. The service supports Middleware. The standalone application can work as well as in conjunction with HTTP.

Queue server

The last and most interesting option is the implementation of the queue server .

On the PHP side, all we do is get a binary payload, unpack it, do the work, and tell the server about the success. On the Golang side, we are fully engaged in managing connections with brokers. It could be RabbitMQ, Amazon SQS or Beanstalk.

On the Golang side, we implement the “ Graceful shutdown” of workers. We can beautifully wait for the implementation of the “durable connection” - if the connection with the broker is lost, the server waits for a while using the “back-off strategy”, it lifts the connection and the application does not even notice it.

We can process these requests in both PHP and Golang, and queue them on both sides:

from the PHP side through the Goridge protocol Goridge RPC;
from Golang - communicating with the SDK library.

If payload falls, then not the entire Consumer falls, but only one separate process. The system immediately raises it, the task is sent to the next worker. This allows you to perform non-stop tasks.

We implemented one of the brokers directly in the server memory and used the Golang functionality. This allows us to write an application using queues before choosing the final stack. We lift the application locally, run it, and we have queues that work in memory and behave the same way they would behave on RabbitMQ, Amazon SQS or Beanstalk.

When using two languages in such a hybrid combination, it is worth remembering how to separate them.

Separate domain domains

Golang is a multi-threaded and fast language that is suitable for writing infrastructure logic and user monitoring and authorization logic.

It is also useful for implementing custom drivers for accessing data sources - these are queues, for example, Kafka, Cassandra.

PHP is a great language for writing business logic.

This is a good system for HTML rendering, ORM and working with the database.

Tool comparison

Several months ago on Habré compared PHP-FPM, PHP-PM, React-PHP, Roadrunner and other tools. The benchmark was held on a project with real Symfony 4.

RoadRunner under load shows good results and is ahead of all servers. Compared with PHP-FPM, the performance is 6-8 times more.

In the same benchmark, RoadRunner did not lose a single request, everything was worked out 100%. Unfortunately, React-PHP lost 8-9 requests under loads - this is unacceptable. We would like the server not to crash and to work stably.

Since the publication of RoadRunner in the public domain on GitHub, we have received more than 30,000 installations. The community has helped us write a specific set of extensions, improvements and believe that the solution has the right to life.

RoadRunner is good if you want to significantly speed up the application, but are not yet ready to jump into asynchronous PHP . This is a compromise that will require a certain amount of effort, but not as significant as a complete rewrite of the code base.

Take RoadRunner if you want more control over the PHP life cycle , if there aren’t enough PHP capabilities, for example, for the queue system or Kafka, and when your popular Golang library solves your problem, which doesn't exist in PHP, and writing takes time, which you don’t have either.

Summary

What we got by writing this server and using it in our production infrastructure.

They increased the reaction speed of application points by 4 times compared to PHP-FPM.
Completely got rid of 502 errors under loads . At peak loads, the server just waits a little longer and responds as if there were no loads.
After optimizing memory leaks, workers hang in memory for up to 2 months . This helps when writing distributed applications, since all requests between services are already cached at the socket level.
We use Keep-Alive. This significantly speeds up communication between a distributed system.
Inside the real infrastructure, we put everything in the Alpine Docker in Kubernetes . The deployment system and project builds are now easier. All that is required is to build a custom RoadRunner build for the project, put it in the Docker project, fill in the Docker image, and then calmly upload our pod to Kubernetes.
According to the actual timing of one of the projects to individual points that do not have access to the database, the average response time is 0.33 ms .

The next professional conference for PHP developers PHP Russia only next year. For now, we offer the following:

Pay attention to GolangConf if you are interested in the Go part and want to know more details or hear arguments in favor of switching to this language. If you yourself are ready to share your experience - most likely send abstracts .
Take part in HighLoad ++ in Moscow, if everything is important for you that is associated with high performance, submit a report by September 7, or book a ticket.
Subscribe to the newsletter and the telegram channel in order to receive an invitation to PHP Russia 2020 earlier than others.

Source: https://habr.com/ru/post/461827/

All Articles