OpenResty: we turn NGINX into a full-fledged application server
We again publish the transcript of the report from the HighLoad ++ 2016 conference, which was held in Skolkovo near Moscow on November 7—8 last year.Vladimir Protasov tells how to extend the functionality of NGINX using OpenResty and Lua.
Hello everyone, my name is Vladimir Protasov, I work in Parallels. I'll tell you a little about myself. Three quarters of my life I write code. I became a programmer to the core in the literal sense: I sometimes see code in my dreams. A quarter of life - industrial development, writing code that goes straight into production. The code that some of you use, but do not know about it.
So that you understand how bad everything was. When I was a little junior, I came, and they gave me such a double-byte database. It is now here at all highload. I went to the conference, asked: “Guys, tell me, do you have big data, is everything cool? How many bases do you have there? ”They answered me:“ We have 100 gigabytes! ”I said:“ Cool, 100 gigabytes! ”And I thought to myself how to keep the poker interface neatly. You think, yes, guys are cool, and then you come back and pick around with these multi-terabyte databases. And this is being a junior. Imagine what a blow it is?
')
I know more than 20 programming languages. This is what I had to figure out in the process of work. You get code on Erlang, C, C ++, Lua, Python, Ruby, something else, and you need to cut it all. In general, I had to. The exact number could not be calculated, but somewhere in the 20 the number was lost.
Since everyone present knows what Parallels is and what we do, I will not talk about how cool we are and what we do. I will tell you only that we have 13 offices around the world, more than 300 employees, development in Moscow, Tallinn and Malta. If you wish, you can take and move to Malta if it is cold in winter and you need to warm the backrest.
Specifically, our department writes in Python 2. We are engaged in business and we have no time to introduce modern technologies, so we suffer. We have Django, because everything is in it, and we took the excess and threw it out. Also MySQL, Redis and NGINX. We also have a lot of other cool stuff. We have MongoDB, we have bunnies running, we have nothing, but this is not mine, and I don’t do it.
Openresty
I told about myself. Let's see what I'm talking about today:
What is OpenResty and what is it eaten with?
Why invent another bike when we have Python, NodeJS, PHP, Go and other cool stuff that everyone is happy with?
And a few examples from life. I had to cut down the report, because I got it for 3.5 hours, so there will be few examples.
OpenResty is NGINX. Thanks to him, we have a full-fledged web server, which is written well, it works quickly. I think most of us use NGINX in production. You all know that he is fast and cool. It made cool synchronous I / O, so we don’t need to do anything, just like in Python we did a bicycle or gevent. Gevent is cool, awesome, but if you write a shared code and something goes wrong there, then you will go crazy with this gevent. I had experience: it took two whole days to figure out what went wrong there. If someone had not rummaged for several weeks before, had not found the problem, had not written on the Internet, and Google would not have found it, then we would have gone astray.
NGINX already has caching and static content. You do not need to bathe, how to do it humanly, so that you have somewhere not slowed down, so that you do not lose your descriptors somewhere. Nginx is very convenient to deploy, you do not need to think about what to take - WSGI, PHP-FPM, Gunicorn, Unicorn. Nginx set, admin given, they know how to work with it. Nginx processes requests in a structured way. I'll talk about this a little bit later. In short, he has a phase when he only accepted the request, when he processed and when he gave the content to the user.
Nginx is cool, but there is one problem: it is not flexible enough, even with all those cool chips that the guys crammed into the config, despite what you can configure. This power is not enough. Therefore, the guys from Taobao once, a long time ago, it seems, about eight years ago, they built Lua there. What does he give?
Size It is small. LuaJIT gives somewhere 100-200 kilobytes of overhead memory and minimum overhead performance.
Speed The LuaJIT interpreter is close to C in many situations, in some situations it loses Java, in some it overtakes it. For a while, he was considered the state of art, the coolest JIT compiler. Now there are more cool, but they are very heavy, for example, the same V8. Some JS interpreters and Java HotSpot are faster at some points, but still lose at some places.
Easy to learn . If you have, say, a Perl codebase, and you are not Booking, you will not find pearl barrels. Because they are not there, they were all taken away, but teaching them is long and difficult. If you want programmers on something else, you may have to retrain them too, or find them. In the case of Lua, everything is simple. Lua learns any junior for three days. It took me about two hours to figure it out. Two hours later, I already wrote the code in production. Somewhere in a week, he went straight into production and left.
As a result, it looks like this:
There are a lot of things. In OpenResty, they collected a bunch of modules, both Luash and Engin. And everything is ready for you - it works well.
Examples
Enough lyrics, go to the code. Here is a little Hello World:
What is there? this is an angling location. We are not steaming, do not write our routing, do not take some ready - we already have in NGINX, we live well and lazily.
content_by_lua_block is a block that says that we are giving content using a Lua script. Take the variable remote_addr and string.format it in string.format . This is the same as sprintf , only on Lua, only correct. And we give to the client.
As a result, it will look like this:
But back in the real world. In production, no one deploit Hello World. Our application usually goes to the database or somewhere else and most of the time is waiting for a response.
Just sitting and waiting. It's not very good. When 100.000 users arrive, it is very hard for us. Therefore, as an example, let's assign a simple application. We will look for pictures, for example, cats. Only we will not search just like that, we will expand the keywords and, if the user has searched for “kittens”, we will find cats, fluffies and so on. First we need to get the request data on the backend. It looks like this:
Two lines allow you to take GET parameters, no difficulty. Then, let's say, from the database with a label for the keyword and extension, we receive this information using a normal SQL query. It's simple. It looks like this:
We connect the resty.mysql library, which we already have in the kit. We do not need to put anything, everything is ready. Specify how to connect, and make a SQL query:
It's a little scary here, but it works. Here 10 is the limit. We pull out 10 records, we are lazy, we don’t want to show anymore. In SQL, I forgot about the limit.
Next we find pictures for all requests. We collect a batch of requests and fill in a Lua table called reqs , and we do ngx.location.capture_multi .
All these requests go in parallel, and we return the answers. The running time is equal to the slowest response time. If we all shoot back in 50 milliseconds, and we sent a hundred requests, then the answer will come in 50 milliseconds.
Since we are lazy and do not want to write HTTP processing and caching, we will force NGINX to do everything for us. As you saw, there was a url/fetch request, here it is:
We make a simple proxy_pass , specify where to cache, how to do it, and everything works for us.
But this is not enough, we still need to give the data to the user. The simplest idea is to serialize everything in JSON, easily, in two lines. We give Content-Type, we give JSON.
But there is one difficulty: the user does not want to read JSON. It is necessary to attract front-tenders. Sometimes we don't want to do this at first. Yes, and SEOs will say that if we are looking for pictures, they do not care. And if we give them some content, they will say that our search engines do not index anything.
What to do with it? Of course, we will give the user HTML. To generate handles - not comme il faut, therefore we want to use templates. For this there is a library lua-resty-template .
You probably saw three scary letters OPM. OpenResty comes with its package manager, through which you can put a bunch of different modules, in particular, lua-resty-template . This is a simple template engine, close to Django templates. There you can write code and substitute variables.
As a result, everything will look something like this:
We took the data and rendered the template again in two lines. User is happy, got seals. As we expanded the request, he also received a fur seal for kittens. Who knows, maybe he was looking for him, but he could not formulate his request correctly.
Everything is cool, but we are in development, and so far we don’t want to show users. Let's do the authorization. To do this, let's look at how NGINX processes the request in terms of OpenResty:
The first phase is access , when the user just arrived, and we looked at it by headlines, by IP address, by other data. You can immediately cut it off if we did not like it. This can be used for authorization, or if a lot of requests come to us, we can easily chop them up in this phase.
rewrite . We rewrite some query data.
content . We give content to the user.
headers filter . Substitute response headers. If we used proxy_pass , we can rewrite some headers before giving to the user.
body filter . We can replace the body.
log - logging. You can write logs in elasticsearch without an additional layer.
Our authorization will look something like this:
We’ll add this to the location we’ve described before, and stick this code in there:
We see if we have a cookie token. If not, then throw the authorization. Users are tricky and can guess that you need to put a cookie token. Therefore, we still put it in Redis:
The code for working with Redis is very simple and is no different from other languages. In this case, all the input / output that there, that here, it is not blocking. If you are writing synchronous code, it works asynchronously. Just like with gevent, just done well.
Let's do the authorization itself:
We say that we need to read the request body. We get POST-arguments, check that the login and password are correct. If wrong, then throw the authorization. And if correct, then write the token to Redis:
Do not forget to put a cookie, this is also done in two lines:
An example is simple, speculative. We certainly will not do a service that shows people seals. Although who knows us. So let's go over what can be done in production.
Minimalistic backend . Sometimes we need to give quite a bit of data into the backend: somewhere we need to substitute a date, somewhere we need to output some list, say how many users are on the site now, fasten a counter or statistics. Something so small. Minimal any pieces can be made very easily. This will be quick, easy and great.
Preprocessing data . Sometimes we want to embed advertising on our page, and we take this ad by API-requests. This is very easy to do right here. We do not load our backend, which already works hard. You can take and collect here. We can blind some JS or, on the contrary, unplug, preprocess something before giving it to the user.
Facade for microservice . This is also a very good case, I implemented it. Before that, I worked at Tenzor, which deals with electronic reporting, which provides reporting for about half of the legal entities in the country. We have made a service, many things have been done there using the same mechanism: routing, authorization, and more.
OpenResty can be used as glue for your microservices, which will provide a single access to everything and a single interface. Since microservices can be written in such a way that here you have Node.js, here you have PHP, here is Python, there is some kind of thing on Erlang, we understand that we don’t want to rewrite the same code everywhere. Therefore, OpenResty can stick to the front.
Statistics and analytics . Usually NGINX is at the entrance, and all requests go through it. It is in this place very convenient to assemble. You can immediately calculate something and throw somewhere, for example, the same Elasticsearch, Logstash, or simply write to the log and then send it somewhere.
Multi-user systems . For example, online games are also very good to do. Today in Cape Town, Alexander Gladysh will talk about how to quickly prototype a multiplayer game using OpenResty.
Request Filtering (WAF) . Now it is fashionable to do all sorts of web application firewall, there are many services that provide them. With the help of OpenResty, you can make yourself a web application firewall, which will simply and easily filter requests according to your requirements. If you have Python, then you understand that PHP will definitely don’t get you right, unless of course you’re using it from the console. You know that you have MySQL and Python. Probably, some directory traversal may try to do something and boot into the database. Therefore, you can filter dumb requests quickly and cheaply right at the front.
Community. Since OpenResty is built on the basis of NGINX, it has a bonus - this is the NGINX community . It is very large, and a decent part of the questions that you will have at first is already solved by the NGINX community.
Lua-developers . Yesterday I talked to the guys who came on the HighLoad ++ school day and heard that only Tarantool was written in Lua. This is not true, a lot of things are written on Lua. Examples: OpenResty, XMPP-server Prosody, game engine Love2D, Lua is scripted in Warcraft and in other places. There are a lot of Lua-developers, they have a large and responsive community. All my questions on Lua were resolved within a few hours. When you write to the mailing list, literally in a few minutes already a bunch of answers, paint what and how, what's what. It's great. Unfortunately, not everywhere is such a good spiritual community.
According to OpenResty there is a GitHub, there you can get an issue if something is broken. There is a mailing list on Google Groups, where you can discuss general issues, there is a newsletter in Chinese - you never know, maybe you don’t speak English, but you know Chinese.
Results
I hope I was able to convey that OpenResty is a very convenient framework, sharpened by the web.
It has a low entry threshold, since the code is similar to what we are writing, the language is quite simple and minimalist.
It provides asynchronous I / O without callbacks, we will have no noodles, as we can sometimes write in NodeJS.
It has easy deployment, since we only need NGINX with the necessary module and our code, and everything works right away.
Large and responsive community.
I did not tell in details how the routing is done, it was a very long story.