Prologue: To begin with, I will tell you about the project, so that there are ideas about how we worked on the project and to recreate the pain we felt.
I, as a developer, joined the project in 2015-2016, I don’t remember exactly, but he worked 2-3 years earlier. The project was very popular in its field, namely game servers. How strange it did not sound, but projects on game servers are being carried out to this day, I recently saw vacancies and worked a bit in one team. Since the game servers are built on an already created game, therefore, the script language that is built into the game engine is used for development.
We are developing a project from Garry's Mod (Gmod) almost from scratch, it is important to note that at the time of this writing, Harry is already creating a new S & Box project on the Unreal Engine. We still sit on the Source.
Which is generally not suitable for our server theme.

“What is your story scary?” - you ask.
')
We have a strong theme of the game server, namely “Stalker” and even with elements of role-playing games (RP), the question immediately arises - “And how to implement it all on one server?”.
Given that the Source engine is old (the 2013 version is used in Gmod also 32 bit), you can’t make big maps, small restrictions on the number of Entity, Mesh and many other things.
Who worked on the engine, will understand.
It turns out, the task is generally impossible, to make a pure multiplayer stalker with quests, RPG-elements from the original and preferably a small story.
First of all, the initial writing was difficult (many actions from the category: throwing out the subject, picking up the subject were written from scratch), hoping that it would be easier further, but the requirements grew. The mechanics of the game was ready, it remained to make the intellect, the agrade, and all sorts of things. In general, all endured as they could.

The problems began already during the operation of the first version of the release, namely (lags, server delays).
It seems a powerful server could easily handle requests and keep the whole Gamemode.
Simple gamemode descriptionThis is the name of a complex of scripts written to describe the mechanics of the server itself.
For example: we want the themes of the now popular "Royal Battles", which means that the name should also correspond to the mechanics of the game too. “Spawning players on the plane, you can pick up things, players can communicate, you can’t wear more than 1 helmet, etc.” - all this is described by the game mechanics on the server.
Lags were both on the server side due to the large number of players, since one player eats up a lot of RAM about 80-120 mb (not counting more items in the inventory, skills, etc.), and on the client side there was a strong decrease fps
CPU power was not enough for processing physics, it was necessary to use objects with physical properties less.
So even in addition were our samopisny scripts that were not optimized at all.

First of all, we of course read the article on optimization in Lua. It even reached the point where they wanted to write a DLL in C ++, but the problem arose in downloading the DLL from the server by the clients. Using C ++ for a DLL, you can write a program that quietly intercepts the data, the Gmod developers added an extension to the exceptions for clients to download (security, although in fact it never was). Although it would be convenient and Gmod would become more flexible, but more dangerous.
Then we looked at the profiler (since smart people wrote it) and there was horror in the functions, it was noted that initially there were very slow functions in the Gmod library.
If you tried to write in Gmod, then you know perfectly well that there is a library built-in called math.
And the slowest functions in it are of course math.Clamp and math.Round.
Having rummaged in the code of people, it was noticed that the functions were thrown in different directions, almost everywhere it is used, but incorrectly!
Let's get to practice. For example, we want to round off the coordinates of the position vector to move the entity (for example, the player).
local x = 12.5 local y = 14.9122133 local z = 12.111 LocalPlayer():SetPos( Vector( Math.Round(x), Math.Round(y), Math.Round(z) )
3 complex rounding functions, but nothing serious, unless of course in the cycle and not often used, but Clamp is even harder.
The following code is often used in projects and no one wants to change anything.
self:setLocalVar("hunger", math.Clamp(current + 1, 0, 100))
For example, self points to the player's object and it has a local variable we’ve invented that when reset to the server is reset to zero, math.Clamp is essentially like a loop, makes a smooth assignment, like a smooth interface to do on Clamp.
Problems arise when it works on every player who enters the server. It is rarely the case, but if 5-15 enter the server at once (depending on server configuration) at one point in time and this small and simple function starts working for everyone, then there will be good CPU delays on the server. Still worse if math.Clamp in a loop.
Optimization is actually very simple; you localize heavily loading functions. It seems primitive, but in 3 gamemode and many add-ons I saw this slow code.
If you need to get the value and use it in the future, do not get it again if it does not change. After all, a player entering the server in any case will get a hunger equal to 100, so this code is several times faster.
local value = math.Clamp(current + 1, 0, 100) self:setLocalVar("hunger", value)
All is well, they began to look further, that yes how it works. As a result, we started to optimize everything.
We noticed that the standard for cycle was slow and we decided to invent our own bike which would be faster (we didn’t forget about blackjack) and the game began.

SPOILERWe even managed to make the fastest loop on Lua Gmod, but on condition that there should be more than 100 elements.
Judging by the time spent on our cycle and its use in the code, we tried in vain to do this because it found application only in the spawn on the anomaly map after ejecting and cleaning them.
And so to the code. For example, you need to find all the entities with the name at the beginning of the anom, this is the name of the class we have anomalies.
Here is for the normal Lua Gmod scripter:
local anomtable = ents.FindByClass("anom_*") for k, v in pairs(anomtable) do v:Remove() end
Here is for the smoker:
It is immediately obvious that such a
g * code will be slower than the standard “for in pairs”, but as it turned out not.
local b, key = ents.FindByClass("anom_*"), nil repeat key = next(b, key) b[key]:Remove() until key != nil
For a complete analysis of these loop options, they need to be translated into a regular Lua script.
For example, anomtable will have 5 elements.
Removal is replaced by the usual addition. The main thing to see is the difference in the number of instructions between the two options for the implementation of a for loop.
Vanilla cycle:
local anomtable = { 1, 2, 3, 4, 5 } for k, v in pairs(anomtable) do v = v + 1 end
Our great:
local b, key = { 1, 2, 3, 4, 5 }, nil repeat key = next(b, key) b[key] = b[key] + 1 until key ~= nil
Let's take a look at the interpreter code (
like an assembler, it is not recommended to look under the spoiler as a high-level programmer ).
Just in case, remove the june from the screens. I warned.Vanilla cycle disassembler ; Name: for1.lua ; Defined at line: 0 ; #Upvalues: 0 ; #Parameters: 0 ; Is_vararg: 2 ; Max Stack Size: 7 1 [-]: NEWTABLE R0 5 0 ; R0 := {} 2 [-]: LOADK R1 K0 ; R1 := 1 3 [-]: LOADK R2 K1 ; R2 := 2 4 [-]: LOADK R3 K2 ; R3 := 3 5 [-]: LOADK R4 K3 ; R4 := 4 6 [-]: LOADK R5 K4 ; R5 := 5 7 [-]: SETLIST R0 5 1 ; R0[(1-1)*FPF+i] := R(0+i), 1 <= i <= 5 8 [-]: GETGLOBAL R1 K5 ; R1 := pairs 9 [-]: MOVE R2 R0 ; R2 := R0 10 [-]: CALL R1 2 4 ; R1,R2,R3 := R1(R2) 11 [-]: JMP 13 ; PC := 13 12 [-]: ADD R5 R5 K0 ; R5 := R5 + 1 13 [-]: TFORLOOP R1 2 ; R4,R5 := R1(R2,R3); if R4 ~= nil then begin PC = 12; R3 := R4 end 14 [-]: JMP 12 ; PC := 12 15 [-]: RETURN R0 1 ; return
Disassembler cycle cycle ; Name: for2.lua ; Defined at line: 0 ; #Upvalues: 0 ; #Parameters: 0 ; Is_vararg: 2 ; Max Stack Size: 6 1 [-]: NEWTABLE R0 5 0 ; R0 := {} 2 [-]: LOADK R1 K0 ; R1 := 1 3 [-]: LOADK R2 K1 ; R2 := 2 4 [-]: LOADK R3 K2 ; R3 := 3 5 [-]: LOADK R4 K3 ; R4 := 4 6 [-]: LOADK R5 K4 ; R5 := 5 7 [-]: SETLIST R0 5 1 ; R0[(1-1)*FPF+i] := R(0+i), 1 <= i <= 5 8 [-]: LOADNIL R1 R1 ; R1 := nil 9 [-]: GETGLOBAL R2 K5 ; R2 := next 10 [-]: MOVE R3 R0 ; R3 := R0 11 [-]: MOVE R4 R1 ; R4 := R1 12 [-]: CALL R2 3 2 ; R2 := R2(R3,R4) 13 [-]: MOVE R1 R2 ; R1 := R2 14 [-]: GETTABLE R2 R0 R1 ; R2 := R0[R1] 15 [-]: ADD R2 R2 K0 ; R2 := R2 + 1 16 [-]: SETTABLE R0 R1 R2 ; R0[R1] := R2 17 [-]: EQ 1 R1 K6 ; if R1 == nil then PC := 9 18 [-]: JMP 9 ; PC := 9 19 [-]: RETURN R0 1 ; return
Inexperienced, just by glancing, the normal cycle is faster, because there are fewer instructions (15 vs 19).
But we must not forget that every instruction in the interpreter has processor cycles.
Judging by the disassembled code in the first cycle there is a forloop instruction written in advance for working with an array, the array is loaded into memory becomes global, we jump on the elements and add a constant.
In the second variant, the method is different, which is more based on memory, it gets the table, changes the element, sets the table, checks for nil and calls it again.
Our second cycle is fast due to the fact that there are too many conditions and actions in one instruction (R4, R5: = R1 (R2, R3); if R4 ~ = nil then begin PC = 12; R3: = R4 end) because of this She eats a lot
and eats CPU cycles for execution, the last one is also more tied up with memory.
The forloop instruction with a large number of elements is surrendered to our cycle in the speed of passage of all elements. This is due to the fact that addressing directly to the address is faster, less than any buns from pairs. (And we have no denial)
In general, in secret, any use of the negative in the code slows it down; it has already been tested with tests and time. Negative logic will work more slowly since there is a separate “inverter” computing unit in the processor ALU, you need to contact the inverter to operate the unary operand (not,!) And this will take additional time.
Conclusion: Everything standard is not always better, your bikes can be useful, but again on a real project you shouldn’t invent them if you care about release speed. As a result, we have a complete development from 2014 to the present day, a sort of another “waiter”. Although it seems like an ordinary game server which is set up in 1 day and is fully configured for the game in 2 days, but you must be able to contribute something new.
This long-term project still saw the second version of itself where optimization is very much in the code, but I will tell you about other optimizations in the following articles. Support criticism or comment, correct if I am mistaken.