📜 ⬆️ ⬇️

Black holes in the development of Web project

It's about PHP, JavaScript and MySQL as a starting point. I will also give some figures of performance tests and time losses that can kill a project, using the example of one of the products that I recently had to open to find problem areas, and show how to kill a project in three steps.



Preface.


Recently, I was given the task to help find problems in one of the web projects created as usual in PHP + MySQL, and all this is also wrapped up in the Symfony Framework. The database began to grow strongly, as people collected behavioral events (for example, a fleet of vehicles) that flowed literally every 5 minutes. Naturally, the event table has grown, and in order for MySQL to cope with it at least somehow, it was broken into partitions. In the end, it all came down to all sorts of samples and reports, i.e. analytics. As a result, even a simple sample for the period, plus a small calculation, took from 11 seconds and more. Apparently, therefore, it was decided to limit the selected period in days.
')


And so it went:

Problem 1 - Base


The base was chosen MySQL. Apparently the fact that it is free, bribes everyone. But I consider weighing on the third-tier base, heavy analytics, I consider the first mistake and the most important one. On a test sample of 700K entries in the event table, the sample for the period increased to 30 seconds . All the complexity was in the data. The events table had a “different type of event” entry with the ON or OFF property, i.e. the beginning and the end of a certain event. Everything is pretty standard, and anyone would easily make a sample:

SELECT ev1.event_point,ev1.event_date, (select event_date from test_events where ev1.event_point = event_point and event_type = (CASE WHEN ev1.event_type = 1 THEN 2 ELSE 1 END) and event_date > ev1.event_date limit 1) as end_date FROM test_events as ev1 WHERE ev1.event_date BETWEEN '2012-1-1' AND '2012-1-31' AND ev1.event_type IN (1 , 2) 


for example and test, one type of event 1 beginning and 2 end was used, although in the base there are about 40 pairs. As a result, simply selecting the records for the period, with the addition of the end of this event immediately, takes about 20-30 seconds from MySQL.
For comparison, the free database, SQL Server 2008 in the Web edition, was also taken, all of which was launched on MS Server 2008 R2 in the VirtualBox virtual environment. I will say right away that the Web edition, though free, does not have a number of important options, especially important for analytics, for example, there is no optimization and caching for View and Procedure.
All tests showed that MS SQL easily executes a query in less than 1 second with 1KK data.
Next, it was necessary to make a simple calculation and choose how long those or other types of events took for a certain piece of time. This is not difficult to calculate at the SQL server level, we impose a condition on top and we get a ready-made report:

 SELECT event_point,SUM(TIMESTAMPDIFF(HOUR,event_date,end_date)) FROM ( SELECT ev1.event_point,ev1.event_date, (select event_date from test_events where ev1.event_point = event_point and event_type = (CASE WHEN ev1.event_type = 1 THEN 2 ELSE 1 END) and event_id > ev1.event_id limit 1) as end_date FROM test_events as ev1 WHERE ev1.event_date BETWEEN '2012-1-1' AND '2012-1-31' AND ev1.event_type IN (1 , 2) ) report GROUP BY event_point 


Such an example on the same MS SQL database is performed instantly, in less than 1 second. But with MySQL you can not wait for the result, increasing the period, the base went tight. Apparently, those who collected the project understood this and decided to stop processing the sample right in PHP. And in order not to lose everything as well 20-30 seconds per JOIN, it was decided to make a simple SELECT for the period, throw everything into Array and there, by running, find the beginning and the end and then easily give the amounts. What was done, in the end, I again came to a dead end, the growth of the period began to take up to 1 minute. on the calculation. And I began to study the second problem, trying to understand why just running an Array takes such a long time. I was hoping that there would be just a typo or mistake, but it turned out that just running Array takes a lot of time. Here is the second problem.

Problem 2 - Array in PHP


Having studied all the data, I realized that PHP cannot quickly handle large arrays. After reading in various forums, I saw discussions confirming this version. This is a PHP problem. It was decided to write a simple test to verify the facts.

 <?php $summary[] = array(); $count = 5000; for($i1=0;$i1<$count;++$i1){ $summary[$i1] = $i1; } $t = microtime(true); for($i1=0;$i1<$count;++$i1){ for($i2=0;$i2<$count;++$i2){ if("5468735354987"!="654654655465"){ $summary[$i1] = $i1*$i2; } } } echo "<li>time: ".(microtime(true)-$t).' ms</li>'; $sum = 0; for($i1=0;$i1<$count;++$i1){ $sum = $sum + $summary[$i1]; } echo "<li>test["+$sum+"]"; ?> 


The code was created based on the task in the project. In this test, the search is 5000 * 5000 = 25KK cycles, and even ++ ++ i was used instead of $ i ++ to raise the speed. As a result, this test gave 11 seconds . Attempting to run this code just directly in the console without a Web server, gave me 10 seconds . And it all started on my computer and not on the host or virtual machine, and all with the configuration: Intel Core i5, 8GB. PHP 5.3.9
Realizing that for PHP this is the limit that you can squeeze, I decided to check this code on other platforms, having a virtual machine with MS Server 2008 R2 on hand. The test code was easily transferred to ASP, ASPX, WSC, VBS and NodeJS. As a result, I received the following data:

image

In the screenshot we see three test options. 1) virtual machine 2) my computer 3) servers available to me on the Internet.

1. PHP surprisingly gave very close results, considering that everything was run perfectly on different resources and capacities.
2. ASP is a close analogue of PHP that gave a worse result, and it also depends on the capacity of the environment.
3. WCS was very disappointing, given that Microsoft once described it as a compiled version of ASP, which should work much faster, which turned out to be completely different.
4. VBS is a purely console version of the script, although it showed better results than PHP, but for a web project it is unacceptable.
5. ASPX showed just great results. Here I am not surprised, all the same it is C #
6. NodeJS also issued, as expected, just excellent performance.

Here are all the script options:

test.php
 <?php $summary[] = array(); $count = 5000; for($i1=0;$i1<$count;++$i1){ $summary[$i1] = $i1; } $t = microtime(true); for($i1=0;$i1<$count;++$i1){ for($i2=0;$i2<$count;++$i2){ if("5468735354987"!="654654655465"){ $summary[$i1] = $i1*$i2; } } } echo "<li>time: ".(microtime(true)-$t).' ms</li>'; $sum = 0; for($i1=0;$i1<$count;++$i1){ $sum = $sum + $summary[$i1]; } echo "<li>test["+$sum+"]"; ?> 


test.asp
 <% count = 5000 Dim summary(5000) for i1=0 to count summary(i1) = i1 next t = timer() for i1=0 to count for i2=0 to count if ("5468735354987"<>"654654655465") then summary(i1) = i1*i2 end if next next response.write timer()-t sum = 0 for i1=0 to count sum = sum + summary(i1) next response.write "sum:"&sum response.write "<br>test-wsc<br>" Dim obj Set obj = GetObject("script:"&Server.MapPath("test.wsc")) Call obj.run() %> 


test.wsc
 <?xml version="1.0"?> <component> <?component error="true" debug="true"?> <registration description="test" progid="test" version="0.1" classid="{13e4b1b3-c698-40ea-8450-9cbc9b33ef03}" > </registration> <public> <method name="run"/> </public> <implements type="ASP" id="ASP"/> <script language="VBScript"> <![CDATA[ Function run() count = 5000 Dim summary(5000) for i1=0 to count summary(i1) = i1 next Dim t : t = timer() for i1=0 to count for i2=0 to count if ("5468735354987"<>"654654655465") then summary(i1) = i1*i2 end if next next response.write (timer()-t) sum = 0 for i1=0 to count sum = sum + summary(i1) next response.write "sum:"&sum End Function ]]> </script> </component> 


test.vbs
 count = 5000 Dim summary(5000) for i1=0 to count summary(i1) = i1 next t = timer() for i1=0 to count for i2=0 to count if ("5468735354987"<>"654654655465") then summary(i1) = i1*i2 end if next next msgbox (timer()-t) sum = 0 for i1=0 to count sum = sum + summary(i1) next msgbox "sum:"&sum 


test.aspx
 <%@ Page validateRequest="false" Debug="true" %> <%@ Import Namespace="System.IO" %> <%@ Import Namespace="System.Data" %> <%@ Import Namespace="System.Diagnostics" %> <%@ Import Namespace="System.Threading" %> <script language="C#" runat="server"> public void Page_Load(Object sender, EventArgs E) { int count = 5000; int[] summary = new int[5000]; for (int i1 = 0; i1 < count; i1++) { summary[i1] = i1; } Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); for (int i1 = 0; i1 < count; ++i1) { for (int i2 = 0; i2 < count; ++i2) { if("5468735354987"!="654654655465"){ summary[i1] = i1*i2; } } } stopwatch.Stop(); Response.Write("<li>time: " + (stopwatch.ElapsedMilliseconds) + " ms</li>"); long sum = 0; for (int i1 = 0; i1 < count; ++i1) { sum = sum + summary[i1]; } Response.Write("<li>test[" + sum + "]"); } </script> 


test_nodejs.js
 var fs = require('fs'), http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/html'}); summary = new Array(); count = 5000; for(var i1=0;i1<count;++i1){ summary[i1] = i1; } var start = new Date().getTime(); for(var i1=0;i1<count;++i1){ for(var i2=0;i2<count;++i2){ if("5468735354987"!="654654655465"){ summary[i1] = i1*i2; } } } res.write("<li>time: "+(new Date().getTime() - start)+" ms"); var sum = 0; for(i1=0;i1<count;++i1){ sum = sum + summary[i1]; } res.write("<li>sum: ["+sum+"]"); res.end(); }).listen(1337, '127.0.0.1'); console.log('Server running at http://127.0.0.1:1337/'); 



The data obtained suggested that JavaScript in the browser is similar to ASPX and NodeJS, so it was decided to check the same thing just in the browser. Again, the code is instantly converted to a test under HTML and we get the following code:

 <!DOCTYPE html> <html> <head> <script> function testClear() { var summary = []; var count = 5000; for(var i1=0;i1<count;++i1){ summary[i1] = i1; } var start = new Date().getTime(); for(var i1=0;i1<count;++i1){ for(var i2=0;i2<count;++i2){ if("5468735354987"!=="654654655465"){ summary[i1] = i1*i2; } } } document.getElementById("testClear").innerHTML = "time: "+(new Date().getTime() - start)+" ms"; var sum = 0; for(i1=0;i1<count;++i1){ sum = sum + summary[i1]; } alert("sum: ["+sum+"]"); } </script> </head> <body> <p><a href="#" onclick="testClear();"><div>test clear:<span id='testClear'></span></div></a></p> </body> </html> 


Running in the browser gave me 75 ms , which is just fine. The solution is clear right away - cut the array processing from PHP and transfer it to the client in the array, and there to do the calculation. What was done ... BUT! .. time at startup again turned out to be over 10 sec . All tests and searches in the code, brought me to a new problem called Prototype .

Problem 3 - Prototype and AJAX


The fact is that in the project interface, everything was built on the version with AJAX. Those. developers have done in the spirit of Web 2.0, fashionable and beautiful. The user selects data selection options, clicks CREATE REPORT and receives a loaded report by Ajax below. Everyone thinks it is now standard, to issue information without rebooting. Here lies the treacherous enemy.
It turns out that Prototype has two options for loading the page, for which the evalScripts [true / false] parameter is responsible. When (evalScripts = false), Prototype loads the information and displays it on the screen simply as text, and naturally all the blocks with JavaScript will not be executed. This is where the dynamically loaded pages are processed and evalScripts = true is set, and it just parses with the execution of eval () for the script part. As a result, instead of a clean code with its 75 ms, we get about 11 seconds !!! For me it was a shock. It turns out that you can not download the Ajax code and execute it. And this fear was confirmed as soon as I calculated the specific report in the interface header. Those. in the loaded page, left Array issued by PHP and made an external function call with the calculation from the global environment passing the array as a parameter, and got its 75 ms back .

This fact prompted me to write another test, with the same code, but to test different browsers and also compare it with the Ajax behavior in JQuery .

In the end, I got the following data:

image

Code samples can be seen here:

test.htm
 <!DOCTYPE html> <html> <head> <script> function testClear() { var summary = []; var count = 5000; for(var i1=0;i1<count;++i1){ summary[i1] = i1; } var start = new Date().getTime(); for(var i1=0;i1<count;++i1){ for(var i2=0;i2<count;++i2){ if("5468735354987"!=="654654655465"){ summary[i1] = i1*i2; } } } document.getElementById("testClear").innerHTML = "time: "+(new Date().getTime() - start)+" ms"; var sum = 0; for(i1=0;i1<count;++i1){ sum = sum + summary[i1]; } alert("sum: ["+sum+"]"); } </script> </head> <body> <p><a href="#" onclick="testClear();"><div>test clear:<span id='testClear'></span></div></a></p> </body> </html> 


test_jquery.htm
 <!DOCTYPE html> <html> <head> <script type='text/javascript' src='jquery.js'></script> <script> function test() { $('#test').html('loading...'); $.get('test_ajax.html', function(data) { $('#test').html(data); }); } </script> </head> <body> <p><a href="#" onclick="test();"><div>test jquery ajax:<span id='test'></span></div></a></p> </body> </html> 


test_prototype.htm
 <!DOCTYPE html> <html> <head> <script type='text/javascript' src='prototype.js'></script> <script> function testFn() { $('test').update('loading...'); new Ajax.Updater('test','test_ajax.html', { method: 'get', asynchronous:true, evalScripts:true, onSuccess: function(transport) { $('test').update('in process...'); }, onFailure: function(transport) { alert("Error"); } }); } </script> </head> <body> <p><a href="#" onclick="testFn();"><div>test prototype ajax:<span id='test'></span></div></a></p> </body> </html> 


test_ajax.html
 <script> summary = new Array(); count = 5000; for(i1=0;i1<count;++i1){ summary[i1] = i1; } var start = new Date().getTime(); for(var i1=0;i1<count;++i1){ for(var i2=0;i2<count;++i2){ if("5468735354987"!="654654655465"){ summary[i1] = i1*i2; } } } document.getElementById("test").innerHTML = "time: "+(new Date().getTime() - start)+" ms"; var sum = 0; for(i1=0;i1<count;++i1){ sum = sum + summary[i1]; } alert("sum: ["+sum+"]"); </script> 



As a result, the picture came out pretty scary.
1. Why 64bit version of IE with clean code gave such a loss of speed, working in 64bit environment, I did not understand
2. The speed of Javascript from under the Prototype is just awful. Between the clean code and the option with Ajax, the difference is huge.
3. jQuery surprised too. Although not as much as the Propotype, it is also far from the pure execution of the JS code.
4. Further analysis leads to a complete dead end. Why Chrome and FireFox gave such terrible results compared to IE. Moreover, IE9 was in a virtual machine, and IE10 was already just on the computer.

There were no other options to double-check. What was at hand, and then used. You can take and check for yourself. But the situation is, in principle, clear.

Problem 4 - String Replace


Literally at the time of this writing, I came across one more problem, but already in another web project, and already on ASP. There, the base issued reports, and as a result, an output of about 3.5Mb was obtained, with placed markers for subsequent filling. As a result, only the operation with Replace () took about 11 seconds. The breakdown into pieces with a run, as well as a regular schedule, did not yield a speed increase. As a result, another test was written to compare different platforms. Having a small HTML report, so under 3.5Mb in size, I decided to feed it to the code and perform replays under the same conditions in order to compare it in the same way as with Array.

So, we have a test file of 3.5 M in size, with tags placed in the text for the test, as well as the same version of the code imported to different platforms.

Having banished the test, we get the following picture:

image

In this test (local host) it’s just UsbWebServer running on my computer, the rest are different providers that were accessed.

1. ASP has huge problems with strings. Regular version gave the result only worse, as well as breaking into small pieces and processing. There is also a huge dependence on the server and its capacity.
2. It also turned out with an ASPX script. In a virtual machine, it gave 13 seconds , but on a simple web server, in the simplest version of a web host that offers 1and1, the speed dropped 4 times, even though I ran the script in a virtual machine with dedicated resources for it.
3. PHP again showed a certain stability, but now the Kiev hoster failed, apparently there were very few resources allocated for the host, or the server was busy at night.
4. NodeJS gave the best result in this race. But, unfortunately, I was not pleased, as more than 2 seconds - this is quite critical, despite the fact that the report was not the largest. So the problem with parsing had to be solved by searching for other methods.

Code options:

test_replace.asp
 <% 'time: 13.58594 'time: 58.01563 Set FSO = CreateObject("Scripting.FileSystemObject") Set File = FSO.GetFile(Server.MapPath("test_replace.txt")) Set Reader = FSO.OpenTextFile(File, 1,False) buffer = Reader.ReadAll Reader.Close t = timer() For rec=1 to 10 For col=0 to 100 buffer = Replace(buffer,"{{"&rec&":"&col&"}}","-|-") Next Next response.write "time: "&timer()-t response.write "<hr>" response.write buffer %> 


test_replace.aspx
 <%@ Page validateRequest="false" Debug="true" %> <%@ Import Namespace="System.IO" %> <%@ Import Namespace="System.Data" %> <%@ Import Namespace="System.Diagnostics" %> <%@ Import Namespace="System.Threading" %> <script language="C#" runat="server"> public void Page_Load(Object sender, EventArgs E) { //time: 8736 ms //time:[1and1] 16183 ms string buffer = File.ReadAllText(Server.MapPath("test_replace.txt")); Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); for (int rec = 0; rec < 10; rec++) { for (int col = 0; col < 100; col++) { buffer = buffer.Replace("{{" + rec + ":" + col + "}}", "-|-"); } } stopwatch.Stop(); Response.Write("<li>time: " + (stopwatch.ElapsedMilliseconds) + " ms</li>"); Response.Write(buffer); } </script> 


test_replace.php
 <?php //time: 4.570 ms //time:ks.ua 10.3491752148 ms //time:servmax 5.7612838745117 ms $buffer = file_get_contents('test_replace.txt'); $t = microtime(true); for($rec=0;$rec<10;++$rec){ for($col=0;$col<100;++$col){ $buffer = str_replace( "{{".$rec.":".$col."}}","-|-", $buffer ); } } echo "time: ".(microtime(true)-$t).' ms'; echo "<hr>"; echo $buffer; ?> 


test_replace_nodejs.js
 var fs = require('fs'), http = require('http'); http.createServer(function (req, res) { fs.readFile('test_replace.txt', function (err, data) { data = String(data); if (err) { res.writeHead(404); res.end(JSON.stringify(err)); return; } res.writeHead(200, {'Content-Type': 'text/html'}); var start = new Date().getTime(); for (var rec = 0; rec < 10; ++rec) { for (var col = 0; col < 100; ++col) { data = data.replace(new RegExp( "{{" + rec + ":" + col + "}}", "g" ), "-|-"); } } console.log(new Date().getTime()-start); res.end(data); }); }).listen(1337, '127.0.0.1'); console.log('Server running at http://127.0.0.1:1337/'); 



Total


PS I wanted to describe briefly, issuing only numbers, but without explanation, it would probably be incomprehensible to someone. So I apologize for the long text, well, the style of writing, for me it is still quite a big problem ... I am not a writer.

PSS Very confused by the tendency to load all projects by default on MySQL. Still, this is the base of the third echelon, and it is more suitable for Wordpress, forum and other minor tasks. The fact that the base is free, in the end goes much more expensive. Instead of doing all the analytics right in the database, instructing it to be engaged in optimization and caching, the developers are engaged in writing PHP code kilometers instead of a single query in the database. If the question is only in price, then take MS SQL Server in the Web edition, which is also free, but it will give you all the advantages of analytics, or use Oracle, after all, these are high-end databases. PostgresSQL did not produce an acceptable result, so it was immediately dropped for further testing. If someone suddenly finds it interesting, here is an analogue of the SQL Server code that you can export to this or that database:

test.sql
 CREATE TABLE test_events ( event_id int IDENTITY (1, 1), event_type int, event_point int, event_date datetime ) -------------- ALTER TABLE [test_events] ADD CONSTRAINT [IDX_event_type] DEFAULT ((1)) FOR [event_type] GO ALTER TABLE [test_events] ADD CONSTRAINT [IDX_event_point] DEFAULT ((1)) FOR [event_point] GO ALTER TABLE [test_events] ADD CONSTRAINT [IDX_event_date] DEFAULT (NULL) FOR [event_date] GO -------------- DECLARE @counter bigint = 0; DECLARE @event_type smallint = 1; -------------- BEGIN TRANSACTION; WHILE @counter < 1000000 BEGIN IF @event_type=1 SET @event_type=2 ELSE SET @event_type=1 INSERT INTO test_events (event_type,event_point,event_date) VALUES (@event_type,ROUND(5 * RAND() + 100,0),DATEADD(MINUTE,@counter*10,'1/1/2011')) SET @counter = @counter + 1 END COMMIT TRAN; -------------- SET NOCOUNT ON SELECT event_point,SUM(DATEDIFF(hour,event_date,end_date)) FROM ( SELECT ev1.event_point,ev1.event_date, (select top 1 event_date from test_events where ev1.event_point = event_point and event_type = (CASE WHEN ev1.event_type = 1 THEN 2 ELSE 1 END) and event_id > ev1.event_id) as end_date FROM test_events as ev1 WHERE ev1.event_date BETWEEN '2012-1-1' AND '2012-1-31' AND ev1.event_type IN (1 , 2) ) report GROUP BY event_point 



Thanks to all! I hope that this information will be useful to you.

Source: https://habr.com/ru/post/162885/


All Articles