📜 ⬆️ ⬇️

What I do not like modern web



The very first step in working with the web is sending data to your server application. And if the parsing of a dozen small lines can be entrusted to the framework, then what about loading files?

Take for example PHP, although the described is true for 99% of other languages ​​and technologies. Suppose we want to allow the user to upload pictures to the site, for this we make a file type field and ... Outwardly, everything is very simple, only a few bytes have changed in the form and in the code, but now instead of text from forms you can work with files! But not everything is so simple, your file must first be located somewhere in / tmp /, until the request is complete, your script just does not get control and you can not do anything with it. For example, instead of a picture, the user has downloaded the exe-file, but you will know about it only after the download is completed. Thus, an attacker for some time may clog the channel and time of your disk subsystem, pretending that it loads useful files, and you will not even know about it. If the caching server is in front of the application server, then the situation is even worse: for example, nginx creates temporary files, i.e. First, the request from the user will settle on the disk, as soon as it is completed, the file is re-read and quickly sent to the application server (in our case, php), where it is saved to disk BACK. Total triple use of the disc, even if the user just needs to display the message “you forgot to enter the captcha”.

I'm not talking about the fact that more fun things with this approach can not be done. A simple feature like the "file download indicator" becomes inaccessible. For more complex examples: Youtube shows footage from a movie that is still being downloaded, but not up to the end. You won’t get any control (and even notifications!) Before the entire movie (2 gigabytes, for example) is loaded. You will not even know that someone has burned 1.5 gigabytes of your disk, but then closed the browser or clicked the “refresh” button in the browser, without waiting for anything.
')
Of course, there are various crutches of varying degrees of curvature, which allow you to solve typical tasks, such as “getting request statistics via json”, implemented as web server modules, but such things have to be done independently and / or attached to the environment, the application ceases to be independent and grows addicted to specific applications and their libraries.

Cache


Caches are vital. The caching technique allows you to speed up the responsiveness of your site and reduce the load on it, allowing you not to do the same type of operation for multiple requests. For example, how many do not do 2 + 2, there will always be 4, but calculating 2 + 2 takes server resources, it is much more profitable to calculate this value 1 time, when the first visitor arrives, and then write down somewhere, so that all other users will issue this ready result.

Do not confuse this caching with http-headers, they only have an effect on a specific client (in the original also on intermediate proxies), while caching on the server is designed to give the same content to multiple users.

Alas, it’s not profitable to give caching to an intermediate server; somepage.html? someshit = 12345 will break through the cache and a new page will be formed, which will not even take this parameter into account, but nevertheless, there will be expenses on its creation, which can again be used by an attacker. Therefore, the intermediate cache server is now very few people use and rely on them is already very difficult.

It remains to cache everything inside the application, almost every framework provides its own crutches, and ready-made like all sorts of memeshchedov and the like shit. Usually, novice developers simply write with boiling water when they find out that when generating a page, you can make 500 requests to memshed and there will be nothing for it (unlike all your favorite mysql). As a result, the entire code is covered with wrappers, which first check the result of the request in memkesh, and then climb over the result in mysql. I do not argue, manual control over the cache is necessary, but full manual control leads to possible errors, where caching can simply be forgotten to include what, according to the law of meanness, will be a critical place.

Interfaces


What interface should the site have? Just do not say that greenbacks.

Previously, as a rule, the presentation of the site was one and indivisible. However, on the large portals there were buttons “version for printing”, or even a “wap-version”, which was later replaced by the “PDA-version” - already plain HTML, only more simple. Although looking where, if you take Twitter, then this is the only readable version of him. Time passed and there was a need to edit the sites not only for printers and phones, but also for all ipads and refrigerators with HTML5 support. Naturally, the developers loved it all, because they had to actually do 10 versions of the same site, and they decided to do something universal. Some kind of API for the site.

What is the API - I do not know. Honestly, I do not know. Usually this is some kind of shit when you spit on a server with a piece of urlencoded string, and in return you get a piece of json / xml / plain text, how lucky it is. Of course, no standards, even primitive data types can be anything, an empty result can also be anything from null to "" (empty quotes), or even no result at all. It is good if the authors read about such a word as REST and rushed to implement it, but against the background of the rest it does not make any sense. The functionality is also not clear, if by requesting an HTML page we can get everything at once (latest news, comments, likes, etc.), how many requests we need to make in the case of the API - only the author of this API knows, it is quite possible that comments can be obtained, but likes to them are not. In fact, an API is a way to separate client and server development into completely different development teams that interact poorly with each other. And there can be no talk about the usefulness of this API.

And often it happens that for access to the API requires a certain key. Keys can often be obtained for free. Why not get such a key? The problem is that this leads us to explicitly taking into account requests, to some internal accounting. And who knows when the author wants to monetize all this? It is possible that after a while the free keys will be turned off or severely restricted by offering to use the paid rate. And sometimes in general, the issuance of all keys is suspended, and service is stopped, even on a commercial basis, this also happens. Well, why API? It is easier for me to pull off a page and parse it with regulars, than to tolerate such a mess. Therefore I almost never use these your API, especially I am not going to deal with their features.

Source: https://habr.com/ru/post/426297/


All Articles