Short announcementI was going to add a couple of assumptions to the account of authorization and registration on the site, but either from lack of sleep, or from the numerous cups of coffee drunk today, I was drawn into the wilds. Sketched some notes. Maybe some of you will find something new for you, maybe someone will tell you a new rule for someone, maybe someone will correct me by giving me a lesson and others. Below are some notes of working with the database, several notes about the work of the server itself are even lower, and so on.
Let's get startedIn fact, usually almost every article I found on the Internet either tells about the registration / authorization mechanism on the site, or about the security of such a mechanism. This record is rather conditional and possibly includes only the main points of the registration / authorization mechanism logic on the site.
')
0) Anycolor user comments:
PHP is recommended to use the crypt function. About md5 it's time to forget. With the arrival of PCP 5.5, there will be already built-in functions for setting a password.Below, md5 is used, but I don’t think that crypt () will be a particular problem instead.
Moreover, as
alexkbs suggests:
All passwords from small Latin letters and numbers long from 4 to 7 digits, which are cached with salt in md5 (md5 ()) on a modern publicly available hardware, can be sorted out in less than a minute.
Link.Just as suggested by the user
charon, it makes sense to consider:
bcrypt is an adaptive cryptographic hash function designed to store password protection. Based on blowfish.
WikiPBKDF2 (English Password-Based Key Derivation Function) is a standard for key generation (English) based on a password.
Wiki1) Save passwords with salt.
Due to laziness, to rewrite everything, and the meaninglessness below is a link to the original post, but from it I will take only a part:
LinkIn order to avoid selecting a password by hash (some Internet services offer this service for a fee and free of charge), in case it went to the side (database leakage or something else).
We store the password in encrypted form, but adding a few random characters that are unique to each user (the so-called salt). Better yet, store double md5 with salt. Hacking such aforementioned way is almost impossible. In the user table, we need two fields:
1. salt field for salt storage;
2. password field to store the hash from md5 (md5 (password) + salt).
For example, user authorization:
// $user – user
if (md5(md5($_POST['password']).$user['salt']) == $user['password']) {
// . !
}
Where to take the salt? It is necessary to generate salt for each new user in the registration script. To do this, you can use the following function:
function generateSalt() {
$salt = '';
$length = rand(5,10); // ( 5 10 )
for($i=0; $i<$length; $i++) {
$salt .= chr(rand(33,126)); // ASCII-table
}
return $salt;
}
Various salting options in well-known engines:
md5 ($ pass. $ salt) - used in Joomla
md5 (md5 ($ pass). $ salt) - used in vBulletin
md5 (md5 ($ salt) .md5 ($ pass)) - used in new IP.Board
This method introduces a significant increase in security in the event of a database leak to the side.
Comment by
maximw :
The use of hash superposition though increases the hashing time, but at the same time increases the probability of collisions. Therefore, it is better to initially use a slow algorithm than a fast one several times. So, on pnp.net, it is not recommended to use md5 and sha1 algorithms, suggesting blowfish hashing.
PHP has a great uniqid () function for salt generation. Using the prefix you can easily and quickly get a guaranteed unique salt.2) The method of selecting a password is also essentially and methodically applied to admin privileges. user But so that such a method does not bring much pleasure (many will remember captcha and will do it right, but there is a tendency that whatever mechanism a captcha does, there are always algorithms for circumventing / identifying them, etc.). But in order to turn the selection of a password into a hell of a long and painful process, it is enough to add a delay in the authorization function in the form of the sleep () function until the user verification code itself. Someone adds cycles, etc., but I don’t think it’s worth bicycles. Some 10 seconds for an individual user is not significant, but when going through each attempt stretches for +10 seconds. At the same time 1000 attempts are already 10,000 seconds, which will be almost three hours.
But in this case it is necessary again to take into account the parallelization of authorization requests. That is, in fact, a person can immediately throw several requests for authorization, which will significantly speed up the selection. Therefore, it is necessary to limit this possibility, but so that several users sitting from under the same provider can still log in. In fact, we must somehow keep in memory the number of IPs that have just been authorized and delete them after a certain timeout. That is, in fact, for example, authorization within 15 seconds from one IP, in triplicate. If three instances of the same IP are present in the database, we sleep for a few seconds with the sleep function and again recheck, after which we issue either a page with the error “Exceeding the number of authorization attempts from the same IP. Try it in a couple of minutes, ”or we fall asleep and recheck again. There is already at will. The main thing to remember is that you should not overload the mechanism with surpluses either, it will respond in direct proportion to the load on the server. Actually, you can act even more cunningly, at the server firewall level, for example, the same iptables can limit the number of connections from the same IP and discard the exceeded limit. I will not give the rules due to the fact that the article is more of a rational nature than technical implementation.
As the user
youlose suggests :
sleep is also the wrong decision, because you will just have idle threads with a PHP interpreter, and it consumes a lot of memory, by the way this is another vector for DOS attacks, hehe, I’ll send your hosting with such protection through proxy down to Down. Read about limit_req_zone for nginx or mod_evasive for apache.It is worth considering.
So that the user is not afraid that the authorization is slow, it is enough to display a message about what is happening in the form of protecting the data of the users themselves and hanging the picture means that the process is being performed and not hung.
Another point is the implementation of the blocking of the account after several unsuccessful attempts of authorization by the user, which probably everyone already knows, but you should not forget.
As suggested by the user amarao , accounts can be specifically blocked. Apparently, you need to block by IP, and by timeout, remove the lock from IP.3) In view of the fact that I decided to speculate with what methods you can fill up the site, the next idea I had was the idea that the above method should be applied to the registration, since you can make fun of the base and score all the popular nicknames on the site so that users do not could register. Thus, in order to destroy the site, deprive it of life, we deprive it of new users. We are struggling with this as before. We limit by time and by IP registration. We register on a site a script of the users who are torn out from a file with any merged DB. As protection, in order to roll back such a database on the site, it is better to indicate the IP with which the nickname was registered, for example, in the LastIp field and it is natural to register with the indication of unique emails that need to be confirmed. Then such registered nicks can be quickly removed from the database, thus freeing the namespace for regular users.
Comment by
maximw :
To avoid “cybersquatting” of logins, it is enough to use confirmation email as login. And limit the time to confirm, for example, for days.4) To simplify the time of confirmation of the mailbox through a link during registration, it is easier to implement sending an automatically generated password to the specified mailbox so that the user can change the password to his own by logging into the site.
As the xnim user specified the standard password, by itself should already imply a sufficient degree of security of the password itself, randomization in the password and cryptographic methods must be used. For example in php:
$ pswd = substr (md5 (date ("l dS of FY h: i: s A"). rand (), 5,15);
5) Password recovery occurs only by sending a key to the mailbox first, then the user must enter the key received on the box and only then change the password to his own, or automatically generate a password to him and send, and he should let him change it after logging in to the site.
Comment
youlose which is worth considering:
You need to send a link where this key is inserted automatically, and when you go to the link, you will already have a password change form. The less a user has to enter any useless information, the more users your websites will have =)6) Naturally, in addition to salt and hashing, and not storing the password in the clear, along with it, you need to implement a check whether the user’s cookies were stolen. In fact, we check with a certain algorithm, for example:
if ($sess_key == md5(md5($ip).md5($uagent)) )
We check the user's browser and its IP address, if something has changed, we destroy the session:
session_destroy ();
7) Be sure to check the user-entered fields for code injection or SQL injection, for this there are various functions, for example:
strip_tags ()
stripslashes ()
htmlentities ()
If you need to leave a part of the HTML code there is a function:
string strip_tags (string str [, string allowable_tags])
This function attempts to return a str string with HTML and PHP tags cut out. It gives an error warning in the case of incomplete or false tags.
Also, the network can be found without any special problems libraries that clean out unnecessary tags, define unclosed and close, remove unnecessary pieces such as XSS attempt using JS code injection, and other functions.
8) Use GET requests only when really necessary, if any. That is, we mainly use POST, the advantages are obvious, as the main fact is that nothing extra is visible and the data is also hidden from prying eyes, and the url looks neat and does not prevent the user from throwing a clean link in case of anything. These are not the only benefits.
9) Passing the session identifier through cookies, otherwise it is indicated in the url, which essentially “dirty” the link, which search engines do not like, since for them each such url will be new, which will have consequences for the site.
10) Do not forget to log errors. Each exception should be noted in the logs so that you can see with what to understand and if someone is torturing your site.
11) Only the necessary data from the database. If you do this: select * from user; and the data will turn out to be lured out or you will write out of habit like this everywhere, sooner or later you will make a mistake and the data of users will begin to leak through the network. Secondly, the more data you pull out of the database, the longer the request is executed, and all this will certainly affect the performance under load. It is necessary to get used to working not as easier, but as it should be and then problems will not arise.
12) Generation of XML-file sitemap for search engines and meta tags. This will increase search results to some extent. The network has ready-made algorithms for generating meta tags from the material.
13) Use the framework or CMS / CMF, respectively task. Each site is a fairly large amount of work, a good foundation for these purposes greatly simplifies the life and portability of your framework (accumulated over the years) to your other projects in which there is a need. If, for example, this is just a blog, you can take one of the CMS, for example drupal, wordpress, and others, if the task is more global and the engine is quite difficult to rework, it is better to take a framework like kohana, yii, zend, and others. Such frameworks already contain database query designers, a caching system (which positively affects performance), a routing system, image processing, ORM, data validation, and the ability to connect additional modules developed by the authors or the community. This will significantly reduce the development and debugging time, due to the smoothness of the code and enough attention paid to performance.
14) Instead of using height and width attributes for images in the img tag, it’s better to save several samples of images with different parameters, for example full_img.png (full), preview_img.png (small copy), in this case you don’t have to transfer without needs large pictures (which will reduce the page loading speed and channel occupancy) and the browser will not have to scale it, which will affect the page display speed.
15) Indexing fields in the database. Creating indexed fields greatly affects the performance of both the database and the site as a whole. With a large number of requests that also greatly reduce the load on the server and reduce costs.
anycolor :
You need to index the fields wisely, otherwise you can add a lot of problems to yourself. Not every index brings good. First, you need to profile slow queries and watch them EXPLAIN and only after that make a decision about creating keys and which ones.16) Use caching for both data and queries. For example, in kohana 3, you can cache a query to the database by adding -> cached (30) to building the query, which caches the query for 30 seconds, and store entire amounts of data with caches of your choice, either with a file cache, or memcached, or sqlite, etc. . But do not forget that with the cache you need to adhere to the golden mean, due to the fact that the data may become outdated, for example, the user made an adjustment to the article and we must, in this case, reset this saved cache value so as not to give the old data to the user. Such a reaction, he certainly did not understand.
17) If you have a dedicated server, use nginx as a front-end, and apache2 (or alternatively) as a backend, dividing the load. About this much is written on the net.
18) Use xcache / eaccelerator / other, for opcode caching, which in some way also affects performance.
19) One of the basic rules is that normalizing a database and designing it is not just a word. This is a kind of science and the golden rule of those who do it professionally. You should not make a site on a database, in one table of which there are for example 80 fields for all occasions. This can already be considered a dead site. Logical partitioning of data into tables, use of keys and other normalization rules (of course you shouldn’t go too far with this) necessarily affect performance in the most direct way.
Comment by
maximw :
Now web development is very relevant denormalization of the database in order to speed up requests. It would be nice to mention it.Denormalization described in this post by user
JuliaTem20) One of the most important rules of administration sounds like this: "First, ban everything and only then allow only what you need." For many, the opposite is easier to “Allow all, and then prohibit,” but with this approach you will surely forget something that will seriously undermine the security of the site.
21) Using a JOIN in queries is better than a few separate queries. If you have a good deal with the JOIN operator, I think it’s pointless to explain the rest. The fewer requests to the database, the faster the site and less load on the server / hosting. Especially since the joining of queries by the JOIN operator should be carried out on the indexed fields.
Refinement from user
maximw :
Quite a controversial statement about JOIN. Very much depends on the specific situation. It happens, for example, two consecutive SELECTs are faster.If briefly the necessary tool needs to be used in the right place. =)
22) The integrity of the data in the database. You should always monitor the data in the database, so that you do not delete the user, but do not delete his articles, or delete the article itself without deleting comments to it. Thus, the base expands and trash accumulates in it, which is then harder to track and clean than calculate the entire logical chain at once.
23) Data is fragmented, which affects performance, and therefore the same data in the database from time to time must be defragmented. In the same MySQL, everything is already provided for this.
Link to FAQ .
Q. How to optimize storage in MySQL?
Clean up the “holes” (defragmentation), update the statistics and sort the indices:
OPTIMIZE TABLE tablename;
or use: myisamchk --quick --check-only-changed --sort-index --analyze
Attention, myisamchk needs to be started when mysqld is _not_ running, otherwise you need to use the mysqlcheck utility
(mysqlcheck --repair --analyze --optimize --all-databases --auto-repair)
Optimizer statistics update:
ANALYZE TABLE tablename;
or use: myisamchk --analyze
It is recommended to regularly perform:
isamchk -r --silent --sort-index -O sort_buffer_size = 16M db_dir / *. ISM
myisamchk -r --silent --sort-index -O sort_buffer_size = 16M db_dir / *. MYI
24) Track slow queries and try to optimize them as much as possible:
In the config mysql (my.cnf) you need to register the following two lines:
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 1We explain:
log_slow_queries - the file in which we save
long_query_time is the query execution time which we consider already sufficient to log. The number of seconds.
25) Consider different options ... Rather not technical detail, but ... Currently, there are many DBMS, like SQL, and NoSQL, frameworks, libraries and other things. All this is highly desirable to consider at the design stage. If your project grows into something powerful, all your mistakes will surely come up. Take even the same database engines in mysql database, there are enough of them to consider the advantages of those and others. And take for example MariaDB ... PostgreSQL. , … ? ? , . , … , , SQL , NoSQL .
26) . . , MySQL Apache2 Nginx. - .
27) frontend-. - . nginx . . .
28) , , , , . . , . Ext2/3/4, ReiserFS, FAT, NTFS, .
29) , /. , , , . sysctl (- Linux, BSD, ) . :
sysctl -a
:
sysctl net.ipv4.ip_forward=1
:
/etc/rc.local
, .
, .
/.
Nix- . , , .. , :
IO30) Systemd / . , Debian , . SystemD :
SystemD, :
systemd:
— , , . required, , , , , SysV, , .
- , systemd, . , :
— SysV , grep, find . systemd , .
— SysV sh-, systemd.
— ! cgroups . , , . .
— ! systemd . Yes! monit pid, ! , , .
— Systemd .
— systemd //_-_ .
— ! .., SysV, fstab /etc/init.d/-_ start/stop
— D-Bus.
31) . . . Gentoo . . Debian apt-build, , . make, . , , /, //_ , . , , . .
32) . , , . , ssh, . , fail2ban ssh , , iptables, host.deny IP . . Portsentry , , . chkrootkit — . iptables, . PHP .
PS .
PPS , .