📜 ⬆️ ⬇️

Protection against SQL injection in PHP and MySQL

To my surprise, I did not find an exhaustive article on the topic of protection against injections on Habré. Therefore, I decided to write my own.

Somewhat extensive disclaimer, not directly related to the issue
Let's recognize the fact: the number of articles (and comments) on the topic of protection against SQL injections that have appeared on Habré lately, tells us that the glade is not so well trampled as some believe. Moreover, the repetition of the same mistakes suggests that some errors are too stable, and not just a listing of standard techniques is required, but a detailed explanation of how they work and in what cases should be used (and in which cases not).

The article turned out to be rather long - it contains research results for several years - but I will try to set out the most important information in a very compact way at the very beginning, and bring more detailed arguments and illustrations, as well as various curiosities and interesting facts, at the end. Also, I will try to finally dispel the multiple misconceptions and superstitions associated with the topic of protection from injections.
')
I will not try to portray the polyglot and write recommendations for all databases and languages ​​at once. I have enough experience only in web development, on a bunch of PHP / MySQL. Therefore, all practical examples and recommendations will be given for these technologies. Nevertheless, the theoretical principles outlined below are applicable, of course, for any other languages ​​and DBMS.

I’ll immediately reply to the standard remark about ORM, Active record and other query builders: first, all these wonderful tools are not born by magic of the sea foam, but are written by programmers using all the same sinful SQL. Secondly, we will be realistic: the listed technologies are good, but in practice, raw SQL is constantly encountered in our work - whether it is a legacy code or a spreading JOIN, which is translated to ORM - more expensive. So let's not bury our heads in the sand and pretend that there is no problem.

Although I have tried to highlight in detail all the nuances, but it is quite possible that some of my conclusions may not seem obvious. I fully admit that my context and reader contexts may vary. And things that seem self-evident to me are not so for some readers. In this case, I will be glad to questions and clarifications that will help me correct the article, making it more understandable and informative.

Having just started to be interested in the topic of protection against injections, I always wanted to formulate a set of rules that would be both exhaustive and compact. Over time, I managed it:

The rules, the observance of which guarantees us from injections


  1. We substitute data into the request only through placeholders
  2. identifiers and keywords are substituted only from the white list, written in our code.

Only two points.
Of course, the practical implementation of these rules needs more detailed coverage.
But this list has a great advantage - it is accurate and comprehensive. Unlike the mass-conscious rules of “driving user input through mysql_real_escape_string” or “always use prepared expressions,” my set of rules is not a catastrophic error (like the first) or incomplete (like the second).

But forward, reader - let's proceed to a detailed analysis.

Placeholders - Data Substitution

In principle, everything is simple: any data should fall into the query not directly, but through a certain representative, wildcard expression.
The request is written in the form, for example,
SELECT * FROM table WHERE id > ? LIMIT ? 
and the data is added and processed separately.
But why is this better than “ordinary escape”? And all:

Therefore, the best option would be to format the data immediately before the execution of the request - so we will always be sure that the data is formatted correctly, this is done only once, and the formatted data will go strictly according to its purpose - in the database and nowhere else.
And that's just the purpose of such - timely, safe and correct - data processing and placeholders serve, giving us a guarantee of security , at the same time simplifying the code.

Examples of using:

From the manual:
 $stmt = $dbh->prepare("SELECT * FROM REGISTRY where name LIKE ?"); $stmt->execute(array("%$_GET[name]%")); $data = $stmt->fetchAll(); 

An example of what you should strive for:
 $ban = $db->getRow("SELECT 1 FROM ban WHERE ip = inet_aton(s:)", $ip); 

As you can see, it is not difficult, and also - with skillful use - it turns out much shorter than making a request with your hands. Do you still want to write in the old-fashioned way?

Important note: Of course, data substitution through placeholders must always be done, regardless of the source of the data or any other conditions.

Identifiers and Keywords - Whitelists

The overwhelming majority of articles devoted to injections, completely overlook this moment. But the reality is that in it we are faced with the need to substitute in the query not only data, but also other elements - identifiers (field and table names) and even syntax elements, keywords. Even insignificant such as DESC or AND, but the safety requirements for such substitutions should still be no less strict!

Let us examine a rather banal case.
We have a database of products that is displayed to the user in the form of an HTML table. The user can sort this table by one of the fields, in any direction.
That is, at least from the user's side, the name of the column and the direction of sorting come to us.
Substitute them in the request directly - guaranteed injection. The usual formatting methods will not help here. Prepared expressions with neither identifiers nor keywords will lead to nothing but an error message.
The only solution is whitelisting .
This is, of course, not a Newton binomial, and many developers easily implement this paradigm along the way, faced for the first time with the need to substitute the field name into a query. However, the article on protection against injection without this rule will be incomplete, and the defense itself will be full of holes.

The essence of the method lies in the fact that all possible choices should be strictly written in our code, and only they should be included in the request , based on user input.

Application example:

 $order = isset($_GET['order']) ? $_GET['order'] : ''; //     $sort = isset($_GET['sort']) ? $_GET['sort'] : ''; $allowed = array("name", "price", "qty"); //  $key = array_search($sort,$allowed); //      $orderby = $allowed[$key]; //  (,     - ) . $order = ($order == 'DESC') ? 'DESC' : 'ASC'; //    $query = "SELECT * FROM `table` ORDER BY $orderby $order"; // 100%  

I used to assume that a placeholder is sufficient for identifiers. But over time, an understanding of the shortcomings of this method came:

So now I use both methods:
First I get the ID from the white list.
And then I add it through the placeholder - just to avoid manual formatting. And for uniformity. In this case, the last line will look like
 $query = "SELECT * FROM `table` ORDER BY n: $order"; 

In principle, this information is enough to start writing completely secure queries. But, as usual, in real life there are different nuances, and I would like to understand the mechanism of placeholders in more detail.

Work with placeholders


First you need to understand that there are two options for the implementation of placeholders - server and client:
Each of the methods has its advantages and disadvantages, which we consider below.
It should also be remembered that the PDO sits on these two chairs at the same time - by default it works according to the second variant, only emulating the first one. This functionality can be disabled by forcing the PDO to send data to the server separately from the request.
 $dbh->setAttribute( PDO::ATTR_EMULATE_PREPARES, false ); 
But since even emulation is done without the participation of a programmer, below we will consider PDO as a representative of server placeholders.

Server placeholders

To begin with, let's formulate - why are injections possible at all?
In fact, the SQL query is a program. A full program - with operators, variables and string literals. The problem is that we are collecting this program dynamically, on the go. Unlike our PHP scripts, which are written once and for all, and do not change based on the incoming data, the SQL query is dynamically re-generated each time. And, as a result, incorrectly formatted data can spoil the request, or even change it by substituting operators for which we have not anticipated. Actually, this is the essence of injection.

What does the server handling of placeholders offer us?
A very simple thing: we bring into our program such a thing as ... variables ! Yes, the placeholder is a common variable that is hard coded in our SQL “script” and does not change depending on the data. And the data itself goes to the server separately from the request, and never intersects with it. Only after the request is interpreted, the data will be used directly at the execution stage.
In practice, it looks like this: when you call prepare (), our request goes to the server directly in this form - with placeholders / variables, the server parses it and signals - “okay, ready to accept data” (well, or reports an error). And then, when executing execute (), data is already being sent to the server (and not in text form, but in a binary package, in a structure resembling the one in which the query result is returned), and participate directly in the execution.

In theory, it sounds very tempting.
However, in practice, unfortunately, in the existing libraries for working with Mysql in PHP, the implementation of working with prepared expressions is still very far from ideal.
It is enough to cite such, for example, facts:
These facts tell us that both libraries are still quite raw, and we can expect from them a lack of other necessary functionality.

We list the main disadvantages of existing libraries


and analyze them more

Verbosity.

Take such, for example, a demanded operation, like getting all the rows of the result of a query into a two-dimensional array. There is still no such function in mysqli. Or, say, to bind variables to placeholders there can only be a separate function.
As a result, it will take at least nine (!) Lines of code to retrieve the data of a single request:
 $data = array(); $query = "SELECT Name, Population, Continent FROM Country WHERE Continent=? ORDER BY Name LIMIT 1"; $stmt->prepare($query); $stmt->bind_param("s", $continent); $stmt->execute(); $result = $stmt->get_result(); while ($row = $result->fetch_array(MYSQLI_NUM)) { $data[] = $row; } 

Moreover, most of this code does not carry any semantic load, is absolutely the same for all executed queries and is repeated many times in hundreds of scripts.
while only two lines are enough to get data:
 $query = "SELECT Name, Population, Continent FROM Country WHERE Continent=? ORDER BY Name LIMIT 1"; $data = $db->getAll($query,$continent); 

PDO has a little better with this - they guessed to allow the transfer of an array to execute () and there is a method fetchAll (). But you still have to write a lot of unnecessary uniform code for the most simple operations.
In particular - let's return to the topic of our article - binding. Here I have the variable $ _GET ['id']; I want to tie her to a placeholder. Great, I can do it right in execute () ... but only by making it an array. What for? Is the driver himself unable to do this for me?

Function failure

Another problem that has already been raised in the comments to a recent article is the IN () operator. Making substitutions for it is a very nontrivial task. Although it would seem, it was for such cases that the placeholders were invented:
 $conts = array('Europe','Africa','Asia','North America'); $query = "SELECT * FROM Country WHERE Continent IN(?) ORDER BY Name LIMIT 1"; $data = $db->getAll($query,$conts); 

Instead of dozens of lines of clever code.
What conclusion can we draw from this? It is necessary to finish.

Inability to get SQL query

Someone will say that he does not need it, someone dodges and writes a request with his hands or uses sly software. We will respect the opinion of these people, but the fact remains that the functionality of the output of a ready request is convenient for debugging, and server placeholders do not allow it.

Performance

Usually, apologists for server-prepared expressions stress the fact that the request is parsed only once.
Unfortunately, this does not work for a web application. That copy of the script that executed prepare () executes execute () for this query exactly 1 time and safely dies. And a new copy of re-do prepare. As a result, there is more work where they wanted to save.
In the comments to a recent article, I was pointed to another potential speed gain - caching the query plan. Indeed, even if we prepare for each request, the database can cache the same request for different data. And get a execution plan without parsing, but with a simple string comparison!
Alas, I am not so strong in the internal MySQL device to confirm or deny both the presence of the mechanism itself and its practical effectiveness.
At the same time, as far as I know, under large real loads, server-prepared expressions lose speed in terms of standard SQL queries.
In any case, the topic is open and still waiting for its researcher. In the end, DBMS versions grow and test results become obsolete.

In general, we found out that the tools provided by the DBMS and the drivers for working with it were not as good as they appeared in the advertisement. And the question arises - can we implement the work with the placers on our own? And there is an answer - we can!

Self implementation of placeholders


In general, in fact, no one bothers us to improve the usability of existing libraries, without fussing on placeholders. Say, write a wrapper over PDO that implements the missing functionality using PDO's own placeholders (or mysqli, as was done in the library, published a year ago in Habré).

But we have several reasons to consider homemade placeholders:
First, as we have already seen, the set of placeholders available in standard libraries is not enough.
Secondly, server placeholders for some reason may not be suitable for us.
Thirdly, on the example of self-processing of placeholders, we consider the nuances of correctly formatting SQL queries.

Principles of formatting various SQL query elements

Implement homemade placeholders is a snap. The primitive parser is already built into PHP. The only thing we need is to learn to distinguish the various elements of the request. But this is a very important point, which is worth staying in more detail.
Since the formatting rules depend on the type of the element, we need, first, to clearly understand which element of the query we substitute into it. And secondly, we need to somehow communicate this information to the processor of the placers.

To begin with, let's determine which elements a query can consist of?
Take, for example, the following SQL:
 INSERT INTO `db`.`table` as `t1` VALUES('string',1,1.5,NOW()); 
It can be divided into three main groups of elements:
We will be interested in the last two points, since only they require special formatting.

Now we will think about how to communicate information about the type of inline data to our handler. The existing solutions make it crooked (which should not surprise you anymore). There are two options here: either you have to set the type by calling the binding function (which immediately complicates the code many times), or not to set it at all, as the PDO does, if you pass the data straight to execute (). But it is impossible to do without the type at all, and therefore all PDO data transmitted to execute is treated as strings. That leads to funny consequences if PDO works in compatibility mode: when you try to pass parameters for LIMIT to execute (), PDO will crash with an error message, in which you can see quotes that the fair library has framed with an offset and limit!

In general, you need another solution. And it is! But we'll talk about it below, but for now let's consider the formatting rules.

Identifier formatting

In general, the rules for naming identifiers are quite extensive . But given that we use white lists for security, and the placeholder only for formatting, these two will suffice:
 function escapeIdent($value) { return "`".str_replace("`","``",$value)."`"; } 

Lyrical digression:
Do I need or not need to format identifiers? After all, in most cases it is not required?
If the request is written by hand, the need can be determined on the spot: the request is working - you can not format it; crashes with an identifier - must be formatted.
If we use a placeholder - that is, we add the identifier to the request dynamically - it is necessary to format it, because we do not know which field name will be inserted into the request, and, therefore, whether it needs formatting or not. So we will format everything.

Actually, if we talk about the use of placeholders in general, it is precisely the consistent, without exception, application of the formatting rules and allows us to speak of guaranteed protection against injections and, which is important, of errors. After all, formatting is primarily done in order to ensure the syntactic correctness of the request. And protection from injections is just a side effect.
Therefore, I prefer to use the term "formatting" rather than "escaping"

Formatting string literals

The most beaten seemingly topic. , , , — . , SQL.



, ? , .
PHP, mysql_real_escape_string() : « , SQL-. » — , - !

PDO — , : PDO::quote() , — . :
  function escapeString($value) { return "'".mysqli_real_escape_string($this->connect,$value)."'"; } 

, « » mysqli_set_charset / mysql_set_charset , PDO — DSN.


, , .
, . , .
— PHP. , PHP_INT_MAX, . intval().

, PHP , - DECIMAL MySQL.

« »

, .
, — sprintf(). , . ! , , , . , , — PDO, mysqli . .

, sprintf . , sprintf , . — . , . , — -.
, , , . . , .

. , . , sprintf() — , .
? , - — ! , . , .

 function query(){ $query = array_shift($args); $query = str_replace("%s","'%s'",$query); foreach ($args as $key => $val) { $args[$key] = mysql_real_escape_string($val); } $query = vsprintf($query, $args); if (!$query) return FALSE; $res = mysql_query($query) or trigger_error("db: ".mysql_error()." in ".$query); return $res; } 

.
: , , , %s %d LIMIT, , ( ).

, printf() — a , , , .
, , , , , 3-4 .
 function dbget() { /* usage: dbget($mode, $query, $param1, $param2,...); $mode - "dimension" of result: 0 - resource 1 - scalar 2 - row 3 - array of rows */ $args = func_get_args(); if (count($args) < 2) { trigger_error("dbget: too few arguments"); return false; } $mode = array_shift($args); $query = array_shift($args); $query = str_replace("%s","'%s'",$query); foreach ($args as $key => $val) { $args[$key] = mysql_real_escape_string($val); } $query = vsprintf($query, $args); if (!$query) return false; $res = mysql_query($query); if (!$res) { trigger_error("dbget: ".mysql_error()." in ".$query); return false; } if ($mode === 0) return $res; if ($mode === 1) { if ($row = mysql_fetch_row($res)) return $row[0]; else return NULL; } $a = array(); if ($mode === 2) { if ($row = mysql_fetch_assoc($res)) return $row; } if ($mode === 3) { while($row = mysql_fetch_assoc($res)) $a[]=$row; } return $a; } ?> 


:
 $name = dbget(1,"SELECT name FROM users WHERE id=%d",$_GET['id']); //      $user = dbget(2,"SELECT * FROM users WHERE id=%d",$_GET['id']); //     $sql = "SELECT * FROM news WHERE title LIKE %s LIMIT %d,%d"; $news = dbget(3,$sql,"%$_GET[search]%",$start,$per_page); //   

, , : KISS , DRY.

, . .
, , . .

-, , . , - — , , , , .

-, . PDO .

 [az]:[az]* 

For example,
 i: 
or
 s:name 

, — .
, , — .

-, IN() ( )
 function createIN($data) { if (!is_array($data)) { throw new E_DB_MySQL_parser("Value for a: type placeholder should be array."); } if (!$data) { throw new E_DB_MySQL_parser("Empty array for a: type placeholder."); } $query = $comma = ''; foreach ($data as $key => $value) { $query .= $comma.$this->escapeString($value); $comma = ","; } return $query; } 

.
, . Why?
mysql IN(), .
- implode , : — , .

DmitryKoterov dbSimple — IN , , , . , , , .

Update:
, zerkms david_mz , , !
 IN(NULL) 
FALSE — .
, NULL:
 function createIN($data) { if (!is_array($data)) { throw new E_DB_MySQL_parser("Value for a: type placeholder should be array."); } if (!$data) { return 'NULL'; } $query = $comma = ''; foreach ($data as $key => $value) { $query .= $comma.$this->escapeString($value); $comma = ","; } return $query; } 

, :
, . , , .
, , . , .
</update>

C . — parse(), , SQL … (, , !). , — , :
 if (is_array($array) and $array) { $sql .= $db->parse(" AND type IN(a:)",$array); } 

WHERE:
 $w = array(); $where = ''; if (!empty($_GET['type'])) $w[] = $db->parse("type = s:", $_GET['type']); if (!empty($_GET['rooms'])) $w[] = $db->parse("rooms IN (a:)",$_GET['rooms']); if (!empty($_GET['max_price'])) $w[] = $db->parse("price <= i:", $_GET['max_price']); if (count($w)) $where = "WHERE ".implode(' AND ',$w); $data = $db->getArr("SELECT * FROM table $where LIMIT i:,i:",$start,$per_page); 

DbSimple , , , ( , , 100500 ), , 10 , - .


.
-, . , .
, : , . , . . .
, . , , ( !) , .

mysql_* deprecated PHP

.
, .

-, «», .
-, deprecated, discouraged, , . , .
-, . , — , . , - , . , , mysql .

, . , .
« mysql_*», ! mysqli_*, pdo_* «» API. — , : API , . mysql(i)_query(), mysql(i)_fetch_array() , , .
, , : API . , , . , , . , API , , — , … !
PDO , , , . PDO .
— mysql_* - . , .


— . , .
, . , — .


. , , , :

«»

«UNION», . . , - , , — , ? - , «» «-».


. « , „ “ !» — .
, «» . , . .
, , . , — . , , . magic_quotes, , , . , , , ( ).
, , — SQL .

Encodings

. .
, « » , mysql_real_escape_string() (!)
, , mysql_real_escape_string() « », . — latin1, , mysql_real_escape_string() mysql_escape_string().
2006 mysql_real_escape_string() mysql . , , — mysql_set_charset() . SET NAMES.

, . UTF-8 ( , addslashes()).

, . — , .
, .
MySQL.
 mysql> select version(); +---------------------+ | version() | +---------------------+ | 5.0.45-community-nt | +---------------------+ 1 row in set (0.00 sec) mysql> CREATE TABLE users ( -> username VARCHAR(32) CHARACTER SET GBK, -> password VARCHAR(32) CHARACTER SET GBK, -> PRIMARY KEY (username) -> ); Query OK, 0 rows affected (0.08 sec) mysql> insert into users SET username='ewrfg', password='wer44'; Query OK, 1 row affected (0.02 sec) mysql> insert into users SET username='ewrfg2', password='wer443'; Query OK, 1 row affected (0.03 sec) mysql> insert into users SET username='ewrfg4', password='wer4434'; Query OK, 1 row affected (0.00 sec) 

Php
 <pre><?php echo "PHP version: ".PHP_VERSION."\n"; mysql_connect(); mysql_select_db("test"); mysql_query("SET NAMES GBK"); $_POST['username'] = chr(0xbf).chr(0x27).' OR username = username /*'; $_POST['password'] = 'guess'; $username = addslashes($_POST['username']); $password = addslashes($_POST['password']); $sql = "SELECT * FROM users WHERE username = '$username' AND password = '$password'"; $result = mysql_query($sql) or trigger_error(mysql_error().$sql); var_dump($username); var_dump(mysql_num_rows($result)); var_dump(mysql_client_encoding()); $username = mysql_real_escape_string($_POST['username']); $password = mysql_real_escape_string($_POST['password']); $sql = "SELECT * FROM users WHERE username = '$username' AND password = '$password'"; $result = mysql_query($sql) or trigger_error(mysql_error().$sql); var_dump($username); var_dump(mysql_num_rows($result)); var_dump(mysql_client_encoding()); mysql_set_charset("GBK"); $username = mysql_real_escape_string($_POST['username']); $password = mysql_real_escape_string($_POST['password']); $sql = "SELECT * FROM users WHERE username = '$username' AND password = '$password'"; $result = mysql_query($sql) or trigger_error(mysql_error().$sql); var_dump($username); var_dump(mysql_num_rows($result)); var_dump(mysql_client_encoding()); 


Result
 PHP version: 5.3.3 string(29) "ї\' OR username = username /*" int(3) string(6) "latin1" string(29) "ї\' OR username = username /*" int(3) string(6) "latin1" string(30) "\ї\' OR username = username /*" int(0) string(3) "gbk" 


, :
PDO . , mysql_set_charset() PDO , DSN 5.3 charset, , .
, , , , , PDO .

« LIKE»

LIKE : .
- . , LIKE , . LIKE . , % _, , — , . That's all.

Conclusion


mysql, , , , , , . , «» .
«» , , , , , .
, .

Update: mysql SQL , .

Source: https://habr.com/ru/post/148701/


All Articles