Attacks to web systems are based on input processing bugs. Using various software vulnerabilities and developer inattention, a hacker can compile data that will compromise the system during processing - shell,
unencrypted passwords , and even emulation of user actions in the browser (for example, in the banking system will transfer money from the account).
One of the popular attack vectors is various injections: html, javascript, sql. Therefore, many reputable sources recommend shielding the user input through various spells. In this post I will show one simple rule, following which you are guaranteed to solve the problem of injections.

')
The rule is simple:
Do not mix data formats!- Do not assign strings of different formats:
lblName.Text = person.Name;
Bad: HTML ← plain text
lblName.Text = HtmlEncode(person.Name);
Good: HTML ← HTML
- Do not connect lines of different formats:
"<span>" + person.Name + "</span>"
Bad: HTML + Plain Text + HTML
"<span>" + HtmlEncode(person.Name) + "</span>"
Good: HTML + HTML + HTML
- Be careful with nested formats:
label.Text = "<script>" + string.Format(script, string.Format(summary, string.Format(personUrl, id)) + "<script>";
Bad: HTML + JavaScript + URI + Text + HTML
label.Text = "<script>" + HtmlEncode(string.Format(script, JavaScriptEncode(string.Format(person.Summary, UrlEncode(string.Format(personUrl, id)))))) + "</script>";
Good: HTML + HTML (JavaScript (URI (text)) + HTML
Format
All data has a format. The simplest format is plain text where all characters are equal and carry no additional meaning. Any other format based on textual data be it csv, html, javascript has control characters - commas, tags, specials. characters. With the above mixing formats, dangerous situations arise when one format is treated as another.
For example, in ASP.NET for Label controls, the Text property is treated as HTML. Therefore, if we assign this property a value in another format (for example, plain text), then we will open a potential HTML vulnerability. That is why you need to use special adapters between formats that screen specials. characters. So the plain text to HTML adapter will turn the "<script>" into "& lt; script & gt;". A plain-text URI adapter will turn “example.com?<script>alert('pwned '); </ script>” into “example.com% 3F% 3Cscript% 3Ealert (' pwned ')% 3B% 3C% 2Fscript % 3E.
Similar adapters exist in almost all languages and platforms - .NET:
HtmlEncode ,
UrlEncode , PHP:
htmlentities ,
urlencode , server-side JavaScript:
escape, encodeURI & encodeURIComponent . For an SQL query, it is sufficient to use parameterized queries that will be processed by the database itself.
Here are 3 simple points that will make your life easier:
- Know the format of the used text variables, properties, fields. For convenience, you can add suffixes to the names: HeaderHtml, AvatarUrl.
- Make an adapter library: HtmlEncode, UrlEncode, JavaScriptEncode.
- Be careful when working with data in different formats.
UPD: About PHP injectionIn one of the discussions in this article, I was given the argument that my rule does not work with php injections. I want to show that it works. So, consider the banal code:
include($_GET['path']);
If I pass in the GET request "evil.php", then the script will safely launch the specified exploit. Now consider this situation in the style of my rule. From the
description it is clear that the include function expects a file path in the format of the current file system or a URI. And the content format of $ _GET ['path'] is plain text. How to avoid injections? Write pathEncode which will lead this text to the correct format of the path, or if it is impossible, it will return an empty string (for example, it will check the white list of allowed paths). In this case
include(pathEncode($_GET['path']));
or it will crash with an exepshin or load the file. The main thing to clearly understand what format expects include and what format comes from the user.