📜 ⬆️ ⬇️

Simple language about HTTP

We offer you a description of the main aspects of the protocol HTTP - network protocol, from the beginning of the 90s to this day allowing your browser to load web pages. This article is written for those who are just starting to work with computer networks and engage in the development of network applications, and for whom it is still difficult to read the official specifications on their own.

HTTP is a widespread data transfer protocol, originally intended for the transfer of hypertext documents (that is, documents that may contain links that allow you to organize the transition to other documents).

HTTP abbreviation stands for HyperText Transfer Protocol , the “hypertext transfer protocol.” In accordance with the OSI specification, HTTP is an application (upper, 7th) layer protocol. The current version of the protocol, HTTP 1.1, is described in RFC 2616 .
')
The HTTP protocol assumes the use of a client-server data transfer structure. The client application forms a request and sends it to the server, after which the server software processes the request, generates a response and sends it back to the client. After that, the client application can continue to send other requests that will be processed in the same way.

The task that is traditionally solved using the HTTP protocol is the exchange of data between a user application accessing web resources (usually a web browser) and a web server. At the moment, it is thanks to the HTTP protocol that the World Wide Web is provided.

Also, HTTP is often used as an information transfer protocol for other application layer protocols, such as SOAP, XML-RPC, and WebDAV. In this case, they say that the HTTP protocol is used as a "transport".

The API of many software products also implies the use of HTTP for data transfer — the data itself can be in any format, for example, XML or JSON.

As a rule, data transmission via HTTP protocol is carried out through TCP / IP connections. The server software usually uses TCP port 80 (and, if the port is not explicitly specified, then usually the client software defaults to port 80 for open HTTP connections), although it can use any other.

How to send an HTTP request?


The easiest way to deal with the HTTP protocol is to try to manually access some web resource. Imagine that you are a browser, and you have a user who really wants to read the articles of Anatoly Alizar.

Suppose that he entered the following in the address bar:

http://alizar.habrahabr.ru/

Accordingly, as a web browser, you now need to connect to the web server at alizar.habrahabr.ru.

To do this, you can use any suitable command line utility. For example, telnet:

telnet alizar.habrahabr.ru 80

Immediately I will clarify that if you suddenly change your mind, then press Ctrl + "]", and then enter - this will allow you to close the HTTP connection. In addition to telnet, you can try nc (or ncat) - to taste.

After you connect to the server, you need to send an HTTP request. This, incidentally, is very easy - HTTP requests can consist of only two lines.

In order to form an HTTP request, you need to make a starting line, and also specify at least one header — this is the Host header, which is required and must be present in every request. The fact is that the domain name is converted to an IP address on the client side, and, accordingly, when you open a TCP connection, the remote server does not have any information about what address was used to connect: it could be, for example , address alizar.habrahabr.ru, habrahabr.ru or m.habrahabr.ru - and in all these cases, the answer may differ. However, in fact, the network connection in all cases opens with the node 212.24.43.44, and even if initially when opening the connection not this IP address was specified, but any domain name, the server is not informed about this in any way - and that is why this address is necessary pass in the Host header.

The starting (initial) query string for HTTP 1.1 is composed as follows:

HTTP URI Method / Version

For example (such a starting line may indicate that the main page of the site is requested):

GET / HTTP/1.1

The method (in the English-language subject literature the word method is used , and also sometimes the word verb - “verb”) is a sequence of any characters except for controllers and separators, and determines the operation that needs to be performed with the specified resource. The HTTP 1.1 specification does not limit the number of different methods that can be used, however, in order to comply with common standards and maintain compatibility with the widest possible range of software, as a rule, only some of the most standard methods are used, the meaning of which is uniquely disclosed in the protocol specification.

URI ( Uniform Resource Identifier , Uniform Resource Identifier ) - path to a specific resource (for example, a document) over which an operation must be performed (for example, in the case of using the GET method, a resource is assumed). Some requests may not relate to any resource, in this case, instead of a URI, an asterisk (an asterisk, the “*” character) may be added to the starting line. For example, it may be a request that relates to the web server itself, and not to any particular resource. In this case, the starting line might look like this:

OPTIONS * HTTP/1.1

The version determines according to which version of the HTTP standard the request is made. Specified as two numbers separated by periods (for example, 1.1 ).

In order to access the web page at a specific address (in this case, the path to the resource is “/”), we should send the following request:

GET / HTTP/1.1
Host: alizar.habrahabr.ru

When doing this, keep in mind that you should use the Carriage Return character followed by the Line Feed character to transfer the line. After the last header is declared, the sequence of characters for line breaks is added twice.

However, in the HTTP specification, it is recommended to program the HTTP server so that, when processing requests, the LF character is perceived as an interline delimiter, and the previous CR character, if any, is ignored. Accordingly, in practice, most servers will correctly process such a request, where the headers are separated by the LF symbol, and it is also added twice after the last header is announced.

If you want to send a request in strict accordance with the specification, you can use the control sequences \ r and \ n:

echo -en "GET / HTTP/1.1\r\nHost: alizar.habrahabr.ru\r\n\r\n" | ncat alizar.habrahabr.ru 80

How to read the answer?


The start line of the response has the following structure:

HTTP / Version Status Code Explanation

The protocol version here is set the same as in the request.

The status code ( Status Code ) - three digits (the first of which indicates the status class), which determine the result of the request. For example, if the GET method was used and the server provides a resource with the specified identifier, then this state is set using code 200. If the server reports that there is no such resource, 404. If the server reports that it does not can provide access to this resource due to lack of necessary privileges for the client, then 403 code is used. The HTTP 1.1 specification defines 40 different HTTP codes, and the protocol can be expanded and additional state codes can be used.

Explanation of the status code ( Reason Phrase ) - a text (but not including the characters CR and LF ) explanation of the response code, designed to facilitate the reading of the response by a person. The explanation may not be considered by the client software, and may also differ from the standard software in some server software implementations.

After the start line, the headers follow along with the response body. For example:

 HTTP/1.1 200 OK Server: nginx/1.2.1 Date: Sat, 08 Mar 2014 22:53:46 GMT Content-Type: application/octet-stream Content-Length: 7 Last-Modified: Sat, 08 Mar 2014 22:53:30 GMT Connection: keep-alive Accept-Ranges: bytes Wisdom 

The response body follows two line breaks after the last header. To determine the end of the response body, the value of the Content-Length header is used (in this case, the response contains 7 octal bytes: the word "Wisdom" and a newline character).

But according to the request we made earlier, the web server will return the answer not with code 200, but with code 302. Thus, it informs the client that it is necessary to access this resource at a different time at a different address.

See for yourself:

 HTTP/1.1 302 Moved Temporarily Server: nginx Date: Sat, 08 Mar 2014 22:29:53 GMT Content-Type: text/html Content-Length: 154 Connection: keep-alive Keep-Alive: timeout=25 Location: http://habrahabr.ru/users/alizar/ <html> <head><title>302 Found</title></head> <body bgcolor="white"> <center><h1>302 Found</h1></center> <hr><center>nginx</center> </body> </html> 

A new address has been provided in the Location header. Now the URI (resource identifier) ​​has changed to / users / alizar /, and this time you need to contact the server at habrahabr.ru (however, in this case it is the same server), and specify it in the Host header.

I.e:

GET /users/alizar/ HTTP/1.1
Host: habrahabr.ru

In response to this request, the Habrahabr web server will already give a response with code 200 and a fairly large document in HTML format.

If you have already managed to get used to the role, then you can now read the HTML code received from the server, take a pencil and a notebook, and draw a profile of Alizar - in principle, this would be what your browser would do now.

What about security?


HTTP itself does not imply the use of encryption to transfer information. However, for HTTP there is a common extension that implements the packaging of transmitted data in the SSL or TLS cryptographic protocol.

The name of this extension is HTTPS ( HyperText Transfer Protocol Secure ). For HTTPS connections, TCP port 443 is commonly used. HTTPS is widely used to protect information from interception, and also, as a rule, protects against man-in-the-middle attacks — if the certificate is verified on the client, and at the same time, the private key of the certificate was not compromised, the user did not confirm the use of an unsigned certificate, and certificates of the attacker were not implemented on the user's computer.

Currently, HTTPS is supported by all popular web browsers.

Are there any additional features?


The HTTP protocol provides a fairly large number of possibilities for expansion. In particular, the HTTP 1.1 specification assumes the possibility of using the Upgrade header to switch to exchanging data using another protocol. A request with this header is sent by the client. If the server needs to make a transition to exchange data via another protocol, then it can return a response with the status “426 Upgrade Required” to the client, in which case the client can send a new request, already with the Upgrade header.

This feature is used, in particular, to organize data exchange using the WebSocket protocol (the protocol described in the RFC 6455 specification, which allows both parties to transmit data at the right time, without sending additional HTTP requests): the standard “handshake” is reduced to sending An HTTP request with an Upgrade header that has the value “websocket” to which the server returns a response with the state “101 Switching Protocols”, and then either side can start transmitting data via the WebSocket protocol.

Something else, by the way, use?


At the moment there are other protocols designed to transfer web content. In particular, the SPDY protocol (pronounced as the English word speedy , is not an abbreviation) is a modification of the HTTP protocol, the purpose of which is to reduce delays when loading web pages, and also to provide additional security.

The speed increase is provided by compressing, prioritizing and multiplexing the additional resources required for a web page so that all data can be transferred within a single connection.

Published in November 2012 draft of the HTTP 2.0 protocol specification (the next version of the HTTP protocol after version 1.1, the final specification for which was published in 1999) is based on the SPDY protocol specification.

Many of the architectural solutions used in the SPDY protocol, as well as in other proposed implementations that the httpbis working group considered during the preparation of the draft HTTP 2.0 specification, have already been obtained during the development of the HTTP-NG protocol, but work on the HTTP-NG protocol was discontinued in 1998.

Currently SPDY protocol support is available in Firefox, Chromium / Chrome, Opera, Internet Exporer and Amazon Silk browsers.

And what, everything?


In general, yes. It would be possible to describe specific methods and headers, but in fact this knowledge is needed rather if you are writing something specific (for example, a web server or some kind of client software that communicates with the servers via HTTP), and for basic understanding of the principle of the protocol is not required. In addition, all this you can very easily find through Google - this information is in the specifications, and Wikipedia, and many more.

However, if you know English and want to delve into the study of not only HTTP itself, but also those used to transmit TCP / IP packets, I recommend reading this article.

And, of course, do not forget that any technology becomes much simpler and clearer when you actually start using it.

Good luck and fruitful learning!

Source: https://habr.com/ru/post/215117/


All Articles