What actually happens when a user types google.com into a browser

This article is an attempt to answer the old question for interviews: “What happens when you type google.com in the address bar and press Enter?” We will try to understand this in as much detail as possible without missing a single detail.

Note: the publication is based on the content of the repository What happens when ...
')
Submitted content is replete with a large number of terms, some of them may contain various inaccuracies. If you find any mistake in our translation - write a personal message, and we will fix it.

We moved the translation to the GitHub repository and sent the Pull Request to the author of the material - leave your changes to the text, and together we can significantly improve it.

1. "g" key pressed

The rest of this article contains information about how the physical keyboard works and how the operating system interrupts. But a lot of things are happening and besides this - when you press the "g" key, the browser receives an event and the auto-substitution mechanism starts. Depending on the algorithm of the browser and its mode (whether the function is incognito) in the drop-down box under the URL line, the user will be offered a certain number of options for automatic substitution.

Most auto-substitution algorithms rank recommendations based on search history and bookmarks left. Some browsers (for example, Rockmelt) even offer profiles of friends on Facebook. When the user plans to type “google.com” in the address bar, none of the above is important, but a large amount of code will be executed, and the recommendations will be updated with each new letter. The browser may offer to go to google.com before the user enters the entire address.

2. “Enter” key pressed to the end

As a certain zero point, you can choose the moment when the Enter key on the keyboard is pressed to the end and is in the lower position. At this point, the electrical circuit of this key closes and a small amount of current is sent through the keyboard's electrical circuit, which scans the state of each key switch and converts the signal to the integer key code (in this case, 13). The keyboard controller then converts the key code for transmission to the computer. As a rule, the transfer now takes place via USB or Bluetooth, and earlier the keyboard was connected to a computer using PS / 2 or ADB connectors.

In the case of a USB keyboard:

The USB keyboard contour requires 5 volts of power, which comes through the USB controller on the computer.
The generated key code is stored in the keyboard's internal memory register, which is called the “endpoint” (endpoint).
The computer's USB controller polls this endpoint every 10 microseconds and receives the key code stored there.
This value is then transferred to the USB SIE (Serial Interface Engine) for conversion into one or more USB-packets, which are formed using the low-level USB protocol.
These packets are then forwarded using various electrical signals via D + and D- contacts with a maximum speed of 1.5 Mb / s - since the HID-devices (Human Interface Device) have always been “low-speed”.
This serial signal is further decoded in the computer's USB controller and interpreted by the universal HID device driver (keyboard). Then the key code value is transferred to the “iron” level of the operating system abstraction.

In the case of a virtual keyboard (touchscreen):

When a user applies a finger to a modern capacitive touchscreen, a small amount of current is transmitted to the finger. This closes the circuit through the electrostatic field of the conductive layer and creates a voltage drop at this point of the screen. The screen controller then initiates an interrupt, indicating the “click” coordinate.
Then, the mobile OS notifies the current open application about the click event in one of the GUI elements (in this case, the virtual keyboard buttons).
A virtual keyboard triggers a software interrupt to send a “key pressed” message back to the OS.
This interrupt notifies the current open application of the occurrence of a keystroke event.

2.1 An interrupt has occurred [not for USB keyboards]

The keyboard sends signals to its “interrupt request line” (IRQ), which is then mapped to an “interrupt vector” (integer) interrupt controller. The processor uses the “Interrupt Descriptor Table” (IDT) to map interrupt vectors to functions (“interrupt handlers”) of the kernel. When an interrupt occurs, the processor (CPU) updates the IDT with the interrupt vector and starts the corresponding handler. Thus, the core comes into play.

2.2 (On Windows) `WM_KEYDOWN` message sent to application

HID sends the event of the key pressed to the KBDHID.sys driver, which converts it into a scan code (scancode). In this particular case, the scan code is VK_RETURN ( 0x0D ). The KDBHID.sys driver KDBHID.sys associated with the KBDCLASS.sys driver (keyboard class driver). He is responsible for the safe handling of all keyboard input. In the future, this driver causes Win32K.sys (after a possible message transfer through installed third-party keyboard filters). All this happens in kernel mode.

Win32K.sys determines which window is currently active using the GetForegroundWindow() function. This API provides processing of the address bar window in the browser. Then the main “message pump” of Windows calls SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam) . lParam is a bitmask that indicates further information about a key press: a repeat counter (0 in this case), the current scan code (may depend on the OEM, but VK_RETURN usually does not depend on it), whether or not the additional keys (for example, Alt, Shift, Ctrl - in our case were not) and some other data.

The Windows API has the SendMessage function, which places a message in a queue for a particular window handler ( hWnd ). After that, the main message handling function ( WindowProc ) assigned to the hWnd handler is hWnd to process all messages in the queue.

The window ( hWnd ) currently active is the processing control, and in this case, WindowsProc has a handler for the WM_KEYDOWN messages. This code examines the third parameter that entered SendMessage (wParam) and, since it is VK_RETURN , understands that the user has pressed the ENTER key.

2.3 (On OS X) `NSEVent KeyDown` sent to application

The interrupt signal triggers the interrupt event in the keyboard I / O Kit driver. The driver translates the signal into a keyboard code, which is then passed to an OS X process called WindowServer . As a result, WindowsServer sends the event to any suitable (active or “listening”) application via the Mach port in which the event is placed in the queue. Events can then be read from this queue by threads with sufficient privileges to call the mach_ipc_dispatch function. This most often occurs and is processed using the NSApplication main loop via NSEvent in the NSEventype KeyDown .

2.4 (GNU / Linux) Xorg server listens to keyboard codes

In the case of a graphical X server, the generic evdev event driver will be used to get the keystroke. Reassignment of keyboard codes to scan codes is performed using special rules and X Server cards. When the scan-key mapping is completed, the X server sends the symbol to the window manager (DWM, metacity, i3), which then sends it to the active window. The graphics API of the window that received the symbol prints the corresponding character of the font in the desired field.

3. Parsing URL

The browser now has the following URL information:

Protocol "HTTP"
Use "Hyper Text Transfer Protocol"

Resource "/"
Show home (index) page

3.1 Is this a URL or search query?

When a user does not enter a protocol or domain name, the browser feeds what the person typed to the default search engine. Often, a special text is added to the URL, which allows the search engine to understand that the information is transmitted from the URL string of a specific browser.

3.2 HSTS checklist

The browser checks the list of "preloaded HSTS (HTTP Strict Transport Security)". This is a list of sites that require them to be accessed only over HTTPS.
If the desired site is in this list, the browser sends it a request via HTTPS instead of HTTP. Otherwise, the initial request is sent over HTTP. (At the same time, the site can use the HSTS policy, but not be in the HSTS list - in this case, the first HTTP request will be sent a response stating that it is necessary to send HTTPS requests. However, this can make the user vulnerable to downgrade attacks - in order to Avoid browsers and include HSTS list).

3.3 Converting Non-ASCII Unicode Characters to a Host Name

The browser checks the host name for characters other than az , AZ , 0-9 , - , or . .
In the case of the google.com domain name, there will be no problems, but if the domain contained non-ASCII characters, the browser would apply Punycode for this part of the URL.

4. DNS definition

The browser checks for a domain in its cache.
If the domain is not there, then the browser calls the library function gethostbyname (different in different OS) to find the desired address.
Before searching for a domain by DNS, gethostbyname tries to find the necessary address in the hosts (its location is different in different OS).
If the domain is not cached anywhere and is not in the hosts , gethostbyname sends a request to the network DNS server. As a rule, this is a local router or DNS server of an Internet provider.
If the DNS server is on the same subnet, then an ARP request is sent to this server.
If the DNS server is on a different subnet, then the ARP request is sent to the IP address of the default gateway.

4.1 The process of sending an ARP request

In order to send a broadcast ARP request, you need to find the target IP address, and also know the MAC address of the interface that will be used to send the ARP request.

The ARP cache is checked for each target IP address - if the address is in the cache, then the library function returns the result: Target IP = MAC .

If there is no cache entry:

The routing table is checked — this is done to find out if the desired IP address is in any of the subnets of the local table. If it is there, then the request is sent via the interface associated with this subnet. If no address is found in the table, the default gateway subnet interface is used.
The MAC address of the selected network interface is determined.
An ARP request is sent (second stack level):

ARP request:

Sender MAC: interface:mac:address:here
Sender IP: interface.ip.goes.here
Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
Target IP: target.ip.goes.here

Depending on what "hardware" is located between the computer and the router (router):

Direct connection:

If the computer is directly connected to the router, then this device sends an ARP reply (ARP Reply).

Between them a hub (hub):

If the computer is connected to a network hub, then this hub sends a broadcast ARP request from all its ports. If the router is connected via the same “wire”, it will send an ARP response.

Between them switch (switch):

If the computer is connected to a network switch, then this switch will check the local CAM / MAC table to find out which port has the correct MAC address in it. If the address in the table is not there, then it will re-send the broadcast ARP request to all ports.
If there is an entry in the table, the switch will send an ARP request to the port with the required MAC address.
If the router is “on the same line” with the switch, then it will respond (ARP Reply).

ARP response:

Sender MAC: target:mac:address:here
Sender IP: target.ip.goes.here
Target MAC: interface:mac:address:here
Target IP: interface.ip.goes.here

Now the network library has the IP address of either the DNS server or the default gateway, which can be used to resolve the domain name:

Port 53 opens to send a UDP request to the DNS server (if the size of the response is too large, TCP will be used).
If the DNS server local or on the provider’s side “does not know” the required address, then a recursive search is requested that goes through the list of upstream DNS servers until the SOA record is found and then the result is returned.

5. Opening Socket

When the browser receives the IP address of the destination server, it takes this information and data about the port used from the URL (port 80 for HTTP, 443 for HTTPS) and calls the socket function of the system library and requests the TCP socket stream — AF_INET and SOCK_STREAM .

This request first passes through the transport layer where the TCP segment is collected. The destination port is added to the header, the source port is selected from the dynamic kernel pool ( ip_local_port_range in Linux).
The resulting segment is sent to the network layer, which adds an additional IP header. Also includes the IP address of the destination server and the address of the current machine - after that the packet is formed.
The packet is transmitted to the data link layer. A frame header is added, including the MAC address of the network card (NIC) of the computer, as well as the MAC address of the gateway (local router). As in the previous steps, if the kernel does not know anything about the MAC address of the gateway, then a broadcast ARP request is sent to find it.

At this point, the packet is ready for transmission via:

In the case of the Internet connection of most private users or small companies, the package will be sent from a computer via a local network and then via a modem ( MOdulator/DEModulator ), which translates digital units and zeros into an analog signal suitable for transmission over a telephone line, cable or wireless phone connections. On the other side of the connection is another modem that converts the analog signal into digital data and sends them to the next network node , where the sender and receiver data are further analyzed.

Eventually, the packet will reach the router managing the local subnet. Then he will continue to travel from one router to another until he gets to the destination server. Each router in the path will extract the destination address from the IP header and send the packet to the next hop. The value of the TTL (time to live) field in the IP header will decrease each time after passing each router. If the value of the TTL field reaches zero, the packet will be dropped (this will also happen if the router does not have a place in the current queue - for example, due to network congestion).

During a TCP connection, there are many such requests and responses.

5.1 TCP Connection Life Cycle

a. The client selects the initial sequence number (ISN) and sends the packet to the server with the SYN bit set to open the connection.

b. The server receives a packet with a SYN bit and, if it is ready to establish a connection, then:

Choose your own starting sequence number;
Sets the SYN-bit to indicate the selection of the initial sequence;
Copies the client's ISN +1 in the ACK field and adds the ACK flag to indicate acknowledgment of receipt of the first packet

c. The client confirms the connection by sending a packet:

Increases the number of its initial sequence;
Increases receipt confirmation number;
Sets the ACK field.

d. Data is transmitted as follows:

When one side sends N bytes, it increments the SEQ field by that number.
When the second party acknowledges receipt of this packet (or packet chain), it sends an ACK packet in which the value of the ACK field equals the last sequence received.

e. Connection closure:

The party that wants to close the connection sends a FIN packet;
The other side confirms FIN (using ACK) and sends its own FIN packet;
The initiator of the termination of the connection confirms the receipt of FIN sending its own ACK.

6. TLS handshake

The client computer sends a message to the ClientHello server with its version of the TLS protocol , a list of supported encryption algorithms and data compression methods.
The server responds to the client with a ServerHello message containing the TLS version, the selected encryption method, the selected compression methods, and the public certificate of the service signed by the certification authority. The certificate contains a public key that the client will use to encrypt the rest of the handshake procedure until the symmetric key is agreed.
The client confirms the server certificate with its list of certificate authorities. If the certificate is signed by the center from the list, then the server can be trusted, and the client generates a string of pseudo-random bytes and encrypts it with the server's public key. These random bytes can be used to define a symmetric key.
The server decrypts random bytes using its private key and uses these bytes to generate its copy of the symmetric master key.
The client sends the Finished message to the server, encrypting the transfer hash with a symmetric key.
The server generates its own hash, and then decrypts the hash received from the client to check if it matches its own. If a match is found, the server sends its own Finished response to the client, also encrypted with a symmetric key.
After that, the TLS session transmits application data (HTTP), encrypted with a validated symmetric key.

7. HTTP protocol

If the browser used was created by Google, then instead of sending an HTTP request to get the page, it will send a request to try to “negotiate” with the server about the “upgrade” of the protocol from HTTP to SPDY (“speed”).

If the client uses the HTTP protocol and does not support SPDY, then it sends to the server a request of the following form:

GET / HTTP/1.1
Host: google.com
Connection: close
[ ]

where [ ] is a series of key: value pairs broken by line breaks. (Here it is assumed that there are no errors in the browser used that violate the HTTP specification. It is also assumed that the browser uses HTTP/1.1 , otherwise it may not include the Host header in the request and the version sent in response to the GET request may be HTTP/1.0 or HTTP/0.9 ).

HTTP/1.1 defines the option to close the connection ("close") for the sender - with its help, the connection is closed after the response is completed. For example:

Connection: close

HTTP/1.1 applications that do not support persistent connections are required to include the “close” option in each message.

After sending the request and headers, the browser sends a single blank line to the server, signaling that the content of the message has ended.

The server responds with a special code that indicates the status of the request and includes the response of the following form:

200 OK
[ ]

After that, an empty line is sent, and then the remaining content of the HTML page of www.google.com . The server can then close the connection, or, if requested by the headers sent by the client, keep the connection open for use by the following requests.

If the HTTP headers sent by the web browser include information that is enough for the server to determine the version of the file cached in the browser and this file has not changed since the last request, the response can take the following form:

304 Not Modified
[ ]

and, accordingly, no content is sent to the client, instead the browser “gets” HTML from the cache.

After parsing HTML, the browser (and server) repeats the download process for each resource (images, styles, scripts, favicon.ico, etc.) referenced by the HTML page, but the address of each request changes with GET / HTTP/1.1 on GET /$( URL www.google.com) HTTP/1.1 GET /$( URL www.google.com) HTTP/1.1 .

If HTML refers to a resource hosted on a domain other than google.com, the browser returns to steps that include resolving the domain name, and then re-runs the process to its current state, but for a different domain. The Host header in the request instead of google.com will be set to the desired domain name.

7.1 Processing HTTP requests on the server

HTTPD (HTTP Daemon) is one of the server-side request / response processing tools. The most popular HTTPD servers are Apache or Nginx for Linux and IIS for Windows.

- HTTPD (HTTP Daemon) receives a request.

- The server parses the request according to the following parameters:

HTTP request method ( GET , POST , HEAD , PUT or DELETE ). In the case of the URL that the user typed in the browser, we are dealing with a GET request.
Domain In our case, google.com.
Requested paths / pages, in our case - / (no requested paths, / is the default path).

- The server checks the existence of a virtual host that corresponds to google.com.

- The server verifies that google.com can accept GET requests.

- The server checks if the client has the right to use this method (based on IP address, authentication, etc.).

- If a rewrite module is installed on the server ( mod_rewrite for Apache or URL Rewrite for IIS), then it matches the request with one of the configured rules. If a matching rule is found, the server uses it to rewrite the request.

- The server finds the content that matches the request, in our case it will examine the index file.

- Next, the server parses ("parsit") the file using a handler. If Google is working in PHP, the server uses PHP to interpret the index file and sends the result to the client.

8. Behind the scenes of the browser

The task of the browser is to show the user selected web resources, requesting them from the server and displaying it in the browser window. Typically, these resources are HTML documents, but it can be PDF, images or other content. The location of the resources is determined by the URL.

The way that the browser uses to interpret and display HTML files is described in the HTML and CSS specifications. These documents are developed and maintained by the World Wide Wib Consortium (W3C) consortium, which deals with web standardization.

Browser interfaces are very similar to each other. They have a large number of identical elements:

The address bar where URLs are inserted;
Return buttons to the previous and next page;
The ability to create bookmarks;
Buttons for refreshing the page (refresh) and stopping the loading of current documents;
Button "home", which returns the user to the home page.

High-level browser structure

The browser includes the following components:

User Interface : This includes the address bar, forward / back buttons, bookmark menu, and so on. This includes all elements except the window, which actually displays a web page.
Browser engine : Distributes actions between the rendering engine and the user interface.
Rendering engine : Responsible for displaying the requested content. , HTML, «» HTML CSS, .
: , HTTP-, .
(UI) : , - .
JavaScript : JavaScript-.
: (, cookie). , , localStorage , IndexedDB , WebSQL FileSystem .

9. HTML

. , 8. HTML- .

(«parse tree») — DOM- . DOM — Document Object Model . HTML- HTML- « » (, JavaScript-). «».

HTML- «» ( ). There are several reasons for this:

;
, HTML.
. , , , HTML (, , document.write() ) , .

, HTML. HTML5 .

: .

, (, , ).

, , «» : , . « complete » (« load »).

: «Invalid Syntax» , «» .

10. CSS

CSS-, <style> «style» c « CSS ».
CSS- StyleSheet , CSS CSS.
CSS , .

11.

DOM- CSS- « » (Render Tree Frame Tree).
— , , .
- ( ).
- — , , .
( ).
, . .
, . ( ) .
.
(/) . (GPU) D2D/SkiaGL.
, , .
-, ( , iframe-, addon-).
Direct3D/OpenGL . GPU .

12. GPU

(GPU).
, GPU , .

13. -

, JavaScript- ( Google) ( ). Flash Java ( Google). , , .

Source: https://habr.com/ru/post/254825/

All Articles