📜 ⬆️ ⬇️

Diagnosing Network Problems with Looking Glass

Looking glass

For many of our clients, the criterion for choosing server services is network connectivity. Network connectivity is understood as the degree of interaction of a network of one operator with the networks of other operators and, as a result, the number of routes and the number of intermediate nodes for Internet traffic.

It is convenient to test network connectivity using services called “Looking Glass” (translated from English as a mirror). They allow you to check the routing from a remote network. Many organizations have such services (information about all the Looking Glass of the world can be found, for example, here ).
')
Our company also has its “Looking Glass” service. It is located at: http://lg.selectel.ru . Diagnostics with the help of "Looking Glass" allows you to specify the malfunction and put forward a reasonable assumption about its cause. The diagnostic procedures that can be carried out using this service will be discussed below.

The interface of our “Looking Glass” is extremely simple:
Looking Glass Interface
All commands are executed from two routers located in St. Petersburg and in Moscow (you can select a router by checking the Router box; you can check from two routers at the same time).

Our “Looking Glass” service executes the following commands (the command is selected in the “Operation” dropdown menu):

After selecting a command, you need to specify the address of the destination host in the “Host” field (you can enter both an IP address and a DNS name) and click on the “Do it” button. Let us consider in more detail how each of these commands works and how their results should be interpreted.

Ping command


Using this command, you can check the availability of the specified host using the special ICMP protocol. The principle of its operation is as follows: having received the ping command, the router sends requests to the specified host, which, in turn, sends it back (to the address where it came from). From the results of the ping request, you can find out (1) whether the sent packets returned back and (2) how long (in milliseconds) it took for the packets to go back and forth.

The results of the ping command might look like this:
  PING 217.69.139.201 (217.69.139.201): 56 data bytes
 64 bytes from 217.69.139.201: icmp_seq = 0 ttl = 62 time = 8.776 ms
 64 bytes from 217.69.139.201: icmp_seq = 1 ttl = 62 time = 8.676 ms
 64 bytes from 217.69.139.201: icmp_seq = 2 ttl = 62 time = 10.024 ms

 --- 217.69.139.201 ping statistics ---
 3 packets transmitted, 3 packets received, 0% packet loss
 round-trip min / avg / max / stddev = 8.676 / 9.159 / 10.024 / 0.613 ms 

From this example, it can be seen that the destination node regularly responds to requests, there are no packet losses. Consider another example:

  PING www.jnto.go.jp (210.165.34.236): 56 data bytes
 64 bytes from 210.165.34.236: icmp_seq = 0 ttl = 241 time = 325.734 ms
 64 bytes from 210.165.34.236: icmp_seq = 1 ttl = 241 time = 325.894 ms
 64 bytes from 210.165.34.236: icmp_seq = 2 ttl = 241 time = 334.896 ms

 --- www.jnto.go.jp ping statistics ---
 3 packets transmitted, 3 packets received, 0% packet loss
 round-trip min / avg / max / stddev = 325.734 / 328.841 / 334.896 / 4.282 ms 

In this case, the results show that the destination node is available, there are no packet losses, but the response time is slightly increased due to geographical distance: the destination node is in Japan, and requests are sent from St. Petersburg.

Sometimes this situation may occur:
  PING 89.108.112.69 (89.108.112.69): 56 data bytes

 --- 89.108.112.69 ping statistics ---
 3 packets transmitted, 0 packets received, 100% packet loss 

The results of the command in this case show that none of the packets reached the target and that the node is most likely unavailable or not responding to ICMP requests.

Thus, using a ping request, you can check the connection to the remote host. The very fact of the successful return of requests gives reason to assume that all devices on the way from the router to the specified node are working normally. It should be noted that packet loss can occur even in the absence of malfunctions: it can be caused, for example, by network congestion. It often happens that routers give diagnostic packets a low priority. But if at least one of the sent packets is returned, this already indicates the presence of a connection and the availability of the node.

Based on the results of the ping request, it is impossible to make a clear conclusion about the presence of a fault. You can only put forward an assumption that needs additional verification using other tools.

Traceroute command


Problems with the availability of web services may also be due to technical problems at the intermediate nodes through which the packets pass on the way to the destination host. You can identify these problems using the traceroute command.

How does this command work? One of the characteristics of packets sent over the network is the time to live (English time to live, abbreviated to TTL) - the number of hops (English hop - jump, that is, the transfer of a packet from one router to another) or time in milliseconds, during which package may be online. Each router that processes the packet decreases the TTL value by one. When the TTL becomes zero, the packet is destroyed and the ICMP message time exceeded is sent to the sender (timed out). Initially, this mechanism was used to prevent infinite copying of network packets during an erroneous looping of the network.

Having received the traceroute command, the router sends a packet with TTL = 1 in the direction of the destination host. The node from which the response time exceeded comes is defined as the first hop (i.e., the first step on the way to the goal). Then, packets with TTL = 2,3,4,5 are sent alternately, and so on, until one of the packets reaches the destination node and receives a response from it.

As a result, a list of all intermediate nodes in the path of the packet from the router to the destination node is displayed. It may look, for example, like this:

  traceroute to 89.108.112.69 (89.108.112.69), 30 hops max, 40 byte packets
  1 sap-b2-link.telia.net (213.248.86.117) 6.278 ms 62.572 ms 9.668 ms
  2 hls-b2-link.telia.net (213.155.131.128) 9.812 ms hls-b2-link.telia.net (80.91.245.172) 12.487 ms hls-b2-link.telia.net (80.91.245.170) 12.088 ms
  3 hls-b1-link.telia.net (213.155.133.132) 15.486 ms 12.360 ms s-bb2-link.telia.net (213.155.133.74) 90.086 ms
  4 s-b2-link.telia.net (213.155.131.33) 22.196 ms s-b2-link.telia.net (213.155.133.143) 22.272 ms s-b2-link.telia.net (80.91.246.235) 21.197 ms
  5 s-b2-link.telia.net (213.155.133.137) 21.807 ms s-b2-link.telia.net (80.91.247.217) 22.231 ms vimpelcom-ic-152423-s-b2.c.telia.net (213.155 .129.118) 21.936 ms
  6 * * *
  7 * 81.211.13.162 (81.211.13.162) 58.058 ms 152.715 ms
  8 te6-1.rt1.dc5.agava.net (89.108.112.242) 40.044 ms 32.834 ms 39.235 ms
  9 te6-1.rt1.dc5.agava.net (89.108.112.242) 40.780 ms 43.640 ms *
 ten * * *
 eleven * * * 

What can be seen from the presented result?

In the ninth transition, an asterisk is displayed instead of one of the response times. This means that no reply was received to any of the packets sent. This can be caused, for example, by network overload. Or the fact that many routers drop low-priority ICMP packets. The appearance of such asterisks in the traceroute output is a typical situation that is not cause for concern.

If three asterisks are displayed for one of the intermediate routers, this means that no response was received from it. But one should not conclude that there is a malfunction on the basis of these asterisks alone. Their appearance may be due to a variety of reasons. For example, routers are often configured so that they “silently” discard obsolete packets. In this case, the packets safely pass to the next router.

The reason may be that the response packets from this router take too long, and Looking Glass stops waiting for them to arrive. If three asterisks are displayed for nodes at the end of the route (as in the example above), then this most likely indicates that the packets did not reach the destination node.

Interpreting traceroute findings is a more complex and delicate task than it might seem at first glance; You can read more about this here and here (in English).

BGP diagnostic commands


BGP (English Board Gateway Protocol, Border Gateway Protocol) is the main dynamic routing protocol on the Internet. It is designed to exchange information not between individual routers, but between autonomous systems. An autonomous system, as defined in RFC1930, is a system of IP networks and routers, managed by one or several operators and having a separate routing policy with the Internet. Actually, the Internet can be viewed as a set of interconnected autonomous systems.

Using the BGP protocol, autonomous systems communicate with each other: (1) the fact of their existence and (2) which networks can be obtained through them. They also collect information on how to get to other networks on the Internet. Receiving information about routes to the destination, they determine the best from them (based on network rules, not technical metrics) and add to their routing tables. That is why the BGP protocol is sometimes called the glue that binds the entire Internet.

BGP was created at a time when there were not many problems and dangers on the Internet that exist today, and it is characterized by increased vulnerability. Errors in the operation of the protocol, which can be caused both by technical problems of a particular autonomous system, and by deliberate actions of intruders, can lead to very serious consequences (for more information about this you can read, for example, here ). As a result of these errors, the traffic is redirected and / or dropped, without reaching the destination network - because of this, there are problems with network availability.

You can check BGP protocol operation using bgp route detail, bgp route terse and bgp summary commands. The result of the bgp route detail command is a routing table with a list of all autonomous stations in the packet path. The bgp terse command displays an abbreviated version of the routing table. The bgp summary command lists all autonomous systems with which our routers have connectivity.

Conclusion


In this article, we have provided only a brief description of the capabilities of our “Looking Glass” service. Diagnosing network problems is a topic that cannot be covered in a single publication. If you have any questions about the work of our «Looking Glass» - welcome to the comments. We will also be happy to constructive comments and suggestions on the service.

For those who can not comment on posts on Habré, we invite to our blog .

Source: https://habr.com/ru/post/191700/


All Articles