The mistakes of our youth. Or how a network engineer learns the nature of the console.

Hello! Today I would like to talk about the standard errors that we make when setting up network equipment. I think everyone knows the feeling when you enter another command into the console and suddenly you realize that the console has stopped responding, and the equipment on the other side of the Earth (which you just set up) is no longer pinged. At this moment the breath stops for a second, you begin to clearly hear the beating of your heart. Harsh hearing focuses entirely on the telephone call that is about to be heard. The body shrinks and you realize that this is the end. The search, of course, is not the end, but the beginning of the heroic overcoming of the situation in which you have just driven yourself. And then there will be a feat, a story about your feat and, perhaps, even a standing ovation from colleagues. And most importantly, you once again give yourself a vow that you will never do this again. Many will argue to me that this is not about them and that they, real professionals, do not allow this. This is beautiful, and I am genuinely happy for them. But real professionals are not born and do not become overnight.

Today I would like to consider the situations that are told in the courses of a young fighter with a console, but which begin to have real meaning only after “stepping” on them. Initially I wanted to dwell on some specific mistakes when setting up Cisco equipment, but in the process of preparing the article I decided to consider more general points. Someone they seem trivial, someone will sneak in to remember that he once did it, and someone, perhaps, will take advantage of them so as not to get into an unpleasant situation.
')
Let's do it quietly

Need to change something in the network. We decide to do it quietly, without warning anyone. Suddenly, he will carry it over and will not have to agree on a reconfiguration, spend time on it. After all, if you reconcile the reconfiguration in the network, they may, first, ask them to carry out these works during off-hours. And, secondly, in most cases, when coordinating even small-scale works, it is required to prepare a detailed regulation indicating the risks, time intervals, instructions for rolling back to the previous working configuration and plan for testing the new configuration. As a result, in order to apply one or two teams you need to do a lot of preparatory work. And so you can immediately set up and forget. It is necessary to change a small trifle. But, according to the well-known law, to reconfigure quietly is not always possible. And it's good if everything happened quickly and there are few victims. Worse, if the general director of the company learns about your small network reconfiguration.

Recommendation

Do not rush, it is better to weigh everything again and, if possible, warn others. So to say, forewarned is forearmed. And if you’re not completely sure that everything will go smoothly, move the work to the least painful time. And even if the work is carried out during off-hours, it is better to notify others of this anyway. Perhaps someone just today decided to stay at work and work late.

One hundred troubles - one reset

You are fully confident in your abilities, you decide to change some important settings on the equipment remotely. Moreover, the device is located in a distant and beautiful city of our Motherland. Do not go there for this. Although it would be possible, but the authorities will not understand. Therefore, we are armed with remote access. We select the time when small network interruptions are permissible. Connect. Klats, klats. And then, unexpectedly, we lose access to the device. There can be many reasons: from banal inattention (in the address instead of one, he indicated a two), to a lack of awareness of the work of this or that technology (well, I didn’t read it, it happens to everyone). It does not matter, now you need to ask someone to reboot the device. After all, I did not manage to save the configuration. But it turns out that remotely reboot the device does not work. Trite there now is night and no one is there, as a completely different time zone. We'll have to wait for the morning. And the morning will begin there when we have night. In general, "there is no peace for the bad head and the rest of the body."

Recommendation

You can use the automatic device reboot command after a certain time:

cbs-rtr # reload in 10
* In this example, the router will reboot in 10 minutes

If something goes wrong, and you lose access to the device, after a specified time it will reboot and you can start all over again. Of course, this command can be used only if the device is allowed to reboot.

Do not forget to disable the automatic restart of the device. And then it may be a little surprise in the process of further work:

cbs-rtr # reload cancel

Now how about you?

Very often, in the process of solving a problem, you need to run a debugger on the device (the more familiar word “debag” will be used later). Sometimes the debug we need generates a lot of messages, especially if it runs on a device that actively processes its bread. So, we start debug, having previously included its display in a terminal session. Well, of course, we want to immediately find the error, analyzing on the go. But then the console starts to slow down. You suddenly forget that you have had some small problem before. And well, if with difficulty it is possible to drive the cherished “undebug all”. Worse, if the only thing you can observe is a static picture of a hung console.

Recommendation

We recommend debugging output exclusively to the buffer. In this case, pre-setting its size. It should not be small (otherwise not all the cherished lines will get there), and also not too large (otherwise our device may fail, since all log messages are stored in the operational memory of the device).

cbs-rtr (config) #logging console informational // Remove logging debag messages to console
cbs-rtr (config) #logging buffered debugging // Turn on debug logging in device buffer
cbs-rtr (config) #logging buffered 100000 // Set the size of the buffer, in our case it is 100,000 bytes

When there are a lot of log records from debag, you can configure the transfer of this information to the most common syslog server so that they are not stored at all on the device:

cbs-rtr (config) #logging host 10.0.0.1 // Specify the IP address of the syslog server
cbs-rtr (config) #logging trap debugging // Enable sending debag messages to syslog server

So what is this about us?

The network as a living organism changes all the time. New services appear, new equipment is connected, the old one is disconnected. All the time you have to make some changes. Remember all impossible. Therefore, sometimes, having connected to the device, you lose a lot of time just to remember what this route was once configured for. Or why you need this line in the access list (ACL). The situation is aggravated if the network is managed by several people. So what to do? The first answer is to document. I agree, without this nowhere. But in real life, oh, how hard it is to keep documentation up to date. In particular, if it is detailed. Therefore, as a kind of compromise as a “reminder”, the device configuration itself will do.

Recommendation

In the process of setting up various entities in the configuration, we recommend using "talking" names. For example, if we make an ACL that will hang on the external interface of the router in the in direction, we can call it, for example, “FW-OUTSIDE-IN”. Further, looking through the configuration, and seeing in it an ACL with that name, it will immediately become clear why he lives here. Such names can also be made for class-map, policy-map, object-group, route-map, etc.

The second point: do not forget to add a description to one or another line in the configuration. This can be done, for example, using the following commands:

for the interface - description;
for crypto map - description;
for route-map - description;
for ACL - remark;
for static route - name.

Nothing is more permanent than temporary.

When solving any problems, sometimes you have to run various debugs, capture traffic (use capture), mirror traffic (SPAN / RSPAN / ERSPAN), use test ACLs, etc. And as soon as the problem is solved, relief comes (sometimes even euphoria, you have just become the master of tsisok), and there is already no particular desire to deal with all the time settings. This is aggravated even in the case when the struggle with the problem goes on several fronts at once. And the glory of the victory won on one of them does not allow us to pay attention to the other fronts, where debags thrown into attack, Kepcher and other means heroically fight the already non-existent enemy. I think it is not worth much to paint what this might lead to. At least to the occurrence of additional load on the device and even on the entire network, which can then play against us at the most inappropriate moment.

Another side of dealing with problems is temporary routing schemes or traffic handling, disabled functions (when we try to determine the problem using the exception method), etc. It is not always easy to later remember what you tried to do in a panic or what to disable to solve the problem on the network.

Recommendation

Do not forget to disable debug, capture, delete test ACL and other temporary configuration. It is necessary to activate all functions that were disabled while searching for a problem.

Problems again…

Sometimes the network fails. Yes, and it happens. The main thing is to be less. At this moment it is very important not to panic, but with a cold head to try to find a problem. If you find out the reason quickly enough, this incident passes with little or no consequences, without disturbing your psychological state. It is quite another thing when all attempts to understand why it is not working and correct the situation, are in vain. In this case, panic moods increase in direct proportion to the time elapsed since the discovery of the fault. Various kinds of requests from colleagues that it is extremely important to fix everything as early as possible, otherwise, become an additional catalyst ... I think there are many variations after the word “otherwise” and can hardly be considered as words of support.

In such a situation, the main thing is to structure the troubleshooting process (troubleshooting) and in no case try to tweak haphazardly here and there in the hope that you can guess where it hurts. Such behavior very often leads to the fact that a person enters into a certain cycle, constantly trying to solve a problem, doing approximately the same thing.

Recommendation

It is highly desirable to pre-think over the troubleshooting process by breaking into stages and choosing a specific action pattern (for example, from bottom to top, we start by checking the IOS physical layer and then step upwards). It is very important to constantly analyze the results in order to adjust the next steps. In the end, you will be able to at least locate the problem. Well, then decide it or come up with some workaround option.

New does not mean the best

IOS replacement is a common procedure. We perform it in order to defeat some problem, or we need to get new functionality. But not always the newer iOS is more reliable. It so happens that by replacing IOS, in which the bug we need (bug) is solved, we get a new one as a gift. So to say, "we treat one, cripple the other." Of course, this does not happen so often, but you should be prepared for this situation. Not for nothing in large networks are usually used certain reference versions of IOS. And the question of their replacement is quite problematic.

Recommendation

In case of replacement of IOS, it is recommended to check the correctness of the functions performed by the device. Of course, check everything immediately fails. Yes, and not worth it, otherwise it will look a little paranoid. Therefore, it is enough to keep in mind that if in the near future there will be a problem with this device, one of the reasons for its occurrence may be the recent replacement of IOS.

Conclusion

Oh, how great it would be to read about all these situations and immediately start setting up correctly. But in IT, very often you have to step on all these rakes yourself before you begin to think about the consequences of each step. Only after that, having experienced every situation on yourself, can you prevent them in the future.

Source: https://habr.com/ru/post/272541/

All Articles

The mistakes of our youth. Or how a network engineer learns the nature of the console.

More articles: