📜 ⬆️ ⬇️

How to kill your network with Ansible

Note trans. : This article, written by a network engineer from Sweden, talks about some of the nuances of working with templates in Ansible, and most importantly, teaches one simple and obvious rule that helps you not to "shoot yourself in the foot" ... and not only in the foot and not even yours, when it comes to automated management of a large variety of devices / servers. The example described in it will be useful to every system administrator and DevOps engineer. ( Selections in the text are not copyrighted - they are made in translation to emphasize several points.)



I not only use Ansible, but also write about it, and try to help others understand how it works. Recently answered questions from Ansible users. One of them did not understand why the ios_config module incorrectly applied its template. Having explained what was wrong with this template, and thinking about this problem further, I realized that such a mistake can be really dangerous. Dangerous enough to cause your infrastructure to become inoperable.
')

Scenario


Imagine that you are a network administrator who has just been assigned to roll out a new IP network to all branches. You are using Ansible 2.4 (everything may work differently in later versions when they are released) [2.4.0.0 - the current stable version of Ansible, released on September 19, 2017 - approx. trans. ] . Details of the new addresses are already fixed in the IPAM system, and the script (Dynamic Inventory) will receive the necessary information for you. Now you need to update the template by adding new networks and rolling out this change. In addition, the person responsible for managing the networks entered a “wormhole” in the description of all interfaces pointing to the WAN (Wide Area Network). "When you add these networks, please change the description to" WAN "." The added network will be used by a new digital signature solution.

You start by looking at the current template, which looks like this:

 interface FastEthernet0 description Wormhole exit ip address {{ wan_ip }} {{ wan_mask }} interface FastEthernet1 description POS ip address {{ pos_ip }} {{ pos_mask }} interface FastEthernet2 description OFFICE ip address {{ office_ip }} {{ office_mask }} 

Going to the router in Wisconsin, you see that you get the following configuration:

 interface FastEthernet0 description Wormhole exit ip address 172.29.58.161 255.255.255.224 interface FastEthernet1 description POS ip address 10.17.80.1 255.255.255.0 interface FastEthernet2 description OFFICE ip address 10.17.81.1 255.255.255.0 interface FastEthernet3 no ip address 

It looks quite simple: the new network will be connected to FastEthernet3 . Run your editor and update the template. The new file is obtained as follows:

 interface FastEthernet0 description WAN ip address {{ wan_ip }} {{ wan_mask }} interface FastEthernet1 description POS ip address {{ pos_ip }} {{ pos_mask }} interface FastEthernet2 description OFFICE ip address {{ office_ip }} {{ office_mask }} interface FastEthernet3 description SIGNAGE ip address {{ signage_ip }} {{ signage_mask }} 

The template is ready to use! You start the terminal and send a new template to your Git repository:

ansible-playbook network-baseline.yml

Meet the problem


You rejoice at the work done and stop to enjoy this little victory. But the pleasure lasts until someone asks you, “Why did Wisconsin just disappear from the map?”

Very strange: you just added a new network, which is not even connected with the office of Wisconsin, right? And here you are overtaken by an unpleasant feeling ... Such a question was sounded because you made a change in a single office, or have all branches just been killed at all?



What happened?


Before understanding this in detail, it is necessary to understand how such cases occur in general. You can make mistakes, and there can be bugs in the software . Therefore, it is really important to first check what you are doing in a safe environment . In this case, a good idea would be to use the test mode in Ansible ( -C ) along with the verbose ( -v ) -v . So you can see which configuration will be sent to the device, without actually applying the changes. Another important point is that you shouldn’t run anything on the entire network if you are not sure how this will end. Use the --limit option and start with several devices.

So, using the verbose option, we can understand what went wrong:



It looks like the actual config that was sent to the affected device is:

 interface FastEthernet0 description WAN description SIGNAGE ip address 10.17.82.1 255.255.255.0 

Great: the playbook reconfigured the WAN interface by giving it an IP address that was intended for FastEthernet3 . At this point, you can call your therapist, Red Hat tech support, or perhaps a lawyer ... Or keep reading, why did this happen.

How Ansible Parsing Patterns for Network Devices


Like the rest of the Ansible components, the network modules use the Jinja2 templating engine. However, it works a bit differently with networks than with configuration templates for nginx and other services. Ansible parsit the current device configuration and, based on this data, decides what needs to be updated on the device. For example, nothing has changed with FastEthernet1 and FastEthernet2 - then Ansible will not attempt to change anything on these interfaces.

After receiving the template, Ansible will apply only the configuration that is not yet on the device. However, Ansible does not really understand the configuration and what it does. Instead, it tries to parse the configuration on a set of predefined rules. If we start with a description of the WAN interface, obviously it needs to be changed. However, we can not give a separate command to add a description - you need to configure it within the interface. Since the description line is indented and is after the interface FastEthernet0 , Ansible treats this line with the interface as the parent for the next section. So, first Ansible sends a command with an interface, and then a command with a description. This is why updates sent by Ansible begin with:

 interface FastEthernet0 description WAN 

And this is how the final part of the template will look like:

 interface FastEthernet3 description SIGNAGE ip address 10.17.82.1 255.255.255.0 

Since there is no indent, Ansible will not understand that interface FastEthernet3 is the parent command for description and ip address . Instead, he will simply perceive the commands as snippets of the global config and, since there are no lines to describe or IP in the global config, will include them in the list of commands sent to the device.

If we directly enter the description and IP address in the global config, we get an error:

 WIS-RTR-01(config)#description SIGNAGE ^ % Invalid input detected at '^' marker. WIS-RTR-01(config)#ip address 10.17.82.1 255.255.255.0 ^ % Invalid input detected at '^' marker. WIS-RTR-01(config)# 

However, in our case, we also changed the description of FastEthernet0 , so the session is still in the context of config-if. Since we do not send an exit command to return to the global configuration (ie, to go from (config-if)# to (config)# ), the wrong IP address will be applied to the FastEthernet0 interface. And the final configuration will be as follows:

 interface FastEthernet0 description SIGNAGE ip address 10.17.82.1 255.255.255.0 interface FastEthernet1 description POS ip address 10.17.80.1 255.255.255.0 interface FastEthernet2 description OFFICE ip address 10.17.81.1 255.255.255.0 interface FastEthernet3 no ip address 

Oops ...

What you need to remember


I think it is safe to say that in order for the scenario described above to happen, there must be a certain amount of failure. In this case, two spaces were missing at the same time and the description was changed, which did not apply to this revision. But even if such a catastrophe does not happen, everything can be completed by applying the configuration where you would not want it.

I repeat again. Make sure you test and validate what you want to do! Use the dry run and look at the result to know what is going on.

Another approach


If the example above sounds scary for you, remember that you do not need to use templates in this way. You also have the lines and parents options in ios_config . You can also take a look at the NAPALM library , especially the napalm_install_config module for Ansible. As mentioned above, Ansible's basic network modules parses the running configuration and try to figure out which configuration the device is missing to decide which commands to send. NAPALM does not look at device configurations and leaves it up to the device to decide what to use. In the case of an IOS device, NAPALM will copy the entire generated template into the device's file system, evaluate whether changes are needed, and then merge them with the current configuration (or replace the entire configuration if you want).

Conclusion


I hope this article will help you understand how Ansible works with templates, applying them to network devices, and why it is important to use the correct indents. And the most important conclusion from it is to make sure that you always check all changes to the configs that you roll out.

Finally, I hope that no network in Wisconsin and anywhere else will suffer from your hands.

Source: https://habr.com/ru/post/339482/


All Articles