📜 ⬆️ ⬇️

How we implemented SD-Access, and why it was needed


The main monitoring page.

SD-Access is the implementation of a new approach to building local area networks from Cisco. Network devices are combined into a factory, an overlay is built on top of it, and all this is controlled by a central component - the DNA Center. All this has grown from network monitoring systems, only now the mutated monitoring system not only monitors, but collects detailed telemetry, configures the entire network as a single device, finds problems in it, offers solutions to them, and in addition enforces security policies.

Looking ahead, I will say that the solution is rather cumbersome and at the moment is not trivial in terms of development, but the larger the network and the more important the security, the more profitable it is to move on: it seriously simplifies management and troubleshooting.
')

Prehistory - how did we decide on this?


The customer moved to a new, freshly purchased office from a rented office. The local network was planned to be made according to the traditional scheme: core switches, access switches, plus some familiar monitoring. At that time, we just deployed a stand with SD-Access in our laboratory and managed to feel a little about the solution and get trained with an expert from Cisco’s French office who visited Moscow very well.

After talking with the vendor, both we and the customer decided to build the network in a new way. Here are the benefits:


We saw the flaws later.

Planning


We figured the top-level design. The planned architecture began to look like this:



Below this is the underlay, built on the usual protocols (the base is IS-IS), but the idea of ​​the solution is such that the subtleties of his work should not interest us. Overlay is performed on LISP and VXLAN. The solution logic assumes the preferred use of 802.1x authentication on access ports. However, the customer intended to use it on a mandatory basis for all initially. You can do without 802.1x and configure the network almost “in the old manner”, then you need to configure the pools of IP addresses manually, and then again with your hands on each port to prescribe which IP pool it belongs to, and make a Copy-Paste, as before In the command line, it does not work out, everything is only through the web. With this approach, the advantages of the solution turn into fat minus. Such a scheme can be used only where it is inevitable, but not on the entire network. The use of access rights is provided by SGT tags.

We ordered equipment and software, but for now everything was going, we began to “land” the design in order to understand what we were going to customize. Here we encountered the first difficulty: if earlier it was necessary to match the IP subnets and the dialing of VLAN numbers with the customer so that it fits into the schemes adopted by him, now we are not interested in this: we needed to understand which groups of users and devices network, how they interact with each other and what network services they use. Unusual for us and for the customer. It was more difficult to get such information. At first glance, it was from such data that you always had to make a start when designing networks, but in practice, the standard set of VLANs was almost always laid out, and then reality was crammed into it during the operation by the callused hands of admins. In the SD-Access paradigm there is no choice: the network is built “for business”.

The deadlines were shrinking, the equipment drove up. It was necessary to customize.

How we implemented it


The process of network implementation differs from the old schemes even more than the planning process. Previously, the engineer connected the devices to each other, set them up one by one, and received one after another working network segments. With SD-Access, the deployment process is as follows:

  1. Interconnect all network switches.
  2. Raise all DNA Center controllers.
  3. Integrate them with ISE (through it all authorization takes place).
  4. Use DNA Center to turn network switches into a factory.
  5. Paint the roles of switches in the factory (Edge Node, Control Node, Border Node).
  6. Configure the DNA Center groups of end devices and network users and virtual networks.
  7. Customize the rules of interaction between them.
  8. Apply device groups and rules to the factory.

This is the first time. Moreover, the DNA Center for primary deployment requires DNS, NTP and access to the Cisco cloud for downloading updates (from a Smart Account). At our implementation, it turned out that the DNA Center loves to update itself during the initial installation: it took about two days to bring all its components to the current versions, although it happened mostly without our participation.


An example of a assembled factory.

When the DNA Center is already working to raise a new office, it’s enough to repeat points 1, 4, 5 and 8. Thanks to the Plug-and-Play Agent, the new switches receive the addresses of the DNA Center via DHCP (optional), take the preliminary configs from there and become visible DNA Center management interface. It remains to paint their roles (Egde / Control / Border), and the new factory is ready. Groups of devices and policies on it can be used old.

Of course, when confronted with such a process for the first time, it is difficult to understand from which side to approach it. In addition, along with the SD-Access paradigm and related products, Cisco has generated so many new terms and definitions that it will enable even an experienced CCIE to feel young again. Here are the main ones:


In general, how to learn concepts should be those who introduce, and those who will exploit it. From ignorance, implementers tighten deadlines, and then the admins then drop KPIs. So you can stay without bonuses. But the distrust of the customer’s management towards the chosen solution is a problem for everyone in general.

With the introduction due to the fact that the customer already had to call in to the new office, we went as follows:

  1. Created one group and one virtual network at all in OpenAuth mode without forced authorization, only connection logs.
  2. Admins have connected workstations, printers, etc. to the network, users have moved to a new office and started working.
  3. Next, there was one user who logically should belong to another group.
  4. We set up this group in the DNA Center and the policy of its interaction with other groups.
  5. Moved the user to this new group and enabled for him ClosedAuth with authorization.
  6. Together with the customer’s specialists, they identified the access problems encountered by the user and corrected the contract settings (policies of the interaction of his group with the others).
  7. When they were sure that the user was working without problems, they moved other users who should belong to it to his group and watched what was happening.

Then the items from the 3rd to the 7th needed to be repeated for new groups until all users and devices connected to the network were in their own groups. When operating in OpenAuth mode, the client device attempts authorization. If successful, the port to which it is connected applies the settings corresponding to the group to which this device belongs, and if unsuccessful, it enters the IP Pool pre-configured on the switch port - a kind of rollback to the traditional mode of the local network.

Of course, as with any new product, we spent quite a few hours updating software and identifying bugs. Fortunately, Cisco TAC helped with this promptly. One morning, logging into the DNA Center's web interface, we found that the entire network was down. At the same time, not a single complaint from users: the office works while drinking morning coffee. Rummaged in the logs, and it turned out that there was a problem with SNMP, through which DNA Center receives information about the state of the factory. The network is not visible, but it is. The elimination of part of the OID from the polling helped.


Component version page.

How to exploit it?


DNA Center collects a bunch of useful SNMP, Netflow and Syslog data from the factory and knows how to present it. This is especially useful when solving floating problems like “something yesterday, many telephony fell off, although now it seems normal.” You can climb in the data of the Application Experience and try to understand what was happening. So there is a chance to fix the problem before it “flies” next time. Or prove that the network had nothing to do with it.


Data on the quality of the application.

For many of the problems that DNA Center shows as Alarm, it tells you where to dig.


An example of an OSFP Adjacency crash message with a clue what to do.

It became easier to carry out routine analysis. For example, if necessary, you can quickly track the path of traffic over the network, without climbing the devices one by one. With authorization through ISE, the DNA Center picks up and shows the names of clients, including on the wired network: no need to climb in search of an IP address.


An example of tracking traffic through a network. A red tag on one of the devices says that traffic is blocked on its access control list.

You can quickly see which network segment is covered by the problem (the switches in the DNA Center are broken down into locations, sites, and floors).

The “gamified” indicator of the quality of life of applications on the network as a percentage makes it possible to superficially assess the state of the network and see if it is not getting worse with time.


Indicator of quality of life applications.

As before, Prime Infrastructure also provides software version control on network devices. DNA Center maintains its repository, where images can be uploaded either manually or automatically downloaded from Cisco.com, and then deployed to devices. In this case, you can program and run scripts to verify the correctness of the network before and after the update. The standard guest bill script, for example, includes checking the availability of free space on the flash, the confi-register status, whether the config is saved. Software patching is supported for devices that can do it.


Software repository at the DNA Center.

And, of course, access to the command line of the network glands is still there.

Total


The product is new, new approaches can be implemented, however, carefully. Because of the newness of the code, there are bugs in the work, but Cisco technical support responds promptly, and developers release updates regularly. Due to the novelty of the network management approach, the probability of errors in the early stages of operation is quite high, but administrators gradually get used to it, and errors become less than with the support of a traditional LAN. It is worthwhile to think in advance about how to test and test everything on the part of users, and then apply it to everyone (although with experience you understand that this is useful when implementing any IT solutions, even the most understandable and proven).

What is the use? Automation, acceleration of typical operations, reduction of downtime due to configuration errors, improving the reliability of the network due to the fact that the causes of failure in the network are known instantly. According to Cisco, an IT administrator will save 90 days a year. Separately, security: with the Zero Trust-approach, an epic problem with subsequent entry into the press can be avoided, but for obvious reasons, very few people appreciate this.

Source: https://habr.com/ru/post/457738/


All Articles