📜 ⬆️ ⬇️

Inter-AS routing. Can I save on a BGP router?

As a preface: yesterday I presented the ideas below at a local meeting of administrators. After the presentation, a representative of a company engaged in the production of network equipment approached me and asked: “Did you publish this somewhere? Share the presentation, I will send to colleagues to see. ”Actually, why not publish it? As we say in Ukraine, "i mi, Khimko, people." If someone from the vendors, even remotely, but became interested, then in the community there is a person to whom the ideas also seem interesting. In addition, I myself plan to use this solution. I will say right away that there will not be 100% of the finished result, but there will be some intermediate, which is enough for ersatz routing and some information to continue work in this direction. Go!

I’ll say right away that the title of the article is about the router, the physical device, and not the BGP protocol. And no, without the BGP protocol, it’s impossible to live, at least for now, but you yourself know that. Why do we need a protocol, what is it good for, what can we do and how to set it up - masses of books and articles are written about this both on Habré and on third-party resources, therefore I’ll omit this topic. If you have enough statics or in the neighboring autonomous system there are creatures that you can ask to change the announcement policies, you can safely close the post - you do not need this information.

If you're still reading, let me start with the design, since the article is more about designing a solution than about the solution itself. So, we are the ISP A service provider, having the same address pool and the need to announce these addresses to the Network. The general scheme of logical connections is shown in the figure.

image
')
At the border of our network stands the router RTR A, which is connected by an eBGP session with a peer-neighbor. Through the protocol session, we receive FullView from the neighbor with the address of the next transition (next-hop) - the IP address of RTR B. In response, we give information about our internal networks with the indication of the next-hop RTR A. Immediately, this is just one of many possible BGP neighborhood organization schemes: there may be several border routers, as well as neighbors, the neighbors themselves may not be connected directly, we can receive more than one FullView, reserve channels and so on. However, I will allow myself to omit the analysis of various schemes for organizing peering and reserving, and dwell on the simplest case — this will not change the essence, but will simplify understanding. Inside our autonomous system, IGP operates, through which we transmit the reachability of ISPA networks. Or does not work - this is again only one of the possible options.

User traffic passes through the CoreA core, the RTR A router, and goes further to the Network. In my understanding, this (with all possible variations) is the classic method of organizing the perimeter of the network and the BGP neighborhood. Now let's see how much this hardware solution costs.

I will build on the required bandwidth of 10 Gb / s. At a minimum, 10GB / s interfaces with the possibility of further upgrade should be present in the solution. Juniper proposes MX104-40G solution (or alternatively MX80) for 40 thousand dollars. with two (four for MX80) 10 Gb / s interfaces "on board" and routing performance at 40 Gb / s (80 for MX80). Cisco responds with a Cisco ASR 1001-X device with a base capacity of 2.5 Gb / s and two 10 Gb / s interfaces "on board" for $ 17,000 + the price of a license to improve performance (up to 20 Gb / s) and activate additional interface slots. I will say right away that I did not set myself the task of a strict comparison of devices - in the end, the post is not about that, but some figures are needed, since our main task is to reduce the cost of the solution.

So, at least 17 thousand dollars. What useful does our RTRA router do? Yes, in general, a little bit - it turns the BGP session (or several) with the neighbor and forwards the traffic to the Network. Is it possible to do without him? For the answer let's analyze the following topology.

image

We removed the physical router and started BGP peering on the kernel device. Is it possible? Yes, the benefit of disarranged L3 switches support the launch of BGP. However, this solution has at least two weak points. First, most switches were not designed for full routing, and therefore have a limited routing table size. For example, Juniper EX4550 has a limit of 14000 IPv4 unicast routes, and Cisco Nexus3k - 16000. Second, in order to run BGP, you will need to purchase a license, which costs 8 (Cisco Nexus3k) or 10 (Juniper EX4550) thousand dollars. If we need redundancy switches, it will double the figures given. In addition, you will need to negotiate with a higher provider to sum networks, well, or get the default route. Nevertheless, such a design still allows you to abandon the purchase of a dedicated router and at the same time use the useful BGP buns. Another possible variation on this topic is given below.

image

We run the BGP process on a physical server or a virtual machine that runs an eBGP session with RTRB and iBGP with a kernel device. On the virtual install one of the available packages for running BGP, for example Quagga, Vyatta or BIRD.

One of the great features of the BGP protocol is the ability to change the next-hop when announcing updates, and we will use it in order to avoid a situation where user traffic needs to be forwarded through BGP speaker. That is, we somehow divide the devices that have route information (virtual) and devices that are engaged in sending traffic (CoreA) within an autonomous system. Accordingly, the RTRB receives the address CoreA as the next-hop and vice versa. Such a control-plane vs forwarding-plane. The idea itself is not new and is actively used in organizing exchange points, only through eBGP sessions.

This is a more interesting scenario, since now we can receive both FullView and several such ones, filter and summarize routes locally, without resorting to calling the provider. Another interesting feature of the solution is that we don’t even need to populate the kernel table on a virtual machine with BGP. Those who are faced with the configuration, for example, Quagga know that you need to first enable the “ip forwarding” option and then transfer the routes that the daemon received to the kernel (well, or the host routing table) to correctly forward traffic. So, this is all superfluous - the virtual machine is engaged only in the announcement of BGP information and does not participate in the traffic promotion, and filling the table inside Quagga takes as much time as is needed to transfer the routing table volume directly - seconds 10.

This is more like the desired solution, but the issue with the license remains, because Virtualka and CoreA communicate via BGP. Are there any other options? Can I do without a license fee? And here we come to the main salt of this post. Take a look at the topology.

image

The basic idea is the same - to run eBGP on a virtual machine, but inside an autonomous system you can already use some IGP protocol, for example, as shown in the figure, OSPF. The part with the eBGP session remained unchanged and there are still no problems here. But with IGP they are - in fact, none of them was intended to transmit a non-directly connected next-hop, sorry for the abundance of English words. Among other things, Nexus3k requires a license for OSPF, but these are already details - in my Juniper network, and for the Nexus, you can use RIP :). One way or another, it is necessary to transfer another next-hop, otherwise user traffic will go through the virtual machine, and this solution will not work. Accordingly, we need a certain crutch that will allow the “impossible” - to transfer another, not local, next-hop when announcing the route. When running ideas, I tried the following options:


By the way, on the last point - exporting to the forwarding table - with its help, you can perform per-flow BGP ECMP, at least on Juniper. If someone is interested, I can throw the config in gratitude for your attention to the post.

So, unfortunately, all of the above does not work. Qugga and Juniper silently ignored my tinkering in politics, and BIRD swore at once while trying to change the “next-hop” parameter in the announcement. That's so trite and insulting my idea broke on the rocks of misunderstanding on the part of manufacturers. In the process, I even googled the problem and it turned out that I’m not the only one who is so clever. But there was no solution to the fact, except that they indicated that Cisco has the “forwarding address” feature (you can read it here ), but this is not it.

Almost desperate, I turned to colleagues for help. Andrushko Dmitry, Kovalenko Alexander (@ alk0v) and Simonenko Dmitry, thank you - the country should know its heroes! So, there are options.

First, there is a ready-made solution for software-defined networks called the Atruim project ( read ). In addition, if I heard correctly, Mellanox is engaged in manufacturing devices with Quagga / BIRD inside. Strictly speaking, SDN is a cool thing - do what you want and how you want. But this is SDN and new equipment, and my task is to solve everything on the existing one.

Further, if I understood correctly (“if” is the main word, since I am not strong in * NIXs), the demons in Quagga (for example, ospfd) communicate with the kernel through the iproute2 module and, theoretically, you can intercept the packet at the output of ospfd and modify him. I don’t know if I think correctly and whether it is possible, but somehow.

And finally, the iron version - Scapy, which allows you to generate packages with the specified content. And in fact - the structure of the OSPF packet is known to us as to what value to change, too. Things are easy - to realize it. This is where I stopped at the moment.

The way I imagine the solution - it must first of all be dynamic. Otherwise, why all these dances with protocols? According to the opinion, you can even raise one virtual machine for each eBGP peer - the price of a virtual machine is negligible, and this simplification will allow you to simply modify all outgoing OSPF packets, changing one next-hop to another.

But until I got to the implementation of such a decision, I decided that I would run eBGP on a virtual machine for my task, and use static on the core (CoreA). Unscrupulously - yes, but this will allow me to do without buying a router, at least at first.

I understand that such a solution is not suitable for transit autonomous systems and places where additional services like MPLS are needed. There may be problems with geofiltration, or more precisely, the prioritization of a specific peer with non-adjacent blocks of addresses, where optimal summation is difficult. It is also necessary to take into account the relatively slow transfer of routing information via IGP. However, for dead-end AS and tasks simpler solution is quite suitable.

These are the ideas. I hope someone they seem interesting and will find their use.

Source: https://habr.com/ru/post/328850/


All Articles