Juniper: composite-next-hop

In the article on EVPN, I mentioned the need to include composite-next-hop for EVPN, after which at least 10 people asked me the same question - what exactly is composite-next-hop. And I suppose that composite-next-hop for many is a mysterious technology, which makes it possible to drastically reduce the number of next-hops. This topic is very well covered in the “MPLS in SDN era” book, and I will briefly describe how this works based on the article from this book.

I think that all engineers dealing with routers know that there is a routing information base (RIB) and FIB (forwarding information base). RIB is compiled on the basis of information obtained from dynamic routing protocols, as well as on the basis of static routes and connected interfaces. Each protocol, whether bgp or isis, installs routes into its database from which the best routes based on protocol preference (administrative distance in terms of tsisko) are already installed into the routing table (RIB) and, very importantly, it is from RIB that routes are announced further . Here, for example, an entry in the rib route to the prefix 10.0.0.0/24:

bormoglotx@RZN-PE2> show route table VRF1.inet.0 10.0.0.0/24 VRF1.inet.0: 8 destinations, 13 routes (8 active, 0 holddown, 0 hidden) + = Active Route, - = Last Active, * = Both 10.0.0.0/24 *[BGP/170] 15:42:58, localpref 100, from 62.0.0.64 AS path: I, validation-state: unverified to 10.0.0.0 via ae0.1, Push 16 > to 10.0.0.6 via ae1.0, Push 16, Push 299888(top) [BGP/170] 15:42:58, localpref 100, from 62.0.0.65 AS path: I, validation-state: unverified to 10.0.0.0 via ae0.1, Push 16 > to 10.0.0.6 via ae1.0, Push 16, Push 299888(top)

The RIB provides us with complete information on each route, such as: the protocol that installed the route, the route metrics, the presence of equivalent paths, the community, etc., as well as the reasons for the current inactivity of the route. But the RIB is purely a control plane, and it is not convenient to use this table for forwarding to the router. Therefore, based on the RIB, the router (to be more precise, this is what the RE does) forms the FIB and sends it to all the PFEs. In FIB, there is no longer redundant information about protocols and metrics - all you need to know about PFE is the prefix itself, next-hop, through which this prefix is available, as well as the labels that need to be hung when sending a packet:
')

 bormoglotx@RZN-PE2> show route forwarding-table vpn VRF1 destination 10.0.0.0/24 Routing table: VRF1.inet Internet: Destination Type RtRef Next hop Type Index NhRef Netif 10.0.0.0/24 user 0 indr 1048576 4 ulst 1048575 2 0:5:86:71:49:c0 Push 16 572 1 ae0.1 0:5:86:71:9d:c1 Push 16, Push 299888(top) 583 1 ae1.0

Note: Usually only one route will go to the FIB, but we use ECMP balancing and RE sends two routes to the PFE if there is an equivalent route.

Today we will talk about next-hop. On Juniper equipment there are several types of next-hops:

 VMX(RZN-PE2 vty)# show nhdb summary detail Type Count --------- --------- Discard 18 Reject 17 Unicast 47 Unilist 4 Indexed 0 Indirect 4 Hold 0 Resolve 5 Local 20 Recv 17 Multi-RT 0 Bcast 9 Mcast 11 Mgroup 3 mdiscard 11 Table 17 Deny 0 Aggreg. 18 Crypto 0 Iflist 0 Sample 0 Flood 0 Service 0 Multirtc 0 Compst 7 DmxResolv 0 DmxIFL 0 DmxtIFL 0 LITAP 0 Limd 0 LI 0 RNH_LE 0 VCFI 0 VCMF 0

Many of them are intuitive, some of the above you will not find in your practice. We will focus on some of the above and begin with a simple direct next-hop, which in terms of Juniper is called unicast.

Unicast next-hop.

  604(Unicast, IPv4->MPLS, ifl:340:ge-0/0/2.0, pfe-id:0) <<<<<< 605(Unicast, IPv4->MPLS, ifl:341:ge-0/0/3.0, pfe-id:0) <<<<<< 606(Unicast, IPv4->MPLS, ifl:342:ge-0/0/4.0, pfe-id:0) <<<<<<

The simplest kind of next-hop is straight next-hop. It points straight to the physical interface through which the prefix is accessible. If this next-hop type were the only one, then for each prefix a separate next-hop would be created and no matter in which routing table this prefix is located - in vrf or grt. Yes, it is very simple and understandable, but not everything that the engineer understands at first glance is good. Let us give an example: if we have 100 vrfs, each of which has 100 prefixes, we will get 10,000 physical next-hops (this is only for vrf-prefixes). Add more routes protocols isis, ldp, rsvp, and so on.

Note: for simplicity of reasoning and simpler understanding, we will assume that we have no equivalent paths and aggregation interfaces. On the hierarchy of next-hop-s in the presence of these, we will talk a little later.

As a result, the next-hop limit can be reached very quickly. But this is not the main problem - now the glands withstand more than 1M IPv4 prefixes in the FIB. The fact is that if one of the interfaces crashes and recalculates the routes, the router will have to overwrite all the next hops that are currently installed in the forwarding table (in our case, all 10,000). Yes, igp routes will be overwritten quickly - there are not so many of them, but with vpnv4 / l2vpn / evpn there are usually several dozen (sometimes hundreds) of routes. Naturally, rewriting such a number of next-hops will take some time and you may lose some traffic. And this we have not yet taken into account the possibility of having an FW on the box, which now has 645K routes.

The most interesting thing about using direct next-hop is that even if all these 10,000 prefixes arrive from the same PE (i.e., have the same protocol next-hop), you will still have to update all 10,000 next-hops But if you think logically, then in fact in this situation, we will have only 100 unique next-hops (subject to the distribution of service labels per-vrf), which differ only in the service label - the transport label and the outgoing interface will be exactly the same. Now you will not find next-hop direct for prefixes in vrf (on Junos anyway) - to be more precise, you cannot even include next-hop direct for L3VPN and other services on cards with a TRIO chipset not supported But it won't work out at all either - unicast next-hop points straight to the interface to which the packet should be sent and when using the hierarchical next-hop (which we will talk about later), the last level of hierarchy will be exactly unicast next-hop. Well, how else? It should be mentioned that in addition to the outgoing interface, this kind of next-hop also includes a label stack and encapsulation, but more on that later.

It looks like unicast next-hop for the route received by isis, like this:

 bormoglotx@RZN-PE2> show route forwarding-table destination 10.0.0.2/31 table default Routing table: default.inet Internet: Destination Type RtRef Next hop Type Index NhRef Netif 10.0.0.2/31 user 0 10.0.0.6 ucst 693 19 ae1.0 bormoglotx@RZN-PE2> show route forwarding-table destination 10.0.0.2/31 table default extensive Routing table: default.inet [Index 0] Internet: Destination: 10.0.0.2/31 Route type: user Route reference: 0 Route interface-index: 0 Multicast RPF nh index: 0 Flags: sent to PFE, rt nh decoupled Nexthop: 10.0.0.6 Next-hop type: unicast Index: 693 Reference: 19 Next-hop interface: ae1.0

Aggregate next-hop

  584(Aggreg., IPv4, ifl:326:ae0.1, pfe-id:0) <<<<<<<< 585(Unicast, IPv4, ifl:337:ge-0/0/0.1, pfe-id:0) 586(Unicast, IPv4, ifl:339:ge-0/0/1.1, pfe-id:0) 603(Aggreg., IPv4->MPLS, ifl:327:ae1.0, pfe-id:0) <<<<<<< 604(Unicast, IPv4->MPLS, ifl:340:ge-0/0/2.0, pfe-id:0) 605(Unicast, IPv4->MPLS, ifl:341:ge-0/0/3.0, pfe-id:0) 606(Unicast, IPv4->MPLS, ifl:342:ge-0/0/4.0, pfe-id:0)

It's all very simple - I think you can guess that this hierarchy appears when we have the prefix visible through the aggregation interface. Aggregation next-hop is essentially a sheet of real next-hop (physical interfaces) that are included in the aggregate. If you use aggregates, then the number of next-hops increases in proportion to the number of links in the aggregate. In the output above, you see two Aggregate next-hops, each of which in turn points to the physical next-hops that are included in these aggregates.

Unilist-next-hop

  1048574(Unilist, IPv4, ifl:0:-, pfe-id:0) <<<<<<<< 584(Aggreg., IPv4, ifl:326:ae0.1, pfe-id:0) 585(Unicast, IPv4, ifl:337:ge-0/0/0.1, pfe-id:0) 586(Unicast, IPv4, ifl:339:ge-0/0/1.1, pfe-id:0) 603(Aggreg., IPv4->MPLS, ifl:327:ae1.0, pfe-id:0) 604(Unicast, IPv4->MPLS, ifl:340:ge-0/0/2.0, pfe-id:0) 605(Unicast, IPv4->MPLS, ifl:341:ge-0/0/3.0, pfe-id:0) 606(Unicast, IPv4->MPLS, ifl:342:ge-0/0/4.0, pfe-id:0)

Actually, this is also a very simple hierarchy and is somewhat similar to the aggregate. It appears only when we have equivalent paths and, in essence, is simply a listing of all equivalent paths. In our case, we have two equivalent paths and both through aggregates.

Note: in our case, the stars converged so that the unicast id (585, 586) go in order after Aggregate id (584) (in terms of numbers, and not in the order of hierarchy), but this is not always the case.

All the listed next-hops do not help to reduce the number of physical next-hops, but on the contrary increase their number. The next two types of next-hop are designed to optimize FIB and reduce the number of unicast next-hops.

Indirect next-hop.

  1048577(Indirect, IPv4, ifl:327:ae1.0, pfe-id:0, i-ifl:0:-) <<<<<< 1048574(Unilist, IPv4, ifl:0:-, pfe-id:0) 584(Aggreg., IPv4, ifl:326:ae0.1, pfe-id:0) 585(Unicast, IPv4, ifl:337:ge-0/0/0.1, pfe-id:0) 586(Unicast, IPv4, ifl:339:ge-0/0/1.1, pfe-id:0) 603(Aggreg., IPv4->MPLS, ifl:327:ae1.0, pfe-id:0) 604(Unicast, IPv4->MPLS, ifl:340:ge-0/0/2.0, pfe-id:0) 605(Unicast, IPv4->MPLS, ifl:341:ge-0/0/3.0, pfe-id:0) 606(Unicast, IPv4->MPLS, ifl:342:ge-0/0/4.0, pfe-id:0)

Literally, the word indirect translates as indirect. This type of next-hop is used to reduce the number of physical next-hops. Nevertheless, 10,000 next-hops, which we obtained by simple computation when considering unicast next-hop, are somehow too much. Now read our next-hop again. We have 100 vrfs in which 100 prefixes each (prefixes are generated per-vrf) and announced with the same PE. It turns out that all of these prefixes in such a scenario will have the same protocol-next-hop (loopback remote PE) and the outgoing interface (well, as a result, the same transport label). The difference will only be in the service tag. But since we generate tags per-vrf, we will have 100 tags in total. As a result, we get that 10,000 direct next-hops can be aggregated into 100 next-hops, which will have exactly the same stack of labels.

The concept of indirect next-hop allows for all prefixes that are reachable through the same protocol next-hop to use the same indirect-next-hop. I would like to draw the reader’s attention to the fact that aggregation occurs via protocol next-hop, since there may not be a service tag at all (for example, routes to the Internet), but its presence has a great influence on indirect-next-hop.

Alas, the main problem with indirect-hop is that it refers to unicast-next-hop, which indicates the full stack of tags, including the service tag:

 bormoglotx@RZN-PE2>show route forwarding-table table VRF1 destination 10.2.0.0/24 extensive Routing table: VRF1.inet [Index 9] Internet: Destination: 10.2.0.0/24 Route type: user Route reference: 0 Route interface-index: 0 Multicast RPF nh index: 0 Flags: sent to PFE Next-hop type: indirect Index: 1048587 Reference: 2 Nexthop: 10.0.0.6 Next-hop type: Push 24008, Push 299920(top) Index: 706 Reference: 2 Load Balance Label: None Next-hop interface: ae1.0

This line describes the full stack of tags:

  Next-hop type: Push 24008, Push 299920(top) Index: 706 Reference: 2

As you can see, the 24008 label is a service label and is included on the stack at the last level of the next-hop hierarchy. Based on this, it is impossible for several indirect-next-hops to point to the same physical one - the service tag is different for everyone. In addition, for example, L2CKT and VPLS use different encapsulations. Therefore, under certain conditions described above, indirect-next-hop may not give any profit.

It’s not hard to guess that if we use the per-prefix label distribution (for some unknown reason, this label distribution method is used by default on Cisco and Huawei), then indirect-next-hop doesn’t help us much, as we now have there will be a separate service tag for each prefix. As a result, we cannot combine several prefixes into one next-hop, since they are attainable through the same next-hop protocol, but have a different service label, which in the worst-case scenario for us leads to the appearance of 10,000 next-hops, though not direct but indirect. It turned out as in the proverb- “horseradish radish is not sweeter” ... Plus, everything else, all L2CKT, even if they are named on the same pair of PE-nis, will have different labels (and there’s nothing you can do about it - generate one label on several L2CKT-s does not work). How the developers defeated this problem will be described later.

Of course, in real conditions indirect-next-hop allows us to significantly reduce the number of next-hops (as few people do not use the vrf-table-label or the distribution of per-vrf tags). Moreover, on Juniper MX, indirect-net-hop is enabled by default and this feature cannot be disabled. In addition, if you have FW on the router, then these prefixes will not have a service tag at all (if you have not stuck the FW in vrf of course) and for all prefixes the same inirect-next-hop will be on the Internet.

But there is no limit to perfection and we want an even more scalable solution. Besides, I repeat, but L2CKT-you will always have different service tags, and therefore different indirect next-hops. The solution that fixes this problem is called chained-composite-next-hop (in Juniper terms, Cisco is a little different).

Chained-composite-next-hop

 607(Compst, IPv4->MPLS, ifl:0:-, pfe-id:0, comp-fn:Chain) <<<<<<< 1048577(Indirect, IPv4, ifl:327:ae1.0, pfe-id:0, i-ifl:0:-) 1048574(Unilist, IPv4, ifl:0:-, pfe-id:0) 584(Aggreg., IPv4, ifl:326:ae0.1, pfe-id:0) 585(Unicast, IPv4, ifl:337:ge-0/0/0.1, pfe-id:0) 586(Unicast, IPv4, ifl:339:ge-0/0/1.1, pfe-id:0) 603(Aggreg., IPv4->MPLS, ifl:327:ae1.0, pfe-id:0) 604(Unicast, IPv4->MPLS, ifl:340:ge-0/0/2.0, pfe-id:0) 605(Unicast, IPv4->MPLS, ifl:341:ge-0/0/3.0, pfe-id:0) 606(Unicast, IPv4->MPLS, ifl:342:ge-0/0/4.0, pfe-id:0)

As we found out indirect-next-hop is a matryoshka from next-hop. Exactly the same matryoshka and chained-composite-next-hop, but now we have another level in the hierarchy. How else can you combine prefixes into groups and associate them with the same next-hop? What else does all L3VPN or all L2CKTs have in common? True - this is a family of addresses. At the very top of the next-hop hierarchy there is a composite next-hop, which combines routes by a service tag, but unlike indirect-next-hop, composite next-hop refers to this tag. That is, the service tag is now indicated not at the latest level of the hierarchy - uniast, but at the first level of the hierarchy. This allows us to overcome the problem that we identified when discussing indirect-next-hop. For example, look at the entry in the FIB for the same prefix 10.2.0.0/24, but with the composite-next-hop enabled:

 bormoglotx@RZN-PE2> show route forwarding-table table VRF1 destination 10.2.0.0/24 extensive Routing table: VRF1.inet [Index 9] Internet: Destination: 10.2.0.0/24 Route type: user Route reference: 0 Route interface-index: 0 Multicast RPF nh index: 0 Flags: sent to PFE Nexthop: Next-hop type: composite Index: 608 Reference: 2 Load Balance Label: Push 24008, None Next-hop type: indirect Index: 1048578 Reference: 3 Nexthop: 10.0.0.6 Next-hop type: Push 299920 Index: 664 Reference: 3 Load Balance Label: None Next-hop interface: ae1.0

The line Load Balance Label contains the service label.

 Load Balance Label: Push 24008, None

By the method of banal erudition, we can come to the following conclusion: how many service marks we will have, just as there will be composite next-hops. At the next level of the hierarchy is indirect next-hop, though not the one we discussed earlier. When using composite next-hop, the indirect-next-hop prerogative is that it aggregates by address family and protocol-next-hop. That is, as you understand, for all vpnv4 prefixes that have the same protocol next-hop there will be the same indirect-next-hop. Well, then indirect-next-hop will indicate to the real next-hop (as a rule, either in unilist or aggregate). The most important thing is that now several indirect-next-hops can point to the same unilist next-hop, since now the unilist next-hop does not indicate the full stack of tags, but only the transport label, but how you understand, to the same PE-ki there will be the same transport label.

Now let's return to the case we are considering with 100 vrfs. In the worst scenario, using indirect-next-hop, we received 10,000 indirect next-hops and, as a result, as many real next-hops. Now let's see what composite-next-hop will give us. First comes the composite next-hop hierarchy, and provided that the labels are generated per-prefix, we will get 10,000 different service labels, which means the same number of composite-next-hops. But, unlike the previous case, the composite next-hop will not refer to the real next-hop, but to indirect-next-hop, which aggregates the vpnv4 assignment prefixes by address family and protocol next-hop. And this very sharply reduces the number of real next-hops. In our scenario, there is only one address family - vpnv4 and one protocol-net-hop, which means that all 10,000 composite-next-hops will refer to one single indirect next-hop, and he, in turn, will point to one real next-hop! That is, we ended up with just one real next-hop!

I can say from my own practice that the inclusion of composite-next-hop for ingress lsp makes it possible to reduce the total number of next-hop by 5-8 times (for example, the ral index, a decrease from 1.1M next-hop (before this function) to 170K (after its inclusion), that is, a reduction of 6.5 times - agree, a good indicator).

Note: when you turn on composite-next-hop, you will not see a stack of tags in the forwarding table, as it is specified in two hierarchies and is displayed only with extensive outputs, for example:

Indirect-next-hop:

 bormoglotx@RZN-PE2> show route forwarding-table table VRF1 destination 10.0.0.0/24 Routing table: VRF1.inet Internet: Destination Type RtRef Next hop Type Index NhRef Netif 10.0.0.0/24 user 0 indr 1048578 4 ulst 1048577 2 0:5:86:71:49:c0 Push 16 699 1 ae0.1 0:5:86:71:9d:c1 Push 16, Push 299888(top) 702 1 ae1.0

Composite-next-hop:

 bormoglotx@RZN-PE2> show route forwarding-table table VRF1 destination 10.0.0.0/24 Routing table: VRF1.inet Internet: Destination Type RtRef Next hop Type Index NhRef Netif 10.0.0.0/24 user 0 comp 608 2

Note: if the Juniper MX chassis includes DPC cards (except for service ones), then you cannot turn on composite-next-hop, as indicated by this message on the Juniper website:

On the MX Series 3D Universal Edge Routes, the chained composite next hops are disabled by default. It is up to you to be able to use the chassis.

It doesn’t say right away what cannot be turned on, but it may come as a surprise to someone, but DPC cards do not support the work in the enhanced-ip mode:

Only Multiservices DPCs (MS-DPCs) and MS-MPCs are powered on with enhanced network services options. No other DPCs function with the enhanced network services mode options.

But do not think that a composite next-hop is needed exclusively on PE routers, although it is useful mainly for them (just for P-like, as a rule, there are not so many routes). Composite-next-hop can be enabled for ingress lsp (on PE) and for transite lsp (on P). In addition, maybe you will have steroidal PE-ki, which will play the role of P-marshutchers (well, or your network design does not provide FREE CORE at all), or your core (P-level) will receive routes to other autonomous systems (Option C) not through the redistribution of BGP-LU routes in igp on boarders, but through a BGP-Labeled uneled session with reflectors.

If for ingress lsp we can enable composite-next-hop only for services such as L3VPN, L2VPN, EVPN, as well as BGP-LU:

  evpn Create composite-chained nexthops for ingress EVPN LSPs fec129-vpws Create composite-chained nexthops for ingress fec129-vpws LSPs l2ckt Create composite-chained nexthops for ingress l2ckt LSPs l2vpn Create composite-chained nexthops for ingress l2vpn LSPs l3vpn Create composite-chained nexthops for ingress l3vpn LSPs labeled-bgp Create composite-chained nexthops for ingress labeled-bgp LSPs

So for transit devices, there are available composite next-hop options for LDP, RSVP and even static lsp:

  l2vpn Create composite-chained nexthops for transit l2vpn LSPs l3vpn Create composite-chained nexthops for transit l3vpn LSPs labeled-bgp Create composite-chained nexthops for transit labeled BGP routes ldp Create composite-chained nexthops for LDP LSPs ldp-p2mp Create composite-chained nexthops for LDP P2MP LSPs rsvp Create composite-chained nexthops for RSVP LSPs rsvp-p2mp Create composite-chained nexthops for RSVP p2mp LSPs static Create composite-chained nexthops for static LSPs

This allows you to significantly reduce the number of next-hops for transit lsp. For example, in my lab there are only 5 devices - that is, the mpls.0 table on the P-ke looks to put it mildly small:

 bormoglotx@RZN-P2> show route table mpls.0 | find ^2[0-9]+ 299872 *[LDP/9] 1w1d 12:59:13, metric 1 > to 10.0.0.5 via ae0.0, Pop 299872(S=0) *[LDP/9] 1w1d 12:59:13, metric 1 > to 10.0.0.5 via ae0.0, Pop 299888 *[LDP/9] 1w1d 12:58:30, metric 1 > to 10.0.0.5 via ae0.0, Swap 299792 299904 *[LDP/9] 1w1d 12:55:57, metric 1 > to 10.0.0.7 via ae1.0, Pop 299904(S=0) *[LDP/9] 1w1d 12:55:57, metric 1 > to 10.0.0.7 via ae1.0, Pop 299920 *[LDP/9] 1w1d 12:47:06, metric 1 > to 10.0.0.5 via ae0.0, Swap 299824

But the effect of the inclusion of composite-next-hop for LDP will already be visible even in such a small laboratory. Here is the total number of net-hops before including composite-next-hop:

 VMX(RZN-P2 vty)# show nhdb summary Total number of NH = 116 VMX(RZN-P2 vty)# show nhdb summary detail Type Count --------- --------- Discard 12 Reject 11 Unicast 32 <<<<<<<<<<<<<< Unilist 0 Indexed 0 Indirect 0 Hold 0 Resolve 2 Local 13 Recv 8 Multi-RT 0 Bcast 4 Mcast 7 Mgroup 1 mdiscard 7 Table 11 Deny 0 Aggreg. 8 <<<<<<<<<<<<<< Crypto 0 Iflist 0 Sample 0 Flood 0 Service 0 Multirtc 0 Compst 0 <<<<<<<<<<<<<< DmxResolv 0 DmxIFL 0 DmxtIFL 0 LITAP 0 Limd 0 LI 0 RNH_LE 0 VCFI 0 VCMF 0

Now turn on the chained-composite-next-hop for ldp and check the result:

 bormoglotx@RZN-P2> show configuration routing-options router-id 62.0.0.65; autonomous-system 6262; forwarding-table { chained-composite-next-hop { transit { ldp; } }

In the routing table, everything is the same, although the routes have been updated, as can be seen from their lifetime (this should be taken into account when composite-next-hop is enabled on transit devices):

 bormoglotx@RZN-P2> show route table mpls.0 | find ^2[0-9]+ 299872 *[LDP/9] 00:00:57, metric 1 > to 10.0.0.5 via ae0.0, Pop 299872(S=0) *[LDP/9] 00:00:57, metric 1 > to 10.0.0.5 via ae0.0, Pop 299888 *[LDP/9] 00:00:57, metric 1 > to 10.0.0.5 via ae0.0, Swap 299792 299904 *[LDP/9] 00:00:57, metric 1 > to 10.0.0.7 via ae1.0, Pop 299904(S=0) *[LDP/9] 00:00:57, metric 1 > to 10.0.0.7 via ae1.0, Pop 299920 *[LDP/9] 00:00:57, metric 1 > to 10.0.0.5 via ae0.0, Swap 299824

And now let's check the total number of routes in the FIB:

 VMX(RZN-P2 vty)# show nhdb summary Total number of NH = 94 VMX(RZN-P2 vty)# show nhdb summary detail Type Count --------- --------- Discard 12 Reject 11 Unicast 10 <<<<<<<<<<<<<< Unilist 0 Indexed 0 Indirect 0 Hold 0 Resolve 2 Local 13 Recv 8 Multi-RT 0 Bcast 4 Mcast 7 Mgroup 1 mdiscard 7 Table 11 Deny 0 Aggreg. 2 <<<<<<<<<<<<<< Crypto 0 Iflist 0 Sample 0 Flood 0 Service 0 Multirtc 0 Compst 6 <<<<<<<<<<<<<< DmxResolv 0 DmxIFL 0 DmxtIFL 0 LITAP 0 Limd 0 LI 0 RNH_LE 0 VCFI 0 VCMF 0

Since we had several transit tags, then each one had its own unicast next-hop and JunOS didn’t worry much that we had only two interfaces to which we can send transit traffic and in the end it turned out that we have 8 aggregation next- hop-s, which naturally led to an increase and unicast next-hop-s. After the inclusion of composite-next-hop, instead of generating its own next-hop for each label, the composite-next-hop already refers to the existing two aggregation next-hop.

I would like to add that when you turn on the composite-next-hop for the ingress LSP, all BGP sessions will jerk, when you turn on the composite-next-hop for the transit LSP, the sessions will not jerk (even if BGP-LU is turned on), but all mpls tags will be reset. and re-installed in the forwarding table.

In conclusion, I would like to visually compare indirect-next-hop and composite-next-hop in pictures.
Three L3VPNs were launched in the laboratory, and with PE3 prefixes (10.2.0.0/24 and 10.3.0.0/24) are advertised with the label per-prefix, and with PE1 - per-vrf:

and three L2CKT - two to PE1 and one to PE3 :

In addition, the scheme is assembled on aggregation interfaces, and there are equivalent paths to PE1.

This illustration shows the “tree” of next-hops when using indirect-next-hop:

For the prefixes 10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24 - we have one serial tag, as well as for the prefixes 20.0.0.0/24, 20.0.1.0/24, 20.0.2.0/24 - also one service tag - they are all advertised with PE1. As you can see, these prefixes are accessible through the same indirect-next-hop. But 10.2.0.0/24 and 10.3.0.0/24 have different labels (labels for them are generated per-prefix), which means they have different indirect-next-hops. Well, I think everything is clear with L2CKT - everyone has different service tags and indirect-next-hops. As a result, we have 29 unicast next-hops.

Now the same thing, but with the included composite-next-hop:

Here the next-hop is less. Prefixes that have the same service label are accessible through the same composite-next-hop. As you remember, the service label is specified in the composite-next-hop hierarchy. Further, all composite-next-hops are referred to in indirect next-hops. In the diagram above, we have two protocol-next-hop (PE1 and PE3) and two services L3VPN and L2CKT. The result is that we have 4 indirect next-hop:

L3VPN, PE1
L3VPN, PE2
L2CKT LDP, PE1
L2CKT LDP, PE2
And since now at the level of the unicast next-hop hierarchy we only have a transport tag, now indirect next- Hops can refer to the same unicast-next-hop. As a result, the number of unicast-next-hops from 29 was reduced to 8.

Thanks for attention.

Source: https://habr.com/ru/post/324268/

All Articles

Juniper: composite-next-hop

More articles: