📜 ⬆️ ⬇️

Reservation of internal and external communication channels, static routing, corporate network on MikroTik

I work as a technical support engineer at ISP. In the article I will share the experience of building a corporate network with static routing and reservation of communication channels, as well as automatic notification of an accident by email, with a limited budget for a retail chain of stores. For experienced network engineers, the article is unlikely to be interesting. For admins who are assigned a similar task, this article may be useful.

I believe that dynamic routing in this task would not work as quickly and probably reliably as the project requires. I have nothing against dynamic routing, but negative feedback on its operation on MikroTik equipment and some network specifics (more on this below) influenced the choice towards statics and scripts.

Part 0. What is given


I was approached by customers - the local trading network. To which we provide services in the organization of a local network for communication between stores distributed in the city.

From a technical point of view - the provider allocates them a separate VLAN in their network. All stores (12 of them) are connected to the ISP through optics using two technologies: FTTH and PON.
')
The enterprise network diagram before the upgrade is shown in the picture.



The two stores and the central office are connected using Ethernet (FTTH) technology. In the remaining 9 stores, the connection takes place via PON (Passive Optical Network) technology. When connected via PON, Huawei terminals are used, the model HG810 is also called ONU (Optical Network Unit). You can read about PON technology here .

The equipment of this company has specific functions. On the one hand, they are not needed for ISP users and play a positive role in terms of subscriber access network design. On the other hand, these features can have a negative effect on corporate clients.

Let's take a closer look at them:

  1. In the PON network built on the equipment of Huawei, traffic is prohibited by default between ONUs working from one base station (OLT - Optical Line Terminal). We managed to solve this problem using special profiles for corporate VLANs.
  2. ONU does not allow DHCP packets from the subscriber to the provider's network. From the provider’s network to the subscriber’s side, everything goes. If you connect the main office with a DHCP server to a distributed corporate network via the ONU, the server located in the office will not be able to distribute addresses to nodes that are located outside the office.
  3. A similar problem with the passage Multicast - packages. All multicast packages do not go through the ONU and are not seen in other parts of the network.

With the rest of the traffic there are no problems, no filtering and restrictions.

Regarding problems 2 and 3, if there are engineers among the readers who use Huawei's PON in their networks and know how to allow such traffic to pass, I will be glad to advise.

At the time of contacting me, the chain of stores was a flat, unmanaged network, with one router running Kerio Control Server.

In the network, all IP devices from all stores were visible to each other. The FDB table on the provider's switch has more than 350 devices in their VLAN. All these devices were in the same large broadcast domain.

Because of this, there were various failures in the network that interfered with the work of the stores, so the network needed to be segmented.

Sometimes there was an accident at the provider because of which the connection between the office and individual stores was lost.

Worse, when there is a loss of communication between the central office and the network provider. In this case, all 12 stores distributed in the city remain without communication with the servers located in the office and without access to the Internet. During this period, the work of the stores is significantly limited, the opportunities disappear:

  1. Accept payment by cashless payment.
  2. To conduct reception and revision of goods.
  3. Synchronize prices and balances.

The central office was connected via Ethernet. Since they needed to distribute DHCP for devices in all stores. Optics from the office goes to the nearest apartment building where the provider’s communications center is located. When in this house or the houses connected further the power supply is lost - all 12 stores are left without communication with the office.

In order to work in the event of a power failure, the PON line was established at the main office. It was used only as a reserve in case of falling Ethernet, because DHCP packets did not pass through it. Switching between the Ethernet and PON communication channels was carried out manually .

I was tasked with:

  1. Segment the network and break it into many small broadcast-domains to eliminate the negative impact in one of them on the overall network.
  2. Introduce a method of automatically switching internal communication channels in case there is no connection between the main office and the provider via Ethernet or PON.
  3. Introduce a way to automatically switch communication with the office, in case if in a particular store the connection with the ISP is lost - which means local connection with the office and Internet access.
  4. Introduce the ability to automatically notify system administrators of an enterprise about an accident on a particular network segment (the connection to the office was lost or the backup Internet was lost).

Part 1. The solution of the tasks


To perform these tasks, MikroTik equipment was purchased. The RB1100AHx2 model was purchased to the central office, and to each of the 12 MikroTik hEx stores (RB750Gr2 ).

At the central office and in all stores, the second provider is connected - Rostelecom. From which the company buys only Internet access . At the central office, the connection is made by cable (FTTH), in stores via ADSL. Modems are leased from the provider and work exclusively in bridge mode.

A distributed addressing scheme has been introduced in the enterprise network:


For the work of routing between offices, two auxiliary networks have been introduced in which communication between MikroTik routers is organized:


The main office has access to the Internet through 3 channels:



Below is an example configuration:

[s@MAIN-BORDER-ROUTER] > ip address export # nov/27/2015 22:43:50 by RouterOS 6.32.2 # /ip address add address=10.10.10.1/24 comment=ISP-LOCAL-ADDRESS interface=eth-1 network=10.10.10.0 add address=10.10.20.1/24 comment=ISP-RESERVE-LOCAL-ADDRESS interface=eth-2 network=10.10.20.0 add address=1.1.1.1/30 comment=ISP1-MAIN-INET-ADDRESS interface=eth-1 network=1.1.1.0 add address=2.2.2.2/30 comment=ISP1-RESERVE-INET interface=eth-2 network=2.2.2.0 add address=192.168.1.1/24 comment=OFFICE-LOCAL-ADDRESS interface=bridge-MAIN-OFFICE network=192.168.1.0 

For PPPoE from the second provider:

 [s@MAIN-BORDER-ROUTER] > interface pppoe-client print Flags: X - disabled, R - running 0 R name="RT-PPPoE" max-mtu=1480 max-mru=1480 mrru=1600 interface=eth-3 user="U" password="P" profile=default keepalive-timeout=30 service-name="" ac-name="" add-default-route=no dial-on-demand=no use-peer-dns=no allow=pap,chap,mschap1,mschap2 

To organize the work of remote stores in case of loss of communication between the store and ISP-1 , on the main router in the office, 2 VPN users were created for each store. This is done so that each of the stores has at the same time two active connections via the external Internet network to two external IP addresses in the office from both providers.

We introduce 2 more auxiliary networks for the exchange of traffic between the office and the stores already through VPN.


We enable L2TP Server on the router and create user profiles (here is an example for one store):

 /interface l2tp-server server set enabled=yes keepalive-timeout=15 add local-address=10.20.30.1 name=VERTOLET-VPN password=Pass profile=default-encryption remote-address=10.20.30.15 service=l2tp add local-address=10.30.40.1 name=VERTOLET-VPN-RESERVE password=Pass profile=default-encryption remote-address=10.30.40.15 service=l2tp /interface l2tp-server add name=15.VERTOLET-VPN user=VERTOLET-VPN add name=15.VERTOLET-VPN-RESERVE user=VERTOLET-VPN-RESERVE 

With the / interface l2tp-server command, I add a hard link in the PPP section for each store. This is done to conveniently determine which stores are connected. And what goes through the traffic.

We get four networks for traffic exchange.

 [s@MAIN-BORDER-ROUTER] > ip address print Flags: X - disabled, I - invalid, D - dynamic 0 ;;; IT-MAIN-LOCAL-ADDRESS 10.10.10.1/24 10.10.10.0 eth-1 1 ;;; IT-RESERVE-LOCAL-ADDRESS 10.10.20.1/24 10.10.20.0 eth-2 2 D 10.30.40.1/32 10.30.40.15 2.VERTOLET-VPN-RESERVE 3 D 10.20.30.1/32 10.20.30.15 2.VERTOLET-VPN 

For convenience, I planned the addressing in such a way that the network 192.168. 15 .0 / 24, will be available in 10.10.10. 15 , 10.10.20. 15 , 10.20.30. 15 and 10.30.40. 15 , other subnets will have different addresses respectively.

Now create the routes.

 [sbl@MAIN-BORDER-ROUTER] > ip route export # nov/27/2015 23:24:47 by RouterOS 6.32.2 # /ip route add comment=1.VERTOLET distance=10 dst-address=192.168.15.0/24 gateway=10.10.10.15 add comment=2.VERTOLET distance=20 dst-address=192.168.15.0/24 gateway=10.10.20.15 add comment=3.VERTOLET distance=30 dst-address=192.168.15.0/24 gateway=10.20.30.15 add comment=4.VERTOLET distance=40 dst-address=192.168.15.0/24 gateway=10.30.40.15 

I use different administrative distances for different routes. In normal mode, the data will go to the store through the network on 10.10.10.15, because she has the lowest administrative distance - 10 . The 10.10.10.0/24 network is available through eth-1 , which means the main Ethernet channel from ISP-1 .

In case of failure of the communication channel eth-1, the data will go through the eth-2 network through the PON, if even then there is trouble then to help VPN via PPPoE from ISP-2.

An example of a network connection in the office is shown in the picture below.



Perform similar settings in a remote store. Assign addresses:

 [s@VERTOLET-GW] > ip address export # nov/27/2015 23:47:45 by RouterOS 6.32.3 # /ip address add address=192.168.15.2/24 interface=bridge-VERTOLET network=192.168.15.0 add address=10.10.10.15/24 comment=LOCAL-MAIN-ADDRESS interface=ether1 network=10.10.10.0 add address=10.10.20.15/24 comment=LOCAL-RESERVE-ADDRESS interface=ether1 network=10.10.20.0 add address=192.168.15.253/30 interface=ether2 network=192.168.15.252 

Create a l2tp vpn connection
 [s@VERTOLET-GW] > interface l2tp-client export # nov/27/2015 23:54:11 by RouterOS 6.32.3 # /interface l2tp-client add connect-to=2.2.2.2 disabled=no keepalive-timeout=45 mrru=1600 name=VPN-OFFICE password=Pass user=VERTOLET-VPN add connect-to=3.3.3.3 disabled=no keepalive-timeout=45 mrru=1600 name=VPN-OFFICE-RESERVE password=Pass user=VERTOLET-VPN-RESERVE 

I suggest to look at the store's connection diagram:



In the event that a channel fails eth-1 at a remote store, it automatically loses contact with the office through both local routes going through ISP-1. Here we come to the aid of VPN networks 10.20.30.1 and 10.30.40.1 which are always raised , and they are always raised via the backup Internet channel for the store!

To implement this trick, I created a separate routing table for ISP-2. This is also done so that the router can always respond to requests coming from ISP-2 through the same interface, but I will not dwell on this in detail.

Create a routing table for ISP-2 in the store:

 [s@VERTOLET-GW] > ip route export # nov/28/2015 00:13:41 by RouterOS 6.32.3 # /ip route add distance=1 gateway=RT-INET-Reserve routing-mark=ISP2-Reserve 

And we create routing rules, according to which traffic to both IP VPN servers in the office will only go through the backup Internet.

 [s@VERTOLET-GW] > ip route export # nov/28/2015 00:13:41 by RouterOS 6.32.3 /ip route rule add action=lookup-only-in-table dst-address=2.2.2.2/32 table=ISP2-Reserve add action=lookup-only-in-table dst-address=3.3.3.3/32 table=ISP2-Reserve 

Now VPN is always available and raised no matter which of the channels the online store and the router take. The VPN network will always work only through the backup channel and is always ready to accept the mission of communicating with the office.

The Internet itself, by default, works through ISP-1 from the office, therefore, 2 separate routing tables for accessing the Internet through the office are also created.

 [s@VERTOLET-GW] > ip route export # nov/28/2015 00:13:41 by RouterOS 6.32.3 # /ip route add distance=1 gateway=10.10.10.1 pref-src=10.10.10.2 routing-mark=ISP1-A add distance=1 gateway=10.10.20.1 pref-src=10.10.20.2 routing-mark=ISP1-B 

We need to make sure that traffic until 10.10.10.1 and 10.10.20.1 will not go through the default route, from which the answer can arrive with some probability. To do this, I create a hard link where to look for addresses 10.10.10.1 and 10.10.20.1 .

 [s@VERTOLET-GW] > ip route rule export # nov/28/2015 00:13:41 by RouterOS 6.32.3 /ip route rule add action=lookup-only-in-table dst-address=10.10.10.1/32 table=ISP1-A add action=lookup-only-in-table dst-address=10.10.20.1/32 table=ISP1-B 

Last for the store - create routes to the office.

 [s@VERTOLET-GW] > ip route export # nov/28/2015 00:13:41 by RouterOS 6.32.3 /ip route add comment=1.Local-NET-MAIN-IT distance=10 dst-address=192.168.1.0/24 gateway=10.10.10.1 add comment=2.Local-NET-RESERVE-IT distance=20 dst-address=192.168.1.0/24 gateway=10.10.20.1 add comment=3.Local-NET-RESERVE-INET distance=30 dst-address=192.168.1.0/24 gateway=10.20.30.1 add comment=4.Local-NET-RESERVE-INET distance=40 dst-address=192.168.1.0/24 gateway=10.30.40.1 

With the routing table that's all. Now we need to configure automatic and fast switching between these communication channels.

Part 2. Setting up automatic switching


At the beginning of the article I wrote that in my opinion in this case dynamic routing is not very appropriate. Although it wins on the simplicity of the setting - it’s just corny less to create and write.

But, firstly, most stores are connected via PON , which does not allow multicast . Both OSPF and RIP would simply not take off via LAN.

Secondly, I have little experience with OSPF. And I'm not sure how exactly it will behave, if the channel through ISP-1 is locally available, but there will be losses of 20-25% or more. Traffic will go and packets with Router Hello will be visible, but with live traffic there will be difficulties.

The third is the speed of reaction and switching; by default, in the OSPF settings, the Router Dead Interval value is 40 seconds . As for the store long enough (well, here are the customers). Of course, it can be tweaked and reduced, but how stable will OSPF work then?

And the last verdict in favor of statics I will call a considerable amount of criticism and dissatisfaction among MikroTik users on the stability of OSPF. What, for example, was written here .

Honestly, I have nothing against OSPF. But in this case, I decided to play it safe and make the switch through the script.

Alas, I don’t have the experience of writing scripts; therefore, some of my edits made to borrowed scripts (primary sources will be given) may seem to you too crutch. I am always happy to criticize.

The script of habrauser magnitudo was taken as the basis of the script for checking the availability of local communication channels.

Script to check the availability of local channels
 name="CHECK-LOCAL-ALARM" owner="admin" policy=read,write,policy,test,sniff,sensitive #DEFINE GLOBAL VARIABALES for LOCAL-REACHIBLE-STATUS :global GlobalITFail #DEFINE INTERNAL PING TARGETS :local PingCount 7 #       # MAIN LOCAL CENTRAL-GW IP ADDRESS :local PingTarget1 10.10.10.1 #        # RESERVE LOCAL CENTRAL-GW IP ADDRESS :local PingTarget2 10.10.20.1 #        # RESERVE VPN LOCAL CENTRAL-GW IP ADDRESS :local PingTarget3 10.20.30.1 #     VPN     IP ISP1-B #CHECK MAIN LOCAL SERVER ADDRESS :local MainLocalServerOK false; #            :local PingResult1 [/ping $PingTarget1 count=$PingCount size=1500 ] #  7   1500    5     :set MainLocalServerOK ( $PingResult1 >= 5) #CHECK RESERVE LOCAL SERVER ADDRESS :local ReserveLocalServerOK false; :local PingResult2 [/ping $PingTarget2 count=$PingCount size=1500 ] :set ReserveLocalServerOK ( $PingResult2 >= 5) #        #CHECK VPN LOCAL SERVER ADDRESS :local VpnLocalServerOK false; #    VPN        ,   :local PingResult3 [/ping $PingTarget3 count=5 ] :set VpnLocalServerOK ( $PingResult3 >= 4) ###        /system script run <> :put "MainLocalServerOK=$MainLocalServerOK" :put "ReserveLocalServerOK=$ReserveLocalServerOK" :put "VpnLocalServerOK=$VpnLocalServerOK" #DEFINE GATEWAYS Administrative Distances #       :local MainLocalServerGWDistance [/ip route get [find comment="1.Local-NET-MAIN-IT"] distance] :local ReserveLocalServerGWDistance [/ip route get [find comment="2.Local-NET-RESERVE-IT"] distance] :local VpnLocalServerGWDistance [/ip route get [find comment="3.Local-NET-RESERVE-INET"] distance] ###        /system script run <> :put "MainLocalServerGWDistance=$MainLocalServerGWDistance" :put "ReserveLocalServerGWDistance=$ReserveLocalServerGWDistance" :put "VpnLocalServerGWDistance=$VpnLocalServerGWDistance" ### #SETUP ADMINISTRATIVE DISTANCE # FROM MAIN LOCAL SERVER if ($MainLocalServerOK) do={ if ($MainLocalServerGWDistance != 10) do={ /log warning "Switch LOCAL-ROUTE to MAIN LOCAL SERVER" } /ip route set [find comment="1.Local-NET-MAIN-IT"] distance=10 } if (!$MainLocalServerOK) do={ /ip route set [find comment="1.Local-NET-MAIN-IT"] distance=110 } ### # FROM RESERVE LOCAL SERVER if (!$MainLocalServerOK && $ReserveLocalServerOK) do={ /log warning "Switch LOCAL-ROUTE to RESERVE LOCAL SERVER" } if ($ReserveLocalServerOK && ($ReserveLocalServerGWDistance != 20)) do={ /ip route set [find comment="2.Local-NET-RESERVE-IT"] distance=20 } if (!$ReserveLocalServerOK && ($ReserveLocalServerGWDistance != 120)) do={ /ip route set [find comment="2.Local-NET-RESERVE-IT"] distance=120 } ### #FROM VPN LOCAL SERVER if (!$MainLocalServerOK && !$ReserveLocalServerOK && $VpnLocalServerOK) do={ /log warning "Switch LOCAL-ROUTE to RESERVE LOCAL SERVER" } if ($VpnLocalServerOK && ($VpnLocalServerGWDistance != 30)) do={ /ip route set [find comment="3.Local-NET-RESERVE-INET"] distance=30 } if (!$VpnLocalServerOK && ($VpnLocalServerGWDistance != 130)) do={ /ip route set [find comment="3.Local-NET-RESERVE-INET"] distance=130 } #### ###   ,    .         .   ,   . ####INFORMING############################################ :local ITfail false; #         10.10.10.0/24  10.10.20.0/4,          ISP-1   ( )   VPN. if (!$MainLocalServerOK && !$ReserveLocalServerOK) do={ :set ITfail true; } #      -     ,        ISP       ,        if ($MainLocalServerOK) do={ :set ITfail false; } if ($ReserveLocalServerOK) do={ :set ITfail false; } #       email #    ,       :put "1.ITfail=$ITfail" :put "1.1.GlobalITFail=$GlobalITFail" #       ,          if ($ITfail != $GlobalITFail) do={ if ($ITfail && !$GlobalITFail) do={ :set GlobalITFail true; /log error "WARNING!!!! IT-MAIN LIK IS DOWN!!!!" :delay 8 /system script run EMAIL-IT-FAIL } if (!$ITfail && $GlobalITFail) do={ :set GlobalITFail false; /log warning " IT-MAIN LINK RECOVERED!!!!!" /system script run EMAIL-IT-RECOVER } } 


The principle of the script is simple. We ping 7 times each of the interfaces on the main router in large packets of 1500 bytes. A satisfactory result is considered if at least 5 packets are returned. This method is very sensitive to possible problems with communication in the channel. If there are problems, the channel is considered not available.

Depending on the result, set the value of the administrative distance. Increasing it by 100 if the channel is not available.
If both channels connected locally disappear, the script initiates the launch of another script, sending a letter about the fall, or about the restoration.

Someone noticed that I have four routes, and the script only checks three. This is done to save time, because All three interfaces (two - locally, one through the Internet) are tied to the main provider. And if he fails on all 3 interfaces, then only the last backup external VPN through ISP-2 remains. Which always has AD = 40.

Here is the script from the store, a similar script is spinning on the main router, for each store its own script.

Someone will think the same how many scripts will be constantly spinning? And in general, how long does it take for the script to work? How often to run it?

For me , the reaction time for accessibility of the route is critical . When checking the script, I tried to note the time working out. In the case when everything is regular, it is somewhere 7 seconds.

If any of the channels is not available and the script waits for a response on a timeout, then the time increases to approximately 15 seconds.
Which is much faster than OSPF waiting 40 seconds by default .

How often to run the script? And with no! I did not do for this scheduler script!

This reduced the reaction time even faster. I managed to achieve almost instant reaction time (in practice, about 5 seconds ) thanks to the NetWatch connection !

When routes are created and scripts are created to check the reliability of the channel and notify about the accident, we need to invent a trigger for launching these scripts.

Create a netwatch for all 3 addresses:

 [s@VERTOLET-GW] > tool netwatch export # nov/28/2015 01:53:17 by RouterOS 6.32.3 /tool netwatch add down-script="/ip route set [find comment="1.Local-NET-MAIN-IT"] distance=110\r\ \n/system script run CHECK-INET-ALARM\r\ \n/system script run CHECK-LOCAL-ALARM\r\ \n" host=10.10.10.1 interval=10s timeout=2s up-script=\ "/system script run CHECK-INET-ALARM\r\ \n/system script run CHECK-LOCAL-ALARM" 

Let me explain - NetWatch pings the host 10.10.10.1 every 10 seconds, with a timeout of 2 seconds. In the event of a fall, we immediately establish the administrative distance of +100 preventively - we make the route inactive .

After that, we initialize the launch of the script with a more accurate check of the availability status of the local network. Which in the case of a false alarm will return the priority of the route back, and in the case of a real drop in both local channels, will send a letter to the admins.

In the case of ping recovery, we do not immediately return the route as an active one. And again we are launching a more detailed check, which already decides whether it is possible to return to the main channel or not.

Such NetWatch are designed for all three internal addresses in the ISP-1 network. Which regularly ping each other from two sides and in case of problems, they instantly change AD and launch a more detailed check with the script.

Below is a listing of the script that notifies of the fall and the restoration of communication with the office. For the basis of the scripts for the notification used article seventh .

Script notification of the fall of the communication channel in the store
    EMAIL-IT-FAIL :local sysname [/system identity get name]; :local smtpserv [:resolve "you_mail_server "]; :local Eaccount "you_username"; :local pass "you_password"; :local date [/system clock get date]; :local time [/system clock get time]; :local mailto you@mail.yu /tool e-mail send to=$mailto from=you@mail. \ user=$Eaccount password=$pass server=$smtpserv port=587 start-tls=yes \ subject=("$sysname-ALARM!!!") \ body=("   $sysname    !     ,     VPN.    $time $date") 


The recovery script for EMAIL-IT-RECOVER is identical, except for the text.

the end


That's all. Although I did not tell everything I wanted. Behind the brackets there is the realization of the reservation of the Internet itself in the office and in the branches, notification of the accident related to the Internet and their restoration. Time counters - how much is the channel on the Internet. How I caught a Wi-Fi printer walking through OSPF.

Thanks to everyone who read to the end. Waiting for your criticism, advice, suggestions. There will be questions, I will answer them with pleasure.
If it is interesting, I will write a few more articles on this project, with a description of some crutches in the network, which had to be solved in nontrivial ways.

The latter is a general scheme for connecting an office and one of 12 stores.

Source: https://habr.com/ru/post/272061/


All Articles