How we designed and implemented a new network on Huawei in the Moscow office, part 2

In the previous series: "Jet" switched to a new network based on a well-known vendor. How did the process of auditing the systems, collecting the hotelok and taming the “mutant reserve” take place in the first part ?

This time I will talk about the process of migration of users (more than 1600 people) from the old network to the new one. I invite all interested in under cat.

So, the existing network of the company as of last summer:
')

the simplest topology “collapsed backbone” - the core of the network (it’s also the distribution level) is formed by two network-level switches integrated into a VSS cluster;
the access level is represented by switches and stacks of data link switches installed in the cross and sometimes directly in the corridors and even in the working premises;
part of the access switches is included in the chain, that is, the switch is connected to another switch, and the latter is already in the core;
Fault tolerance of connecting access switches to the core is provided mainly through LACP, sometimes it is not provided at all;
access control - via VLAN, VLAN routing - on the core;
access switches from three manufacturers and four generations are used, users connect at speeds from 100 Mbps to 1 Gbps;
some users still use analog phones, some - IP phones connected via PoE injectors, and a minority have IP phones that receive PoE power from access switches.

Task:

bring the network into a decent view;
not to disrupt the work of employees during the modernization;
fix the maximum possible number of problems that have accumulated over the previous 15 years of operation.

Old life

Previously, the system of operating and expanding the network in the office was as follows: Suppose that we need to organize jobs for new employees. Additional space was rented in the business center, repairs were made, SCS was laid, and then a computer and telephone network was deployed.

At one workplace accounted for, depending on the needs of the unit, from 2 to 4 outlets RJ-45. One of the sockets was assigned by telephone (received green markings), the rest (from one to three) were assigned by computer (received blue markings).

Outlets on a typical workplace

At the construction stage of the network, each computer outlet was connected to a separate port of the access switch, each telephone socket was connected to the analog port of the telephone exchange (in old premises) or to the access switch port pre-configured for VoIP data transmission (in new premises).

Such a scheme was convenient for service operation. Users moved to a new building, connected a computer to any free computer outlet, a telephone to any free telephone outlet.

After that, the technical support staff, focusing on the MAC addresses of the connected equipment, prescribed the necessary VLANs and other parameters on the ports of the access switches.

But the approach had a flaw - it was uneconomical, up to half of the ports remained unused. Such a reserve is useful if, over time, additional workplaces are organized indoors, but we have not been able to use the reserve of 50% even once.

Preparing for migration

At the first stage, we decided to replace all access switches. The model of the Huawei S5720 family, specifically the S5720-52X-PWR-SI-AC, was chosen as the standard. Its characteristics:

connects to the trunk at a speed of 10 Gbit / s;
stacked on conventional 10G interfaces;
Allows connection to all user ports at a speed of 1 Gb / s;
provides PoE-powered on all user ports.

Thus, a computer, an IP phone, a computer via an IP phone, wireless access points, a surveillance camera, and other devices can be connected to any port.
We had to find out which sockets in the rooms are actually used and what is connected to them. We had:

data of old and not very old projects laying of SCS;
"Not quite relevant" cable magazines for all company premises;
MAC address tables on existing access switches that we received using the built-in network hardware;
tables of correspondence of MAC addresses of telephone sets and internal numbers of subscribers - from a telephone exchange.

Next, we wrote a small script that, on the basis of this data, made up the switching and migration tables (a separate table for each next stage of work). They contained the following information:

room;
port marking at the workplace;
marking of the outlet on the patch panel in the cross;
the name (hostname) of the old switch (stack);
port number on the old switch;
VLAN ID;
MAC address of the connected device (or several);
the type of the connected device (determined by the MAC address);
the name (hostname) of the new switch (stack);
port number on the new switch.

Script results

From such a table, it was immediately obvious what exactly is connected to a specific port - a computer, a phone, a computer through a phone (if there are two MAC addresses), or something more complicated.
On the basis of the tables, design engineers developed new cable journals, and implementation engineers pre-configured new switches, which at the next migration stage were installed in cross-over ones.

Fragment of a new cross magazine

Access Port Settings

Thus, the work was as follows:

disable old access switches, remove patch cords;
install new access switches, connect power;
perform switching on the cross log;
bypass work areas, make sure that phones and computers work normally.

It looks simple, but ...

The first stage is the replacement of access switches: three months of work without holidays

From the middle of 2018, we spent three months replacing access switches every weekend and re-switching workplaces according to our migration tables (about 3,500 ports in total).
The first migration took us more than 12 hours; during this time we managed to replace one access stack of five switches and rewire approximately 200 ports connected to it.
It took me most of the time ... to prepare patch cords. Each patch cord had to be removed from the package and pasted on both sides of the tag with the number. Only after this patch cord could be used for switching.

During the next migrations, we optimized the process and prepared patch cords in advance. Therefore, the last migration took the same 12 hours, and during this time we managed to replace five access stacks, from 3 to 5 switches in each, and rewired more than 1000 ports.

What did we get in the end?

First, the updated cross magazine, which we shared with the maintenance service. The service has developed its own web application to maintain the current state of switching - it is now always possible to familiarize yourself with it on the internal resource.

Secondly, jobs connected to new modern access switches. In parallel, we finally got rid of analog phones and separate power supplies for IP phones, updated and unified the firmware in IP phones so that the phone and the switch correctly identify each other using the LLDP protocol. This is necessary in order for the phone to understand in which VLAN-e it should transmit voice frames, and in which - frames of equipment connected via the phone. In this way, the telephone can access the telephone exchange servers, and users can connect computers at workplaces either directly to the outlet or via the telephone.

Third, we turned off all unused ports of active equipment and set up the port security function. In parallel, we formulated, agreed and approved the regulations on connections, now there are no connections "past the cashier".

Thus, we:

put in order and documented all the office equipment connections;
got rid of old switches, PoE-injectors, analog phones;
reduced energy consumption of equipment (for individual cross-over and work premises - up to 30%);
improved user experience and kept a reasonable reserve of active equipment ports (at the cost of some increase in the workload of technical support specialists, especially when moving employees to new premises).

Migrating at the height - switching to a new highway

After the completion of the first stage, the situation with user connections returned to normal, but nothing changed with the trunk. As mentioned above, before the upgrade it was arranged quite simply. There were about 100 VLANs that were routed on the central switch, or rather, on two switches, clustered using VSS technology.

In parallel with the first stage of migration, we built and tested a new highway in accordance with the principles outlined in the first article :

installed core switches Huawei CE8850;
installed Huawei CE6870 distribution switches;
paved additional fiber optic;
completed all connections;
configured the overlay and underlay routing protocols (but until the first stage was completed, the switches were idle and the air warmed up).

Then the next stage of migration began. Not very long, but the most difficult.

To begin with, we developed and agreed a new IP addressing plan that takes into account our current and future needs. In the new plan, separate ranges were allocated for all planned L3VPNs - for ordinary users and users of the technical center, for telephony systems, video conferencing, video surveillance cameras, demonstration stands of various types and other needs.

Then we connected the entire old network to one pair of distribution switches as a single access stack. After that, we started switching stacks to a new trunk with changing VLAN numbers and, accordingly, changing the IP addresses of connected users to new ranges.

We planned to work so that at a time to switch a large enough group of access switches. Worked either at night or on weekends, depending on the specifics of the work of the switching units.

Before starting work, we performed the following operations in advance:

set up a corporate DHCP server so that it issues IP addresses from the new range for the users to be switched;
set up ports on distribution switches in which it was planned to include access switches during migration;
With the help of another specially developed script, modified configurations for switchable switches were prepared.

Worked in the following order:

switched access switches from the old trunk to the new;
make sure that the switches are accessible via the control interface;
flooded the switched switches with a previously prepared configuration for changing VLAN numbers.

Before performing the above work, the VLAN containing the control interfaces of all access switches was “stretched” between the old and the new backbone, so step 2 usually took a minimum of time. In the simplest case, user workstations (as well as telephones, printers and other devices) immediately received IP addresses from the new range from the DHCP server and continued to work without changes.

Where are no problems?

On this way we encountered several difficulties.
First, many users have stopped working printers. At the workstations of some users, access to the printers was configured at a specific IP address. After switching printers to a new range, they received new addresses, and users lost access to them. To solve the problem, the next day after the migration, we allocated one support specialist for half a working day so that it went around the users who had migrated and reconfigured the printers to use not DNS addresses, but DNS names.

During the migration of the very first group of users, we had to solve some problems with the correct configuration of firewalls, so that they allowed the moved users to the necessary corporate resources. And with all subsequent migrations, we already knew in advance what needs to be configured.

Last but not least, we moved the service center, namely the staff on duty shift. The duty shift should work continuously and around the clock, always have access to the resources needed for work, and to the information systems of customers. For these people, we organized temporary jobs at pre-agreed time and in separate rooms. One officer on duty shift went into this room, making sure that he can get access to all the necessary systems. After that, the rest went into this room.

Then we migrated the relevant unit, invited one of the staff members on duty to return to their place and check the availability of all necessary resources. Promptly eliminated problems, if any. There were few problems. After that, the shift on duty again moved from the "temporary shelter" to the normal mode of operation in their workplaces with the entire set of information systems they needed.

At some point we found that in the old network there was not a single user workstation! ~~And opened the champagne.~~ Next, we started to migrate the server segment. Another scheme was applied to it, since the server usually cannot be stopped even for 15 minutes. I will tell you about this separately in the next article.

Maxim Klochkov
Senior Consultant, Network Audit and Complex Projects Group
Network Solution Center
Jet Infosystems

Source: https://habr.com/ru/post/459118/

All Articles