After reading
about the subway , I wanted to comment, but decided to write separately.
We participated in the creation of public networks with distributed captive portal and attacked almost all the rakes, so I want to share experiences.
To begin with, there’s a little theory of how this works and how distributed portals differ from centralized ones. Ideologically, what we used to call the Captive Portal actually consists of three components:
web frontend - designed to interact with the user, collecting information by filling out forms and showing him advertising. If we are going to ask the user to enter personal information and passwords, then https should be used, respectively, the server needs a normal certificate. If we are going to ask to put a tick under the user agreement, then http is enough.
')
captive portal itself is a certain agent, called to receive information gathered through the web frontend, analyze it, it is possible to do specifying requests on its own behalf (for example, in RADIUS) and report its decision either directly to the user or to him through the web. frontend. In the case of a positive decision, captvie portal opens for the user the necessary holes in the firewall. After a specified period of time, the holes close and we have the user back to the web frontend. Prematurely the portal is closed due to inactivity of the user. Often the only reason for limiting the session time is the desire to show the user advertising again (if we don’t want to act like a subway, disfiguring the design of other sites)
firewall - knows the access of individual users to the network. In case of lack of access for ideological reasons, it redirects the user to the web frontend. In case of lack of access for technical reasons (the gateway does not ping), you can instruct the firewall to redirect the user to a special page “there is no service, but we are repairing it with all our might”.
In the case of a centralized captive portal, all three components are obviously located on the same machine (device), which greatly simplifies the task. Firewall in this case often executes more NAT, and the captive portal can be implemented as a bunch of scripts that twist the local iptables. There is a formidable desire to push into the network of cheap access points, which will throw us all users on ethernet or in the best case - in a separate vlan. What are the problems here:
- Safety problems. We restrict access to the external channel, but everything is bad on the local network. Since the network is open, any user can respond to arp on behalf of our default gateway, receive user traffic and engage in phishing. It is not forbidden to put your DHCP server and in a certain delta-neighborhood to push users with statements like “your browser is hopelessly outdated”. If your captive portal and the user are separated by a router, then you do not have the ability to control the mac and ip correspondence with all the consequences on the captive portal. Communication between wireless clients becomes possible. You can prohibit communicating to wireless clients at a cheap point, but clients of other points are already visible on ethernet.
- Traffic problems. We have a lot of excess traffic on the local network. Before opening the captive portal, customers are advised not to allow access points further.
- Scalability issues. With a large number of clients, any of the three components of the portal can be problematic.
As you already guess, a distributed captive portal is designed to solve all these problems. Speaking of "distributed", we assume that the components can be placed on different devices. This will allow us to create a reliable system that will provide the necessary level of security and service, while having great potential for scaling. The problem that we have to solve is to provide interaction between the components of the captive portal. Where should the solution components be located?
The firewall should be as close as possible to the client, i.e. uniquely at the access point. Since there are several access points and each of them has its own firewall, their work must be synchronized within a certain space or terrain, within which roaming of clients is assumed. Otherwise, customers will experience communication problems when roaming. In modern networks, the task of synchronizing the work of something inside a certain area (RF-domain) is performed using appointed arbitrators (RF-domain managers) and was solved in ancient times without regard to the task of implementing a distributed captive portal. For this system, synchronization of a firewall is just one more process that must be performed in a domain consistently, along with (for example) traffic switching, synchronization of configurations of points, or collection of statistics.
The location of the web frontend strongly depends on the complexity of the tasks that it has to solve. If you need to show pages that do not imply server side processing or any difficulties like sending SMS, then it is quite possible to get by with a server on the access point. He, again, is located as close as possible to the client and provides the most effective interaction with him. The synchronization of the content of web servers at different access points will be handled (surprise) by the RF-domain manager.
The location of the captive portal depends on the position of the web frontend and the availability of points. Since the task of the captive portal is to twist the firewall, it must have its own representative (agent) at each point. However, the web frontend can communicate with any of the copies of these agents, because their status (you guessed it) is also synchronized within the domain.
Thus, we achieve a situation in which for a client that successfully passed authorization, the captive portal opens immediately in the entire domain and after that at any time on all access points of the firewall domain for this client is configured the same.
Subtleties
The method of interaction with the captive portal. We need a mechanism by which we can tell the portal the results of user interaction. In our case, HTTP GET was chosen as such a mechanism. If you need to open the portal, we send HTTP GET to any of its offices. The composition of the parameters transmitted in GET depends on the mode in which the portal operates. Here are a few options:
- The portal opens always. It is possible to record this in the log.
- A portal is opened when there is a variable in GET that reflects agreement with the conditions (agreement).
- The username and password are transferred to GET, the portal itself climbs into RADIUS with these attributes and opens, having received ACCEPT from there.
- One (universal) attribute is transferred to GET, the portal indicates it both as a username and as a password when accessing RADIUS and is opened by receiving ACCEPT. It is clear that such a user must be in RADIUS
Anything outside of this logic requires implementation in a web frontend. For example, you can ask the user phone, send him a text message, check the code. According to the results, charge the user to the radius (for example) with username = phone_number and password = his_IP and then send GET to the portal with these values.
How does a portal, getting a GET, understand what kind of user is it? When a user is redirected to the web frontend, the portal attaches a rather long variable to the call, which we must return to it in the integrity of the parameters of the request to open the portal.
Ideally, the point performs bridging (level 2 forwarding) between the SSID and some vlan in the wires. That is, the firewall runs at the second (MAC) level. Since the firewall sees the DHCP offer arriving from the depths of your network to the client, it knows its IP for sure, responds instead of the client to ARP and rigidly filters all ARP and DHCP on the wireless segment.
The absence of an IP address in a user vlan eliminates the possibility of the user communicating directly with the point. However, sometimes we need this opportunity - at the location of the web and the portal right on the point. In this case, the dummy address 1.1.1.1 is used.
What does Apple have to do with it?
And why are we everywhere, like in the subway, convincing iPhones that there is no portal.
By the way iPhones behave in wireless networks, I had the strong conviction that the creators of this megaproduct assumed only one scenario, namely,
one access point . That is, either at home or in a cafe for hipsters. In the second case, there is a non-illusory chance to meet with a captive portal.
What does the iPhone do when it encounters several points with a single SSID and a captive portal? He tries
all available. On each, it connects, asks for an address, checks a random url from its long list (it used to be one), realizes that it is captive, gives an address (dhcp release) and turns off. Since in our case one SSID shines from each point in both 2.4 and 5GHz, everything is multiplied by two. Having come to the logical conclusion “yes, there is an ambush everywhere!”, The iPhone re-connects to one of the points and draws its own mini-browser. In the terminology of our customers and clients, this process is called “My last iPhone connects to your network for a very long time” and “at my place everything flies at a point for 1000 rubles.” In the case of a coordinated network (not separate points), with each connection, the point sends a message to the manager the domain "we have a new passenger", and in the case of MESH - in parallel there and there. The whole process takes up to 20 seconds.
What does the iPhone do when it encounters the same SSID immediately in 2.4 and 5GHz? You thought that you would be able to balance clients between channels, points and ranges, making maximum use of the possibilities of clients and network transmission? Only not with Apple products! On the network side, hearing requests from the client in both bands, we have the right to assume that we can force the client to connect where we need, skipping requests to those points where we don’t want it to connect. Typically, customers take the hint and connect, for example, in 5Ghz. IPhone will break in 2.4 to the last. For resistant there is a separate counter (20 consecutive requests by default). Takes time too.
The two described processes occur not only when connected to the network, but also when roaming, if you go far enough. Oh, yes, there are new points here. Well, let's check ...
What does the iPhone do if he launches a mini browser and we (suddenly) need to send an SMS message to the client? It shows SMS in a small window on top with an exposure time of about 3 seconds. Blonde is not able to remember 6 (six!) Numbers during this time. The window is leaving, the user pokes a finger at the sms, the mini browser closes, dhcp release, disconnect, welcome to 3G. The user with a grief in half remembers the code, climbs in settings, connects to the network, enter the phone number, get a new SMS. And further, and further ... In the terminology of customers and users, this is called “your main portal is not working on my last iPhone” and “they have already repaired it even in the metro”.
The situation can be corrected by sending the user's MAC (we are able) to the web frontend, memorizing there that we have already sent him a text message and asking for the code at the second call. For this browser does not support cookies.
What is the reason for such inadequate behavior? It's simple: the creators of the device set out to not leave you without communication.
Suppose you came to visit. There is a closed network, but the good hosts told you the password and voilĂ - here it is the Internet. Your network remembered your smartphone and connected to it automatically during your next visit. But the owners forgot to pay the provider and this time they did not let the router go further. That is, you did not do anything, did not even pick up the phone, but, without knowing it, you were left without communication with the outside world. This is very bad. To avoid this, modern mobile devices perform a kind of multi-step process when connected, the purpose of which is not to leave you without communication:
- We can not get IP - disconnect
- We do not see ARP with default gateway - we disconnect
- No DNS is responding from the list - disconnect
- We are requesting a url from one of our domains - we hope to see
<HTML><HEAD><TITLE>Success</TITLE></HEAD><BODY>Success</BODY></HTML>
In case of success of the last step, we assume that the Internet is there and we get off 3G. And so we do with each connection to wifi. Even at home.
If instead of “Success” we see something wrong, here it is a captive portal. It's time to start the minibrowser. The user could not agree at once with the portal in the window - we are disconnected. The problem with the iPhone is that he hopes for the best to the last. If you ask to connect to the network, and it can be seen at more than one point - all options will be tried. Time will pass. Most of the devices, seeing the portal, suggest that he is everywhere, probably.
The only way to stop the throwing is to bypass portal detection. It is possible to implement in two ways - by filtering "User-Agent: CaptiveNetworkSupport" or by passing traffic across a certain list of domains. In the subway, for example, iMessage works with the portal closed.
As a result, bypassing the portal, the network can be seen either in no way or not all. In any case, this is very bad, because it actually leaves the user without communication in a way imperceptible to him.
On our hardware, detection is turned off with one command:
ap7131-ABCDEF(config-captive-portal-XXXXX)#bypass ? captive-portal-detection Captive portal detection requests(eg, Apple Captive Network Assistant ap7131-ABCDEF(config-captive-portal-XXXXX)#bypass captive-portal-detection