Difficulty administering proxies in large companies

Working in a company with a number of employees of several hundred people, inevitably you think about the security of the network, data and employees' workplaces.

Below I will describe several work optimization methods that help solve some of the problems of corporate security using proxy servers.

Myths about the presence of a proxy server
')
Myth1: Proxy server - EVIL.
By itself, the Proxy server is not evil, it becomes evil in the wry hands of administrators, who often do not understand the principles of its work and incorrectly tune it to the needs of the company.

Myth 2: A proxy server is needed by an administrator to humiliate a user.
System administrators who use proxy servers to restrict access do not set themselves the goal of becoming gods over users in the hierarchy of the company they serve. Techniques to restrict access to certain categories of sites provide additional security for both users and the company as a whole. If you are ready to argue with this, then below, I will give a few examples of protecting users when using a proxy server.

Myth 3: Traffic is so cheap that using a proxy server is simply unprofitable.
Nowadays, a developed connection to the network of everyone, even housewives and children, the load on some network resources becomes a headache for the administrators of these resources, as well as for the network administrators providing access to these resources. A proxy server can reduce the load on the resources of interest to the company, reduce the access time to many objects from resources by caching them, and also affect the reduction in the amount of traffic coming to the company interface.

The main problems and limitations with a large number of users

One of the biggest problems is the authentication of a large number of users to provide access to the network. The problem is not so much technical as architectural. When using authentication mechanisms used in proxy servers (in this article we consider SQUID), the possibility of transparent filtering of user requests is lost. In turn, when using transparent filtering, the possibility of using user authentication is lost. Those. these two things are mutually exclusive in work.

Authentication options are divided into several types and are external applications that, according to the data received from the client, verify data in a database. The main authentication methods are described in the wiki.squid-cache.org/Features/Authentication FAQ and there is no point in describing them again.

Each type of authentication has its drawbacks:
LDAP - when used in a large domain can lead to a significant increase in the load on the domain controllers.
NSCA is inconvenient from the point of view of password synchronization in the domain and on the proxy server.
SMB (MSNT) is convenient for use, but there is a high risk of accidentally disconnecting from the domain to losing control of the server, since For authorization, the Samba server applications are used and the proxy server is included in the domain as a member of the domain.

According to the squid architecture, all positive authentication requests are cached, which in principle can lead to inappropriate use of the server by blocked accounts depending on the lifetime of the cached data.

Ident authentication can cause problems associated with an increase in the load on the network, as well as a decrease in the speed of responses of the proxy server during high load of client machines. There are frequent cases of authentication failures by ident and issuing the NT \ SYSTEM user name instead of the real user name working behind the machine.

Authentication by IP address implies a static allocation of IP addresses in a local network or reservation of addresses on a dhcp server.

Unfortunately, when using transparent caching, none of the authentication methods except ident and IP work, which imposes certain restrictions on access control. Since a large number of requests to the domain controllers are fraught with their possible overload and subsequent failure, we will consider IP Based user authentication. This may seem wrong to many of you, but in most cases there is not much choice.

Automatic proxy server settings on client stations

Since manually setting up the proxy server settings on the workstations is quite tedious and ungrateful, there are several ways to help configure the browser to the desired address.

The first way is to use domain group policies. The method is simple and convenient, except for the moment of changing the name of the proxy server or transferring it to another IP address. Depending on the settings, domain policies may be updated in different time intervals (by default, 1 time in 45 minutes), which may result in a delay in the application of new settings.

The second way is to distribute the link to the proxy server via DHCP. The method is certainly good, except for the need to restart the computer or call the procedure for updating the IP address on the user's computer. Unfortunately, the method only works for Internet Explorer.

The third way is to create an auxiliary WPAD (Web Proxy Auto-Discovery) domain and a wpad.dat file, which allow you to dynamically change the connection settings to the proxy server, just by closing / opening the browser. In order for the mechanism to work, it is necessary to create a WPAD IN A record in the current domain (DNS zone by default) indicating the IP address of the web server from which the automatic configuration file will be downloaded. In the root of this server you need to place the file wpad.dat. The file is a piece of javascript code that, depending on different conditions, can give different proxy server addresses for different addresses and domains, as well as specify the method of access to resources. A description of the wpad.dat file format can be found on the Internet. The only condition for the successful use of the wpad file is to disable its caching in the domain policy of the enterprise. The default cache time for this file is large enough, so it’s better to ensure that it is updated when you open a new session.

Perfect ACL

Squid allows you to create rules for access control based on certain conditions and responses of external programs or authentication modules.
Changing the SQUID configuration file, as well as the files called from the configuration file (external lists of users or domains, etc.), require restarting the squid process. If you change the configuration files and restart squid, you can crash the server process. Unfortunately, in high-load systems with hundreds of users, accidentally stopping the server is fraught with a red phone in about a few seconds. It is clear that finding the problem probably will not take much time, but the nerves will be spoiled by angry users.

The ideal option is to create a single external acl that will steer access instead of squid, but will not require restarting the proxy process. The external acl version can be either two-tier or three-tier. It all depends on the architecture that is convenient for you.

Many administrators use access control modules such as SAMS, SquidGuard, Rejik, and others. These modules use a two-tier structure. Squid refers to the external redirector and analyzes the responses from it. At the same moment there is a small set of utilities or a web interface that changes the configuration files of the redirector. But when updating these files, the squid as a host process is needed to update the redirector settings.

Almost perfect is the scheme in which squid + redirector (with logic) on the one hand, an independent block for storing settings and a server on the other. Those. block with the settings let's say this is a database in SQL or memcached. A redirector is a query aggregator that pulls out the necessary information from the database to determine user access. The serving server is a tool for creating an access base. In this case, stopping and restarting the squid server is not required and all settings can be changed in real time.

Practical solution

A reasonable question arises: “Why is it still not implemented?”
The answer is simple: “There is no universal tool for all occasions, everyone chooses a tool that suits him”

It so happened that the search for a tool for our company was not crowned with success. Some tools lacked the flexibility of configs, while others lacked the speed and reliability of redirectors. Mixing several tools into a single configuration often leads to total incompatibility and an increase in labor costs for maintaining the system.

Therefore, we decided to implement our own version. This option is a working prototype and is aimed mainly at testing the load that a proxy server with 600-1000req / sec can give.

At once I will make a reservation that the main task of this prototype was to restrict user access according to certain categories with dynamic change of access lists in real time.
Even more specific task was to protect users from visiting phishing sites, as well as sites with malware and other nastiness.
For example, the opendns service was taken, which allows categorizing sites and restricting access to certain addresses, including the parental control function.

What is the meaning of the service OpenDNS?
OpenDNS allows the user to select a certain number of categories and reconfigure their DNS servers to the OpenDNS servers to provide the filtering function. When the visited site falls into one of the categories that the user has selected, the substituted IP address of the site leading to the blocking page with listing of the site categories is returned to the user.
Taking this page as a basis, you can write a small script that returns a list of site categories upon request.

#!/bin/sh
wget "http://block.opendns.com/controller.php?url=$1&ablock=" -q -O - | grep '<p class="light">' | sed -E 's/(.*)in: (.*)/\2/' | sed -E 's#</.>##'

The list of categories of sites is quite finite; it can be moved to a separate file and numbered for further use.
So. We have a file with categories numbered, say from 1 to 40, we have a list of users (IP addresses) which we want to protect from visiting Malvari and phishing sites. What categorization test method should we choose in the redirector?
There are several options, all have their own advantages and disadvantages. By default, we will use caching for sites and their categories. Those. Before checking the category of the site, we first look at the cache, and then if there is no category in the cache, then we turn to the categorizer site. There should be several redirectors to eliminate delays and increase processing speed.

Option 1: SQL table
Users and categories allowed by him live in the sql table, cached domain names and their categories live in the same place. All the pleasure will take 1-2 requests.
The main problem will be at the moment of swelling of the number of cached domains and their categories, it is partially solved by the introduction of correct indices, but at 600 queries per second, sampling from a table of several thousand records can be quite a long and resource-intensive operation. To update the data we will need to enter the timestamp field for which we will delete records older than a certain age. You can delete entries from the script by cron or through a certain number of requests directly from the redirector.

Option 2: MemCached
The variant from my point of view is more convenient, since it allows you to make simple tables and work with data in the form of “Key = Value”. The sampling rate from memcached is much higher than the sampling rate from SQL, so the server load can be significantly reduced. However, this option requires installing an additional memcached daemon, as well as having enough memory on the machine. The scheme allows you to cluster service requests, as well as use one instance for multiple proxy servers.

Option 3: Perl Cache :: FastMmap
A light version of memcached (figuratively speaking) that allows you to use a file lying on a disk as a shared memory object. Several clients can work with this file at the same time and read / store data in the form “Key = Value”. Also, this scheme ensures the safety of data in case of the fall of the redirectors or the squid daemon itself.

We chose the latter option, as it gave more flexibility when working on a local machine.
The scheme is also good because you can load any external address lists of sites as well as blocks of IP addresses as categories. With this construction of the system, you can use any number of categorizers. Let's say that to block the Tor network, all you have to do is download the list of Tor servers by updating it on Krontab and assigning a specific category to Tor servers. The same with malware domains. To speed up the processing of data by IP addresses, you can use Radix Tree, but it depends on the tasks that you solve.

When working with the Cache :: FastMmap database, we made a scheme in which data processing is maximally accelerated.
Categories for users are described in the format: ip_cat = perm
where IP is the ip address of the user
cat - category number
perm - rule (allow / block)
The data for the address 0.0.0.0 is common to all and allows you to block or open the necessary categories to all users, leaving the possibility of personal locking / unlocking for each specific case.
Site categories were in the format domainname = cat1; cat2; cat3, etc.
Since Cache :: FastMmap has a built-in mechanism for deleting obsolete data, the problem with monitoring the update of the database has disappeared by itself.

The effectiveness of caching results was pretty good.
Placed sites in cache: 125077 objects
Read websites from cache: 4226793 items
It does not take into account the repetition of objects placed in the cache. Just general data entry / exit.

Actually what I want to finish this post.
Using modern data processing mechanisms it is possible to build a fast and productive system that provides all the requirements for restricting access for users.
What system you will build is up to you. The mechanisms described here are very easily reproducible in any programming language.
If someone is interested in the source code of the redirector on perl, I can post it, but there is very little interesting there;)

UPD: In order not to be unfounded I post scripts for working with opendns.
aborche.com/tst/squid/catserver.pl.txt - a categorical daemon that drags categories from opendns.com
aborche.com/tst/squid/testfilter.pl.txt - a script that waits on the standard input of the domain name and sends it to the categorical server (used for testing and verification)
aborche.com/tst/squid/pradm.pl.txt is a redirect for squid that requests sites from a category daemon and decides whether to start up or not.
aborche.com/tst/squid/categories.conf - file with categories
aborche.com/tst/squid/squid.conf - an example of a redirector call from squid.

© Aborche 2009

Source: https://habr.com/ru/post/74754/

All Articles

Difficulty administering proxies in large companies

More articles: