📜 ⬆️ ⬇️

Organization of content filtering in educational institutions

It is unlikely that there will now be a system administrator working in the field of education who does not know what FZ-436 is “On protecting children from information harmful to their health and development” with all the ensuing consequences. This problem became most acute for me after receiving an order from the head to prepare for the arrival of the prosecutor's office. Of the solutions known to me at that time:

none seemed attractive to me. Having a low-powered server and 50 workstations that need protection, I would like to use a Unix-like solution. Obviously, you can't do without Squid. Began the search for solutions to meet established requirements. As a result, an interesting variant was found from the not-known company Entensys, which produces software called UserGate. The content filtering software solution is called UserGate WebFilter. Based on the experience of long-gone years, those years when Internet traffic was more expensive than gold, and when a proxy was needed, UserGate did not like it because of its glitchiness and resource-intensiveness (in the context of those very past years), despite the fact that the product is proprietary, it was decided to try it out.

PART 0. Product features

.
I will list the most important opportunities for me:


All other information, including prices and a full list of features, is available on the Entensys website.
')

PART 1. Preparation for installation


Regarding the system requirements: as stated by the manufacturer, the software will run on the following operating systems:

Minimum hardware requirements, up to 100 users: Intel Atom D2500 1.86GHz, 2Gb RAM, HDD 500Gb
Well, of course, the Squid proxy server compiled with ICAP client support, as well as client machines that are pre-configured to use a proxy. Squid version requirements are not specified, but intuition tells you that at least 3.0 is sufficient.
In fact, there is the following: Intel Core2Duo, 4GB RAM with Debian 7 installed on board and Squid3 in transparent mode.

PART 2. Installation


The Entensys company has its own repositories, so installation to disgrace is trivial:
sudo apt-get install webfilter3 

The initial setup is as follows:
  1. Go to the web interface at serverip: 4040
  2. Select the type of node "Main"
  3. Specify passwords and click "Install"
  4. We configure Squid according to the instruction in the documentation

Webfilter generates all the necessary configuration files and starts the daemons.
Immediately after installation, I was frightened by the number of ports I listened to, but in my situation, with the DROP policy in the INPUT chain of the filter table, this does not pose a special threat. 1344 (ICAP server), 4040 (web interface), 10053 (backend for DNS Queries) are distinguishable throughout this heap of listening ports.

In the course of adapting the new software to the local infrastructure, I encountered this feature: in addition to the main webfilter3 daemon, there is also an init script webfilter3_rules, which at start adds rules to iptables to redirect all incoming dns traffic to port 10053, to filter it, and also to redirect http traffic. For me (paranoid) having a self-configured firewall, interfering with iptables tables was simply unacceptable, therefore:
 sudo /etc/init.d/webfilter3_rules stop sudo insserv -r webfilter3_rules 

Now the question is how to filter incoming dns requests. It seems logical to redirect via iptables from port 53 to 10053. For those who do not have their own dns records, for whom all the dns traffic is forwarded to another server, this solution is perfect (or leave it on the webfilter3_rules). I had static entries in / etc / hosts and in the dnsmasq config, in addition, there were special options for running dnsmasq. Therefore, I decided to do the following:
 #/etc/dnsmasq.conf no-resolv server=<server-ip>#10053 //  ip     ,     . 

With this configuration, dnsmasq will redirect requests that the DNS filter could not answer to itself.

PART 3. Setup


Setting up the filter is carried out through a web interface. All details of the settings are described in detail in the documentation. A brief algorithm for the minimum setting is as follows:
  1. Add a user group. Customize used lists. In my case, all the built-in lists were used.
  2. We add users. The following authorization mechanisms are available:
    • ip
    • ip range
    • login: password via radius server

  3. Create filtering rules:
    1. For the most rigorous filtering, the logic of the rule should be "OR"
    2. Select categories of sites that will be banned (pornography and violence, fraudulent sites, etc.)
    3. Select the categories of morphology, which will be taken into account when analyzing the page content
    4. Customize the individual schedule of the rule. (In case the logic of the rule is “OR”, it will be logical to leave all days empty, otherwise, the rule will be triggered upon any request on the marked day)
  4. Activate the created filtering rule in the user or group settings
  5. Change the page address to which the user will be redirected in the event of a page lock
  6. Check the rules work:
    1. Go to the menu "Check URL"
    2. We use pornhub.com or pornolab.net as a verified address.
    3. Push check
    4. If properly configured, the result should look like this:

      where a non-empty value in the "Block by rules" field means that the rule is enabled and running

The minimum setting is made and sufficient for the full functioning of the filtering, then, using the documentation, we set up filtering to fit our own needs.

PART 4. Testing


Speed

The logical question: "How much filtering slows down the download sites." Initially I wanted to test and compare the download speed of various sites and provide the result in the form of a table. The measurements were made using the developer tools built into Chrome. If in the case of loading without a filter it was possible to calculate the average loading time based on 10 requests, then under the filter the loading time fluctuated very strongly, in some cases from 100 to 500 ms, so I decided that such a comparative analysis would not do anything. The fact is that the load time increases, the most that I managed to catch - 3 times. However, having high-speed Internet, the difference between 100 ms and 300 ms is not noticeable to the eye, the difference between 200 ms and 600 ms is palpable very slightly and does not cause much discomfort. In general, according to subjective feelings, sites load quickly.

Filtration

UserGate WebFilter's filtering is awesome. The list of banned domains is very extensive. Most of the “bad” sites I tried to enter are discarded according to the domain lists, so the matter does not even reach the morphological analysis.
As for the morphological analysis, everything is also very good here. As a test I tried to go to sites that are on the list of extremist materials, approved by the Ministry of Justice. Works fine. Nevertheless, there were also cases of missing unwanted content, but their number tends to zero.

fault tolerance

At the time of this writing, the server has been running for a little more than a month, it never took a reboot: neither the webfilter daemon, nor the server entirely. During peak hours, HTTP traffic passes through the server at speeds of up to 8 MB / s for a long time. On glitches, freezes and other malfunctions, users do not complain.

Conclusion


From working with this software were very pleasant experience. All declared functionality works properly. Given the low cost of software (for example: a 50 PC license will cost 13,500 rubles a year), in my personal opinion, this software is the ideal solution for content filtering in educational institutions.

Sources:

Source: https://habr.com/ru/post/203028/


All Articles