📜 ⬆️ ⬇️

Law №139-ФЗ: view from a small provider

A couple of weeks ago, my good friend (part-time system administrator in a city, but not very large provider) colleagues from Roskomnadzor put on the form that by IP he, of course, blocks banned sites, but by URL or domain names absolutely not; When the IP address is changed by a prohibited resource, it starts to open again. A comrade complained to me about his fate, at the same time hinting that once he personally had already been fined for failing to comply with the requirements of the infamous No. 139-FZ.


The connection scheme for this provider is as follows:


After some thought, as well as googling information about layer 7 filtering on the knee, Google told us that such things are the prerogative of expensive equipment, and l7-filter for iptables stopped developing a long time ago. After that, we started to think in the direction of proxy servers. None of us worked with squid on an industrial scale, however there were quite a few experiments with nginx - a very powerful product of Igor Sysoyev.
')
The general scheme of work was developed as follows:


It is clear that the configuration of nginx and the list of zones of forbidden resources must be updated dynamically: the frequency of Roskomnadzor’s requirements indicates the frequency 3 times a day. The drawback is also obvious: access via HTTPS will either have to be restricted or allowed, but using a self-signed certificate, otherwise the traffic through our nginx will be encrypted and it will not be able to perform its main function (filtering). In the second variant, swearing of clients' browsers on the wrong certificate is inevitable. Moreover, if the block gets an address from the list of google domains, and google chrome is installed on the client, then the client will not allow the client to these sites in principle. But we couldn’t invent other knee variants for a limited time. So, what we have:
  1. The script that loads the list of prohibited sites (plain XML, is given by the SOAP service after a corresponding request. Authorization is required using a certificate that is unique for each provider).
  2. Script that loads the list of zones in the DNS server.
  3. The script that creates the necessary configuration for nginx.


I will not give the first script: providers themselves know how to get information, and it is not recommended to disclose it to ordinary users.

The second script (SQL for working with the PowerDNS database):

USE pdns; create temporary table if not exists dump( domain text ); TRUNCATE TABLE dump; load data local infile '/opt/zapret/dump.xml' into table dump LINES STARTING BY '<domain>' TERMINATED BY '</domain>' (@tmp) SET domain = ExtractValue(@tmp, '/'); create temporary table if not exists locked( domain varchar(767) primary key ); TRUNCATE TABLE locked; INSERT INTO locked SELECT DISTINCT domain FROM dump; create temporary table if not exists locked1( domain varchar(767) primary key ); TRUNCATE TABLE locked1; INSERT INTO locked1 SELECT * FROM locked; DELETE FROM l USING locked l INNER JOIN locked1 l1 ON l.domain=SUBSTR(l1.domain from 5); UPDATE locked SET domain=SUBSTR(LCASE(domain) FROM 5) WHERE LEFT(LCASE(domain), 4) = 'www.'; DELETE FROM locked WHERE domain LIKE '%youtube.com' OR domain LIKE '%google.com' OR domain LIKE '%google.ru'; create temporary table if not exists old_locked( id int, domain varchar(767) primary key ); TRUNCATE TABLE old_locked; INSERT INTO old_locked SELECT DISTINCT d.id, d.name as domain FROM domains d INNER JOIN records r ON d.id=r.domain_id WHERE r.content='1.2.3.4' AND d.name NOT LIKE '%provider.ru'; INSERT INTO domains (name, master, last_check, type, notified_serial, account) SELECT n.domain, NULL, NULL, 'NATIVE', NULL, NULL FROM locked n LEFT JOIN old_locked o ON n.domain=o.domain WHERE o.domain IS NULL; INSERT INTO records (domain_id, name, type, content, ttl, prio, change_date, ordername, auth) SELECT d.id AS domain_id, d.name, 'SOA' as type, 'ns.provider.ru dns.provider.ru 2014022701 28800 7200 604800 86400' as content, 86400 as ttl, 0 as prio, 1393508792 as change_date, '' as ordername, 1 as auth FROM domains d INNER JOIN locked l ON d.name=l.domain LEFT JOIN old_locked o ON d.name=o.domain WHERE o.domain IS NULL; INSERT INTO records (domain_id, name, type, content, ttl, prio, change_date, ordername, auth) SELECT d.id AS domain_id, CONCAT('*.', d.name), 'A' as type, '1.2.3.4' as content, 86400 as ttl, 0 as prio, 1393508820 as change_date, '' as ordername, 1 as auth FROM domains d INNER JOIN locked l ON d.name=l.domain LEFT JOIN old_locked o ON d.name=o.domain WHERE o.domain IS NULL; DELETE FROM r, d USING records r INNER JOIN domains d ON r.domain_id=d.id INNER JOIN old_locked o ON d.name=o.domain LEFT JOIN locked l ON o.domain=l.domain WHERE l.domain IS NULL; 


After the execution of this script, you must not forget to execute
 # pdnssec rectify-all-zones 

so that powerdns is aware of the change.

The third script (forming the list of blocked):
 <?php $xml = simplexml_load_file ('/opt/zapret/dump.xml'); $dirty = array(); $excl = array(); $excl[] = 'youtube.com'; $excl[] = 'google.ru'; $excl[] = 'google.com'; $excl[] = 'badsite.org'; foreach($xml as $node) { if( strlen( (string)$node->domain )>0 ) { $parsed = parse_url((string)$node->url); if( $parsed!=false ) { if( isset($parsed['path']) ) { if( isset($parsed['scheme']) ) $scheme = $parsed['scheme'] . "://"; else $scheme = "http://"; if( isset($parsed['port']) ) { $port = ':' . $parsed['port']; if( $scheme=="https://" ) $port = $port . " ssl"; } else { if( $scheme=="https://" ) $port = ":443 ssl"; else $port = ":80"; } $port = $port . ";"; $domain = (string)$node->domain; if( strcmp(strtolower(substr($domain, 0, 4)), 'www.') == 0 ) $domain = substr($domain, 4); if( isset($parsed['query']) ) $que = $parsed['query']; else $que = ''; $que = str_replace('\\E', '\\E\\\\E\\Q', $que); $que = '\\Q' . $que . '\\E'; if ( strcmp($que, '\\Q\\E')==0 ) $que = ''; $path = $parsed['path']; $path = str_replace('\\E', '\\E\\\\E\\Q', $path); if ( strcmp($path, '/')<>0 ) { $path = '\\Q' . $path . '\\E'; if ( strcmp($path, '\\Q\\E')==0 ) $path = ''; } $keys = preg_grep('/' . $domain . '/', $excl); if( count($keys)<1 ) $dirty[] = array('domain'=>$domain, 'url'=>(string)$node->url, 'loc'=>$path, 'query'=>$que, 'port'=>$port, 'scheme'=>$scheme); } } } } // ..   ,     $dirty[] = array('domain'=>'badsite.org', 'url'=>'badsite.org', 'loc'=>'/', 'query'=>'', 'port'=>':80;', 'scheme'=>'http://'); $sort_func = function($obj_1, $obj_2) { return strnatcasecmp($obj_1['domain'] . $obj_1['url'], $obj_2['domain'] . $obj_2['url']); }; $domains = array_unique($dirty, SORT_REGULAR); usort($domains, $sort_func); $old_domain = ""; $old_loc = ""; $wasroot = false; $alldomain = false; $allloc = false; foreach($domains as $node) { $domain = $node['domain']; $url = $node['url']; $loc = $node['loc']; $query = $node['query']; $port = $node['port']; $scheme = $node['scheme']; // echo "\n1. Root " . (string)$wasroot . "; alldomain " . (string)$alldomain . "; alloc " . (string) $allloc . "; loc '" . $loc . "'; query '" . $query . "'\n"; if( strcmp($domain, $old_domain) ) { loc_close( $old_loc, ($alldomain || $wasroot || $allloc)); dom_close( $old_domain, $wasroot ); dom_open( $domain, $port, $scheme ); $old_loc = ''; $alldomain = false; $wasroot = false; } if( !$alldomain ){ if( strcmp($loc, $old_loc) ) { loc_close( $old_loc, ($alldomain || $allloc) ); loc_open( $loc ); $allloc = false; } if( strlen($loc)<2 && strlen($query)>0 ) $wasroot = true; if( strlen($loc)<2 && strlen($query)<1 ) { $alldomain = true; $wasroot = true; } if( strlen($query)<1 ) $allloc = true; if( !$allloc ) args_check( $query ); } // echo "\n2. Root " . (string)$wasroot . "; alldomain " . (string)$alldomain . "; alloc " . (string) $allloc . "; loc '" . $loc . "'; query '" . $query . "'\n"; $old_loc = $loc; $old_domain = $domain; } loc_close( $loc, ($alldomain || $wasroot || $allloc) ); dom_close( $domain, $wasroot ); function loc_close( $_loc, $_alldomain ) { if (strlen($_loc)>0 ) { if( $_alldomain ) { ?> return 301 http://eais.rkn.gov.ru/; <?php } else { //if ( strcmp($_loc, '/')==0 ) { ?> include /etc/nginx/proxy_params; if ($args = '') { proxy_pass $scheme://$host$uri; } if ($args != '') { proxy_pass $scheme://$host$uri?$args; } <?php } echo " } # location\n"; } } function dom_close( $_dom, $_wasroot ) { if( strlen($_dom)>0 ) { if( !$_wasroot ) { ?> location / { include /etc/nginx/proxy_params; if ($args = '') { proxy_pass $scheme://$host$uri; } if ($args != '') { proxy_pass $scheme://$host$uri?$args; } } #root location <?php } echo "} #domain\n"; } } function loc_open( $_loc ) { ?> location ~* <?php echo $_loc . "* {\n"; ?> <?php } function dom_open( $_domain, $_port, $_scheme ) { ?> server { listen 1.2.3.4<?php echo $_port; if( strcmp( $_scheme, 'https:\/\/' )==0 ) echo ' ssl'; ?> server_name <?php echo $_domain . " " . "*." . $_domain . ";\n"; } function args_check( $_query ) { if ( strlen($_query)>0 ) { echo "\t if (\$args ~* \""; if(strlen($_query)>0) echo $_query; echo "*\") {\n"; echo "\t\treturn 301 http://eais.rkn.gov.ru/;\n"; echo "\t } #args\n"; } else echo "\treturn 301 http://eais.rkn.gov.ru/;\n"; } ?> 


Following the execution of the third script, we get the configuration for nginx, which proxies the domain names that we received. If the address is blocked, then an unconditional redirect (301) to the address of the eais.rkn.gov.ru - the registry of prohibited sites.
There are three types of locks:

1. Whole domain. For such sites we get the following entry:
 server { listen 1.2.3.4:80; server_name badsite.org *.badsite.org; location ~* /* { return 301 http://eais.rkn.gov.ru/; } # location } #domain 


2. Specific URLs in the domain. In this case, we get another entry:
 server { listen 1.2.3.4:80; server_name badsite.hk *.badsite.hk; location ~* \Q/h/\E* { return 301 http://eais.rkn.gov.ru/; } # location location ~* \Q/h/res/214.html\E* { return 301 http://eais.rkn.gov.ru/; } # location location / { include /etc/nginx/proxy_params; if ($args = '') { proxy_pass $scheme://$host$uri; } if ($args != '') { proxy_pass $scheme://$host$uri?$args; } } #root location } #domain 


3. Certain arguments at a specific URL (for example, a specific post in PHPBB):
 server { listen 1.2.3.4:80; server_name badsite.com *.badsite.com; location ~* \Q/forum/viewforum.php\E* { if ($args ~* "\Qf=6\E*") { return 301 http://eais.rkn.gov.ru/; } #args if ($args ~* "\Qf=6&start=25\E*") { return 301 http://eais.rkn.gov.ru/; } #args include /etc/nginx/proxy_params; if ($args = '') { proxy_pass $scheme://$host$uri; } if ($args != '') { proxy_pass $scheme://$host$uri?$args; } } # location location / { include /etc/nginx/proxy_params; if ($args = '') { proxy_pass $scheme://$host$uri; } if ($args != '') { proxy_pass $scheme://$host$uri?$args; } } #root location } #domain< 


And, of course, in default there is a forwarding of all requests to the corresponding addresses (just in case):
 server { listen 1.2.3.4:80 default_server; location / { include /etc/nginx/proxy_params; if ($args = '') { proxy_pass $scheme://$host$uri; } if ($args != '') { proxy_pass $scheme://$host$uri?$args; } } } server { listen 1.2.3.4:443 ssl default_server; location / { include /etc/nginx/proxy_params; if ($args = '') { proxy_pass $scheme://$host$uri; } if ($args != '') { proxy_pass $scheme://$host$uri?$args; } } } 


After generating the configuration, you need to remember to say
 # service nginx reload 

That will inform nginx about the need to reload the configuration, gently extinguishing the old pools.

The system with nginx was tested for strength a week ago, when one video from youtube.com was added to this list. In addition to the increased memory consumption, no side effects were noted. We managed to fight memory consumption by disabling keep-alive client connections. But with the convenience for users, of course, it was not very good: viewing and uploading videos on youtube.com generally worked, but many videos were embedded into other pages using https, and browsers did not want to display them with the substituted certificate. The deliberate decision of the provider’s google.com, google.com, and youtube.com domains were put on the list of exceptions, and one of the sites was included in the list of “exceptions on the contrary”: it has a long-standing decision to block it entirely, but it is unloaded in this registry with only two forbidden URLs.
In general, this solution proved to be quite efficient for a small provider who wants to continue working in the difficult conditions of our Russian legislation.

Source: https://habr.com/ru/post/216209/


All Articles