📜 ⬆️ ⬇️

Ten examples of how not to write PAC files

image


Introduction


Virtually any implementation of a Web Security Gateway, be it a cloud-based SaaS solution, such as Zscaler or an on-premises appliance, such as Cisco WSA (IronPort), can’t do without configuring proxy servers in browsers with certain cases and therefore I often come across with proxy auto configuration files (PAC, proxy auto configuration). In this article I would like to consider a few examples of optimizing their performance.


Why this article is needed


Why did I decide to write this article and is there any benefit in it? I hope so, and here is why. Essentially, the PAC file is a JavaScript function that matches the string / substring of url / host fields, which returns the proxy server name for the resource or tells the browser to use direct access to the resource bypassing the proxy. Like any programming language, JavaScript code can also be optimized for execution. In the conditions of large companies / enterprises with a complex, distributed infrastructure of access to the Internet and, as a result, PAC-files consisting of several hundred lines of code, the task of optimizing PAC-files no longer seems to be absolutely useless, because, for example , the percentages of the execution time of a single non-optimal or irrelevant function will obviously be multiplied by the number of its occurrences (applications) in the code.


Further under the cut.



Disclaimer


This article does not intend to describe in detail the functions used in the PAC-files. For a more detailed description of each function, I recommend to apply for help, for example, to the site http://findproxyforurl.com . The results of the performance tests of various browsers in this article can be controversial and in no case do not claim to be the absolute truth, however, the author tried to exert maximum efforts in order to achieve a truthful result. However, for the reason indicated by the controversy, the summary results of the tests are not given - the reader is asked to verify for himself the advantage of using one or another approach. For this, wherever possible, references to tests will be given. In my research, I primarily relied on the jsperf.com resource, which, by the way, if you have an account on github, you can create test scripts and measure the performance of the JavaScript code, this is where the test scripts for this article will be stored.


Well, let's get down to business.


Example one


The first in my list is the simplest case when you need to choose a proxy for a strictly specific host. Very often colleagues for some reason use in this case, checking the url out of place or framing the string "*". This often happens when the PAC-file is already over a hundred lines and someone adds by analogy a new condition like the following:


else if (shExpMatch(url,"*cisco.com*")) return 'PROXY MyProxy1:3128'; 

or so with the host:


 else if (shExpMatch(host,"cisco.com")) return 'PROXY MyProxy1:3128'; 

The problem here is not even that for Chrome 58 and IE11 the first version is almost 1% slower (for Firefox 53 there is almost no difference between them), the worst thing here is that, I hope, many of the readers guessed that for the URL, for example, “ http://www.noncisco.com/evil/cisco.comboth results will return true. The solution to the problem is very simple - do not use the shExpMatch function, which is not intended for this, but instead look for an exact match:


  if (host == 'www.cisco.com') return 'PROXY MyProxy1:3128'; 

The downside here is also obvious. The code above returns false for http://cisco.com , although, of course, no one bothers you to do two checks:


  if ((host == 'www.cisco.com') || (host == 'cisco.com')) return 'PROXY MyProxy1:3128'; 

Actually, the link to the test, for those who want to be convinced of the performance: https://jsperf.com/inefficient-shexpmatch


By the way, how to test PAC-files? Do not actually go through them manually in the browser? Although this method will be ultimatum, I still use the resource http://home.thorsen.pm/proxyforurl for these purposes, which has not yet let me down and did not deceive me.


Example two


Problem number two is very close in its essence to what we have considered above, and here again they use the function I dislike. default shExpMatch.
The task is to use MyProxy2 for a specific domain. What they do:


 if (shExpMatch(host, "*.linkedin.com")) return 'PROXY MyProxy2:3128'; 

Again, not optimal, not exactly out of place. We use the function


 if (dnsDomainIs(host,".linkedin.com")) return 'PROXY MyProxy2:3128'; 

Although, strictly speaking, for FF and Chrome browsers, these functions are almost equal in performance, for IE11 this is significant and the difference is more than 1%.
Actually, the test:
https://jsperf.com/dnsdomainis-vs-shexpmatch


Example three


Problem number three, which often arises due to the inattention of administrators, is in duplicate conditions. Example:


 else if (shExpMatch(host, "192.168.*")) return "DIRECT"; else if (shExpMatch(host, "10.*.*.*")) return "DIRECT"; … else if (isInNet(host, "192.168.88.0", "255.255.255.0")) return "DIRECT"; … else if (shExpMatch(url, "*10.10.*")) return "DIRECT"; 

Not only do the conditions duplicate (overlap) each other, they still don’t work out in principle if the user enters the domain name and not the ipv4 address in the address bar of the browser. The solution is to eliminate duplicate conditions and perform name resolution, at least as follows:


 else if (shExpMatch(dnsResolve(host), "192.168.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "10.*.*.*")) return "DIRECT"; 

Example four


Actually, problem number four is the development of what is described in the previous section. Let the administrator wish for all subnets in the RFC1918 range to direct clients to bypass the proxy and do this:


 else if (shExpMatch(dnsResolve(host), "192.168.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "10.*.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.16.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.17.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.18.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.19.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.20.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.21.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.22.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.23.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.24.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.25.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.26.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.27.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.28.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.29.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.30.*.*")) return "DIRECT"; else if (shExpMatch(dnsResolve(host), "172.31.*.*")) return "DIRECT"; … 

Needless to say, it suggests once to resolve the name and then work with a variable in which the result will be written:


 var resolved_ip = dnsResolve(host); ... else if (shExpMatch(resolved_ip, "192.168.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "10.*.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.16.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.17.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.18.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.19.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.20.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.21.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.22.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.23.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.24.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.25.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.26.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.27.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.28.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.29.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.30.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.31.*.*")) return "DIRECT"; ... 

Instead of 18 name resolution requests, in the worst case (for the last else if), we get predictable performance and ask for name resolution only once, which is undoubtedly more efficient without any additional tests.


Fifth example


We continue our optimization recursion. The block of code from the previous problem already due to its size suggests itself for optimization. Suppose the function initially looked like this:


 function FindProxyForURL(url, host) { var resolved_ip = dnsResolve(host); if (shExpMatch(resolved_ip, "127.*.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "192.168.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "0.*.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "10.*.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.16.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.17.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.18.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.19.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.20.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.21.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.22.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.23.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.24.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.25.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.26.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.27.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.28.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.29.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.30.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "172.21.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "169.254.*.*")) return "DIRECT"; else if (shExpMatch(resolved_ip, "192.88.99.*")) return "DIRECT"; ... } 

Even if we optimize the code above and use logical or (||) instead of else if, according to the test results among the three browsers mentioned above, the most optimal was to use the regular expression test as follows:


 function FindProxyForURL(url, host) { var privateIP = /^(0|10|127|192\.168|172\.1[6789]|172\.2[0-9]|172\.3[01]|169\.254|192\.88\.99)\.[0-9.]+$/; var resolved_ip = dnsResolve(host); if (privateIP.test(resolved_ip)) { return "DIRECT"; } } 

Note that this is not only more efficient, but also looks more compact and sleeker. But there is no limit to perfection, and if you use the specialized function IsInNet, you can achieve even more impressive results. Compared to the previous example, the code below gives a performance boost of 0.2 to 2% in various browsers:


 function FindProxyForURL(url, host) { var resolved_ip = dnsResolve(host); if ((isInNet(resolved_ip , "192.168.0.0", "255.255.0.0")) || (isInNet(resolved_ip , "172.16.0.0", "255.240.0.0")) || (isInNet(resolved_ip , "10.0.0.0", "255.0.0.0")) || (isInNet(resolved_ip , "127.0.0.0", "255.0.0.0")) || (isInNet(resolved_ip , "0.0.0.0", "255.0.0.0")) || (isInNet(resolved_ip , "169.254.0.0", "255.255.0.0")) || (isInNet(resolved_ip , "192.88.99.0", "255.255.255.0"))) return "DIRECT"; } 

Here you can repeat the test itself: https://jsperf.com/privateip-test-vs-shexpmatch


Example Six


A slight departure from the main line for a change. Consider the two lines of code below:


 dnsDomainIs("www.notmycompany.com", "mycompany.com") dnsDomainIs("www.MyCompany.com", "mycompany.com") 

If the first line returns expected true (yes, yes, it is true, this is just a search for a substring!), The second line, despite the apparent exact match, due to the difference in uppercase letters will return false (by the way, it seems like in FF version 52, the result would still be true, but the author read it somewhere on the network and did not check it himself).


Thus, to solve the problem, it is highly desirable to perform the conversion of the url and host string at the very beginning of the PAC file, if later we work with them, for example, like this:


 var url_lc = url.toLowerCase(); var host_lc = host.toLowerCase(); 

And then work only with url_lc and host_lc.


Example Seven


Very simple problem, but from that not less widespread. As well as many problems, arises due to inattention, the example below is taken from the real case of introduction, but the names, of course, are fictional and any coincidences are pure coincidence:


 ... else if (shExpMatch(host, "files.company.ru")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "mail.company.ru")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "company.com")) return "PROXY companyproxy7:3128"; else if (shExpMatch(url, "*downloads*.company.com*")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "global.company.com")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "db.company.com")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "test.company.com")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "blog.company.com")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "servicedesk.company.com")) return "PROXY companyproxy7:3128"; else if (shExpMatch(url, "*training.company.com*")) return "PROXY companyproxy7:3128"; else if (shExpMatch(url, "*farm.company.com*")) return "PROXY companyproxy7:3128"; else if (shExpMatch(host, "*.fa.company.com")) return "PROXY companyproxy7:3128"; else if (shExpMatch(url, "https://servicedesk.company.com/*")) return "PROXY companyproxy7:3128"; else if (shExpMatch(url, "https://servicedesk.company.com:8080/*")) return "PROXY companyproxy7:3128"; … 

And at the end of the file we see something like (and even with a comment!):


 /* Default Traffic Forwarding */ return "PROXY companyproxy7:3128"; 

In addition to the problems that we have already discussed earlier, the block of code above is completely useless and only slows down the browser, because in the end, the same proxy server is used by default.


Example Eight


Problem number eight is related to the internal resources of the company.
Let's say we have this code:


 if (host == 'internal') return 'DIRECT'; if (host == 'mail') return 'DIRECT'; if (host == 'files') return 'DIRECT'; 

That is, the PAC file checks whether the browser has accessed the internal resource by hostname without specifying a domain, since the search domains are set on the PC. A good solution for optimizing such checks is to use only one function:


  if (isPlainHostName(host)) { return 'DIRECT'; 

Example nine


The optimization problem number 9 is close to the previous one and again appears when using the internal resources of the organization. What to do if a user can request the same site with or without a domain, as in the example below?


 if (host == 'mail') return 'DIRECT'; if (host == 'mail.resource.lan') return 'DIRECT'; if (host == 'files') return 'DIRECT'; if (host == 'files.resource.lan') return 'DIRECT'; 

It is quite simple to optimize such checks if you use the localHostOrDomainIs function:


 if (localHostOrDomainIs(host, "mail.resource.lan") return 'DIRECT'; if (localHostOrDomainIs(host, "files.resource.lan”) return 'DIRECT'; 

Example ten


The controversial problem of optimizing if conditions. The bottom line is that in the PAC-files after checking for any if condition in the overwhelming majority of cases, there is a return from the FindProxyForURL function on return, so why use the construction below?


 if .. else if .. else 

FF test results suggest that using the OR conditions (||) is optimal:


image


For Chrome, the result will be somewhat different, but the if / else if block loses again:


image


The reader can repeat the test for his surroundings here: https://jsperf.com/shexpmatch-vs-host-string-vs-dnsdomainis


Based on the above, we conclude that it would be more optimal to use an if statement or if + or, although testing IE11 gives a completely opposite result. Perhaps it would be a good idea to use the switch statement, but I’m already adding this opus deep at night, and therefore I donate testing this case to the mercy of a highly respected reader.


Instead of conclusion


Instead of concluding, I would like to thank the reader if he has mastered this small work to the end and I sincerely hope that he will be useful not only as tips in optimization, but also more as a source of new knowledge or ideas in working with the access infrastructure on the Internet for the enterprise.


useful links


Cisco Systems about PAC files
Useful site with a description of the functions used in the PAC-files
→ A website that provides a convenient interface for testing the performance of JavaScript code
Here you can check the logic of the PAC file
Another resource where you can debug JavaScript code, including PAC files


')

Source: https://habr.com/ru/post/328316/


All Articles