Electronic libraries at work and at home

Inspired by the article about science under lock and key . Although my article is not about this, but also about access to electronic libraries, although of a different nature.

I work in the West ... not, nevertheless, a northern university, and I have to read quite a few articles from my field of study. Fortunately, here the university library is subscribed to many electronic libraries (I wonder how much this pleasure costs ... not, not so - how many bourgeois are cashing in on our articles?). In my subject matter, there are three such libraries - ACM, IEEE and Springer. And in them - the lion's share of what I need. And everything would be great, but there is one BUT.

Electronic libraries (DL) identify users, apparently, by IP. Maybe I'm wrong, but access to them is open only if the computer is physically (through a twisted pair) connected to the University network. At least, on the villages of DL it is listed under whose subscription I can download articles. All you have to do is to connect to the University’s network wirelessly, as soon as the EB begins to ask for money for the article. It is clear that outside the university I see only the tariffs for reading the work of colleagues and of my own.

And what if I decided to work at home (well, I’m kidding: who can do this in the presence of three scamps ...) well, or on a business trip I needed to check something, refine, or just read, in the end-ends (very real case) ? At some point, I decided that it was necessary to somehow solve this matter. The local admin rejoiced me only because I was not the first to ask him to come up with a solution on how to access the DL remotely. Without promises to do anything. Later I learned from a colleague that there is some solution on a national resource, but it works somehow unstable - despite the fact that a colleague showed how she connects to the DL through this resource, for some reason, the same actions did not lead to the desired result. .
')
That's when the decision appeared to do everything myself, because there are some small PHP and JS programming skills. As well as the fact that there are user folders on the local network (from which the subscription to the DL is valid), and PHP scripts can be placed in them. The idea looked like this:

place the proxy in your folder on the local University server,
write an extension under google chrome to
All links leading on the DL pages “not there” (not on PDF articles) would change on the fly, i.e. everything would look as if there is no proxy and articles open up as if you are working from the internal network of the University.

After searching for PHP, the proxy was taken by a simple script (thanks to Eric-Sebastien Lachance) that sent all HTTP headers (that is, including the user's cookie). True, it had to be greatly expanded to 1) limit the number of domains to which a user can access, and 2) only students and employees of the University can use it (you know why it cannot be made available to anyone ...), and This check whether the user is logged on to the local network (Intranet, you can log in remotely; by the way, this step can be omitted if you use only yourself). To do this, every request, except for the actual URL to the PDF of the article, must contain the ID of its session on the Intranet. The proxy makes a request for the Intranet page, setting the desired cookie (JSESSIONID in my case) to the transmitted value, and if a positive response comes from the Intranet, it performs a request to the EB.

User check

function verifyUser($authCookieValue) { global $authCheckURL; global $authSuccessHTML; global $authCookieName; $result = false; $error = ''; // create a new cURL resource $ch = curl_init(); if ($ch !== false) { // set URL and other appropriate options $header = array(); $header[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'; $header[] = 'Accept-Language: en-US,en;q=0.5'; $header[] = 'Connection: close'; $header[] = 'DNT: 1'; curl_setopt($ch, CURLOPT_URL, $authCheckURL); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_USERAGENT, 'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0'); curl_setopt($ch, CURLOPT_COOKIESESSION, true); curl_setopt($ch, CURLOPT_COOKIE, $authCookieName.'='.$authCookieValue); curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate'); curl_setopt($ch, CURLOPT_HTTPHEADER, $header); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // grab URL and pass it to the browser $response = curl_exec($ch); if($response === false) { $error = curl_error($ch); } else { $result = strpos($response, $authSuccessHTML) !== false; } curl_close($ch); } else { $error = 'cannot initialize cURL'; } return array( 'succeeded' => $result, 'error' => $error ); }

The Google Chrome extension works on the user’s side (it consists of two parts — the code embedded in the page and the code with access to tabs and cookies running in the background), and if the URL is in one of the three libraries, the page is scanned for certain elements that should be on every page where there is a link to PDF articles (for each DL, of course, these are different elements). If such elements are found, then a proxy request is inserted there with a well-formed link to the PDF article. (The search for how and from which of the data on the page to correctly form a link took quite some time).

ACM Example

 // proxy config var PROXY_URL = 'http://university.org/~user/proxy.php?'; var PROXY_URL_QUERY = 'urlForProxy='; var PROXY_ID_QUERY = 'idForProxy='; // page search and modification const var ACM_PDF_LINK_NAME = 'FullTextPDF'; var ACM_ARTICLE_ID_NAME = 'id'; var ACM_PURCHASE_LINK = 'https://dl.acm.org/purchase.cfm'; var ACM_QUERY_URL = 'http://dl.acm.org/ft_gateway.cfm'; var ACM_QUERY = 'id={0}'; var ACM_LINK = '<a name="' + ACM_PDF_LINK_NAME + '" title="FullText PDF" href="{0}" target="_blank"><img src="imagetypes/pdf_logo.gif" alt="PDF" class="fulltext_lnk" border="0">PDF</a> [proxy]'; // requests to the background page var REQUEST_AUTH = 'auth'; function setACMLink() { var pdfLink = document.getElementsByName(ACM_PDF_LINK_NAME)[0]; if (!pdfLink) { var i, id, param; var params = window.location.search.substr(1).split('&'); for (i = 0; i < params.length; i++) { param = params[i].split('='); if (param[0] === ACM_ARTICLE_ID_NAME) { id = param[1].indexOf('.') > 0 ? param[1].split('.')[1] : param[1]; break; } } if (id) { var link = PROXY_URL + ACM_QUERY.format(id) + '&' + PROXY_URL_QUERY + encodeURIComponent(ACM_QUERY_URL); // purchase link is a placeholder for a link to PDF var a, container; var links = document.getElementsByTagName('a'); for (i = 0; i < links.length; i++) { a = links[i]; if (a.href.indexOf(ACM_PURCHASE_LINK) === 0) { container = a.parentNode; container.innerHTML = ACM_LINK.format('#'); setClick(container.childNodes[0], link); break; } } } } } function setClick(elem, link) { elem.addEventListener('click', function (e) { commPort.postMessage({name: REQUEST_AUTH, href: link}); e.preventDefault(); return false; }); }

When clicking on this link, the value of the cookie that contains the session ID on the University Intranet is added to the request.

Adding session identifier on intranet

In the background code:

 // config var AUTH_URL = 'https://university.org/intranet'; var AUTH_COOKIE = 'JSESSIONID'; // const var REQUEST_AUTH = 'auth'; chrome.runtime.onConnect.addListener(function (port) { port.onMessage.addListener(function (request) { var answer = {toRequest: request.name}; if (request.name === REQUEST_AUTH) { // check the authorization on the select web-site answer.href = request.href; answer.result = false; answer.id = ''; chrome.cookies.get({url: AUTH_URL, name: AUTH_COOKIE}, function (cookie) { if (cookie) { answer.result = true; answer.id = cookie.value; } port.postMessage(answer); }); } }); });

In the code embedded in the page:

 var commPort = chrome.runtime.connect(); commPort.onMessage.addListener(function (answer) { if (answer.toRequest === REQUEST_AUTH) { // add an authorization id, and send the request to to the proxy window.location = answer.href + '&' + PROXY_ID_QUERY + answer.id; } });

True, in order to open articles on IEEE had to tinker a lot. It turns out that for IEEE you need to open any page on its DL website through a proxy (in Chrome, an additional tab opens, which closes immediately after the content is loaded) in order to receive a cookie from it with user identifiers with a subscription. Plus I had to conjure with the replacement of the domain and path values in the received cookies (the latter was optional) in PHP, so that they could be automatically transmitted along with the request for PDF to the proxy: they came (after this additional request to get identifiers) as domain = ieeexplore.ieee .org, and the link to the PDF already pointed out not to .ieeexplore.ieee.org / query, but to university.org/~user/proxy?url= ieeexplore.ieee.org% 5Fquery, so you had to edit them to domain =. university.org, path = / ~ user / proxy.

Alteration of cookies received from IEEE

 // config $cookieDomain = '.university.org'; $cookiePath = '/~user'; $headerArray = explode("\r\n", $response['header']); $js = ''; foreach ($headerArray as $headerLine) { if (strpos($destinationURL, 'ieee.org') !== false) { if (strpos($headerLine, 'Set-Cookie: ') !== false) { $cookieArray = explode(': ', $headerLine, 2); $headerLine = $cookieArray[0].': '; $cookieDataArray = explode('; ', $cookieArray[1]); $isFirstKey = true; $js .= ' document.cookie = "'; foreach ($cookieDataArray as $cdKey => $cookieData) { list($cname, $cvalue) = array_merge(explode('=', $cookieData), array('')); if ($cname === 'domain') { $cvalue = $cookieDomain; $cookieDataArray[$cdKey] = $cname.'='.$cvalue; } if ($cname === 'path') { $cvalue = $cookiePath; $cookieDataArray[$cdKey] = $cname.'='.$cvalue; } $headerLine .= ($isFirstKey ? '' : '; ').$cookieDataArray[$cdKey]; $js .= ($isFirstKey ? '' : '; ').$cookieDataArray[$cdKey]; $isFirstKey = false; } $js .= "\";\r\n"; } header($headerLine); if (strlen($js) > 0) { echo "\r\n<script>\r\n".$js.'</script>'; // insert JS code into the page to set IEEE session and identification cookies } } else { foreach ($headerArray as $headerLine) { header($headerLine); } } }

The last thing I want to point out is that these are the weak points of this implementation:

If something changes in the layout of the DL pages, this solution will stop working. But while it works it is very convenient: logged in on the Intranet, and work
There are some security issues: the session ID is passed over HTTP, and potentially it can be intercepted. While I alone use this solution, the risk seems to me to be zero - how does the attacker know what kind of numbers I transfer to the proxy? However, if he examines the extension, accessing someone else's account on the University Intranet will be a matter of technology. And HTTPS doesn’t extend to the user directories on our server ... It may be necessary to use session identifier encryption in the extension ( CryptoJS ) and server-side decryption ( OpenSSL ), as I ’m still planning to send the extension to my colleagues

And here is the full source code .

PS If someone knows the best and the same convenient solution, please share. You can take code, use, edit, share, etc. In general, there is an idea to fill it with github, and if there are wishes in the comments, then I will.
PPS For the code, do not scold too much - it took more time to figure out which send requests were needed (especially for EB on IEEE) than on the code itself. It may also be incomplete - it is not always to substitute the correct link.

Source: https://habr.com/ru/post/192276/

All Articles

Electronic libraries at work and at home

More articles: