It so happens that you need to walk through the pages of a site that has suddenly gone down or completely closed, and from time immemorial Google has been helping us here with its search cache. One problem - to “walk” in this case turns into a continuous torture: view the page, copy the address of the link that you want to go through, paste into the search box and add the “cache:” prefix. Too much action for the sake of one link. Here is a link to solving this problem for the impatient:
GCB 2.0 .
Google Cache Browser 1.0 and its problems
Several years ago I tried to solve this problem, and even created a small Google Cache Browser service that worked on the proxy principle: I downloaded the cache page, replaced all the links in it so that they would lead again to the service itself, and in this form gave to browser user. However, he had several significant flaws:
- He decently consumed traffic.
- Regularly hit the Google Ban.
- To a lesser extent, but still noticeably loaded processor (apply regular expressions to large pages - a thankless task).
- Despite all my tricks, he replaced not all links. On some sites they were designed so that you wonder. About the validity of this HTML and speech did not go.
As a result, the service gradually stalled and once I just did not renew the domain.
Google Cache Browser 2.0 and JS-Fu
After that, my thoughts periodically returned to the problems that drowned the service, and they mostly revolved around what it would be good to transfer all this processing to the client side: the browser is much better suited for manipulating the contents of a web page than regular expressions. And just recently I found a way to do it!
The main problem was that for my purposes I had to run my JavaScript in the context of the domain
webcache.googleusercontent.com , and about a week ago I noticed that the cached pages still load and execute their javascripts, but not their cached versions, but current versions from the site. From this moment it remains only to drive into Google's cache a suitable page with a connected JS and start working in the context of Google’s domain.
')
All this pretty well coincided with SOPA and the
temporary shutdown of good sites like Wikipedia, so last night I took it and brought the service to mind: now it functions completely in the browser (not a single server script), in recent versions of Firefox, Chrome, Opera and in IE8. I didn’t have enough time to check in other browsers, so send bug reports! :-)
And, yes, the last good thing: I published all the
source codes for the service on GitHub, under the terms of GPLv3. Feel free to fork!
Results
Humanity is blessed with the
opportunity to at least read Wikipedia for today, and I got a lot of pleasure from using already forgotten JS-Fu, because at work I spend most of my time on the server side.
ToDo
As is usually the case, there are still a lot of improvements that would perfectly complement the service. Here are the most interesting ones:
- Make a bookmarklet for the service. At the same time, the bookmarklet will be able to transfer both to the cached version of the page, and to add functionality to the page from the cache, if it has already been opened.
- Overcome some spontaneous glitches with layout.
- Insure against dropping the entry point page from the index.
- Test cross-browsering more thoroughly.
- It is possible to transfer the service to a separate, more beautiful domain.
By the way, you can easily lend a hand to this - the
source code lies on the GitHub ;-)