Many of us are aware of the possibility of changing the appearance of websites by local means, without modifying the files on the server. There are various ways to do this, the most popular ones are custom scripts and styles that are automatically applied by the browser to the loaded page. Of course, the possibilities of such editing are very limited, and to solve some serious problem that requires a non-standard query to the database, this way will not work.
However, even in this case, it may turn out that not everything is so hopeless. In this article I want to talk about my own experience of creating a local add-on for the phpBB2 forum engine, which corrects the display of the read status for topics and posts. Although the result of this attempt was quite a workable product, which I now constantly use, the purpose of writing this article is still not a product presentation, but a description of the approach to solving the problem. I spread the code of the received program, but due to the specifics of the task, it is not universal enough and cannot be used as it is, without prior (and rather painstaking) settings for the engine of a particular site. I decided to warn about this beforehand so that there would be no disappointment.
Well, now in more detail about the formulation of the problem.
Despite the fact that the third version of phpBB has long been considered the latest, the second is still quite often found on the Internet. Here is a list of phpBB2 flaws that my tool is meant to fight:
- After relogining or restarting the browser, as well as after a certain period of inactivity, all unread topics are automatically marked as read.
- When you open a topic, it becomes read all over, no matter what page you open.
- The read status is stored in the cookies as a serialized array. The maximum size of cookies is usually limited to 4 Kb, which allows storing status only for ~ 140 topics. For an active forum this is very little.
Some may shrug their shoulders: and what is the problem? And the problem is that if you strive to be aware of all the cases, then marking the unread topics and messages with an orange leaf (in the base skin) helps to quickly see which topics are updated and which are not. If you, say, decide to go to the forum now and quickly reply to some message, and read the rest of the updated topics to postpone until later, the next time you see the unread status will be reset, and you will have to search for new messages manually, remembering when it was last visited. specific topic, and looking for dates or texts of posts, what appeared in it from that moment. And even if you always read all the threads at once, this is not a guarantee: after all, browsers can sometimes crash, the computer can fail, and the light can turn off (and not everyone has a UPS).
')
Of course, the most appropriate solution to the problem would be to modify the forum engine, but in practice this is not always possible to implement. Therefore, I decided to try to solve this problem on my own, on the client side. Of course, there can be no talk of a complete ideal solution without direct access to the database on the server. In particular, in most cases it is impossible to unambiguously determine the presence / absence of unread topics in a single sub-forum in order to correctly display its icon on the main page (unless you download and analyze the full set of all pages with its topics, but this is a long time). However, some significant improvements are quite realistic.
The overall architecture of the add-in looks like this: a local database is stored on the client machine, which stores the timestamp of the last visit for each topic, plus a proxy server runs that passes all forum pages through itself, correcting the labels for topics and messages in accordance with the actual status readings taken from the local database (and, of course, updating this database as needed).
As the base format, I chose a text file with a set of lines like "00000000000000". Each line corresponds to one topic, the line number (counted from zero) is the topic identifier, and the timestamp of the last visit in the UNIX-time format is recorded as the content of the line. The zero topic does not exist, so the zero line is used in a special way: it stores the date / time for the forum as a whole, so that you can quickly mark all the forum topics with readings at once. Thus, for each topic, the real time of the last visit is considered the maximum of two values: general forum and specifically this topic.
Why I stopped on the text format? First, it was convenient to correct errors if necessary, without getting into the binary editor. Secondly, my proxy server is written in Perl, and in it it is more convenient to work with text files than with binary ones. Parsing numbers from a text string is not such a resource-intensive operation, and thanks to the fixed length of the string, you can go directly to the desired record by index, without reading the entire file line by line.
As for the proxy server implementation, the Pearl language was chosen for the reason that it is very convenient to work with text on it (and HTML parsing will, of course, be the key part). The resulting speed, by my standards, turned out to be quite acceptable (in any case, the potential performance gain from switching to another language looks less significant to me than wasting time and effort). The proxy server listens to its port and, upon receiving a request from the browser, sends the request to the target server, reads the response and sends the contents of the received page to the browser. On its way, the page passes through a filter whose behavior depends on the target address. If this is one of the forums we need, and not just anything, but one of the scripts
viewtopic.php ,
viewforum.php ,
index.php (or just the root URL of the forum), then the filter starts parsing the page, replacing the labels of topics and messages based on the date / time of the last visit, taken from the local database. Otherwise, the filter does not work, but simply sends the unmodified content to the browser.
The most difficult and unpredictable part is parsing. The problem is that different themes and extensions can be installed on each specific forum, so that the HTML-code of the pages received from the server varies widely. To take into account the peculiarities of different forums, I only use the key structural features of the phpBB2 engine, and specific signal lines are rendered into separate modules that are loaded by the proxy server at the start and allow each forum to be processed according to their own sets of rules. Of course, if you need to add support for a new forum, you will have to do all the work on selecting and setting up signal lines manually, by analyzing the HTML page received from the server. If the changes in the engine are small, everything will be limited to this. But it may also be that the engine has been seriously reworked, and the HTML structure has become completely different. Then the entire main module will have to be redone, and it’s not a fact that the “multi-forum” will be kept at all. It is possible that it will be easier to keep a separate version of the proxy server, sharpened specifically for one of the "clever" forums.
For proper parsing, you will also have to make changes to your forum profile. The fact is that by default the engine gives the date / time of messages without specifying seconds, and since the time to take us nowhere else, it turns out that the error in placing marks will be one minute, which of course is too much. Therefore, it is necessary to choose a more complete time format, with seconds, in your profiles [settings]. I dwelled on the “D dmY, G: i: s” format, which will allow displaying the date in the form of “Mon 02/14/2011, 10:57:44 AM” (of course, the proxy server is not needed for the day of the week, but the convenience of a loved one too Do not forget). If you prefer a different format, you will need to make the appropriate corrections to the timestamp function. We should not forget about the modifications of phpBB, which instead of the date can insert the words "today" or "yesterday." Unfortunately, all this will also require the completion of the code.
Well, it remains to mention a few features.
- Perhaps the main problem is that it is extremely inconvenient to use such a proxy server on an ongoing basis. If the braking work when working with a single forum can be experienced, then it will be very unpleasant to endure the brakes in everyday work. I solved this problem for myself in the following way: I have long used Proxomitron as an advertisement cutter and adding all sorts of buns. Among other things, he has the ability to use different proxy servers depending on the requested address. I just created rules in it so that the forums I needed were loaded through my proxy (and not entirely, but only the pages being processed), and everything else would work directly, so this lack of inconvenience does not give me. Those who do not use Proxcitron can look for some alternative options or simply set up in the browser the possibility of fast manual switching of proxies (if, of course, the browser has such an opportunity).
- I did not want to implement a full-fledged HTTP server or search for a ready-made implementation, so I limited myself to a simple, self-written, but rather limited version (only HTTP / 1.0 is supported with the forced shutdown of gzip-compression). Personally, this is enough for me, but it may not seem enough to someone.
- In phpBB there is another problem that I managed to podzadolbat. If I open or update several forum pages at the same time, the engine throws me out of the forum (which, in combination with resetting the read status of the topics, simply infuriated me). I have not yet figured out the reasons for this behavior, but my proxy server at the same time made it possible to get around this problem. Initially, it was multithreaded, so that the processing of one page did not slow down the processing of other pages. I commented out the code responsible for multithreading, and I received slower, but sequential processing of pages, so now I can open a bunch of links in the background tabs at once, almost without worrying that I’ll log out (of course, this is not so fatal, but too much once I enter my login / password and I don’t feel like updating a bunch of already opened pages). The drawbacks of this solution, of course, are also there: if one of the pages hangs for some reason, the rest will not load, and you will have to either wait for the departure time out or restart the proxy server. And when working with several forums, it is wrong to wait for one because of loading another. The solution may be to run several independent threads, one for each forum, or to launch several instances of proxy servers on different ports. Just in case, I did not delete the multithreading code, but simply commented it out, so that you can uncomment it and use the normal multithreaded version.
- The proxy server does not know how to keep track of the number of rows in the database and add new ones as needed (to be honest: it was just too lazy to implement), so it is necessary for it to first create a file filled with zero lines. The number of lines depends on the forum (more precisely, on the number of topics in it). To avoid problems, it is better to create a file immediately with a good margin.
- In the proxy server, there is also support for the "View a single message" phpBB mod ( viewpoint.php script). When viewing a separate post, the time to visit a topic in the database is not updated, because in this case, it is easy to miss unread messages that precede the one you are viewing. If someone thinks this is wrong, the code is easy to correct by removing one if .
Well, if someone has not scared away all this confusion yet, the server itself can be downloaded
from here (archive, 5 Kb). Included is a set of rules for the official forum Total Commander, which I took as a basis for my experiments and for the sake of which, basically, this whole bodyaga was started.