Periodically, in order not to become covered with dust, I try to create interesting things that could make someone's life easier. I strive to ensure that they are more useful than a social network for cats. One of the most recent examples is the Telegram-bot, which allows you to find known Wi-Fi-points in these coordinates and see the passwords for them.
This time was no exception and I decided to create a bot that would allow to watch my favorite movies and TV shows with the greatest comfort and minimum of effort, and also provide content in several versions of voice acting. No sooner said than done. And now, when the iron friend of the person happily gives out users their favorite shows, I would like to talk about what accompanied the creation of the bot, what problems stood in my way and how they were solved. In the first chapter I will talk about Go through the eyes of a PHP developer, in the second chapter about finding Zen for parsing Kinopoisk, and in the third chapter about the undocumented feature of Telegraph.

1. $ alexander-> useLanguage (GOLANG);
My name is Alexander, I'm 21 years old. I do web development and most often write in PHP.
')
I can not say that PHP is the language of dreams. He, like any other language has strengths and weaknesses. However, I began to notice that I was tired of PHP - I was gradually tired of developing in this language, its children's sores, like similar functions, which take similar arguments, but in different order, not always predictable behavior and, of course, weak typing. Thus, for the next product, I decided to use Golang. At the time when I started, I knew about him this:
- Strong typing
- Not very many keywords
- Gorutiny - this is a convenient parallelism out of the box
- It is said that language is simple and predictable.
Also, I used to go through the Golang-book with boredom. At first, everything was very unusual ... Well, the first 3-5 hours. Yes, the entrance to the language is very simple. The lack of magic and abundance of keywords, as well as predictable behavior, do their job - if you are already familiar with any programming language, it will most likely not take you to dive into Go. Here is an important remark: If you have been writing one-page pages for three years, and the experience ends here, I take my words back. Predictable language and strong typing allow you to write a very large amount of code without compiling a binary to run and check. Of course, there are runtime errors, but after PHP it is a breath of fresh air - you know, I made a mistake myself, and the error is not obvious.
With the organization of the code in Golang, everything is simple: “Here is the directory for you, at the same time, this is naming, by the way. Keep everything here. ” And ... It works. It is so simple in design and support that tears of happiness are self-evident. To be honest, I do not know how big a project can be created with such an approach. I looked in the repositories of several large libraries - it looks sanity, but I can’t tell you about support. Subjectively, PHP code base of the same size is harder to maintain than Go.
For the sake of justice, convenient and obvious work with arrays (slices) is not about Go:
From the point of view of Golang, everything looks logical, but from a human point of view it is a bit strange. This topic is covered in more detail
here .
Also, for parallel work in Golang, gorutines (streams) are used, while in PHP it is common to use forks (processes). In my project there are not so many places where I could apply gorutiny. However, where they are used, it looks so logical and simple that you don’t feel like returning to forks. Since forks are processes independent of each other, they usually use a third party to communicate with each other: Redis or Memcache. A similar problem in Golang is solved with the help of channels - the part of the language that is available out of the box. Just think about it! Parallel work out of the box, and even with the support of synchronization. Before, I never even dreamed of such a thing. I do not think that I demand too much from PHP, because the tasks of parallel work in modern backend-development are common. Also, I do not want to say that Golang is a panacea for all the problems of mankind, but after experience in developing in PHP, solving similar tasks on Go, in addition to the result, is also a pleasure.
2. Alexander.NeedInfo ()
At some point, the API I used to get information about movies from Kinopoisk ended.
And, apparently, forever. It was decided to write our own Kinopoisk parser (guys from the Kinopoisk team, do not throw slippers, better make a public API).
v1 - Lonely Hero
The first implementation was simple and in the forehead - a lonely PHP script settled in the project, and when accessing it, it took the address of a random proxy server from the queue and sent it through the film to the Film Search. Sam parsing the page also took place in PHP. Due to the fact that the lonely hero did not use cookies, Kinopoisk banil (started to show captcha) each address after a single request, and yet not all proxy servers were fast.
It would seem to realize the support of cookies, and that’s the end of it. However, I noticed that even with the support of cookies, a film search shows a captcha to my parser more often than it does to me in the browser. I decided not to investigate the protection of Kinopoisk from parsing in more detail, because I understood that it starts to smell like running js code on the client.
v2 - Full client
The next version of the parser was a web server on Go, which, on a GET request, launches PhantomJS with the necessary parameters and the transferred film ID. It worked. I no longer needed proxy servers, I went to Film Search directly from my IP. I had session support, a full browser and, on the whole, everything was convenient. But it was very slow. PhantomJS honestly waited until all the static was loaded and all the necessary JS code was executed. Besides the fact that it was slow, it was very expensive in terms of resources. On the analysis of one page took 100-150mb RAM. The reason for the shot in the head of this version was the gluttony of PhantomJS and its unstable work - for example, its processes did not always end, remaining hanging in the running and not freeing memory after them. I tried different versions of PhantomJS, I tried to complete the processes behind it using a web server that initiates its launch, but the result was always the same: Yes, it works, but it is voracious and unstable, although, of course, convenient.
v99
In the process of searching for the Holy Grail for parsing the Film Search, I lost count of how many versions of different parsers and their modifications I managed to create. As a result, I called the next version ninety-ninth. Ninety-ninth version was written in PHP. I used Guzzle (HTTP client for PHP), supported the session and tried to be as close as possible to the user's browser in my behavior. I refused support of JS. Captches, of course, are shown, but much less frequently than in the first version of the parser and, in principle, this option can be called comfortable. On this version I stopped.
Also, I know that, upon request, Film Search can provide access to its API, but I did not consider this option: even if I had access, it could be a potential point of failure, because access can be taken at any time.
3. Video.Publish ()
After the war with Kinopoisk, I found myself in a situation where I was ready to give the user a link to the video, but there was nowhere to play it: Telegram Bot API does not provide convenient functionality for showing the video by reference, and register the domain, host something other than the parser bot and I didn’t want to engage in the development of the front.
What to do?
We will publish a video somewhere. Having a little thought I have decided that the
Telegraph can quite pass for "somewhere". A site that is de facto used to publish articles from Telegrams? What you need! One problem - you can not publish a video link (except YouTube or Vimeo).
And if you search?
Looking at how easily and dynamically blocks are created on a page, and by pressing just one button, an article is published, you involuntarily wonder: How does it work? Especially if you are looking for a place to publish content. I decided to find out.
And what did I see? [{ "tag": "p", "children": ["Story"] }, { "tag": "p", "children": [{ "tag": "br" } ] }, { "tag": "figure", "children": [{ "tag": "div", "attrs": { "class": "figure_wrapper" }, "children": [{ "tag": "img", "attrs": { "src": "/file/a2e8087fbc53679c14fa1.jpg" } } ] }, { "tag": "figcaption", "children": ["Pff"] } ] }, { "tag": "p", "children": [{ "tag": "br" } ] } ]
The POST request for publishing contains JSON, which is suspiciously similar to HTML markup. And let's try to add a video tag, according to the structure we have? And let's. A little patience and we get this ...
Structure [{ "tag": "p", "children": ["Story"] }, { "tag": "p", "children": [{ "tag": "br" } ] }, { "tag": "figure", "children": [{ "tag": "div", "attrs": { "class": "figure_wrapper" }, "children": [{ "tag": "img", "attrs": { "src": "/file/a2e8087fbc53679c14fa1.jpg" } } ] }, { "tag": "figcaption", "children": ["Pff"] } ] }, { "tag": "p", "children": [{ "tag": "video", "attrs": { "src": "https://www.w3schools.com/html/mov_bbb.mp4" } } ] } ]
If you execute a POST-request for editing with the above structure, then
an arbitrary video will be added to the publication by reference . That is necessary.
It was not there
Everything works and there are no problems with it. The trouble is that most of the attributes are not supported, which means that you can forget about subtitles, or, for example, a video poster. That is, the solution came out of the category “say thank you, that there is any”. Without thinking twice, I decided to use XSS in order to be able to customize the player. Probably, somewhere in this place, normal development ends, but there was nowhere to retreat: it was necessary to organize the publication of the video. I tried different ways of introducing third-party code into the page, even
through a picture , but it was all in vain and the Telegraph survived heroically. However, I am not an expert in the field of information security. Perhaps, if I spent more time, I would find a working version of XSS for Telegraph, which I would use exclusively for customizing the player, however, I left this idea. I tried several more sites to publish my content, but everywhere something was missing or something did not work. Thus, I did realize the video player on my side ...
However, this is another story. PS If this article is read by Telegraf developers: Please add the publication of a video by reference to the interface, since such functionality is available.