📜 ⬆️ ⬇️

How I made a news aggregator

Today I want to share with the community about the history of the development of rating news aggregator TOP.ST.



Start


It all began in the New Year holidays, when the rest is already tired, and before the start of working days there was still plenty of time. As usual, various ideas came to my mind and I wanted to implement one of them. And not in a way to sketch out a concept and abandon (it is already difficult for me to count the number of such undertakings), but in order to go all the way from scratch to launch, without delaying it indefinitely.
')
The idea of ​​a rating news aggregator is not new. The network has both successful projects and not so much. All with their pluses and minuses. I love to read the news, I like to read them fresh, so I often used similar projects and gradually formed a vision of how I would implement this idea myself. And I started.

First of all, I promised myself that I would not delay the development, as is often the case with side-projects, and I will launch a working prototype as soon as possible. For this, it was necessary to use the most familiar stack of technologies in order not to waste time on “entering”. At that time, it was PHP + Phalcon Framework as a backend, MySQL as a database, Beanstalkd as a queue manager.

I started by developing an application architecture plan. As a result, it consisted of 4 parts practically independent of each other:


All these parts interact with each other through the queue manager Beanstalkd and are combined into a single virtual network through OpenVPN.

The process has started


On the very first day of the work on the scripts, the first disappointment hit me. I ran into serious problems with performance and memory leaks, on seemingly simple operations of parsing RSS feeds, parsing HTML and some related tasks. As it turned out, many (even popular) third-party libraries are still not ready to work in daemon mode. When the script works according to the scheme: it started, worked, gave content and died - everything is fine. But with long work, memory leaks and sudden drops start to manifest themselves. Then I realized that if I start to deal with all of this, I’ll get stuck for a long time, but there’s no time for that and I uncovered my node.js.

I had quite a lot of experience working with node.js, but it turned out that by that time I didn’t write on it for a little more than half a year, so I had to go back to the topic. Contrary to my fears, in two days I was again completely imbued with its atmosphere, as if this break had not taken place and things were turning around with new turns.

Design


The design is very difficult for me, because I am not a designer. I believe that I have a good sense of taste, I can distinguish a good interface from a bad one, I love it when everything is done beautifully, and even somewhere in the back of my head I see “how it should be”. But the removal of this vision beyond the head is given to me with great difficulty. But the design needed to be made and wanted to be made good. I spent a lot of time trying to create a "light" pleasant interface. And at that time he even seemed to me beautiful. But now I remember him with horror. The first version looked like this:



After the launch of the first version, when my head got a little cold, I looked at “this” once more and realized that it looks awful and even I myself would not have used it. And I started from the beginning. I don’t know how it happened, but I liked what I got and like so far. The site has received a new face and now it is it that meets visitors.

First working version


It took me about a month and a half to create the first working version, which could already be quietly released to the light. By that time, I had already purchased a domain and had free hosting, as I thought, of suitable capacity. After deploying the system and collecting primary information, I started debugging. It took about two weeks to identify the largest pitfalls, debugging algorithms and collecting data sources.
The site began to work, but the power was quickly missed. VPS with 4Gb of memory and 4 cores of CPU obviously did not cope with its task anymore. And I had a choice: to optimize or add capacity. Optimization is a long, laborious and not always predictable process. Looking ahead, I’m saying that I’m still engaged in optimization and am still engaged. As a result, it was possible to greatly reduce the overall load on the system, but that server would still not be enough, so then I made a decision - I need more mineral resources.

Moving


Prices for servers with the characteristics I needed at that time seemed to me not to be justified. In simple words: I was so sorry to pay so much and I began to look for a way out of the situation.

Since the main load on the server was created by the database and the part of the system responsible for basic calculations and task scheduling, I decided to take it to the server to my home. At that time, I had a MacBook Air 2013 with a Core i7 and 8Gb of memory on board, which I did not use due to switching to a MacBook Pro Retina. I deployed a server on it and connected via VPN to the rest of the network. Fortunately, at home I had a stable Internet, the speed of which was enough for the normal execution of tasks. MacBook Air, thanks in large part to its fast SSD, began to cope perfectly with the task.

It was not so painful to make changes to the database structure, which by that time had already grown over several gigabytes, well, in general, physical access to the server added conveniences, including faster creation of backups. So the site worked for about a month, slowly finishing and improving. The increase in the amount of data processed led to the fact that the load on the MacBook Air has increased significantly. He still managed to cope with it without any problems, but he began to warm up and make noise with the fan accordingly. I felt sorry for him, all the same, he was not intended for such tasks, although he withstood them with dignity.

At this point, I decided to transfer the server to the machine more suitable for this purpose. There was a temptation to buy powerful desktop hardware, but the fear of electricity outages led me to buy a powerful laptop. The choice fell on a laptop company DNS. These laptops can not boast a good exterior: the screen, keyboard, case leaves much to be desired, but the price / filling ratio is very good. I took a model with a 4-core Core i7 and 8Gb of memory. I immediately expanded the memory to 16Gb and inserted the Kingston KC300 SSD to 180Gb. It cost me 35,000 rubles, which, after the fall of the ruble, already seemed to be quite a budget decision.

The server has been working properly for about four months in 24x7 mode and completely copes with its task. In order to cool the system a bit, I installed it between the speakers, so that air could circulate freely under the bottom of the laptop and lowered the processor frequency slightly. In this mode, even in summer, the processor temperature does not rise above 65 degrees, which is quite a good indicator.

During this time, I came across several short-term outages of electricity and the Internet, but since the work of the site does not depend directly on this server, it does not interfere much with visitors. This is manifested only in the fact that the data cease to be updated for the idle time, which is not very critical at short intervals and at this stage you can live with it.

Normal flight


At the moment, the service monitors publications in 28 countries. The interface is translated into 10 languages, the language is selected based on the browser settings. The website design is responsive, it looks equally good on a desktop computer, tablet and mobile phone. For mobile devices, it is possible to save a shortcut for the site to work in application mode. The data on the site is updated in real time, and the interface is implemented on AngularJS.



The service has become a member of Microsoft's BizSpark program. The Azure hosting provided by the program has greatly helped and continues to help in the development of the project, for which I am very grateful to Microsoft. BizSpark is a really valuable contribution to the development of start-up projects.

Lecture hall


The very first announcement of the site was published in the site Site of the day portal Ferra.ru. I also several times laid out a link to Reddit, HackerNews, Twitter and took advantage of the Zuckerberg project Stand. Call. They took the service quite well and gave a few valuable tips. At the moment this is all website promotion. Over time, several reviews of the site appeared on foreign resources, and from there came the first regular visitors not from Russia. A few days ago, a link to the site hit the top of the portal DesignerNews.

In general, analytics show that visitors like the project. In recent months, the average duration of a user’s session on the site was 30 minutes, the bounce rate fluctuates around 10-12%, and the number of returned users is 65-70%. At the same time, 55% of traffic is direct, 35% are links from reviews and blogs. The remaining 20% ​​are social networks and search. Interested users return to the site and turn into regular visitors, which is very pleasing. Although in absolute numbers there are still less visitors than we would like.

Plans


There are many plans to improve the functionality and I slowly implement them, unfortunately not as fast as I would like, because I am engaged in the project in my spare time.

In terms of internal architecture, I first of all want to transfer the entire system to use Docker containers, refactor parts of the code that stopped me arranging, set up a centralized collection of logs from all working parts of the project and, of course, improve the ranking algorithm and its speed.

Of the visible changes, I plan to introduce categories or tags so that you can navigate through news topics, add the ability to subscribe to news digests, finally solve the problem of broken and duplicate links and try to introduce the possibility of commenting.

With the latter, the hardest thing is to figure out how to control this opportunity so that the trolls and other inadequate people will not arrange there another place for the sracha.

In dreams, there is learning how to combine news of one subject into plots, but so far this has been postponed until better times.

Mobile applications in the plans and they will appear as soon as there is an obvious need for them among the audience.

I noticed that the speed of development of the project is directly proportional to the feedback from users. Every time I see a link to a site in a blog, a review on a news resource, or even a letter from a satisfied user in the mail, the work starts moving at a new speed. Feedback is indeed a very important motivating factor.

Conclusion


I'll be brief, because article and so it turned out more than I expected. Thanks to everyone who read it. I will try to answer all the questions in the comments, who do not have such an opportunity, can feel free to write to mail@top.st. Well welcome .

Source: https://habr.com/ru/post/259421/


All Articles