📜 ⬆️ ⬇️

Media Restart: Overview



I happened to work ( fb ) in the Internet edition of Lenta.ru. Walk the path from the developer to the technical director. Successfully implement a full restart. Along the way, doing similar projects on a smaller scale. Now we are working with the team in preparing the restart of the Internet newspaper Vedomosti ( fb ).

I'll tell you about the development projects of media publications. Galloping across Europe, hooking on the main topics. To you, dear readers, please outline the questions that need to be disclosed in more detail. For example, my colleague plans to write about the deployment of the system, a fault-tolerant scheme of the site.


')

Technology


One of the questions that interests the management of the publication is what language will be developed. The question arises purely for practical reasons - the cost of developers. The answers to this question are quite simple.

First, the choice of language is based on how well the leading developers know it. If you have developers and they speak A, then it’s stupid to require them to know B. Language. Replacing leading developers is not an easy task.

Secondly, there are popular languages ​​on the market: PHP, Python, Ruby and others, less popular. There is some misconception that some are cheaper than others, or developers of one language, easier to find than developers of another. The fallacy is that it is very difficult to measure the level of professionalism among developers of different languages. Everything becomes confusing when they compare the experience and the desired reward of each. As a result, we have: a suitable developer is equally difficult to find for any programming language. Either there are a lot of them, but they are all of low qualification, or they ask a lot, or they are specialists of a different profile, etc.

There is one more factor in choosing a development platform. Usually publications are part of some rather big holding. And depending on the degree of development of the corporate culture, an appropriate decision will be imposed on the publication.

At one time, Lenta.ru switched from Perl to Ruby, while in the Rambler practiced Python. Now Vedomosti is switching from PHP to Ruby, while PHP was not imposed on us.

The selection of related tools, such as an operating system or database, usually does not bother anyone.

Ultimately, for publication it does not matter in what language the project will be written, what libraries and services it will use. More importantly, the team that makes the project, owns the tool at a professional level.

Formulation of the problem


The process of a full restart is a long process. But this is not a reason to relax at the beginning, and to work at the limit in the end. Managers are determined by the goals they want to achieve, the tasks they want to accomplish. Editorial teams are dozens of people. And the number of the entire state of the publication can easily pass for a hundred. If the restart implies significant changes, it means that a very large number of people will need to change their approach to work.

To do something good, you need to abstract from the current state, to model the ideal world, and already after that, to build a strategy of transition from one state to another. Otherwise, you can trample around the old well. This problem is solved by the chief editor and the management of the publication. Analyzing statistics, studying the best practices and setting ambitious goals, they form a concept.

As a result, designers get the task of visualizing a new face of the publication. Developers think over strategy of an embodiment of idea. The difficulty of the task is that, among other things, it is necessary to create a tool for each department - editorial, marketing, support service. That is, in addition to creating a public site in different ways, you need to create a management system for the editorial, commercial department, marketing and support services.

The reality is that in the public site every detail will be thought out by the editor-in-chief and traced by the designer. Creating content management systems is often given to developers. Here, of course, is not a complete amateur: collecting requirements, describing work scenarios - everything is in the best traditions of clever books. But there is a lot of room for fantasy.

Puzzle


When creating a media publication project, it is convenient to build a service-oriented architecture. This will allow to engage in parallel development of independent parts of the project. The main thing is to have a big picture and fix the interface of interaction between services. System components can use common code, access each other using some protocol over TCP connection, use shared data storage.

The public site is almost always worth separating from all internal services. For the site to work, you need an application to generate HTML pages or JSON for API. Depending on the architecture of the project, the application can directly connect to the classical database (MySQL, PostgreSQL, MongoDB ...) or use the service layer. Such an application can be easily scaled. All internal services, including content management systems, can be closed by IP addresses, which adds a plus in favor of security.

Database selection is based on what data structures you need to store and retrieve.

For example, in Lente.ru for a public site, we used MongoDB. We needed to receive documents with a complex tree structure. In this case, the main database we had was MySQL (later migrated to PostgreSQL). In the main database, everything was stored in a normalized form. In MongoDB, the data was in a user-friendly form. For synchronization between the databases, we used a background service, which monitored the changed data, created a new view for them and recorded it in MongoDB. The service was based on the queue in Redis, where there were messages about changes.

In the case of Vedomosti, we chose a different path. In PostgreSQL, in addition to classic data structures, there are internal types: array, hstore, jsonb. Thanks to them, it is possible to simplify the storage of links between associated entities, dynamic attributes, and complex tree structures. Thus, we get in the face of one service normalized data, while they are presented in a convenient form for a public site, of course, with some compromise with respect to MongoDB.

The editorial content management system (CMS) is the main editorial tool. With its help, content is created. Managed and organized the editorial process. Formed a picture of the day on a public site. With a public site, they only share the database. This is a standalone application, access to which is carried out only through authorization. Even better, access is limited to a specific list of IP addresses. It is from this system begins the development of the project.

There are special performance requirements for the statistics collection service. To begin, let's answer the question: “why is it needed?”. Often editions need to collect statistical data in the context of parameters that are not in third-party services. For example in Vedomosti practiced subscription access. Accordingly, there is a request to collect data from a specific group of users. Information about the status of the subscription to third-party services, we do not pass. At the same time, we do not need to build a full replacement for open metrics. Only minimal functionality covering analyst requests.

Such a service is also divided into components: the collection of events in an intermediate queue, processing and writing to a normalized database, sampling of aggregated data. We use Golang for the first two components, Redis for the intermediate queue, PostgreSQL for normalized data, and Ruby for aggregated data.

The media content storage service is also rendered into an independent application. Its task is to accept the file, save it to disk, return a data structure that describes the necessary meta information: path, size, type. If we work with images, then the service should be engaged in the generation of versions of different geometric sizes. Some analogue was described in the habr article . Regarding media content there are a few points.

The first is accessibility control for specific files. You can post a note that uses some images. At one point, you may need to hide the image by reference, but not remove it from the internal photo bank. In Ribbon, we used symbolic links. If the file is public, a link was created for it. If the file must be hidden, including when hiding a linked note, the symbolic link was deleted. In Vedomosti, the division into public and closed images is realized through file permissions.

The second is the content delivery network. Once upon a time in the tape, video was distributed to users directly from their servers. And then came the moment when we scored our gigabit with an interesting video in high quality, and our website began to slow down. By itself, the right decision is to use the CDN, since today it is a very affordable service. As a test, at one time we used a third-party CDN service for static images. But in this matter, the practical use was estimated too expensive, and the financiers refused us such pleasure.

The third is image scaling. Depending on the context of use, the image is scaled in several versions. The specificity of media publications implies that there is an individual person - a bild editor, who keeps track of how the images turned out as a result of resize and crop. And if he does not like something, he must have a tool that allows you to replace a specific version of the image. Otherwise, you can get images of women with severed heads in the Miss World photo gallery in the preview version.

Media content is also audio and video. It is better to entrust the work of converting video to different quality on a professional video platform.

It is useful to have a separate authorization service for site users. Often it is closely related to the commenting system .

Development process


Earlier, I wrote that you need to pay due attention to design. When this is necessary not to be afraid to write code . It is impossible to foresee everything, to make a good system from scratch. It is necessary to develop the system iteratively. Wrote the conceived functionality, analyzed, be prepared to rewrite again. This is normal. The time you spend writing code is a lot less than the time you spend thinking. Your code is not carved on the rock.

Short paragraph: version control systems . For some reason, it is important to focus attention on this.

Modern approach to the development involves writing tests . The difficulty is that in publications with the legacy of the leaders there was a certain idea about the speed of development. And here you come, with fashionable technologies, the motto that now becomes easier. As a result, similar tasks are done either the same time as before, and even longer. You understand that tests are a good thing, and you are trying to convey this truth to the chief editor. Most likely you will fail. Editors need a product, and developers need tests. The fact that for the quality of the product you need to spend time writing tests is just a given.

Do not change your working environment in the active phase of the next development stage. With the release of operating system updates from developers, lovers to be at the cutting edge of new versions, another day can often be lost. It is like the health of a soldier in the army. You need to be in working condition, your workstation should not fail, just because yesterday a new version of your favorite OS came out.

A bit strange. In addition to the main activity, you need to find time to work on third-party projects . With an insidious goal: run in new technologies and practices. It is not always possible to use various interesting solutions in the combat system. Do this in small side projects.

Dynamic programming languages ​​allow extensive use of metaprogramming . This can make life much harder in the future for both you and your colleagues. If possible, try to prefer a simple code magic spells.

Monitoring and profiling


The editors use various metrics to analyze the success of their activities. Developers should take care of profiling tools and monitoring their applications in advance. This is a very dangerous task - to run the system without status tracking. It's not just about the performance of memory consumption and CPU time. To create a good application, you need to track the entire stack of the program. Take the time to set up such an environment.

Database selections may change as project requirements change. And again, profiling comes to the rescue - you will quickly learn what needs to be optimized.

Prepare for stress testing data as close as possible to reality. Run testing for a long period of time and learn the weaknesses of your application.

Process of moving


Moving from the old platform to the new is tied to the editorial board and archive. The editors must restructure their processes, get used to the new content management system. archive transfer must be with preservation of referential integrity.

For editing there are three stages. At the first stage, they test the CMS in free mode. At this time, something may break, change on demand. The second stage begins after the final import of all content into the new platform. At the same time, the new site is not yet publicly available. The editors work in two systems - the old and the new. This is due to the fact that most often there is no backward compatibility in the structure of old and new content. Usually this period lasts a week. And already from the moment of public restart, the third stage begins, when everyone is happy, the editors are working on a new platform. The old site stops working.

Moving archive is preparing for quite a long time. With each import run, there are some bugs, after correcting which import starts from the beginning. Notes may change their addressing. In this case, it is considered good practice that the transitions on the old links initiate a redirect to a new address. To do this, you need to prepare a routing table in advance. It will be needed in the future. There are situations when it is necessary to change the address of the page, while it is necessary that transitions to the old address also lead to a redirect to a new one. You simply note the list of associated addresses, one of which is marked as the main one.

mobile version


We made a mistake, albeit a forced one, when we restarted Tape.ru without a browser-based mobile version of the site. But it was a deliberate risk, and we promptly corrected, besides, we already released two versions - pda and mobile. The first was designed for old phones, with a minimum of images. We called such phones “alconocia”. The second for smartphones with large displays, ala you know what. Over time, we began to focus on the mobile version by implementing an automatic redirect from the desktop version.

In addition to the implementation of an automatic redirect, we implemented the ability to memorize the choice of version. That is, if you logged in from the phone, you are transferred to the mobile version, but you do not need it, you choose the desktop one. Now, with each next entry, you will not be sent anywhere automatically.

It is also a good idea to show the user a message stating that we automatically redirected it to another version.

We implemented this logic on the side of nginx. With the help of a scary regular expression, the type of device was determined - mobile or not, and the $ismobile = 1 flag was $ismobile = 1 . We looked at the value of the cookie with the name view_version, which determined the stored value of the preferred version. When you first visit the site, this value is not defined. Below is an example of code that determined whether to redirect or not:
 if ( $ismobile = 1) { set $mobile_rewrite 1;} if ( $cookie_view_version = 'm' ) { set $mobile_rewrite 1; } if ( $cookie_view_version = 'www' ) { set $mobile_rewrite 0; } 

Accordingly, if the value of the $mobile_rewrite variable is equal to one, then we redirect to the mobile version, simultaneously setting up a one-time cookie that served as a trigger for displaying an informational message.

Setting up services


In the continuation of the configuration of the web server can be noted a few points. While the main data transfer protocol on the web is HTTP / 1.1, it is important to use several domains to distribute statics. If you use custom fonts on the site or make calls to the API from the client page, then do not forget to specify the correct CORS headers in the settings of the corresponding web server.

When building a service-oriented architecture, it may turn out that some of your internal services are not protected by application authorization. As an example - a separate image download service. Only authorized editors should have access to it. In this case, you have a separate authorization service for the same editors. The authorization service is primitive - it receives the headers, and responds positively or negatively. With the help of the ngx_http_auth_request_module module, we can make a subquery to the authorization service for each request to the image upload service. A live sample configuration can be viewed here .

There is only two hard things in Computer Science.

- Phil Karlton


For the name of the hosts on the servers in Lenta, we used cigarette brands. For Vedomosti choose from the name of star systems. Applications names are chosen among the species of birds. For example, the rooster was engaged in the generation of statics in older versions. In Vedomosti, the talisman is a big fish - big fish. Shark business.

The projects of media publications in Russia are not highly loaded. The last peak of attendance in Lenta in the spring was about 20 million views per day. It is quite simple to build a system that can withstand such loads without caching. We practice the same use of the cache, for a short period of time - a few tens of seconds. This removes the problem of cache invalidation. Correction of typos on the site is delayed by less than a minute. At the same time, this allows you to use such a wonderful option in nginx, like proxy_cache_lock . Out of ten identical requests to the web server, only one will be sent to the backend. This allows you to evenly distribute the load on the application.

Visualize a small DDoS attack:


Backup


Needless to do backups of data. It is convenient to have a hot reserve and a full backup periodically executed.

Hot spare will save you when you need to restore the most recent version of the data. The easiest way is simple real-time replication.

In case if an important piece of data is removed from the data by a stray movement, some of them can be restored from the daily backup.

People


Be attentive . Over time, some processes become routine. Naturally, there is a desire to automate something. At the end of the next development stage, you look back, and your hands are itching to do better. We must try to patiently understand each situation. Stop, try to look at your work differently. It may happen that those tasks that seem important to you are not. And vice versa.

Plan your work . Spend enough time designing. There is no need to quickly assemble a prototype using your favorite framework just because it gives you this pleasure, and you know what you will do in the next couple of hours or days. Do not have time to look back, and you already do a lot of work, which with a high probability must be redone or completely thrown out.

Be friendly and patient . Working in a team, it is important to be able to communicate with colleagues. Most often, more than one person works on a project. This means that you will form relationships between colleagues. The result of all your work depends on how well you treat a friend to a friend. This is a professional and personal relationship. Voltage will increase as the project date is approaching. And the teamwork of the team depends not only on Vasya and Petit, but also on you.

Your team should develop a format for professional communication . The simplest example is the formulation and execution of a task. You must clearly articulate the tasks. It would seem the most obvious point. But still the rake stably gets in the face. The fact that you have a thought in your head does not mean that it flies in the head of a colleague. The task, the retelling of which takes several minutes, is very difficult to describe in two words. Try to express your thoughts with the words “on paper”. But already after a colleague reads your opus - talk! Discuss the task, make sure your thought is understood correctly. In the opposite direction, the principle is the same. If you have completed the task, tell us about it. You can not imagine how much time it will save you.

Department positioning


I would like to note a little the role of developers in the publication. It is necessary to understand that the development in relation to the editorial staff is a certain attendants. Our task is to give a handy tool. It is unacceptable to arrogantly treat editors who do not understand why it’s impossible to make a one-to-many relationship instead of a one-to-one connection. Well, they downloaded a video file instead of a picture. The system should not crash with incorrect user input. If you are waiting for a number, and a string has flown in to you, it means that they themselves are to blame for having admitted this, and not the editor is disabled. Try a day to write a dozen other news without copy-paste, with an analysis of sources to confirm the facts.

It's not about doing all the whims of the editorial board. You just need to be patient with the desires and wishes of your colleagues - because you are doing a common thing.

Total


Restarting the media edition is not much different from others. The cohesion of the team is still important. You should be one step ahead of the editorial board, anticipating the new functionality that they want to bring to life. We design - we make, we design - we remake. It is fraught to underestimate the role of monitoring the status of servers, applications, client side. After the restart, you have to pay a lot of attention to analyzing the behavior of your new platform. Make it through.

We look forward to your questions, perhaps to continue.

Source: https://habr.com/ru/post/243939/


All Articles