[Part I. Delivery of video content] [Part II. CDN do it yourself]
In continuation of the topic about the
delivery of video content : we ensured the storage and processing of content, how can we now deliver the content so that it is as close as possible to the consumer? Most of the article will be devoted to a generalized approach of geographically distributed distribution of content, and at the end, as an example, the described approach will be applied to the delivery of video files and broadcasts to end users.
In addition to the fact that the content was delivered to the user, we must ensure the quality of content delivery. For a FLV video file, this means that the speed with which it is delivered to the user must be higher or equal to the bitrate of the stream, otherwise the video from the user will “shut up” when viewed.
')
In addition, it makes sense to “bring” content to the user geographically. This is due to the bandwidth of the channels (sometimes lacking good trunk channels), as well as the difference in the cost of local and external traffic for the end user (for example, in regions of the Russian Federation).
Such a step must be made if one wishes to enter the international market, as well as with regional development within the Russian Federation. Today in the regions very often the most popular sites are regional portals that provide various services, including a video hosting service, and their popularity is due to both the cost of traffic and the access speed / response time. You can imagine that the user is ready to wait for the page to open, to load the player, but it’s hard to assume that the user agrees to watch videos that are interrupted due to constant buffering, or to watch a broadcast that reaches the user in the form of a slideshow video).
Thus, having realized the need for geographical distribution for content, we buy / rent servers in close proximity to the consumer: in Europe, the USA, Ukraine, Yekaterinburg, etc.
Work mechanism
In the abstract, two entities are involved in the content delivery process: a resource, our content (for example, broadcasting or video), and a visitor. It is necessary to begin to find out where our consumer resource is - a visitor to our site. We don’t have to hope that he will tell this information about himself, so we take the visitor’s IP address, perform a search in the GeoIP database (of which there are quite a few today, both paid and free), and we get the output information about the user's location : his country, region, city, name of his provider.

The resource (content) is always located on a specific server, and the server has its own physical location, which we know in advance. Therefore, the resource through the server also has its geographical location: the same country, region, city, etc. In addition, we can create copies of the resource on special mirroring servers, thus the same resource will be available on several servers, and therefore, in several geographic points.

Choosing the resource nearest to the visitor
Now we have a visitor, his location, the resource he wants to receive, as well as all copies of this resource with their locations. Now you need to select the copy of the resource that is closest to the user. How can I do that?
It would be logical to calculate the distance from the visitor to the resource and all its copies and select the copy of the resource closest to the visitor. How to set this distance?

This distance does not always coincide with the distance on the map between two points, and is rather a measure of the quality and capacity of the channels between regions, countries or cities (or even between individual providers). In the first approximation, it is enough to set the distance between individual countries and cities. An example of such a distance arrangement is shown in the figure above.
Thus, we get a weighted oriented graph on which we need to solve the problem of finding the shortest path (minimizing the sum of the weights of the edges of the graph included in this path). This is a classic problem that can be easily solved in polynomial time. Although the graph is loosely coupled, its dimensions are still quite large (in the example above, only a small part of the real graph is displayed), so real-time calculation for each visitor is not the most effective solution. But we can “cheat a little”: we know that for all searches for the shortest path, the end point of the path will be the places where the servers with resources are located, thus the number of different end points of the desired path is fixed and relatively small. Now it is enough for us to calculate and cache, for example, in memcached, the lengths of the shortest paths from all vertices of the graph (where visitors of our site can be located) to the locations of resources.
Already this processed information will be used in real time when a visitor requests for a resource. If we find several copies of the resource, which will reach the shortest distance from the visitor, we will select any of the copies randomly (possibly, in accordance with the weight of the server where the resource is located).
Copying resource
The selection scheme works great when we have copies of the resource on servers scattered around the world. However, how are these copies made? It would be unreasonable to copy all resources to all servers, it is better to choose those resources that will be in demand by a specific audience.
To do this, we propose the following solution: when a visitor applies for a resource, all geographic places where the resource could be, but is not currently located, receive a bonus:

where
distance is the distance from the visitor to a given geographical location, and
k is a certain coefficient that determines how quickly the resource will be moved to the specified server. If there is no way from the visitor to this place (ie, the visitor is far away), then the distance will be equal to infinity, and the bonus will be zero. If the path exists, the resource will be copied to the server at this point the faster, the closer it is to the visitor and the more such visitors ask for access to this resource.
As soon as the bonus exceeds a certain threshold, the resource is copied to the server located in this geographic location, and the algorithm for selecting a copy of the resource described above already comes into play.
Geographically Distributed Video Files and Broadcasts
Now we can apply the above approach to the content of the video hosting: broadcasting and video.
Video files In this case, the resource is the video file itself (of any type, like FLV, for example, the original video). Resource Copies — Copies of a file that reside on mirroring file servers. Accessing a resource — downloading a file, playing a FLV video by a visitor in Flash Player. Copying a resource is simply copying a file from the main file server to one of the mirroring file servers located at different points.
Broadcasting . For broadcasting, the situation is very similar, here the resource is, of course, the broadcast itself, located on the main broadcasting server for it (the server to which the broadcast author is connected). Copies of the resource are all retransmissions of this broadcast on other broadcast servers. Appeal to the resource is the “entrance” to the broadcast of a new visitor. Copying a resource is opening a relay on a broadcast server located in a particular point in the world.
In the case of broadcasting, it is interesting that relaying is at the same time a way to deal with high load and means of bringing content closer to the consumer, that is, relaying is carried out simultaneously in two planes: on the one hand, to satisfy the requests of all users with the most closely located content. On the other hand, it is necessary to ensure in each geographic point such a number of retransmissions so that the load on the broadcast servers remains normal.