Experience Tutu.ru: how the schedule of trains

Commuter trains - electric trains - remain one of the most popular types of passenger transport in Russia. Over the year, they are used by millions of passengers, who travel a total of hundreds of billions of kilometers on thousands of electric trains. Only in January 2017, according to the Moscow Department of Transport, published in the unified data warehouse of the Moscow Government (EXD), the passenger traffic of suburban railway transport amounted to 42.6 million people. This is 4.1% higher than last year.

The availability, accessibility and accuracy of the timetable for the running of electric trains excites every passenger, and for those who shape it and convey it to people, it is an important and very difficult task.

My name is Alexander Podlevskikh, I am a leading development engineer of the company Tutu.ru, a team leader in a team of electric trains, and in the article I will tell you about the technical details and difficulties of building an online schedule, how it all works, how we use the data provided by Russian Railways, and how our users help us keep the schedule up to date without realizing it.
')

The train schedule is a display of the train movement process in the Cartesian coordinate system. In this form, the schedule of trains on the railway is presented.

On the territory of Russia, there are about 30 suburban passenger companies (PPC), each of which is responsible for passenger service in a certain area. Each CPD, on the basis of the transport order of the regions, the wishes of passengers and research, forms proposals for changing the timetable, which are sent once a year (approximately at the beginning of the summer) to Russian Railways.

JSC “Russian Railways”, having received applications for the schedule of electric trains, passenger and freight trains, has been developing a new train schedule (new schedule) for several months and is putting it into operation at the end of the year. This schedule, which operates from the second Sunday of December of each year to the second Saturday of December of the following year, is called normative or basic. As a rule, it is posted on large stands at stations and platforms, and it is printed in books that can be bought at the checkout at a number of stations.

At the same time, the majority of passengers of large railway junctions (for example, Moscow) know that there is a little sense in the basic schedule in areas with heavy traffic. The fact is that regular repairs are required on the railway, which in most cases cannot be carried out without partial closure of trains. And then JSC “Russian Railways” develops a variant train schedule, providing, for example, alternate movement of trains one way in both directions on one of the stretches. In other words, temporary changes are made to the base schedule. And such changes are made to the schedule constantly, and not only because of the repair work.

Before the advent of Internet resources with a schedule, it was possible to learn about temporary changes mainly from announcements at the stations. Moreover, the announcement is not a schedule for a specific day, but, as a rule, an A4 paper with changes to the basic schedule. And such leaves could be several for one day. For example, the electric train went in one of them at 15:50 instead of 15:30, in the other - the same train went with the omission of a number of stops, and the third went with changes for the late evening where few people watched (for example, the electric train was started which was at 15:40 instead of 22:00). By the way, many stations still have such announcements. As an illustration, an example from life can serve: once my colleague from Tutu.ru decided to go from Moscow to Rzhevsky district with a transfer, traveled by train from Moscow to Volokolamsk and found out that the suburban train he needed to Rzhev would be only tomorrow, in Moscow there was no information about this.

With the advent of online resources, it became much easier to find out the schedule - just go to the site or to the mobile application , enter the departure and destination stations, the date, and the system will show which trains will be on that day, taking into account all the planned temporary changes known to date . No need to study a bunch of pieces of paper with changes. And Tutu.ru became the first online resource in Russia, where as early as 2003, not only the basic schedule, but also temporary changes were published.

It was not easy to create such a resource and keep the schedule up to date. Changes had to be monitored manually: the service creators themselves drove around the stations, took pictures and rewritten the schedule and announcements. It is clear that it was physically impossible to go round all the stations, so changes to the timetables were made with inaccuracies. And here our users helped a lot - they wrote and called Tutu.ru, provided us with first-hand information.

But still, there were mistakes in the schedule, so we began to look for additional sources of information. So, soon after the appearance of suburban passenger companies (PPC), which were also interested in informing passengers correctly about the schedule, we agreed with them about getting data on the schedule and changes to it for all trains at all stations. The appearance of this source of information has significantly improved the quality of the schedule. If in 2005 almost every user encountered at least one error on the site, then after 10 years, the overwhelming majority of users always saw an accurate and reliable schedule.

Since errors in the data from the control panel are rare, but they occur, and our operators also sometimes make mistakes, we did not stop and connected another source - the Central suburban timetable database of the Main Computing Center of Russian Railways , to which employees of Russian Railways make the schedule of suburban trains and changes to it throughout Russia.

How does the Tutu.ru service keep up to date

Now on Tutu.ru you can find information about schedules, routes and schedules of movement of electric trains in 17 "regions" (conditional breakdown of the territory, approximately along the boundaries of the actions of the responsibility of the relevant CPD). Tutu.ru receives the data of the main schedule before its introduction into action, as well as information about temporary changes in the schedule (options for the movement of electric trains for specific days).

This information gets into our database by the operators, who in a semi-manual mode, enter it through the interface. In those areas with which we have no partnerships, specialists manually browse the sites of schedules and enter data in manual mode. Such an approach requires a lot of effort and can lead to errors, as a result of which our schedule does not fully coincide with the real one.

When we decided to connect the MCC database, we didn’t know exactly how to use it. Initially, it was assumed that this would become an additional source of data for obtaining more data, possibly more accurate. It was known that some details in the schedule model are different for us and in the MCC system: for example, the train moves along the same route, arrives at the final station, stays at this station for some time, then it changes its number and it continues moving but on a different schedule and route. As a rule, in the MCC system these two trains appear as different, and there will not be this train on the RZD website from the station from the first section to the station from the second. We have such situations handled individually, and if there is confidence that this train simply stands at the station and then continues to go further, changing the number, then it starts as a single object. He will have a composite number — the numbers of the original electric trains, indicated through the separator “/” —and this train will be present as a result of the search for electric trains between stations from different sections.

Changes that are sent from some PPK (for example, CPPK or SZ PPK) to partners / subscribers, contain data not about all the electric train stops, but only about individual points (railway station, checkpoint, junction, track, etc.), and the time it takes for the train to reach intermediate points (such as a stopping point, platform, and others) at which it, however, stops, each partner calculates in his own way.
Consider an example: the electric train No. 6,600 in the Riga direction follows the usual schedule daily and has a stop in Nakhabino at 5:04, Opalikha 5:10, Krasnogorsk 5:14, Pavshino 5:18 and more. On July 9, the train’s schedule changes and the carrier’s information comes in that the train will leave Nakhabino at 4:57 and Pavshino will continue to follow the standard schedule.

The data in the MCC is entered as follows: at Nakhabino and Pavshino stations are 4:57 and 5:18 respectively, and the transit time of intermediate stations is calculated in proportion to the original walking schedule, i.e. in the ratio 6: 4: 4 (as if the train in this area goes slower) and it turns out, the stop in Opalikha is transferred to 5:06, and the Krasnogorsk train will proceed to the station at 5:12. For a long time, the calculation algorithm on the Tutu.ru website was similar, and in 99% of cases this is exactly the movement that a train will have. But there were cases when the reason for the change disappeared (for example, repair) and the electric train was moving at the site with normal speed. In our example, this would mean that she would get to Opalikha in 6 minutes (5:03), then to Krasnogorsk in another 4 minutes (5:07), to Pavshino in 4 minutes (5:11). After that, in order to follow the schedule, the line-up would be at the station until 5:18 and would go on according to the schedule.

What would this mean for users? The user who came to the station Krasnogorsk at 5:10, in the end would wait for the next train. Due to such cases on the website Tutu.ru, the time for passing stopping points, for which the exact time is unknown, is now introduced using an algorithm other than the MCC. The time is calculated on the basis of the original schedule of movement or, in general, the minimum time of passage of the train, between the specified stations. We give the user, most likely, the time is less for a few minutes than the train will go. It is better to come to the platform a couple of minutes earlier than a couple of minutes later.

In addition, human factor errors were noticed that were made when the schedule was entered into the MCC system. For these and other reasons, it was decided that it was undesirable to directly import data from the MCC. Instead, it is more important to find out the difference between the data stored in the MCC and in ours. On the basis of these data, as well as on the basis of other sources (including actual checks of trains at stations in difficult cases), specialists will decide which data is “more accurate” (or will be more useful to users).

But before you compare something, you need to establish at least some connections between the objects. Initially, we had no train correspondences, no station correspondences, and no fields for which this correspondence could be strictly established. About 25 thousand station objects and 15 thousand train objects were found in the MCC base, which made it difficult to search for the corresponding train head-on, i.e., searching and comparing each station with each station and each train with each train.

Given the possible discrepancies in the model filling algorithm described above, the comparison would have to be fuzzy. This means that we would not be looking for exact equality of objects, but objects with minor differences in one of the data fields, for example, the difference in departure time by 2-3 minutes on one of the following dates. A fuzzy comparison is quite an expensive action, and given the fact that pairs of objects for comparison would be hundreds of millions, such a method would not have worked in a reasonable time. And as a result, there would be little established correspondences, because initially all the features were not known.

Matching station and train objects

This was done in several stages and several passes. First of all, it was necessary to establish correspondences between the objects of the stations. According to the identifiers that we had in the MCC database, it was not possible to unequivocally establish compliance. For example, in Russia there are 9 stopping points "105 km" and 17 stopping points "106 km". So, comparing the names was not very effective: stations with unique names for which we managed to find unique stations in the MCC database turned out to be about 10%.

In this regard, our timetable specialist Alexey Derkachev, who unearthed the correspondence between the seven-digit Express-3 station codes (which are widely used as one of the station identifiers) and the station code from the MHC database, helped a lot. With the help of this table, we managed to find a pair for about half of the stations that we participate in the train schedules. After we managed to compare at least such a number of stations, it was possible to proceed to the next stage: try to find the same trains.

To do this, the automatic script went through all the found pairs of stations and sampled the train schedules for a particular station. Then each set of received electric trains was compared, and when an exact match was found (ie, the number of stations in the route is the same, the train’s arrival / departure time for each station is the same, the train number is similar, the week schedule is the same), duplicates were deleted and in our schedule there was only one train.

Thus, for part of the train it was possible to find a pair. After establishing the correspondence between the trains, it was possible to return to the stations again - to walk along the found pairs of trains and, since they are the same, then the stations in the route are most likely the same. This gave some more number of stations. After that, you could again try to look for matching trains. Along the way, it was possible to experiment with different search parameters, different assumptions, take into account all the new and new features of the formation and storage of the schedule. After a dozen iterations, the matching base could already be used.

Continuous search for schedule differences

Schedule data of electric trains change quite often: hundreds of changes are made per day, and before the beginning and end of the summer season, changes can reach up to several thousand per day. In addition, the changes are not always simultaneously included in our database, and in the MCC database, especially since according to the interaction schedule, we can upload updates only at a certain time twice a day. And experts can contribute data to their database around the clock.

Each time doing reconciliation across all trains is a time-consuming operation, the comparison criteria may change (for example, part of the discrepancies in one minute can be considered insignificant and these discrepancies can be ignored) and the data itself can change during this time, new trains can be added for which no match has been found. Our clients help us in finding inconsistencies. Every second on the site there is an average of 10 searches for the schedule.

For each search in the background, data is requested from a local copy of MCC data and a comparison is made. If the trains are similar (by number, weekly schedule and time of passage of the station), but for them there is no match, then it is installed. If there is a match, but the data diverge, then the discrepancy found is preserved. In the future, schedule experts will be able to see a general list of discrepancies, discrepancies for a particular pair of trains and find out why the data diverge, and then decide whether to change them on our site or not.

At the moment, the whole system is still in the process of refinement, both in terms of comparing models and in terms of convenience of displaying and working with it, because there are many discrepancies, most of them are minor and do not need to be shown first. Differences between the models and the principles used for the maintenance of the schedule may affect the accuracy of the comparison.

Technical implementation

Several years ago, thanks to the DevOps team, the opportunity to create microservices appeared in our company. It became possible, separately from the monolith, to implement in its service a new functionality.

This is how microservice appeared, which stores all schedule data from the MCC in the same format that the database responds to and implements the API, which responds to search queries (from where to, date schedule and route of a particular train). This is a microservice that responds to search requests with data from our repository, compares two sets of data, stores data about discrepancies in models.

Replenish the base of discrepancies

When searching for electric trains on any route, the website invokes the timetable service, and before returning the result, an event with the calculated data is sent to the bus from it. The comparison service listens for these events, when receiving data, it requests similar data from the data storage service from the MCC and compares the two received sets.

If there is a discrepancy in the data on trains with already established connections, then another event is generated about this, which is heard in the service-store of inconsistencies. If for some trains there is no connection, but according to the data it is clear that they are very similar, then they are connected.

The processes of the work of the specialists of the schedule of electric trains with a base of discrepancies

The processes of updating these discrepancies

Conclusion

The system does not stand still and we are constantly refining it. So far there are differences in the methods of forming and filling in the data models of the schedule. Because of them, a significant number of records accumulate in the final list of differences, which do not need to be taken into account, which greatly complicates orientation in it and, as a result, it is more difficult to respond to problems. But we are working on it.

We continue to work on the automatic establishment of correspondences between the objects of stations and trains. The “team of electric trains” is constantly working on improving the service so that you can use it conveniently.

In the next article I plan to tell in details about the structure of the models, I’ll dwell more on the algorithm of their comparison. In addition, I will describe the revealed differences and how the system evolves based on them. If you have questions on the subject of the article or any suggestions and wishes on the product itself , be sure to write.

Source: https://habr.com/ru/post/333038/

All Articles