📜 ⬆️ ⬇️

As we schedule public transport in 2GIS added



2GIS helps to navigate the city. You open the application, enter the name of the street or organization in the search, find, rejoice. After the necessary organization is found, a reasonable question arises: how to get there? And if we recently paid considerable attention to automotive scenarios, then the search for travel on public transport was forgotten a little. I will tell you about how the search for travel was created, share the intricacies of collecting and processing information.

Where did the task come from


We love to communicate with users. At the end of 2016, we conducted a survey to find out how our users use public transport. The result was curious - we share.


')
“ How often do you use public transport?” "

In general, in all cities where 2GIS is present, more than half of the respondents use public transport every day. The larger the city, the more people use public transport daily. On weekdays, it is most popular with residents of Moscow and St. Petersburg, and in other cities, people actively use public transport and on weekends.

With the frequency of use figured out, time to see what types of public transport - the favorites of citizens.



“ Which modes of transport do you prefer?” "

Regarding the top positions, the result is quite expected. In large cities, the preferred mode of transport is the metro. Second place for buses. More than a third of Muscovites use the train. In Novosibirsk, they go by minibus, and in Petersburg, among other things, they like trams.

An interesting discovery was that half of the respondents could walk to their destination.

The next step was to ascertain the weaknesses of 2GIS. We came to users with a question - what are we missing?



“ What is missing in 2GIS? "

The problem with the choice of specific modes of transport, we decided using the newly released filters, where the user can specify which type of transport he wants to use. But the question “When will the bus arrive?” Remains relevant for 64% of the respondents.

It was at this moment that we thought about adding a public transport timetable to 2GIS and how all our desires could be realized.

Where to get the data?


This is the first question we have encountered. Indeed, in most cases for our product, we collect data on our own. When a new microdistrict is built in the city, the information collector travels to the claimed location and verifies all the necessary details “in the fields”. With the new feature, the usual approach would not work. Sending specialists to a bus stop is a waste of time and effort, as the schedule is constantly changing. New carriers appear, routes are optimized, winter gives way to spring. During the collection, planned schedules and intervals would simply be outdated.

Yes. Unfortunately, the data is becoming obsolete, and this was the second problem that confronted us. The logical and quite obvious decision was to apply to those who schedule and control this - in subordinate institutions. Often, a constructive dialogue began only through a thorny bureaucratic path and advice, such as: "Write to the Ministry of Transport / Depression, and then we'll talk."

Found responsible, started the dialogue - half the problem.

Next began the marathon with:

  1. Explanations that we want and why we need it.
  2. Beliefs that our idea is useful for residents and visitors.
  3. Proof that 2GIS does not monetize the construction of routes, taking into account the frequency of movement.
  4. The assurances that it is safe.

Victory? And no.

The most curious thing is the technical side of the issue, and in particular, the data transfer format. Yes, in some cities there are automated systems for maintaining a schedule and an API for accessing this data (gtfs or transferring to json in its own format), but this was not expected everywhere. Somewhere they simply offered to parse the site, again, for security reasons, without providing access to the databases. Somewhere we were ready to send files (.xls, .doc, .pdf), but only once, without the ability to timely update the information in our directory.

We assigned the first place by originality to a photo of a piece of paper with a schedule of public transport.

But initially the problem seemed trivial - to get publicly available data from the original source!

Loading data into the internal system


Having accessed and downloaded the data, we faced another problem. You can not just take and upload other people's data into the internal system.

Why?


It's time to tell how the original data is stored inside 2GIS.
We develop all internal products for collecting and storing information. The software for cartographers (who are responsible not only for the map, but also for transport) is called Fiji - a detailed story here (briefly, cartographers draw a traffic graph in Fiji, enter data on public transport, store the timetable. All the collected routes have already been entered into the system ).

The first analysis showed that the routes within our system and among suppliers differ, and in places - drastically. It was necessary to somehow map their own routes and routes of the supplier. You can, of course, do it manually, but we decided to write our own matcher.



GTFS was chosen as an intermediate format for storage as a generally accepted standard, plus some suppliers are able to issue a schedule in this format. For the intermediate database on which the matcher works, we chose PostgreSQL, and the matcher himself wrote it for simplicity in Python.

The match simply did not work out according to the type and name of the route, since the routes diverge greatly in names between us and the suppliers. Match the names of the stops did not work for the same reason. As a result, the matcher works according to a rather complicated scheme, taking into account the geometry of the route, the type of transport and then the names of the stops and the route numbers.

At the same time, there are still errors in the comparison, as the suppliers have a very large number of areas: separately for each weekday, separately for every day off. Errors also occur in the comparison of ring routes if they are set up differently at the supplier and in our internal system (Fiji).

Therefore, the final decision is the same for the person - the cartographer can manually cancel the schedule mapping if he realizes that the algorithm worked incorrectly.

Algorithm


The core of the search algorithm is written in C ++. In fact, the search for travel on public transport is not one algorithm, but several. The search for passage to the nearest stops is considered to be our pedestrian routing algorithm, which, in turn, consists of two algorithms - “ pixel ” (with which we build a passage through the territory without a graph of roads) and the usual one (already on a pedestrian graph to the stop).
In the quality of the search algorithm of travel between stops, we use a strongly modified A * , to which we added support for the accounting of the schedule. And if earlier the waiting time of transport at the bus stop was a kind of “average” time for each project and each type of transport, now either the exact or interval schedule is taken into account.

At the same time, the algorithm had to take into account many funny nuances in the data. For example, a route may have a departure time from a stop at 25 or even 47 hours. From the point of view of data, this means that this is the same flight that went in the previous day, and he just has not completed his work. You also need to take into account that the flight can start walking “tomorrow” and, if the user is looking for a route at the end of the current day, then you need to look in the next day (as relevant, if you keep the schedule by day).

Separately solved the problem of how to combine data with a schedule and without a schedule. In the end, we decided that routes with no timetable still participate in the search, they just have less weight. At the same time, if a route without a timetable coincides with stopping platforms with a route with a timetable, then we simply hook it up to issue, and if it goes somehow differently, then this will be a separate travel option with less weight, since we don’t We know about the waiting time at the bus stop.

Since 2GIS works both online and offline, the algorithm works both inside the application and on the server. Despite the fact that these schedules are more or less static, our server search is also used, because on slow devices, if the Internet is available, the server request will be processed much faster than local search. For server search, we have 8 search backends located in three data centers in Novosibirsk, Moscow and Dronten (Holland).

Release


The final result of adding the public transport timetable to 2GIS you can evaluate in our mobile application in Google Play and the App Store . The web version will appear a little later.

Having grown jealous, we received quite a lot of feedback. After analyzing the negative, identified two main causes of complaints:

  1. We didn’t tell the users in a proper way that the timetable is now used in the search for directions and broke the usual scripts for working with the application.
    When searching for a route in the evening or at night, users lost their usual routes in the issue. Control the choice of date / time of travel when building a route fell out of scope.

    Most of the calls to technical support looked like this:

    - Hello, you search for travel in public transport has become inadequate, because .... / description of a specific problem.
    - You know, we released a schedule in search of travel by public transport, here is the control, you can set the time for which the trip is planned.
    - It is clear, thank you very much!
  2. The algorithm tried not to offer routes without a schedule (or omit them in the issue) if there is an alternative with a schedule. Because of this, in some cases, the issue has become less relevant.

    Our technologists had to urgently refine and manually make an interval schedule for all remaining modes of transport to return them to the issue, and we - to carry out additional adjustment of search algorithms.

Captain conclusions


What conclusions can be made on the basis of the launch?


PS About completeness


As far as it was possible to reach an agreement with all Deptrans / Mintrans, the schedule so far is only in Moscow, St. Petersburg, Novosibirsk, Yekaterinburg, Krasnoyarsk, Omsk, Chelyabinsk, Krasnodar and Rostov-on-Don. We will increase the coverage of transport by the timetable in these cities, as well as add new cities, as we receive data.

Source: https://habr.com/ru/post/343924/


All Articles