Design of the city based on data. Lecture in Yandex
Under the cut you will find the decoding of the lecture of Andrey karmatsky . For a long time he led the design service for geo-information services of Yandex, and then founded the company Urbica, which is engaged in the analysis and processing of urban data. Andrei talks about examples of how a data-oriented approach helps to improve urban services. The lecture was held in the framework of the event "Data & Science: the city . "
Most of the slides are also under the cut.
I am Andrei Karmatsky, I lead the company Urbica. The company was originally conceived as a data visualization studio, but realized that working with data is not needed to make beautiful pictures. ')
I have not found a successful translation of this word. If someone knows English well, then “tinkering” is something related to soldering, fixing primus, because to get a beautiful picture for slides, you need to do a lot of tricky hacks, experiment with data. Even now it is fashionable to talk about machine learning. Ultimately, everything converges to the fact that you need to select the parameters and find some model. In general, it seems we are very similar here. I find myself in the fact that usually our team is constantly engaged in this, so that we are in an experimentally applied engineering process, where something is cooked, cooked, cooked, done, and suddenly something happens.
I want to share a story that inspires me a lot. It is from the past, from the end of the XIX century. There was such a wonderful industrialist, Charles Booth. If someone has not heard about the poverty map of London, then I will briefly retell his story. Booth, being a fairly successful entrepreneur, decided to try to solve the problem in the country, in England, at the end of the XIX century. What did he do?
He assembled a team that investigated the urban environment. His assistants went around the city for 17 years with special notebooks, where they recorded their observations. They lived in families, interviewed people, tried to understand how the city lived, what problems it had, what difficulties it was in the context of solving the problem of poverty.
Ultimately, Booth published two cards with an interval of 10 years. All the data that they collected in their notebooks and notes, they mapped. This map shows the parameter, which is now called the "income level of the population." No one has done this before. They collected a map in which it is very clearly visible that the black houses are poor citizens, and the red houses are a well-to-do, rich layer of the population. And he very clearly, clearly showed how stratified society is, how close they are. You understand that the end of the XIX century is not so cool. Firstly, this is industrialization, this is the era of steam engines, soot, dirt, and secondly - unsanitary conditions. Neighborhood of different segments of the population - this is a heightened criminogenic situation, and in general some public health. So if someone knows and wants to see in detail, the “London poverty map” is one of the fundamental, at least inspiring stories, because after it, for example, a number of significant legislative changes took place in parliament. Including the introduction of an old-age pension is directly attributed to a series of events that stemmed from this study.
Being engaged in data, I came to one thought ... This is not an axiom, because I wrote two words and started a picture with a taxi order in the background so that it would not be so boring. Because it is just a visualization. But the most important thing is that when we do some projects with data, we always have two super-important ingredients. To get something done, you need to ask yourself the question: what problem do we want to solve, why are we doing this, what should be the result? Judging by the heap of questions to the previous speaker , there are always questions about the data: “Where are the data? What you need to explore? And we always see two or three situations. The ideal situation is when we have a good task and good data that is structured. For example, data of mobile operators. They are stacked in CSV, everything is super healthy. We can all skeshevarit and get some kind of answer.
Another situation is when, for example, we only have a problem and no data. The third situation is related to the background slide. This is when “We have great data. Let's twist something, and it will be fun. " And it turns out cool, but useless visualization.
When there is a lot of data and when it is well structured ... You know perfectly well, since many of you are working with data that you can easily go to the stage where you can observe patterns. For example, taking a well-structured set of data on bike rides in New York — there are open data on bike rentals and train stations — we can see patterns and find out, for example, where people go or how they balance bikes. After the break, my colleague will talk in detail about the Moscow “Velobike”, because everything is not so obvious there and there is no such essential data set that allows you to see the patterns, and the task becomes many times more interesting.
Nevertheless, our experiment with open data for New York speaks, inter alia, about what Egor said : that there are a number of stations and, for example, if we take and put data on the workload of the station on average during the day on weekdays and somehow break them into piles, then we will see three completely clear patterns. Some stations are filled with bicycles in the morning, empty during the day, while others work the other way around. What does this mean?
This is nothing more than labor pendulum migration.
People living, for example, on some East Side and working in the center of Manhattan, take a bike and move to work. This is clearly seen, and such patterns are easy to trace when we have a lot of data and are well structured. We are very easily moved to the stage where we can take some kind of data set and easily see patterns. Further there is already some scope for creativity with this data.
What to do when, for example, we have no data? Much more interesting to talk about it, right? Or, for example, the simplest thing is to take the data and go buy it. True? You have a lot of money, you go, buy data. If you do not know where to buy data, then find someone who knows. It takes a percentage, finds data or collects, structures, etc.
In our humble experience there was participation in such a project ... If someone knows, last year there was an open competition for the reconstruction of Soviet cinemas. One of the developers bought 39 old Moscow cinemas built in the 1960s from the city and made some commitments not to turn them into shopping centers, but to do some socially significant function. That is, it was necessary to make these points directly in the districts, since they are all scattered around the city, more useful - except for shopping centers. It is clear that here the task is in commercial success, but still it was necessary to realize a certain social function. And this developer announced a competition on the theme "Let's think of something." We helped one of the competitive teams “Aventica” and Nowadays Office to process the data and offer some kind of functional content. What in this case should be considered? If to simplify completely, the conditions are very close to the geo-marketing task. If someone heard about geomarketing, about network development, what should we do? We need to understand the demand in a particular location. That is, it is about either the BTI data, or, for example, the number of residents according to mobile operators. In this case, we simply calculated the housing and utilities reform - you know, there is an open data set, you can simply, knowing the average number of people per household, somehow imagine how many people live in the territory. And here an important detail, completely wrong, because he seems to describe the radius of one and a half kilometers. In this case, it is Ryazan Avenue, the cinema "Sunrise". PH there is a little bit beyond Ryazan Avenue. If someone knows the east well, then there is a railway line, Kuskovo Park, and Razyansky Avenue divides the territory. There are some industrial zones there, because the real walking distance is completely different.
That is, it is not 52 thousand people in 15-minute availability, but 20% less, because in reality the city is not a field in which we can go in any direction. It is very important to talk about walking areas that take into account the topology. I will tell you in more detail with the following example.
The second data set is a sentence. At the intersection of supply and demand, we can offer something or see some kind of dependence and find out how effective the location of the location will be. Our task was further complicated by the fact that we did not have any data that was presented. That is, we could only use open data. The task was the following: to offer functional content. What is the functional content, what should be the location?
It's one thing when we build a hardware store or Auchan. We know very well that we are interested in a rather narrow range of services, and we can look at the competitive environment, see how much people are already implementing this need on the surface. We can already draw some conclusions.
What to do when it is not? It is clear that you can sort out in some way. In this case, we had no machine learning, but human learning in the face of analysts who were engaged in research.
We did a super simple thing - screwed a small interface to the Yandex.Maps API, which have a directory of organizations, and made it possible to just drive in different requests. And here is a good example that we suddenly had. We thus found out ... There was one of the locations, located near the metro Voykovskaya. If you know, there is a huge shopping center "Metropolis". It would seem, what function can be offered, when in two minutes we have a kind of machine that has everything: food, clothes, what else can there be. But at the same time, we need to make a socially useful function, and so that it would still be of interest to the developer and developer in commercial terms.
Therefore, by such a search, we found out that the sports function is not implemented there, that there are no good fitness centers around Voikovskaya in 10-15-minute access zones, etc. One of our proposals ... I’ll just say that everything does not look super beautiful: we don’t won this competition, we took third place. We were praised for having a good analyst. In fact, there was also an architectural concept, and a number of special requirements. Nevertheless, we offered to make a kind of sports facility out of the cinema, where you can practice yoga, or some kind of space to conduct classes. Here is one example.
Another example about our experiments with data and their absence is a confirmation of the thesis that the need for invention is cunning. This is an analysis of the urban environment for pedestrians. It was one of the ideas at the start of the studio at some point in time. I really wanted to develop it. We are now engaged in it in the mode of a home project, because we are interested in it - to collect data on the quality of the urban environment for pedestrians. It is clear that, hello, the program "My street", and the sidewalks are becoming wider, the environment becomes more comfortable. But I was very hooked on one story on TEDx. It is called “Happy Maps”, it was told to Daniele Kuertsa, and it got too close to me. He said that here you go to work from point A to point B by the same route and, in fact, you may not suspect that the route along the next street is much happier, but you will spend plus one or two minutes. Accordingly, it became interesting to try to make the city more convenient for a person, add this minute to him, but add a little more happiness. It is clear that happiness cannot be measured, it no longer fits into graphics, etc. But at the level of sensations I wanted to try this story. Therefore, we began to actively engage in this project. And it is clear that with some expertise in the maps ...
I will give one simple example. Collect data about landscaping, about noise - and each data set is taken from different open sources, since this is some kind of experiment in a field without open data.
Take the city noise. It is clear that it affects your health. There is a lot of research about sleep disturbance at night - construction sites and so on. But no one in Moscow now measures city noise systemically, and in many countries where they pay attention to data, they tend to react to noise measurements - if there are complaints about construction sites, etc. But, for example, uniform coverage is about how the noise landscape of the city is arranged, no.
So the simplest idea was this. Take any application. You have a phone, he has a microphone. In fact, there are two of them. If you know, they even use special noise reduction, and even there are already applications that measure the noise level. It was a cool app called InstaDB. It is not enough that measures the noise. It will also post a photo of the street on Instagram. I made a special account, and nafotkal under 200 points. Having processed them, I found out that there are some classes of streets where you can talk well and calmly, and there are some classes of streets like Vernadsky Avenue, Bolshaya Yakimanka Street - it is 80-85 decibels, and absolutely impossible. Accordingly, we can already observe some patterns with this experimental set of data. We can replicate them on a city model, and, for example, take the data of OpenStreetMap.
I’ll say right away that we saw a guy from Sweden who did this. In open form, he laid out a certain model, where certain numerical indices are attributed to the types of roads about what the noise level on them may be. Just say, that is, science tells us to model the noise landscape by the intensity of traffic. In other words, you need to know how many cars drove on each edge. Unfortunately, these data were not available to us, so we were engaged in the experiment. But it is clear that there may be many questions about the accuracy of the model.
Asking how to refine the model, we again decided to try a very simple thing: from the Arduino and various improvised means to assemble a small sensor that simply sends you a noise level to your phone via Bluetooth or Wi-Fi every second. And we are now continuing experiments, because the iron part was the most difficult. We are somewhere on the edge between measurement accuracy and budget. That is, it is clear that you need to meet. For example - to make 10-20 devices for interested people to at least measure the center. It is necessary to meet some reasonable limits in the presence of professional devices that are expensive.
Answering the question why this data is collected: you can, for example, help a city dweller build routes taking into account these parameters in order to somehow help you personally. That is, it is clear that with this data the city may be interested in the final improvement, but in fact, each resident can use the application and simply see where it is quiet and where it is noisy. We are now finishing some beta application. I think it comes to the App Store. It simply shows how to walk along quiet streets, along green ones. The data on the level of landscaping with us shared the guys from Greenpeace, and the quality of the air - the company Aerostate. And accordingly, you can somehow lead the user to the coolest routes. It is clear that we are talking about one of the possible applications of this data set.
Blue buses seen? Honestly, a year ago I did not know anything about transport in a professional manner. It so happened that Moscow decided to optimize the route network. In the center of the city, it has already been made, called the “highway”. I want to talk about our participation in this process.
For a start it is worth noting that, as I said, we did not know anything about transportation planning. Obviously, we are talking about special people who have been doing this for many years. But nevertheless, transport planning is one of the notable applications where there are complex models and large data sets. Buses have a GLONASS, which means tickets are recorded. That is, there are different large amounts of data, in varying degrees structured.
But it is worth including dispelling the myth that the subway in the five-minute accessibility is everywhere in the center of Moscow. This is not true. But we talked about the foot accessibility zones. Compare with Paris - there is some difference.
What did we do? Worked about how these employees of the London subway. The Moscow transport complex has historically evolved for quite a long time And it is clear that there are no ideal systems, and not to say that we were doing just that. We did not count ticket stubs in order to understand who goes where. Nevertheless, we succeeded ... I guess we saw everything. Or not everything - but what we saw was enough for us not to be afraid of anyone anymore. And in principle, we can still go to the circus and never smile.
We took different data that was. These guys gathered at the workshop, transport planners. They were engaged in optimizing the route network, that is, they were conducting some kind of workshop. For them, we made a small tool, where all these data were presented in a single form, so that you can see how, for example, demand intersects with supply, and how it goes - quickly or slowly. The GLONASS signals turned into a very similar and familiar picture of Yandex.Probok, only directly about public transport - so that you can see, for example, where the bus is on average idle at traffic lights.
Briefly about the task and what these guys were doing.
They wrote out some of the principles they followed when they optimized the route network. And it is absolutely clear that high payment flow and density means high demand. ? . , — , , , . — . 10-15- — . .