It's no secret that you can find photos of any large city on Instagram. What if we try to restore the whole picture using fragments? The information obtained will help to get an idea of ​​unfamiliar places and will be useful for travelers, complementing traditional travel guides.
The idea of ​​analyzing cities by photo is generally not new [
1 ,
2 ,
3 ], but to be honest, the articles found in essence do not tell much.
How to collect data from instagram is a topic that has been highlighted several times and is not covered in this article. Something can be pulled through the API, but if there is no access to it, there are
alternatives .
Our basic tools are Python and Plotly. At the end, there are links to GitHub and Jupyter laptops for different cities (the graphics are interactive, therefore they contain more information and are recommended for in-person acquaintance). The repository also includes scripts for data collection.
')
In this article we will walk around Berlin. Dataset, considered in this article, contains about 100k photos for ~ 2k locations.
For each location we need:
- Title
- Coordinates
- Photos (10-100 pcs.)
- The number of posts (variable edge_location_to_media on the location page, it does not appear in the documentation, but by indirect indications means exactly that)
First walk
We put the data on the map. To select the most lively places, combine locations located on the same street in one marker. We will draw maps using Mapbox.
Map of Berlin. Markers mark the number of locations.Let's get acquainted with the main toponyms. To do this, we need to convert the coordinates of places to addresses - this is the reverse geocoding task. For its decision Google Geocoding API was used. After collecting geodata, we will sort the streets and areas by the number of locations.
For cities such as Moscow, information about the districts is not too important, everything is approximately the same in the center, but Berlin is more heterogeneous and therefore it is useful to distinguish, for example,
Kreuzberg from
Prenzlauer Berg .
Look at the list of places sorted by popularity.
Top locationslocation, edge_location_to_media
Alexanderplatz Berlin, 695533
East Side Gallery, 537034
Brandenburger Tor, 525004
Berliner Dom, 411376
Berlin Kreuzberg, 364077
Berlin Mitte, 340891
Memorial to the Murdered Jews of Europe, 251433
Berlin Wall, 228749
Kreuzberg Berlin Germany, 218383
Potsdamer Platz, 182316
Checkpoint Charlie, 171895
Brandenburg Gate, 143530
Mercedes Benz Arena Berlin, 143498
Zoo Berlin, 140465
Berlin Hauptbahnhof, 138153
Gendarmenmarkt Berlin, 114615
Berliner Fernsehturm, 106127
Friedrichshain, 104376
Reichstag dome, 101895
Berlin Germany, 97402
East Side Gallery Berlin Wall, 96385
JĂĽdisches Museum Berlin Jewish Museum Berlin, 94647
Berlin the place to be, 92444
FAR AWAY, 91062
Berlin Reichstag, 90945
Museum Island, 84010
Potsdamer Platz Berlin, 80733
Hamburger Bahnhof Museum fĂĽr Gegenwart Berlin, 79323
KurfĂĽrstendamm, 75632
KaDeWe, 73312
Pergamonmuseum, 71524
Tempelhofer Feld, 70472
Azad Gence, 69566
Reichstag building, 69028
Tiergarten Berlin Germany, 65391
Berghain Panorama Bar, 60807
Mall of Berlin, 60718
Schöneberg Berlin Germany, 60482
Tiergarten Berlin, 60210
Hackescher Markt, 59899
Klunkerkranich, 59661
Berlin Victory Column, 57304
Berlin Prenzlauer Berg, 56705
Madame Tussauds Berlin, 55351
Hackesche Höfe, 55183
Bikini Berlin, 50920
Alexanderplatz, 48875
Alte Nationalgalerie, 48346
Museum fĂĽr Naturkunde Berlin, 46786
The Wall Of Berlin, 46708
NENI Berlin Monkey Bar, 44770
Flughafen Berlin Tempelhof, 44197
Columbiahalle, 43717
Brandenburger Tor, 43484
Berlin Germany, 42739
Warschauer StraĂźe, 41897
Reichstag, 41321
Berlin Holocaust Memorial, 39930
Brandebourg Tor Berlin , 38949
Berlinische Galerie, 37947
Sony Center, 37539
Berliner Philharmonie, 37431
Konzerthaus Berlin, 36905
Tempodrom, 35982
Berlin Mitte, 35895
Friedrichshain, 34693
Urban Spree, 34613
Kraftwerk Berlin, 34392
Bode Museum, 34205
Bundestag, 33998
SONY Center Berlin am Potsdamer Platz, 33628
Berlin Brandenburger Tor, 33098
Brandenburger Tor, 32857
Berlin Zoological Garden, 32718
Deutsches Historisches Museum, 32604
Humboldt Universität zu Berlin, 32308
C/O Berlin, 32294
Astra Kulturhaus Berlin, 30082
Badeschiff Berlin, 30007
Markthalle Neun, 29989
Michelberger Hotel, 29444
Altes Museum, 29009
Hotel Adlon Kempinski Berlin, 28889
Mauerpark, 28282
YAAM Berlin, 27925
Mitte, 27681
Hofbräu Berlin, 27561
Huxleys Neue Welt, 27546
Oberbaum Bridge, 27131
Friedrichstadt Palast Berlin, 27009
STATION Berlin, 26816
Velodrom Berlin, 26385
Moabit, 26350
Neues Museum, 26346
Gedächtniskirche, 26316
It mixes "formal" places (monuments, museums, galleries) with "informal" (clubs, bars, shops). To separate one from another, we need data from Wikipedia; Unlike Instagram, its API is available to everyone in full. On one axis, we postpone the number of posts on instagram, on the other - the number of views of the Wikipedia article about this place. In this picture, more “formal” places will be located higher, more popular - to the right.
To reduce errors, we will group locations along the streets, as on the map. In the process of selecting articles for locations, part of the data is lost, so the figure will contain fewer points than the map.
“Insta-wiki” diagram. For several streets marked the most significant places. For more information, see the laptop.Where to go to take a selfie? Estimate the proportion of photos containing faces. This will help us OpenCV and the cascade of Haar.
Share photos containing faces. Points located on the right side of the picture are popular places for selfies (or vanity fairs)Going deeper
Next, apply a neural network to determine the environment in the photographs. Used
CNN Places365 , trained on dataset collected at MIT
[4] . The most appropriate tags for this task were selected. Find out which of them are more common:
Tag Rating. Names are left original. They should not be taken literally: martial_arts_gym is rather a gym, and with a discotheque tag just a dark room can be notedLet's see which tags correspond to the streets:
The same on the map:
Map of Berlin with the most characteristic tags. Pay attention to the discotheque tag on the right - it is Friedrichshain, an area with a vibrant nightlife.Hello, Hallo, Hola
One of the ways to learn something about a new city is to compare it with the one that you know. Take feature vectors for locations of two cities and using t-SNE we get two-dimensional coordinates. For greater clarity, the figure hidden locations, lying in the area of ​​the city of the opponent.
Comparison of locations in Berlin and Moscow. Labels indicate the dominant feature in a given area. Clusters of different colors, located nearby, indicate the points of contact between cities, i.e., similar placesLet's look at the difference of signs:
The difference of signs between Berlin and Moscow. It seems that in our capital more often they are photographed in gyms and fitting rooms.Articles1.
How to Study the City on Instagram2.
What We Instagram: A First Analysis Of Instagram Photo Content and User Types3.
Zooming into an Instagram City: Reading the local through social media4.
Places: A 10 million Image Database for Scene RecognitionLaptopsTula ,
Moscow ,
St. Petersburg ,
Berlin ,
Rome ,
Hong KongGithubgithub.com/pskryuchkov/voyage