📜 ⬆️ ⬇️

I will calculate you over the networks: use the API of the largest social networks for my own selfish purposes



It's no secret that modern social networks are huge databases containing a lot of interesting information about the privacy of their users. You can't get too much data through the web-muzzle, but each network has its own API ... So let's see how this can be used to search for users and collect information about them.

There is a discipline in American intelligence, such as OSINT (Open source intelligence), which is responsible for finding, collecting and selecting information from publicly available sources. One of the largest providers of publicly available information include social networks. After all, almost every one of us has accounting (and someone has more than one) in one or several social networks. Here we share our news, personal photos, tastes (for example, like something or joining a group), around our acquaintances. And we do this on our own free will and almost completely do not think about the possible consequences. On the pages of the magazine, they have repeatedly considered how it is possible using various tricks to pull out interesting data from social networks. Usually, it was necessary to manually perform some kind of manipulation. But for successful intelligence it is more logical to use special utilities. There are several open source utilities that allow users to pull information about users from social networks.

Creepy


One of the most popular is Creepy . It is designed to collect geolocation information about a user based on data from his Twitter, Instagram, Google+ and Flickr accounts. The advantages of this tool, which is regularly included in Kali Linux, include an intuitive interface, a very convenient process for obtaining tokens for using the API services, as well as displaying the results with labels on the map (which, in turn, allows you to follow all user movements). The disadvantages I would consider rather weak functionality. Tulsa knows how to collect geotags on the listed services and display them on a Google map, shows who and how many times a user retweets, counts statistics on the devices from which tweets were written, as well as on the time of their publication. But due to the fact that this is an open source tool, its functionality can always be expanded by yourself.
')
We will not consider how to use the program - everything is perfectly shown in the official video , after watching which there should be no questions left about the work with the tool.



fbStalker


Two more tools that are less known, but have strong functionality and deserve your attention - fbStalker and geoStalker .

fbStalker is designed to collect information about a user based on his Facebook profile. Allows you to pull the following data:

To use this tool, you will need Google Chrome, ChromeDriver, which is installed as follows:

wget http://goo.gl/Kvh33W unzip chromedriver_linux32_23.0.1240.0.zip cp chromedriver /usr/bin/chromedriver chmod 777 /usr/bin/chromedriver 

In addition, you will need to install Python 2.7, as well as pip to install the following packages:

 pip install pytz pip install tzlocal pip install termcolor pip install selenium pip install requests --upgrade pip install beautifulsoup4 

Finally, you need a library for parsing GraphML files:

 git clone https://github.com/hadim/pygraphml.git cd pygraphml python2.7 setup.py install 

After that, you can correct `fbstalker.py`, indicating there your soap, password, username, and start searching. Using the tool is quite simple:

 python fbstalker.py -user [  ] 

geoStalker


geoStalker is much more interesting. He collects information on the coordinates that you gave him. For example:

To use the tool, as in the previous case, you need Chrome & ChromeDriver, Python 2.7, pip (to install the following packages: google, python-instagram, pygoogle, geopy, lxml, oauth2, python-linkedin, pygeocoder, selenium, termcolor, pysqlite , TwitterSearch, foursquare), as well as pygraphml and gdata:

 git clone https://github.com/hadim/pygraphml.git cd pygraphml python2.7 setup.py install wget https://gdata-python-client.googlecode.com/files/gdata-2.0.18.tar.gz tar xvfz gdata-2.0.18.tar.gz cd gdata-2.0.18 python2.7 setup.py install 

After that, edit `geostalker.py`, filling in all the necessary API keys and access tokens (if for any social network these data are not specified, then it simply will not participate in the search). Then run the tool with the command `sudo python2.7 geostalker.py` and specify the address or coordinates. As a result, all data is collected and placed on a Google map, and also saved in an HTML file.

Go to action


Before that, it was about ready-made tools. In most cases, their functionality will be missed and you will have to either refine them or write your tools - all popular social networks provide their API. Usually they appear as a separate subdomain to which we send GET requests, and in return we get XML / JSON answers. For example, for “Instagram” it is `api.instagram.com`, for“ Contact ”-` api.vk.com`. Of course, most of these APIs have their own libraries of functions for working with them, but we want to figure out how it works, and we can make the script too heavy with external libraries because of one or two functions that are not comme il faut. So, let's take and write our own tool that would allow you to search for photos from the VC and Instagram on the given coordinates and time interval.

Using the documentation for API VK and Instagram, we make requests for a list of photos according to geographic information and time.

Instagram API Request:

  url = "https://api.instagram.com/v1/media/search?" + "lat=" + location_latitude + "&lng=" + location_longitude + "&distance=" + distance + "&min_timestamp=" + timestamp + "&max_timestamp=" + (timestamp + date_increment) + "&access_token=" + access_token 

Vkontakte API Request:

  url = "https://api.vk.com/method/photos.search?" + "lat=" + location_latitude + "&long=" + location_longitude + "&count=" + 100 + "&radius=" + distance + "&start_time=" + timestamp + "&end_time=" + (timestamp + date_increment) 

Here are the variables used:

As it turned out, access_token is required to access the Instagram API. It's easy to get it, but you'll have to be a little confused (see the sidebar). Contact is more loyal to strangers, which is very good for us.

Getting Instagram Access Token



First you register on Instagram. After registration, go to the following link:

instagram.com/developer/clients/manage

You harvest ** Register a New Client **. Enter the phone number, wait for text messages and enter the code. In the window for creating a new client that opens, important fields for us should be filled in as follows:
  • OAuth redirect_uri: localhost
  • Disable implicit OAuth: checkbox should be unchecked

The remaining fields are randomly filled. Once everything is filled, create a new client. Now you need to get a token. To do this, enter the following URL into your browser:
 https://instagram.com/oauth/authorize/?client_id=[CLIENT_ID]&redirect_uri=http://localhost/&response_type=token 

where instead of [CLIENT_ID] specify the Client ID of the client you created. After that, make the transition to the resulting link, and if you did everything right, then you will be redirected to localhost and Access Token will be written in the address bar.
  http://localhost/#access_token=[Access Token] 

For more information about this method of obtaining a token, you can read the following link: jelled.com/instagram/access-token .


Automating the process


So, we have learned how to write the necessary requests, but manually parsing the server's response (in the form of JSON / XML) is not the coolest thing. It is much more convenient to make a small script that will do it for us. We will use again Python 2.7. The logic is as follows: we are looking for all the photos that fall into a given radius relative to the given coordinates in a given period of time. But consider one very important point - a limited number of photos are displayed. Therefore, for a large period of time, you will have to make several requests with intermediate time intervals (just date_increment). Also consider the accuracy of the coordinates and do not specify a radius of several meters. And do not forget that the time must be specified in the timestamp.

Begin to code. To begin with we will connect all libraries necessary for us:

 import httplib import urllib import json import datetime 

We write functions for data acquisition from API through HTTPS. Using the passed function arguments, we compile a GET request and return the server response as a string.

 def get_instagram(latitude, longitude, distance, min_timestamp, max_timestamp, access_token): get_request = '/v1/media/search?lat=' + latitude get_request+= '&lng=' + longitude get_request += '&distance=' + distance get_request += '&min_timestamp=' + str(min_timestamp) get_request += '&max_timestamp=' + str(max_timestamp) get_request += '&access_token=' + access_token local_connect = httplib.HTTPSConnection('api.instagram.com', 443) local_connect.request('GET', get_request) return local_connect.getresponse().read() def get_vk(latitude, longitude, distance, min_timestamp, max_timestamp): get_request = '/method/photos.search?lat=' + location_latitude get_request+= '&long=' + location_longitude get_request+= '&count=100' get_request+= '&radius=' + distance get_request+= '&start_time=' + str(min_timestamp) get_request+= '&end_time=' + str(max_timestamp) local_connect = httplib.HTTPSConnection('api.vk.com', 443) local_connect.request('GET', get_request) return local_connect.getresponse().read() 

Let's also apply a small function to convert the timestamp to the human view:

 def timestamptodate(timestamp): return datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')+' UTC' 

Now we are writing the main logic of the image search, having previously divided the time interval into parts, the results are saved in the HTML file. The function looks cumbersome, but the main difficulty in it is the partitioning of the time interval into blocks. The rest is the usual JSON parsing and saving the necessary data in HTML.

 def parse_instagram(location_latitude, location_longitude, distance, min_timestamp, max_timestamp, date_increment, access_token): print 'Starting parse instagram..' print 'GEO:',location_latitude,location_longitude print 'TIME: from',timestamptodate(min_timestamp),'to',timestamptodate(max_timestamp) file_inst = open('instagram_'+location_latitude+location_longitude+'.html','w') file_inst.write('<html>') local_min_timestamp = min_timestamp while (1): if ( local_min_timestamp >= max_timestamp ): break local_max_timestamp = local_min_timestamp + date_increment if ( local_max_timestamp > max_timestamp ): local_max_timestamp = max_timestamp print timestamptodate(local_min_timestamp),'-',timestamptodate(local_max_timestamp) local_buffer = get_instagram(location_latitude, location_longitude, distance, local_min_timestamp, local_max_timestamp, access_token) instagram_json = json.loads(local_buffer) for local_i in instagram_json['data']: file_inst.write('<br>') file_inst.write('<img src='+local_i['images']['standard_resolution']['url']+'><br>') file_inst.write(timestamptodate(int(local_i['created_time']))+'<br>') file_inst.write(local_i['link']+'<br>') file_inst.write('<br>') local_min_timestamp = local_max_timestamp file_inst.write('</html>') file_inst.close() 

HTML format is chosen for a reason. It allows us not to save the pictures separately, but only to provide links to them. When the page is launched, the results in the browser will automatically load the images.
We write exactly the same function for the "Contact".

 def parse_vk(location_latitude, location_longitude, distance, min_timestamp, max_timestamp, date_increment): print 'Starting parse vkontakte..' print 'GEO:',location_latitude,location_longitude print 'TIME: from',timestamptodate(min_timestamp),'to',timestamptodate(max_timestamp) file_inst = open('vk_'+location_latitude+location_longitude+'.html','w') file_inst.write('<html>') local_min_timestamp = min_timestamp while (1): if ( local_min_timestamp >= max_timestamp ): break local_max_timestamp = local_min_timestamp + date_increment if ( local_max_timestamp > max_timestamp ): local_max_timestamp = max_timestamp print timestamptodate(local_min_timestamp),'-',timestamptodate(local_max_timestamp) vk_json = json.loads(get_vk(location_latitude, location_longitude, distance, local_min_timestamp, local_max_timestamp)) for local_i in vk_json['response']: if type(local_i) is int: continue file_inst.write('<br>') file_inst.write('<img src='+local_i['src_big']+'><br>') file_inst.write(timestamptodate(int(local_i['created']))+'<br>') file_inst.write('http://vk.com/id'+str(local_i['owner_id'])+'<br>') file_inst.write('<br>') local_min_timestamp = local_max_timestamp file_inst.write('</html>') file_inst.close() 

And of course, the function calls themselves:

 parse_instagram(location_latitude, location_longitude, distance, min_timestamp, max_timestamp, date_increment, instagram_access_token) parse_vk(location_latitude, location_longitude, distance, min_timestamp, max_timestamp, date_increment) 



The result of our script in the console


One of the results of parsing Instagram


Result of parsing "Contact"

Baptism of fire


The script is ready, it remains only to try it in action. And then I came up with one idea. Those who were on PHD'14, certainly remembered very nice prododevochek from Mail.Ru. Well, let's try to catch up - find them and meet them.

Actually, what we know about PHD14:


We get the following data set:

location_latitude = '55 .740701 '
location_longitude = '37 .609161 '
distance = '100'
min_timestamp = 1400619600
max_timestamp = 1400792400
date_increment = 60 * 60 * 3 # every 3 hours
instagram_access_token = [Access Token]

Useful tips


If, as a result of the script, the photos are too small, you can try changing the `date_increment` variable, since it is responsible for the time intervals over which the photos are collected. If the place is popular, then the intervals should be frequent (decreasing `date_increment)`, but if the place is deaf and photos are published once a month, then collecting photos at intervals of an hour does not make sense (increasing `date_increment`).


Run the script and go to analyze the results. Yeah, one of the girls posted a photo made in a mirror in the toilet, with reference to the coordinates! Naturally, the API did not forgive such an error, and soon the pages of all the other promodevochek found. As it turned out, two of them are twins :).


The same photo of a promo girl with PHD'14, taken in the toilet

Instructive example


As a second example, I would like to recall one of the tasks from the final CTF on PHD'14. Actually, it was after him that I became interested in this topic. Its essence was as follows.

There is an evil hacker who developed some malware. We are given a set of coordinates and the corresponding timestamps from which he went to the Internet. You need to get the name and picture of this hacker. The coordinates were as follows:

55.7736147.37.6567926 30 Apr 2014 19:15 MSK;
55.4968379,40.7731697 30 Apr 2014 23:00 MSK;
55.5625259,42.0185773 1 May 2014 00:28 MSK;
55.5399274,42.1926434 1 May 2014 00:46 MSK;
55.5099579,47.4776127 May 1, 2014 05:44 MSK;
55.6866654,47.9438484 May 1, 2014 06:20 MSK;
55.8419686.48.5611181 May 1, 2014 07:10 MSK

First of all, we naturally looked at the places where these coordinates correspond. As it turned out, these are Russian Railways stations, with the first coordinate being Kazan Station (Moscow), and the last is Zelen Dol (Zelenodolsk). The rest are stations between Moscow and Zelenodolsk. It turns out that he went online from the train. By the time of departure was found the desired train. As it turned out, the train arrival station is Kazan. And then the main question arose: where to look for the name and picture. The logic was as follows: since you want to find a picture, it is quite reasonable to assume that you need to look for it somewhere in social networks. The main targets were chosen for VKontakte, Facebook, Instagram and Twitter. In addition to the Russian teams, foreigners participated in the competitions, so we considered that the organizers would hardly have chosen VKontakte. It was decided to start with Instagram.

We didn’t have any scripts for searching photos by coordinates and time, and we had to use public services that could do it. As it turned out, there are quite a few of them and they provide a rather poor interface. After hundreds of viewed photos at each station, the train movement was finally found the right one.

As a result, it took less than an hour to find the train and the missing stations, as well as the logic of the further search. But the search for the desired photo - a lot of time. This once again underlines how important it is to have the right and convenient programs in your arsenal.

Www


You can find the source code of the reviewed script in my Bitbucket repository.


findings


The article came to an end, and it was time to draw a conclusion. And the conclusion is simple: you need to deliberately upload photos with geo-referencing. Competitive intelligence officers are ready to catch on with any opportunity to get new information, and the social networking API can help them very well in this. When I wrote this article, I studied several other services, including Twitter, Facebook and LinkedIn, is there any similar functionality. Positive results were given only by Twitter, which undoubtedly pleases. But Facebook and LinkedIn upset, although all is not lost, and perhaps in the future they will expand their API. In general, be more attentive, laying out your photos with geo-referencing - suddenly someone will find them wrong. :)

image

First published in the magazine "Hacker" from 02/2015.
Posted by: Arkady Litvinenko ( @BetepO_ok )

Subscribe to "Hacker"

Source: https://habr.com/ru/post/254129/


All Articles