📜 ⬆️ ⬇️

As I looked all ...

... and then another Ponastavil likes.

It all started with the fact that on a single dating site, I found that the photos are stored there without processing.

Of course, for avatars, ribbons and the like, the photos were scaled, cropped and subjected to other deformations, but during viewing you could click on the link “Original photo” and really open the original with all its contents (EXIF), if it was there of course. Rarjpeg also worked. Yes, if you uploaded photos in PNG format, then it was converted to JPEG.

It was decided to download the photos and see who that about themselves, without knowing it, laid out in open access. If the user has not set up the reverse, then the profile with one photo is visible even to unregistered users. In general, I was interested in geotags, everything else was secondary.
')
For a start, we will limit ourselves to female questionnaires aged 0 to 99 years with photos. Questionnaire such at that time turned out 10611. It is time to choose a tool with which information about the questionnaires, and then from the questionnaires, will be processed. The choice fell on C #. To search you do not even need to log in to the site.

We look, how many pages were found, we do a cycle, we download search results. We have 443 pages. The next step is to go through these pages and extract id questionnaires from them. At this stage, we got a database consisting of id, nickname and age.

image

It was necessary to download pages. At this stage it was already necessary to be authorized on the site. All my attempts to log in via webclient were unsuccessful, as was the attempt to get cookies from the webbrowser. Wget comes into play. Using excel, a cmd file was created consisting of 10,611 lines call getPage% id and a file getPage.cmd, which called wget with the parameters to save the session, load cookies from the cookie.txt file that was created using firebug, and the user agent of the same firefox . The pages, by the way, were downloaded from the old version of the site, because the date of registration of the questionnaire, the date of editing and the date of the last activity were displayed on it, as well, sometimes the “name” field could be present if the user specified it. The output was 850 MB of html files. When the script finished, the site in the “You watched” section had four and a half hundred pages of viewed profiles. To process the questionnaire again returned to C #. After processing the questionnaire files, the name and date were added to the existing database. There were some very interesting specimens. For example, an active questionnaire created in 2005 and never after that is not editable.

image

When processing the questionnaire files, it turned out that some of them were blocked and no information could be obtained from them. Links to photos were also received and the download started. In the end, it turned out 46,235 files, 12.5 GB of photos. True, some have already managed to delete photos and then a 1 by 1 pixel file was downloaded. To work with the metadata was chosen exiftool.

image

He himself handled all the subdirectories himself, and gave the results to text files next to the photos. As it turned out, only 1% of the 46 thousand photos contained geotags. At this point, those who I looked into began to visit my page. By the evening it was under a hundred guests. Some put huskies, which brought my photos to various tops. I decided to repay the same.

Training "on the cat" was held on your own profile. It became clear that everything happens through ajax and a json request with an indication of the action, in our case “put like”, id photos, id profiles with photos, id json. The lesion area was reduced to 1,500 questionnaires. Using C #, json files were created, 7334 pieces, which were sent to the site using wget.

image

A quick check by the browser showed that carpet laymet was successful. In the rest we have 12.5 GB of photos and the increased popularity on a dating site.

image

Source: https://habr.com/ru/post/282107/


All Articles