📜 ⬆️ ⬇️

GeoIP, unconventional use attempts

It was done in the evening, there was nothing to do ...

Prehistory


At one point, accidentally stumbled upon an old service that allows you to hang a picture on your website, which shows the geographical position of visitors. There are many and different.
The thing itself is not so useful, I like the data of awstats'a enough for the eyes.
But for demonstrating to the visitor that he is not a random idiot, a passer-by who has wandered onto a dead site is a suitable case.
Further, as it should be, I spread my brains on the topic, but how do they do it, figured it out and sort of calmed down ...

But the patient's head doesn’t give rest to his hands, so now I will show you one funny experiment that does not carry the meaningful load.
image
This is a sample of all possible coordinates of IP-Schnick (from rounded to integer).
It can be said that this is a photograph of the Internet habitat TCP / IP

Attention, the author is not a coder in a good sense of the word, therefore, those who can be shocked by the codec, please leave.

')
From where to get the data, it immediately became clear, this is all known to GeoIP, which has rescued us more than once, since GeoLiteCity has such a vital important thing as coordinates.
#geoiplookup -f /usr/local/share/GeoIP/GeoLiteCity.dat 90.155.128.74
GeoIP City Edition, Rev 1: RU, 48, Moscow, N/A, 55.752201, 37.615601, 0, 0

True, she believes that I moved from Butovo to Alexander Garden . But nothing, we do not need such accuracy.

Inspection of the GeoLiteCity.dat file showed that it is very similar to a binary tree, but unfortunately I have not yet lit up how to parse it quickly and painlessly.
The first thought immediately came the option of iterating over all possible ip addresses in the entry in the database of unique coordinates.
256 ^ 4 = 4294967296 even minus backup, local, etc. still leads to estimated time, which does not please. We finish in six months, this is already good.
After an active approach to the task, I remembered that GeoIP had other options for storing their data, and the bingo CSV format saved the father of Russian democracy.

Sample file:
5751,"US","NY","Albany","12209",42.6390,-73.7890,532,518
5752,"US","NY","Hillsdale","12529",42.2164,-73.5413,532,518
5753,"US","NY","Albany","12225",42.6706,-73.7791,532,518


Total 310070 entry.
As it’s not kosher to generate a picture for all 300 hundred points, we make a script that will parse the CSV file and enter unique rounded positions in the database, and then select the database and draw the image.
Why rounded, I did not need high accuracy, so that the point of length and breadth was sufficient.
If you want to do something like the size of 3600 by 1800, then nothing prevents you from repeating this experiment to finish reading the post to the end.

Selection of tools.
MySQL is already at hand. one table is created there, with 4 columns (x, y, kol, two were enough, but I added the last one to see where the records are most concentrated)
Perl is for writing the CSV parser and entering it into the database. It is also at hand, plus it is more convenient to write it to joe via SSH from the communicator. Proffesionalov perlistov please do not look at the code.
PHP with the GD library, actually for generating the actual image. Why? Experience with the GD library from php is great.

So, download the latest CSV file .

Create a database and a table:
CREATE DATABASE IF NOT EXISTS `geo`;
CREATE TABLE IF NOT EXISTS `all` (
`x` int(3) NOT NULL,
`y` int(3) NOT NULL,
`kol` int(64) NOT NULL
);


Making a parser:
#! / usr / local / bin / perl
use Mysql;
print "start \ n";
my $ host = "localhost";
my $ database = "geo";
my $ user = "geo";
my $ table = "all";
my $ password = "ololoparolko";
my $ db = Mysql-> Connect ($ host, $ database, $ user, $ password);
$ sql = "SELECT * FROM` $ table` ";
$ sth = $ db-> Query ($ sql);
arr = $ sth-> FetchRow;
my $ vsego = 0;
my $ unik = 0;
my $ ok = 0;
# open the file, throw the data into the array and process it.
open (GL, "/home/klef/geo/GeoLiteCity-Location.csv");
@ geo_arr =;
close (GL);
$ max = $ # geo_arr;
for ($ i = 0; $ i <= $ # geo_arr; $ i ++) {
$ vsego ++;
@ geo_a = split (",", $ geo_arr [$ i]);
if ($ # geo_a> 6) {
$ ok ++;
$ x = sprintf ("%. 0f", $ geo_a [5]);
$ y = sprintf ("%. 0f", $ geo_a [6]);
# are we looking for any such coordinates?
$ sql = "SELECT * FROM` $ table` WHERE x = $ x AND y = $ y ";
$ sth = $ db-> Query ($ sql);
arr = $ sth-> FetchRow;
if ($ # arr> = 0) {
#there is? fine we change the counter
$ nov_s = $ arr [3] +1;
$ sql = "UPDATE` $ database`.` $ table` SET `kol` = '$ nov_s' WHERE` $ table`.`x` = $ x AND `$ table`.`y` = $ y AND` $ table``kol` = $ arr [3] LIMIT 1; ";
$ db-> Query ($ sql);
} else {
# no, well, it's fixable
$ sql = "INSERT INTO` $ database`.` $ table` (`x`,` y`, `kol`) VALUES ('$ x', '$ y', '1');";
$ db-> Query ($ sql);
print "New";
$ unik ++;
}
}
}
# this is so that in the end we were happily shown how much he otlapatil
$ vsego_p = sprintf ("%. 2f", $ vsego * 100 / $ max);
$ ok_p = sprintf ("%. 2f", $ ok * 100 / $ max);
$ unik_p = sprintf ("%. 2f", $ unik * 100 / $ max);
$ itog = "\ nObrabotano: $ vsego_p% ($ vsego) \ nNormalnih: $ ok_p% ($ ok) \ nUnikalnih: $ unik_p% ($ unik) \ n";
print $ itog;
print "END \ n";


And it started, the main problem in processing time was, not in the fact that there is a lot of data, but the fact that we are sorting them out in order and without beautiful solutions.

php script for image generation:
#! / usr / local / bin / php
<? php
$ host = "localhost: 3306";
$ user = "geo";
$ data = "geo";
$ pass = "again parolko";
// where to save the picture
$ patch = "/ data / web / geo.ip.png";
$ z = 0;
// create image and color
$ img = imagecreatetruecolor (600, 400);
$ ink = imagecolorallocate ($ img, 0, 255, 0);
// access the database and select the data
$ db = mysql_connect ($ host, $ user, $ pass) or die (“Bolt MySQL„);
$ db_sel = mysql_select_db ($ data, $ db) or die (“Bolt BD„);
$ sql = mysql_query (“SELECT * FROM` all`; ") or die (" Bolt zapros ");
$ num_rows = mysql_num_rows ($ sql) or die ("Bolt Num Stroki");
$ vib = mysql_num_rows ($ sql) or die ("Bolt Stroki");
for ($ i = 1; $ i <= $ num_rows; $ i ++) {
$ tmp = mysql_fetch_array ($ sql);
$ y = $ tmp ['x'];
$ x = $ tmp ['y'];
// put a point with the desired coordinates
imagesetpixel ($ img, 300 + $ x, 200- $ y, $ ink);
}
mysql_close ($ db);
// finally put a mark and save the file
ImageString ($ img, 2, 10, 5, "Geo IP:" .date ("YM j H: i"). "Kolvo:". $ Num_rows. ", $ Ink);
imagetruecolortopalette ($ img, true, 255);
imagepng ($ img, $ patch);
imagedestroy ($ img);
?>


Total parser:
Total 310072 lines in file
unique coordinates (rounded) 8226, which is only 2.65%
Size table 384 kilobytes :)
Time of processing:
To fill the database: about a couple of hours.
On the formation of images in the aisle of milliseconds.

ps later re-processed, but rounded to the first decimal point, the result is a resolution of 3600 by 1800
When resizing to the desired resolution of the desktop, we get a good wallpaper.

pps later tried to generate a file size of 36,000 by 18,000 (respectively, rounding to two decimal places), but php paired up with GD a little overstrained with such a resolution, why and why there was no longer any desire or strength.

Source: https://habr.com/ru/post/121085/


All Articles