📜 ⬆️ ⬇️

Remote execution of system commands on request via sockets in Python 3 or how I downloaded the sites

The project was written for educational purposes (to learn network programming in Python) rather than practical ones. The same role bears and becomes, because now hardly anyone will download sites to read a couple of stateek (with the exception of some cases when this can really come in handy).

Not so long ago, the quality of the mobile Internet in my city began to deteriorate gradually due to the increasing load on the network of operators and some sites that require a large number of connections (dependent page files) began to load well VERY slowly. In the evenings, the speed drops so much that some sites can be fully loaded within a few tens of seconds.

There are several ways to solve this problem, but I decided to choose a way that is a bit unusual for our time. I decided to download sites. Of course, this method is not suitable for large sites, such as Habr, it is wiser to use a parser, but you can download a separate hub, a list of users, or just your own publications using HTTrack Website Copier, applying filters. For example, to download a Python hub from Habr, you need to apply the filter "+ habrahabr.ru / hub / python / *".
')
This method can be used for several more purposes. For example, to download a site, or part of it, before you find yourself without an Internet connection, for example, on an airplane. Or in order to download websites blocked in the Russian Federation, if you download them through Tor, which will be very slow, or via a computer in another country where the website is not banned, and then transfer it to a computer located in the Russian Federation, which will be much faster for multipage sites. Thus, we can download, for example, xHamster Wikipedia through a server in Germany or the Netherlands and get the site in a compressed form via SFTP, FTP, HTTP or another protocol that is convenient for you. If, of course, there is enough space for such a large site :)

So, let's start!? The application will gradually become more complicated and new features will be added to it, this will allow us to understand what is happening here and how it all works. I will accompany the code with a large enough number of comments so that even a person who does not know Python can understand it, but I will not re-comment on the already described pieces of code and functions so as not to clutter up the code. Both the server and the client are written and checked under Linux, but, theoretically, they should work under other platforms if all the necessary applications are installed, namely: httrack and tar , as well as the necessary path is set in the configuration file that we will create below. If you have problems running under your platform, write in the comments.

To begin with, we will implement a simple server that will forward the string to the client.

# FILE: server.py import socket #  IPv4    (TCP/IPv4) sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) #     localhost   65042 sock.bind(("localhost", 65042)) #   sock.listen(True) #    while True: #                conn, addr = sock.accept() #     print('Connected by', addr) #    ,    1024  data = conn.recv(1024) #        conn.sendall(data) 

Now we are implementing an even simpler client that will output the line received (that is, sent by it to the server).

 # FILE: client.py import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) #    sock.connect(("localhost", 65042)) sock.sendall(b"Hello, world") #    ,    1024  data = sock.recv(1024) #   sock.close() #    print(data.decode("utf-8")) 

In the output, we used the decode (original) method to get a string from the byte array. To decrypt an array of bytes, you need to specify the encoding, in our case it is UTF-8 .

Now we need to stop for a while and think about how we will use our application, what commands will be used and how the communication between the client and the server will look like.

Since we are planning to use our application occasionally, with convenience, you can not particularly bathe. What should be able to do our application? First of all, it is to download sites. Well, the server application has downloaded our site, now what? We really want to see it, right? To do this, you need to transfer it from the server machine to the client machine, and since the number of files is very large, and we have big problems with the time the connection is established, it was also a good idea to pack all of this, preferably also a good squeeze. Well, it would be nice to be able to view the downloaded sites, but more on that later.

Commands transmitted to the server will have the following format:
 <command> [args] 

For example:
 dl site.ru 0 gz list list during 

To begin with, we modify our client a little. Replace
  sock.sendall(b"Hello, world") 

on
  sock.sendall(bytes(input(), encoding="utf-8")) 

Now we can send arbitrary commands entered from the keyboard to the server.

Let us turn to the server, everything is more complicated.

First, create two files: httrack.py and config.py . The first will contain functions for managing HTTrack, the second - the configuration for the client and server (it will be shared). If you wish, you can make the configuration file for the server and client separate and use not the Python format, but the configuration .ini , or something like that.

With the second file, everything is simple and clear:
 from os import path host = 'localhost' port = 65042 #    .    -  <b>Sites</b>   .  ,      ,      . sites_directory = path.expanduser("~") + "/Sites" 

Before going to the first file, I’ll talk a little about the call function from the standard subprocess library.
 subprocess.call(args) 

The function executes the command passed in the args array. This function can also accept the cwd parameter, which specifies the directory in which the command should be executed from the args array. Waits for completion of the executed command (called program) and returns an exit code.

Now we will write our, so far the only, HTTrack control function, which allows downloading the site to the directory we need:
 # FILE: httrack.py from subprocess import call from os import makedirs #        import config def download(url): #     ( , ),   . if url.find("//"): url = url[url.find("//")+2:] #     if url[-1:] == '/': url = url[:-1] site = config.sites_directory + '/' + url print("Downloading ", url, " started.") #  ,       makedirs(config.sites_directory, mode=0o755, exist_ok=True) #  HTTrack     call(["httrack", url], cwd=config.sites_directory) print("Downloading is complete") 

Change server.py :

 import socket import threading #      import httrack import config def handle_commands(connection, command, params): if command == "dl": #    (, ,  )  HTTrack' htt_thread = threading.Thread(target=httrack.download, args=(params[0])) #    htt_thread.start() connection.sendall(b'Downloading has started') else: connection.sendall(b"Invalid request") def args_analysis(connection, args): #       .  "dl site.ru 0 gz"   ["dl", "site.ru", "0", "gz"]. args = args.decode("utf-8").split() # [1:] - .   ,     . handle_commands(connection=connection, command=args[0], params=args[1:]) sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind((config.host, config.port)) sock.listen(True) while True: conn, addr = sock.accept() print('Connected by ', addr) data = conn.recv(1024) args_analysis(connection=conn, args=data) 

Here, I think everything is clear. The code is a little complicated with functions that can be combined and thereby simplify the code, but they will help us in subsequent code changes.

At the moment, we can start server.py first, and then client.py . In the client application, enter the following command:
 dl http://verysimplesites.co.uk/ 

After about a minute, depending on your Internet connection, the server application will display " Downloading is complete " and the Sites folder will appear in your home directory, and the verysimplesites.co.uk directory will appear in it, which will contain the downloaded site in a browser without an internet connection.

But this is not enough for us, because we want the site to be obtained in a compressed form, in the archive. Suppose now that the dl command will have three arguments, and not one. The first one remains the same, this is the site that needs to be downloaded. The second is a flag indicating whether to delete the directory upon completion of the download. The third is the archive format in which the site will be packaged after downloading (before deletion, if it is required).

The function of checking the status of the httrack process in server.py :
 def dl_status_checker(thread, connection): if thread.isAlive: connection.sendall(b'Downloading has started') else: connection.sendall(b'Downloading has FAILED') 

Dl command in server.py :
 if command == "dl": #    ,     <b>"0"</b> if params[1] == '0': params[1] = False else: params[1] = True #            ,      if not params[1] and len(params) == 2: params.append(None) htt_thread = threading.Thread(target=httrack.download, args=(params[0], params[1], params[2])) htt_thread.start() #  2  ,     HTTrack dl_status = threading.Timer(2.0, dl_status_checker, args=(htt_thread, connection)) dl_status.start() 

httrack.py :
 from subprocess import call from os import makedirs from shutil import rmtree import config def download(url, remove, archive_format): if url.find("//"): url = url[url.find("//")+2:] if url[-1:] == '/': url = url[:-1] site = config.sites_directory + '/' + url print("Downloading ", url, " started.") makedirs(config.sites_directory, mode=0o755, exist_ok=True) call(["httrack", url], cwd=config.sites_directory) print("Downloading is complete") if archive_format: if archive_format == "gz": # : <b>tar -czf /home/user/Sites/site.ru.tar.gz -C /home/user/Sites /home/user/Sites/site.ru</b> call(["tar", "-czf", config.sites_directory + '/' + url + ".tar.gz", "-C", config.sites_directory, url], cwd=config.sites_directory) elif archive_format == "bz2": call(["tar", "-cjf", config.sites_directory + '/' + url + ".tar.bz2", "-C", config.sites_directory, url], cwd=config.sites_directory) elif archive_format == "tar": call(["tar", "-cf", config.sites_directory + '/' + url + ".tar", "-C", config.sites_directory, url], cwd=config.sites_directory) else: print("Archive format is wrong") else: print("The site is not packed") if remove: rmtree(site) print("Removing is complete") else: print("Removing is canceled") 

A lot of new code has appeared, but there is nothing complicated in it, just a few new conditions have appeared. Of the new functions, only rmtree appeared, which removes the directory transferred to it, including everything that was in the latter.

You can add a simple list command with no parameters to the function handle_commands :
 elif command == "list": #          file_list = listdir(config.sites_directory) folder_list = [] archive_list = [] #   ,       ,   for file in file_list: if path.isdir(config.sites_directory + '/' + file) and file != "hts-cache": folder_list.append(file) if path.isfile(config.sites_directory + '/' + file) and \ (file[-7:] == ".tar.gz" or file[-8:] == ".tar.bz2" or file[-5:] == ".tar"): archive_list.append(file) site_string = "" folder_found = False #     if folder_list: site_string += "List of folders:\n" + "\n".join(folder_list) folder_found = True if archive_list: if folder_found: site_string += "\n================================================================================\n" site_string += "List of archives:\n" + "\n".join(archive_list) if site_string == "": site_string = "Sites not found!" connection.sendall(bytes(site_string, encoding="utf-8")) 

By connecting the necessary library at the beginning:
 from os import listdir, path 

It would also be nice to increase the maximum amount of data received by the client from the server in client.py :
  data = sock.recv(65536) 

Now restart server.py and run client.py . To begin with, let's order to download some website to the server and package it into the tar.gz archive, and then delete it:
 dl http://verysimplesites.co.uk/ 1 gz 

After that, we will download another site, but we will not pack it, of course, we will also not delete it:
 dl http://example.com/ 0 

And after about a minute, check the list of sites:
 list 

If you entered the same commands, you should receive the following response from the server:
 List of folders: example.com ================================================================================ List of archives: verysimplesites.co.uk.tar.gz 


For today it, I wish, everything. This, of course, is far from everything that can and should be realized, but, nevertheless, allows us to understand the general principle of how such applications work. If you are interested in this topic, then write about it in the comments, if there are any, then I will try to allocate time and write an article about a bit more functionality of this application, including viewing the status of downloading the site, as well as show several ways to protect against penetration to outsiders, including protection against penetration into the shell.

UPD: Part 2. Data transfer protocol .

Source: https://habr.com/ru/post/268993/


All Articles