Rename files downloaded from rutracker.org

I want to share a small script that makes for me a simple but very useful thing - it produces a group rename of downloaded files so that the files get a conveniently readable name from the page on the torrent tracker site.
In the end, instead of "God.Bless.America.2011.HDTVRiP720.mkv" I get a file with the name ", God Bless America ( Bobcat Goldthwait) (2011) , , , , HDTVRip-AVC.mkv"
In continuation of the topic Can I tidy up on my computer once and for all?

In general, this is what the script does: it searches the Downloads directory, searches for downloaded files and the corresponding .torrent files. Then it takes from the .torrent file the URL of the page on rutracker.org, loads this page and gets the name of the download. Then, based on the received text, the script renames the downloaded file (instead of renaming, you can choose to copy files to another directory, transfer or create links).

For the successful operation of the script, you need Python 3 and a uTorrent client with a small setup:
Prefferences -> Other -> Store .torrent files in:
Here you need to specify the directory where uTorrent will copy .torrent files. The fact is that uTorrent, when copying .torrent files to this directory, gives them the same name as the downloaded file. What I actually took advantage of.
In the script, this directory must be specified as TORRENT_DIR.

You also need to register the directory where the script will look for downloaded INPUT_DIR files.
And the directory where the script will copy files (transfer or create hard links), that is, OUTPUT_DIR.
')
Using the MOVE_ALGORITHM variable, you can specify one of the file transfer algorithms:

'link' - create hard links to files. Works only for NTFS file system within one disk. After creating a hard link, the file will exist until the last link is removed. In essence, an NTFS hard link is another file name equivalent to the original one.
'copy' - copy files
'move' - move files from the INPUT_DIR directory to the OUTPUT_DIR directory. When choosing this option, be vigilant with active downloads if INPUT_DIR points to the directory where files are being distributed.
'move' in combination, when INPUT_DIR == OUTPUT_DIR, renames files in place

 # -*- encoding: utf-8 -*- import os import re import urllib.request import shutil import sys import ctypes INPUT_DIR = 'D:/Downloads/uTorrent/Completed' OUTPUT_DIR = 'D:/Video/Movies' TORRENT_DIR = 'D:/Downloads/uTorrent/torrent' MOVE_ALGORITHM = 'link' def GetMoveAlgorithms(): return {'move' : MoveFile, 'copy' : CopyFile, 'link' : CreateHardLink} def Print(msg): msg = msg.encode('cp866', 'replace') msg = msg.decode('cp866') print(msg) def GetTorrentFilePath(fileName): filePath = os.path.join(TORRENT_DIR, fileName + '.torrent') if not os.path.exists(filePath): Print('Skiped, .torrent is not found: "%s' % filePath) return None return filePath def GetTrackerUrl(torrentFilePath): try: torrentFile = open(torrentFilePath, 'r', encoding='ascii', errors='replace') fileData = torrentFile.read() trackerUrlLen, trackerUrl = re.search(r'comment([0-9]{2}):(.+)', fileData).groups() trackerUrl = re.search(r'(.{' + trackerUrlLen + '})', trackerUrl).groups()[0] return trackerUrl except: Print("Error, can't extract tracker url from .torrent file %s" % torrentFilePath) return None def LoadTrackerPage(trackerUrl): try: response = urllib.request.urlopen(trackerUrl) htmlPage = response.read() except: Print("Error, Can't load tracker page '%s'" % trackerUrl) return None htmlPage = htmlPage.decode('cp1251') return htmlPage def PrepareFileName(fileName): try: #remove special symbols fileName = re.sub(r'[\\/:"\*?<>|]+', '', fileName, 0, re.UNICODE) #remove repeating spaces fileName = re.sub(r'[ ]+', ' ', fileName, 0, re.UNICODE) fileName = fileName.strip() except: Print("Error, can't prepare file name '%s'" % fileName) return None return fileName class FileInfo: pass def ParseTrackerPage(htmlPage): try: pageTitle = re.search(r'<title>(.+?) :: .+?</title>', htmlPage, re.UNICODE).groups()[0] except: Print("Error, Can't parse <title>") return None fileInfo = FileInfo() fileInfo.name = "" fileInfo.year = "" fileInfo.descr = "" try: fileInfo.name, fileInfo.year, fileInfo.descr = re.search(r'(.+?) \[([0-9]{4}).*?, (.+?)\]', pageTitle, re.UNICODE).groups() except: Print("Warning, Can't parse page title: %s" % pageTitle) try: fileInfo.name, fileInfo.year, fileInfo.descr = re.search(r'(.+?)([0-9]{4}).*?, (.+?)$', pageTitle, re.UNICODE).groups() except: Print("Warning, Can't parse page title: %s" % pageTitle) fileInfo.name = pageTitle return fileInfo def GetDataFromTorrent(fileName): torrentFilePath = GetTorrentFilePath(fileName) if not torrentFilePath: return None trackerUrl = GetTrackerUrl(torrentFilePath) if not trackerUrl: return None htmlPage = LoadTrackerPage(trackerUrl) if not htmlPage: return None return ParseTrackerPage(htmlPage) def PrepareNewFileName(fileName, fileInfo): tmp, ext = os.path.splitext(fileName) toPrepare = fileInfo.name + ' (' + fileInfo.year + ') ' + fileInfo.descr cleanName = PrepareFileName(toPrepare) newFileName = cleanName + ext return newFileName def MoveFile(src, dst): shutil.move(src, dst) def CopyFile(src, dst): if os.path.isdir(src): for fileName in os.listdir(src): if not os.path.exists(dst): os.mkdir(dst) subSrc = os.path.join(src, fileName) subDst = os.path.join(dst, fileName) CopyFile(src, dst) else: if not os.path.exists(dst): shutil.copy2(src, dst) def CreateHardLink(src, dst): CreateHardLinkW = ctypes.windll.kernel32.CreateHardLinkW CreateHardLinkW.argtypes = (ctypes.c_wchar_p, ctypes.c_wchar_p, ctypes.c_void_p) CreateHardLinkW.restype = ctypes.c_int if os.path.isdir(src): for fileName in os.listdir(src): if not os.path.exists(dst): os.mkdir(dst) subSrc = os.path.join(src, fileName) subDst = os.path.join(dst, fileName) CreateHardLink(subSrc, subDst) else: if not os.path.exists(dst): if CreateHardLinkW(dst, src, 0) == 0: raise IOError def main(): Print('Hello, Find downloads in "%s" :' % INPUT_DIR) totalCount = 0 processedCount = 0 for fileName in os.listdir(INPUT_DIR): totalCount = totalCount + 1 Print('Process a file: "%s"' % fileName) fileInfo = GetDataFromTorrent(fileName) if fileInfo is None: continue sNewFileName = PrepareNewFileName(fileName, fileInfo) if sNewFileName: oldFilePath = os.path.join(INPUT_DIR, fileName) newFilePath = os.path.join(OUTPUT_DIR, sNewFileName) try: GetMoveAlgorithms()[MOVE_ALGORITHM](oldFilePath, newFilePath) processedCount = processedCount + 1 except: Print("Error, Can't move file from %s to %s" % (oldFilePath, newFilePath)) Print("%d friles were moved from %d total found files" % (processedCount, totalCount)) if __name__ == "__main__": main()

The script was tested on Windows 7 x86 and Windows 7 x64
In theory, the script can be ported under Linux.

The script is sharpened by rutracker, but it will probably work with other sites.
At least after the completion of the file.

For downloading the html page I used the urllib2 library.
And for parsing .torrent files and html I used regular expressions.
It turned out that it is quite convenient to use them in Python.
In principle, it would be better to use css selectors for parsing html, but for now it will do.

Since Python is not my main language, I will be happy with the advice of experienced developers.

Update1:
In case if the example with the God.Bless.America.2011.HDTVRiP720.mkv file seems not convincing to you, here is a list of top files from the root tracker.
As for me, you can break your eyes and head while you understand what kind of movie is behind one of these titles.
And of course, cataloging programs like AllMyMovies or Movienizer cannot automatically find information about these files in the IMDB and Film Search databases.

Na_grani_DVDRp_ [rutracker.org] _by_Inh.avi
Ohotniki za golovani.chopper887.mkv
Shvatka.chopper887.mkv
prizrachnyi.gonschik_2.2012.hdrip.ac3.1450mb.by.riperrr.avi
Dom_grez_BDRip_dub_ [rutracker.org] _by_Scarabey.avi
Lubov.zhivet.tri.goda.2011.BDRip.avi
Nepricosaemie.chopper887
Missiya-Fantom.chopper887.avi
Njanki.2012.O.DVDRip.IRONCLUB.avi
samoubyici.2012.dvdrip.ac3.1450mb.by.riperrr.avi
belyi.tigr.2012.dvdrip.ac3.2050mb.by.riperrr.avi
Zhila.byla.odna.baba.2011.BDRip.1.46.avi
svidanie.2012.dvdrip.ac3.1450mb.by.riperrr.avi
Moy.Paren.Angel.BDRip.avi
Visotskiy_Spasibo_Chto_Jivoy_2011_DVDRip_1.46_Menen.avi
Inadequate people_1.46.avi
Den.vyborov.2007.avi
shapito.shou_1.2010.hdrip.ac3.1450mb.by.riperrr.avi
belyi.tigr.2012.dvdrip.ac3.1450mb.by.riperrr.avi
2 days_1.46.avi

Update2:
On the proposals in the comments I decided to update the script:

Moved to Python 3
Separated input and output directories
Added support for creating HardLinks for NTFS
Added support for copying files

Source: https://habr.com/ru/post/144938/

All Articles

Rename files downloaded from rutracker.org

More articles: