📜 ⬆️ ⬇️

Rename files downloaded from rutracker.org

I want to share a small script that makes for me a simple but very useful thing - it produces a group rename of downloaded files so that the files get a conveniently readable name from the page on the torrent tracker site.
In the end, instead of "God.Bless.America.2011.HDTVRiP720.mkv" I get a file with the name ", God Bless America ( Bobcat Goldthwait) (2011) , , , , HDTVRip-AVC.mkv"
In continuation of the topic Can I tidy up on my computer once and for all?

In general, this is what the script does: it searches the Downloads directory, searches for downloaded files and the corresponding .torrent files. Then it takes from the .torrent file the URL of the page on rutracker.org, loads this page and gets the name of the download. Then, based on the received text, the script renames the downloaded file (instead of renaming, you can choose to copy files to another directory, transfer or create links).

For the successful operation of the script, you need Python 3 and a uTorrent client with a small setup:
Prefferences -> Other -> Store .torrent files in:
Here you need to specify the directory where uTorrent will copy .torrent files. The fact is that uTorrent, when copying .torrent files to this directory, gives them the same name as the downloaded file. What I actually took advantage of.
In the script, this directory must be specified as TORRENT_DIR.

You also need to register the directory where the script will look for downloaded INPUT_DIR files.
And the directory where the script will copy files (transfer or create hard links), that is, OUTPUT_DIR.
')
Using the MOVE_ALGORITHM variable, you can specify one of the file transfer algorithms:


 # -*- encoding: utf-8 -*- import os import re import urllib.request import shutil import sys import ctypes INPUT_DIR = 'D:/Downloads/uTorrent/Completed' OUTPUT_DIR = 'D:/Video/Movies' TORRENT_DIR = 'D:/Downloads/uTorrent/torrent' MOVE_ALGORITHM = 'link' def GetMoveAlgorithms(): return {'move' : MoveFile, 'copy' : CopyFile, 'link' : CreateHardLink} def Print(msg): msg = msg.encode('cp866', 'replace') msg = msg.decode('cp866') print(msg) def GetTorrentFilePath(fileName): filePath = os.path.join(TORRENT_DIR, fileName + '.torrent') if not os.path.exists(filePath): Print('Skiped, .torrent is not found: "%s' % filePath) return None return filePath def GetTrackerUrl(torrentFilePath): try: torrentFile = open(torrentFilePath, 'r', encoding='ascii', errors='replace') fileData = torrentFile.read() trackerUrlLen, trackerUrl = re.search(r'comment([0-9]{2}):(.+)', fileData).groups() trackerUrl = re.search(r'(.{' + trackerUrlLen + '})', trackerUrl).groups()[0] return trackerUrl except: Print("Error, can't extract tracker url from .torrent file %s" % torrentFilePath) return None def LoadTrackerPage(trackerUrl): try: response = urllib.request.urlopen(trackerUrl) htmlPage = response.read() except: Print("Error, Can't load tracker page '%s'" % trackerUrl) return None htmlPage = htmlPage.decode('cp1251') return htmlPage def PrepareFileName(fileName): try: #remove special symbols fileName = re.sub(r'[\\/:"\*?<>|]+', '', fileName, 0, re.UNICODE) #remove repeating spaces fileName = re.sub(r'[ ]+', ' ', fileName, 0, re.UNICODE) fileName = fileName.strip() except: Print("Error, can't prepare file name '%s'" % fileName) return None return fileName class FileInfo: pass def ParseTrackerPage(htmlPage): try: pageTitle = re.search(r'<title>(.+?) :: .+?</title>', htmlPage, re.UNICODE).groups()[0] except: Print("Error, Can't parse <title>") return None fileInfo = FileInfo() fileInfo.name = "" fileInfo.year = "" fileInfo.descr = "" try: fileInfo.name, fileInfo.year, fileInfo.descr = re.search(r'(.+?) \[([0-9]{4}).*?, (.+?)\]', pageTitle, re.UNICODE).groups() except: Print("Warning, Can't parse page title: %s" % pageTitle) try: fileInfo.name, fileInfo.year, fileInfo.descr = re.search(r'(.+?)([0-9]{4}).*?, (.+?)$', pageTitle, re.UNICODE).groups() except: Print("Warning, Can't parse page title: %s" % pageTitle) fileInfo.name = pageTitle return fileInfo def GetDataFromTorrent(fileName): torrentFilePath = GetTorrentFilePath(fileName) if not torrentFilePath: return None trackerUrl = GetTrackerUrl(torrentFilePath) if not trackerUrl: return None htmlPage = LoadTrackerPage(trackerUrl) if not htmlPage: return None return ParseTrackerPage(htmlPage) def PrepareNewFileName(fileName, fileInfo): tmp, ext = os.path.splitext(fileName) toPrepare = fileInfo.name + ' (' + fileInfo.year + ') ' + fileInfo.descr cleanName = PrepareFileName(toPrepare) newFileName = cleanName + ext return newFileName def MoveFile(src, dst): shutil.move(src, dst) def CopyFile(src, dst): if os.path.isdir(src): for fileName in os.listdir(src): if not os.path.exists(dst): os.mkdir(dst) subSrc = os.path.join(src, fileName) subDst = os.path.join(dst, fileName) CopyFile(src, dst) else: if not os.path.exists(dst): shutil.copy2(src, dst) def CreateHardLink(src, dst): CreateHardLinkW = ctypes.windll.kernel32.CreateHardLinkW CreateHardLinkW.argtypes = (ctypes.c_wchar_p, ctypes.c_wchar_p, ctypes.c_void_p) CreateHardLinkW.restype = ctypes.c_int if os.path.isdir(src): for fileName in os.listdir(src): if not os.path.exists(dst): os.mkdir(dst) subSrc = os.path.join(src, fileName) subDst = os.path.join(dst, fileName) CreateHardLink(subSrc, subDst) else: if not os.path.exists(dst): if CreateHardLinkW(dst, src, 0) == 0: raise IOError def main(): Print('Hello, Find downloads in "%s" :' % INPUT_DIR) totalCount = 0 processedCount = 0 for fileName in os.listdir(INPUT_DIR): totalCount = totalCount + 1 Print('Process a file: "%s"' % fileName) fileInfo = GetDataFromTorrent(fileName) if fileInfo is None: continue sNewFileName = PrepareNewFileName(fileName, fileInfo) if sNewFileName: oldFilePath = os.path.join(INPUT_DIR, fileName) newFilePath = os.path.join(OUTPUT_DIR, sNewFileName) try: GetMoveAlgorithms()[MOVE_ALGORITHM](oldFilePath, newFilePath) processedCount = processedCount + 1 except: Print("Error, Can't move file from %s to %s" % (oldFilePath, newFilePath)) Print("%d friles were moved from %d total found files" % (processedCount, totalCount)) if __name__ == "__main__": main() 

The script was tested on Windows 7 x86 and Windows 7 x64
In theory, the script can be ported under Linux.

The script is sharpened by rutracker, but it will probably work with other sites.
At least after the completion of the file.

For downloading the html page I used the urllib2 library.
And for parsing .torrent files and html I used regular expressions.
It turned out that it is quite convenient to use them in Python.
In principle, it would be better to use css selectors for parsing html, but for now it will do.

Since Python is not my main language, I will be happy with the advice of experienced developers.

Update1:
In case if the example with the God.Bless.America.2011.HDTVRiP720.mkv file seems not convincing to you, here is a list of top files from the root tracker.
As for me, you can break your eyes and head while you understand what kind of movie is behind one of these titles.
And of course, cataloging programs like AllMyMovies or Movienizer cannot automatically find information about these files in the IMDB and Film Search databases.

  1. Na_grani_DVDRp_ [rutracker.org] _by_Inh.avi
  2. Ohotniki za golovani.chopper887.mkv
  3. Shvatka.chopper887.mkv
  4. prizrachnyi.gonschik_2.2012.hdrip.ac3.1450mb.by.riperrr.avi
  5. Dom_grez_BDRip_dub_ [rutracker.org] _by_Scarabey.avi
  6. Lubov.zhivet.tri.goda.2011.BDRip.avi
  7. Nepricosaemie.chopper887
  8. Missiya-Fantom.chopper887.avi
  9. Njanki.2012.O.DVDRip.IRONCLUB.avi
  10. samoubyici.2012.dvdrip.ac3.1450mb.by.riperrr.avi
  11. belyi.tigr.2012.dvdrip.ac3.2050mb.by.riperrr.avi
  12. Zhila.byla.odna.baba.2011.BDRip.1.46.avi
  13. svidanie.2012.dvdrip.ac3.1450mb.by.riperrr.avi
  14. Moy.Paren.Angel.BDRip.avi
  15. Visotskiy_Spasibo_Chto_Jivoy_2011_DVDRip_1.46_Menen.avi
  16. Inadequate people_1.46.avi
  17. Den.vyborov.2007.avi
  18. shapito.shou_1.2010.hdrip.ac3.1450mb.by.riperrr.avi
  19. belyi.tigr.2012.dvdrip.ac3.1450mb.by.riperrr.avi
  20. 2 days_1.46.avi


Update2:
On the proposals in the comments I decided to update the script:
  1. Moved to Python 3
  2. Separated input and output directories
  3. Added support for creating HardLinks for NTFS
  4. Added support for copying files

Source: https://habr.com/ru/post/144938/


All Articles