📜 ⬆️ ⬇️

Backup on hardlinks under Windows

I, like many, thought about backups. Slowly I thought and thought, considered different options, until the hard drive burned on my wife's laptop. This sad event spurred my activity, the result of which I want to present in this article.

It's about backup. I will discuss in detail exactly my task. Some people may have different circumstances, but there should be a lot of people like me. Therefore, I hope my advice will be useful to a large number of people.

To begin with, we decide what to keep. For me, in the first place photos. In 2003, I bought the first digital camera, and since then all my photos are stored in digital form. My parents have an album with a photo of my great-grandmothers and great-grandfathers. I will not leave anything on paper newer than 2003. Therefore, we must especially take care of the safety of digital photos. Now my folder with the photo "weighs" 180 gigabytes and grows in gigabytes another way per month.

In addition to the photo there are files with a variety of projects. For example, this article will be one of them. These are text files, programs. Places occupy a little, but a lot of work is invested in them. There is also a collection of music, part of which I personally digitized from CDs. At one time I was keen on video editing - since then there are about 300 gigabytes of operating time left. All this would be very sad to lose. Another data storage is a working laptop. In our company, everything valuable is stored centrally: the code in SVN, documents at Sharepoint, etc. IT is responsible for their safety, and so far there have been no problems. However, I have a collection of files on my machine that I made for myself. It increases my productivity, and I would like to have some guarantees that all this will not disappear if the hard drive once orders to live long.
image
')
A few words about what I do not need to copy. It makes no sense to save distros and movies. What is downloaded from the network, you can download again. And distributions lose relevance after the release of newer versions. I also do not need to back up the system. In my experience, the collapse of the system happens quite rarely, and is a good reason to upgrade the OS version, the programs used, and even upgrade the hardware.

All my computers work under Windows. I thought about putting Linux on a second system specifically for backup, but then I decided not to do that. I have very little experience with unix systems, which means that I will spend too much time setting up, I’m probably going to do something wrong. Backup will be inconvenient, it will be rarely done.

Now, about what threats I'm trying to protect. First, it is a sudden hard drive failure. A RAID array helps with this scourge. My current computer has RAID-1, but, on reflection, I decided not to limit myself to this. In life there is anything. For example, they talk about unexpected power surges, as a result of which the entire contents of the system unit burns. Also, it is much more likely that you can get under the attack of a particularly evil virus with very sad consequences. I once had problems without a virus: I installed Windows from the distribution with an older service pack than I had before. I started the check disk, which did not recognize the files created by the previous system, and began to “fix” them, causing chaos and destruction. In general, you can remove valuable information with your own hands, by mistake.

In my opinion, the ideal solution is to backup to an external USB-drive. Now you can buy enough discs. If you connect it to the computer only occasionally, you can not be afraid of power surges and virus attacks. The reliability of such a disk is no worse than the reliability of conventional hard drives.

It remains to decide how to make backups. The easiest way: copy everything. However, in the long run it does not work. Files on the computer are constantly changing. To make a copy of everything every time is not an option. The catalog with pictures, for example, basically remains the same, only something is added (although small changes in old files cannot be ruled out). So you have to keep track of the difference. Immediately there is a lot of questions: how to do it, whether it is necessary to store old copies, etc. Surely all these issues have somehow been resolved. I tried to find out how other people solve these problems. Ideally, you need some kind of software that summarizes the experience of generations of computer scientists, and would do everything in the best possible way. Here is the story of my searches.
The first thing I was advised Norton Ghost. However, it turned out that this product is no longer for sale, but instead some kind of Symantec system recovery. This software is sharpened by the backup system. It is probably well suited to those who support a large fleet of similar machines. Did something happen to the accountant's computer? It does not matter, for half an hour we restore a clean system from the archive, and the data is still in 1C. I, in which case, it is better to rearrange the system, combining it with the transition to more recent versions of the operating system and programs that I use.

The next item was considered "regular funds". This is “data archiving” in Windows 7, or ntbackup in XP. I studied the question a bit, and it turned out that Windows 7 makes an incremental backup, however, in some strange and intricate way. In the vastness of the network, I found this screenshot:


According to it and the descriptions we can conclude that the system for the first time packs all the data into the archive, and the next time it finds the changed files, and creates an archive with a delta. I do not know for whom this algorithm is intended, but it obviously does not suit me. First of all, my main volume is created by pictures that are not necessary to reap. The data that zhmutsya well, I have not a very large amount. In general, now even office documents are a zip archive. Secondly, it is absolutely not clear how to administer this set of zipnikov. Is it possible, for example, to remove one of them if space is urgently needed? Can I leave some specific versions of the folder I need and delete the rest? And finally, the most unpleasant - there is no certainty that this data can be restored without problems in the future. The backup algorithm changed when moving from XP to 7-ke. There is no certainty that any Windows 10 will not forget about these backups. Or maybe I'll switch to Mac or Linux or some future Android for desktops - who knows. In general, the mechanism that suits me should give a result in an understandable form, using common formats, and so that it can be deployed without proprietary software.

Another potential data storage solution is to keep everything online. But here, too, there are nuances. First, it is expensive. I need to store up to a terabyte of data. How much does it cost?


In addition to the high cost, it doesn’t leave anxiety, but what if on one not very beautiful day the files are inaccessible? For example, hackers can take possession of my account, use it to send spam, and the administration will close it (so I was spared from an account on Odnoklassniki). You can also see the inscription “this service is not available in your country” (yes, as on YouTube). Theoretically, the service itself could lose popularity and close (by the way, how is my website doing there on narod.ru?). It is terrible to imagine what will become of my data if the company that holds them one day goes bankrupt.

Fortunately, on the Internet there was a way to solve the problem with backup under windows: blog.jay2k1.com/2011/08/13/how-to-create-rsync-like-hard-link-backups-with-vss-on -windows

The problem was solved by running Linux rsync using the cygwin library and transferring data to it through the Volume Shadow Copy Service or VSS. What does this give?
The following is a free translation of a piece of an article by reference.

Firstly, a small educational program about hardlinks for those who do not know what it is (I did not know about it). Suppose you have a text file with the words "hello the bear" at c: \ files \ hello.txt. Its contents are recorded on a disk in some place, suppose at position 10246, and this is reflected in the "table of contents" of the disk. The table of contents states that "there is a file hello.txt, it lies in the c: \ files folder, and its contents lie in position 10246". The link between the file name and the piece of data is called hardlink. When you delete a file, the data is not erased, but only the hardlink is deleted. Without hardlinking, the system no longer knows what data is at that address, and if no other hardlink indicates position 10246, this position is considered free. Over time, it can rub down other data. In general, “deleting a file” is actually a hardlink removal.

Writing in the "disk contents" does not take up much space, so many hardlinks to the same file are allowed. In our example, you can create a second hardlink to our file, say, c: \ test.txt. Then both descriptors (c: \ test.txt and c: \ files \ hello.txt) will point to the same content (“translated bear” at position 10246). Everything will look as if there are 2 files with the same content, but in fact it is the same file, only with different names. If you open c: \ files \ hello.txt and enter “author drink yadu” there, then close this file and open c: \ test.txt - we will see these changes in it.

Hardlinks for backups have two very useful properties: 1) they take almost no space, and 2) the file is not deleted as long as at least one hard link refers to it. The backup algorithm based on this works like this:
First, we copy all the data into the backup folder, say, in “f: \ backups \ 1 \“. The next time you start, compare the file with the previous backup set.



All this magic on working with hardlinks is done using the rsync utility, which comes from * nix systems, but works under Windows using cygwin's libraries. But that is not all. The fact is that the Windows prohibits copying some files if they are blocked by the programs that opened them. To get around this, rsync copies files using VSS. As Wikipedia says, VSS is a service of the Windows operating system that allows you to copy files that are currently being worked on. You can even copy system and locked files. The service is necessary for the operation of the following programs: system recovery, backup programs (Acronis True Image and others).

This link (http://pub.jay2k1.com/rsyncbackup.zip) provides an assembly that performs a backup. It works according to the following algorithm: in the vss-exec.cmd file, you must specify which disk will be backed up and where the copies will be added. Also in the rsync-excludes.txt file, you can specify exception folders that are not subjected to backup.
The backup is started using _start_backup.cmd. A folder with a computer name is created in the backup folder, and a set of backup folders are created in it, which are called according to the backup time. In vss-exec.cmd, the maximum number of backups is configured. If this number is reached, the script deletes the first one, and then creates another backup of the above algorithm.

This is practically what I need, but there are a couple of moments. This backup script is designed for a situation where the entire contents of a disk is copied to some other disk in the same computer. My paranoia requires backup to external media, and in this case there are some problems:



I corrected the script by implementing the following script:




The script can be downloaded (along with all called utilities) here:
www.dropbox.com/s/un82qyrsuq3r3j0/rsync_backup_for_win.zip

I want to immediately say a few words in my defense. I never seriously wrote cmd files, and it turned out to be a real torture for me. The program stops working if you put a space after the equal sign. Sometimes it even reacts to spaces at the end of a line. A separate song is a proprietary chip with the scope of variables. The initial script, apparently, was written by a German-speaking programmer, so do not be surprised at structures like “set stunde =% ZEIT: ~ -8,2%”. A piece of code that parses the ini files, I took from the network with minimal changes. This was the first thing that was found on the request “reading the configuration from a file in cmd-script” Therefore, an ini-file is used, although its capabilities are clearly unnecessary. An example of the backup_folders.ini file is below:

[backup folders] photo=F:\Photo lit=F:\literature music=F:\MUSIC desktop=F:\Desktop 


The category name [backup folders] does not affect anything, but it is better to leave it as it is. The name of the variable (the characters before the equal sign) also do not go anywhere - treat them as a comment. The path to the backup folder is after the equal sign. It can be taken from the address bar of windows explorer. As you can guess, in this utility vinigrette each has its own view on which slashes to use along the path to the file, and which letter (large or small) to call disks. The script adapts the path from windows explorer to the whims of these utilities.
The script leaves a log with the results of rsync.

Reported problem: does not want to copy files with Japanese characters and other non-standard characters in the names. Search forums gave nothing. They write that this is cygwin's bug, it is recommended to put Japanese in the system language. But then, files with Russian letters will stop being copied. I do not know what to do, I decided that this problem can be ignored.

The rest works great. Shadow Copy allows you to copy everything, you can safely work on the machine while copying, and nothing falls off. The time of the initial backup depends on the speed of the disk. I have more than a million files on 200 gigabytes copied about 8 hours. If there are no changes, and there is nothing to copy - the backup is done in the order of tens of minutes.

Source: https://habr.com/ru/post/205204/


All Articles