Continuing the topic of snapshots, mentioned in a recent
article , today I will talk about how Veeam Backup & Replication helps minimize the impact of snapshots on the “environment”.
For various reasons, sometimes snapshots become “invisible” for vCenter: information about them is not displayed in any reports and is not visible anywhere in the UI, but the snapshots themselves live and live on the storage system. A virtual machine is quietly using such a snapshot - and this is exactly what can lead to problems due to the “eaten up” space on the storage system and the drop in performance. Let's face it, the problems that have arisen due to the “invisible” snapshot is a common cause of contacting support.
To learn about methods of dealing with snapshots (visible and invisible), look under cat.
')

Tiny son created snapshot, and asked the crumb: "Is the picture good or is it bad?"
As you know, in a VMware environment, snapshots are additional disks on storage systems; All write operations are performed on them while the previous point is in the read-only state. In addition to the benefits, the existence of snapshot entails a number of not very pleasant consequences: the disk space occupied by snapshots is subtracted from the valuable storage arrays of storage, and due to the fact that reading-writing is done on different virtual disks, the performance of virtual machines may drop.
VMware administrators know that leaving a virtual machine snapshot open for read-write for a long time means, with a high probability, to create a problem for the efficient operation of the virtual infrastructure. Therefore, it is reasonable to track the presence of such snapshots, for example, periodically launching the
Active Snapshots report from the report package for VMware, included in the Veeam ONE solution.

This is a good way to control snapshots. However, there are situations where it may not be enough.
Who are the invisible snapshots, and how do they appear?
Let's take a closer look: in Veeam Backup & Replication, any backup or replication task starts with creating a virtual machine snapshot. This method allows you to correctly perform "freezing» (quiescence), i.e. Shut down data stored on a virtual disk - this ensures the consistency of the backup data. So, first of all, Veeam Backup & Replication sends a request to create snapshot to vCenter. After this has happened, the data of the “frozen” virtual disk are copied in whole or in part (if an incremental backup backup is performed). Veeam then sends a commit request to vCenter - i.e. all changes that were written to the snapshot delta file while the data was being copied must be made in the VMDK, and the snapshot must be deleted - this procedure is called consolidation.
Here such a turn of events is possible: even if vCenter reported on deleting snapshots, in reality this deletion could not have happened, and unclosed snapshots remained to live their own lives (and so that vCenter did not find out about this), with all the negative consequences. For example, an attempt to remove snapshots, even though the virtual machine disk was attached to HotAdd proxy, may cause an “invisible” snapshot to appear (for more on HotAdd proxy, see
here in English).
Veeam Backup & Replication v8 was able to overcome the problem of “invisible” snapshots that could be left behind by a backup or replication run. This was done using a feature called
Snapshot Hunter ("snapshot hunter"). "Hunter" tracks down such snapshots and automatically deletes them.
What do we see on the screen?
As soon as vSphere finishes (or thinks that it has completed) working with the snapshot of the virtual machine, a corresponding notification will appear in the client UI:

Immediately Snapshot Hunter connects to the virtual infrastructure and reads data from the storage system where the files of this virtual machine are stored. If the snapshot created during the backup is still there, information about this will be displayed in the session of the current task in the Veeam Backup & Replication console, and then the process of automatic consolidation begins.

You can also observe the work of the “hunter” in the
History view: in the tree on the left we find the
System node (system tasks) and filter the list of system tasks using the search bar. A list of tasks that work on snapshot consolidation appears in the right pane. Each such task is the "snapshot hunter":

What happens when this happens?
Immediately after the processing of the task is over and the report about the snapshot commit has passed (recall that in terms of Veeam Backup, the “task” is 1 virtual machine or 1 virtual disk if the machine has several), according to the following conditions:
- In vSphere, we look at the value of the Needs Consolidation attribute of a viral machine — if its value is Yes , this means that data consolidation has not occurred.
- Using the vCenter Server database, we check the number of registered snapshots with the number of delta disks - if these are different values, this also means that consolidation has failed.
If the check reveals that compulsory consolidation is necessary, then Veeam Backup plans to launch the system task (that is, the “hunter” Snapshot Hunter) in a separate thread.
The “hunter” starts the procedure of forced consolidation and removal of snapshots, acting according to the following algorithm (after each step, the verification is also carried out according to the conditions described above):
- First we try to use standard tools - we turn to Consolidate (the same native VMware mechanism that works when you select the Snapshot> Consolidate command for the virtual machine in the vSphere client)
- If this does not help, then we perform a hard consolidation, that is, a bunch of operations “create snapshot, then delete” - according to VMware, this should lead to the forced removal of all “invisible” snapshots, regardless of their origin (only visible snapshots remain intact eg created by user)
- In the end, we perform a hard consolidation with “freezing”, that is, a bunch of operations “create snapshot, then delete with“ freezing ”” (should have the same effect on “invisible” snapshots without touching visible ones).
If this algorithm does not lead to the desired result from the first time (the check still shows, for example, that
Needs Consolidation =
Yes ), then another 2 attempts will be made with an interval of 4 hours.
If after 12 hours the snapshot still cannot be deleted correctly (after steps 1–3), Veeam Backup will notify the user in writing about the presence of a “frozen” snapshot, since the problem most likely requires manual intervention. Namely, if you have sent email notifications in the general settings (as described
here ), you will receive an email with the following content:
"
VM virtual_machine_name needs snapshot consolidation, but all automatic snapshot attempts have failed.
It is a virtual disk being locked by some external process. Please follow the snapshot consolidation manually in vSphere Client. "
Or, speaking in Russian:
"
For VM virtual machine, you need to consolidate the snapshot. Attempts to auto-consolidate did not lead to success. Most likely, the virtual disk is locked due to some external process. Please identify the reason for this state of the virtual disk and consolidate the snapshot in manual mode using vSphere Client. "
And what if you need to consolidate yourself in manual mode?
After you have figured out what exactly is preventing the consolidation and removal of snapshots, and eliminating the root cause, it is recommended to follow the
procedures prescribed by VMware .
What snapshots does it work for?
Snapshot Hunter runs for all backup and replication tasks (for it, snapshots on the source side will be monitored) - for both normal and storage snapshots; It also works for backup with vCloud Director and for VeeamZIP. In this case, only snapshots created by these tasks during the operation of Veeam Backup & Replication are “captured”, and, for example, snapshots created by the users themselves will not be affected.
This mechanism is also used to identify those “invisible” snapshots that arose during the work of the older version of Veeam Backup & Replication tasks.
Does Snapshot Hunter's performance affect performance?
In relation to the resources of the snapshot hunter infrastructure, it behaves quite humanely: for example, if several machines are included in the backup task (or one machine, but with several virtual disks), tracking and deleting snapshots for them will go on in parallel. If, however, it turns out that the storage system is overloaded in terms of latency (that is, the read-write intensity has reached the threshold), then the “hunter” will not start consolidation until the intensity of operations decreases.
I also note that the “hunter” takes into account the backup window (backup window), provided that the schedule responsible for the backup task is configured and activated (as described
here ). In this case, before performing the consolidation, Snapshot Hunter will clarify whether the “window” reserved for the backup has been closed. If at any of the three attempts (including the first) it turns out that consolidation does not fit into the “window”, then it will not be launched, but instead the user will receive a notification about the need for manual intervention.
Where is his button?
Snapshot Hunter is always on by default, that is, you do not need to manually configure anything. If you need to disable it, you can do this by setting the value to
DisableAutoSnapshotConsolidation (DWORD) = 1 in the
HKLM \ SOFTWARE \ Veeam \ Veeam Backup and Replication registry key. However, if it turns out that consolidation is necessary, the user will receive a notification about the need to execute it independently, as described in the
VMware HF article .
Additional Information: