top of page
markjramos

VM Snapshot & Retention Policy

Updated: Mar 18, 2023


What is a Snapshot and what files are generated by a snapshot?

A VMware snapshot is a copy of the virtual machine's disk file (VMDK) at a given point in time. Snapshots provide a change log for the virtual disk and are used to restore a VM to a particular point in time when a failure or system error occurs. A VM snapshot file consists of all the files stored on the storage devices of a virtual machine. Taking a snapshot creates files with extensions:.vmdk, -delta.vmdk, .vmsd, and .vmsn, which are stored with the VM base files. The delta files are stored with the base VMDK file, which is stored in read-only mode to preserve its state. And VMSD and VMSN files are stored in the VM directory.


When you take a snapshot, the original VMDK file with the current disk state is preserved in read-only mode, and the guest OS can no longer make changes to it. Instead, a delta or a child disk file called delta.vmdk is created to which the guest OS can write. It holds the current disk state and the state that existed when the last VM snapshot was taken. The delta disk has two files: a disk descriptor file (with extension .vmsd) that contains information about the VM snapshot—like relationships between snapshots and child disks for each snapshot—and a flat file (called flat.vmdk) with raw data.


Optionally, you can take a memory snapshot which also captures the memory state of the virtual machine. A memory snapshot also includes a memory state file (with extension .vmsn) that holds the memory of the VM at the time of the snapshot capture. The size of the memory file and the time it takes to capture the memory state depends on the configured maximum memory for the original/parent VM.


Retention Policy & Best Practices Follow these best practices when using VMware snapshots in the vSphere environment:

•Do not use VMware snapshots as backups.

The snapshot file is only a change log of the original virtual disk, it creates a place

holder disk,virtual_machine-00000x-delta.vmdk, to store data changes since the

time the snapshot was created. If the base disks are deleted, the snapshot files are

not sufficient to restore a virtual machine.


•Maximum of 32 snapshots are supported in a chain. However, for a better

performance use only 2 to 3 snapshots.


Do not use a single snapshot for more than 72 hours.



Do not trust the Snapshot Manager!

The snapshot manager is a great utility, but it often has the wrong information in it. It can show that there are no snapshots on the VM when in reality the VM has snapshots. The problem was that the snapshot commit process actually removes the entry from the snapshot manager configuration file before it commits the snapshot. So we would see the snapshot manager showing that there are no snapshots, where the VM actually has many snapshots on it. The snapshot manager gets it’s information from a configuration file in the VM’s directory. This file is named.vmsd. It is the virtual machine snapshot descriptor file. Below we have an example of a .vmsd with a single snapshot in it.


# cat vps.vmsd

.encoding = "UTF-8"

snapshot.lastUID = "4"

snapshot.current = "2"

snapshot0.uid = "2"

snapshot0.filename = "vps-Snapshot2.vmsn"

snapshot0.displayName = "test1" snapshot0.createTimeHigh =

"310596"

snapshot0.createTimeLow = "938781285" snapshot0.numDisks =

"1"

snapshot0.disk0.fileName = "vps.vmdk"

snapshot0.disk0.node = "scsi0:0"

snapshot.numSnapshots = "1"


We can see that snapshot.numSnapshots = 1, so we only have a single snapshot. That snapshot is identified by the displayName, test1, and refers to vps-Snapshot2.vmsn. The vps-Snapshot2.vmsn file is the snapshot memory file. It actually contains a little more than just the memory. If you were to hexdump this file you would see that it is a binary file that contains a copy of the vmx at the time and more. The file points to the delta disk files as well. So this file would contain everything needed to restore to the state the VM was running at when the snapshot was taken. In this case we did not take the memory in the snapshot so it is small and does not contain the memory dump. Back to the Snapshot Manager, we can see that the .vmsd file is really just a text file that contains snapshot information. If this file has a syntax error or it was not updated properly, the snapshot manager would not have the correct information. It is always worth verifying if the VM actually has snapshots.

Using the command line to see if a VM has snapshots

The best way to determine if a VM is running on a snapshot is to query the VMX file to see what the disk it is has attached. There may be snapshot files in the VM’s directory that are not actually attached to the VM or there may be many different snapshot chains, but the VMX has the current chain.In the example below we can see that the VM is running of snapshot vps-000001.vmdk.#


grep vmdk vps.vmx

scsi0:0.fileName = "vps-000001.vmdk"


In the example below the VM is running of a base disk on a different datastore.

# grep vmdk vps.vmx

scsi0:0.fileName = "/vmfs/volumes/4e4043f8-160ce551-98d6-

00221591be89/vps/vps.vmdk"


Old Snapshot files can be left around in the VM’s Directory

It is not uncommon to find VMs that have old snapshot laying around that are not actually attached. This happens for various reasons. Most times it happens when someone has manually modified the VMX or cloned the VM.

Before modifying anything, it is best to confirm that the snapshots are not actually in use. To confirm we go to the command line and see what disk the VM is running on.

# grep vmdk vps.vmx

scsi0:0.fileName = "vps.vmdk"


In this case it is running off of the base disk, so let’s look at the VMDKs in the VM’s directory.

# ls *.vmdk

vps-000001-delta.vmdk vps-000001.vmdk vps-flat.vmdk vps.vmdk


We can see above that there is indeed a snapshot in the directory that is not in use by this VM. Since this VM is turned on and using the base disk, this snapshot would be invalidated. Let’s confirm that it was pointing to the base disk, vps.vmdk.

# grep parentFileNameHint vps-000001.vmdk

parentFileNameHint="vps.vmdk"


Since the parentFileNameHint is vps.vmdk, we know that it was pointing to the base disk. Since the timestamp of the vps-flat.vmdk is newer than the vps-000001-delta.vmdk, this snapshot is likely invalidated.


# ls -lrt *.vmdk

-rw——-1 root root 332 Apr 9 20:01 vps-000001.vmdk

-rw——-1 root root 33579008 Apr 9 22:08 vps-000001-delta.vmdk

-rw——-1 root root 513 Apr 9 22:10 vps.vmdk

-rw——-1 root root 10737418240 Apr 9 22:11 vps-flat.vmdk


We can move these files out of the way and remove them at a later time. The idea behind removing them later is for the sake of data integrity. If you remove a file that contains important data, it could cost thousands of dollars to get a datarecovery company to recover it.


# mkdir deleteme \# mv vps-000001.vmdk vps-000001-delta.vmdk deleteme/

Delete All Snapshots

The “Delete All” option will commit the current state of the VM into the base disk and get rid of all of the snapshot files. This will clean up the snapshots and get the VM back on the base disk even if the snapshots are not showing up in Snapshot Manager.


If the “Delete All” operation has previously failed for any reason or if there are many snapshots on the VM, it should not be considered a Data Safe operation. In this situation, you should consider that the backups may not be complete or working. It is worth taking the extra time to get a file level backup before doing a snapshot commit. The reason that it is not considered safe is because it modifies the original files. There is no “undo button” for a snapshot commit, so make sure that the VM is in the right state before doing the “delete all”. That said, I have not seen an issue where a customer has run into data loss with a “delete all”.


If the VM has not had any problems doing a commit and the snapshots are not very large a “Delete All” operation is the best choice. The downsides come when there are many snapshots or the snapshots are very large. When the snapshots are very large it can take a long time for the host to read the snapshot data, calculate the current state of the data, and then write it back to the base disk one snapshot at a time. This is also very I/O intensive and has been seen to hang VMs. It is a better idea to commit large snapshot in non production hours to avoid downtime and responsiveness issues.


When using “Delete All” snapshots, VMware will commit the snapshots one by one into the base disk. This was changed from the previous versions where it would commit the snapshots from top to bottom. Since committing the snapshots from top to bottom would inflate the intermediate snapshots and take a large amount of space. This would often cause customers to run out of space when committing snapshots.


Always Clone Snapshots!

Say that you were to find a VM that has snapshots on it. It could have been running off of these snapshots for a very long time, which means that you have a lot of data in the delta files. If you are using a VMDK level backup solution, you can assume that your backups are invalid as they would have been backing up the delta files.


Always clone the snapshots if there is any doubt to the backups or the integrity of the data. Cloning will not modify the original snapshots so we can always go back to the original files if we encounter problems. It is considered a data safe procedure for data integrity.


Another benefit is that cloning the snapshots is faster that committing when the snapshots are large. Cloning will read through the snapshots an determine the current state of the snapshots and then write that to a new file. This means that the maximum amount of data written will be up to the size of the base disk. Deleting snapshots can write much more.


VMDK level clones

Clones can be taken from the VM level and from the VMDK level. The VMDK level clones offer better flexibility in terms of what snapshot to clone from along with where it is going. The downside is that the VM should be off for the process.


When cloning a VMDK the new VMDK will not have any snapshots, but it will contain all of the data of the last snapshot that it was cloned from. There are benefits to VMDK level clones listed below.

•A cloned VMDK does not have any snapshots

•Flexible

•Can clone one VMDK at a time

•Can clone from different snapshot levels

•Can be canceled safely


So how do we clone a VMDK? Well we do this from the command line. The first thing to do is figure out what snapshot we want to clone, so determine which snapshot you want to clone and then find a location to clone it. Once you know where you can put the new clones you can use the vmkfstools command to clone the VMDK.


The syntax for the clone is below

This example will clone the current snapshot to the same folder with a new name. The disk will be in ZeroedThick format.

# grep vmdk vps.vmx

scsi0:0.fileName = "vps-000001.vmdk"


\# vmkfstools -i vps-000001.vmdk vps.new.vmdk

Destination disk format: VMFS zeroedthick

Cloning disk ‘vps-000001.vmdk’…


Clone: 100% done.


The example below will do the same thing, but make the destination disk thin provisioned.

# grep vmdk vps.vmx

scsi0:0.fileName = "vps-000001.vmdk"

\# vmkfstools -d thin -i vps-000001.vmdk vps.new.vmdk

Destination disk format: VMFS thin-provisioned

Cloning disk ‘vps-000001.vmdk’…


Clone: 100% done.


The example below will clone the snapshot to another datastore using thin.


# vmkfstools -d thin -i vps-000001.vmdk

/vmfs/volumes/iscsi_dev/vps/vps.vmdk

Destination disk format: VMFS thin-provisioned

Cloning disk ‘vps-000001.vmdk’…


Clone: 100% done

11 views0 comments

Recent Posts

See All

Comments


bottom of page