Over the years various backup technologies have been developed in an attempt to minimize the amount of space required to store backup files, and to reduce the bandwidth required to transfer those files to remote locations. When faced with the different backup methods that many programs offer, it is easy to become confused, since the terminology used is often not very clear, and it is hard to know the benefits or drawbacks of any one technology. This article is meant to be a simple guide to help cut down on the frustration that many experience when they don’t know what certain terms mean, and how different options are best used.
Note: This is not, by far, an exhaustive glossary of backup terms. If you have questions about any terms that are not covered below, please feel welcome to ask in the comment section, and we will attempt to answer them for you.
Common Backup Methods
- Full Backups
- Differential Backups
- Incremental Backups
- Delta Block-level Backups
- Mirror Backups (Simple Copy)
Other Backup Methods and Techniques
This is just what it sounds like. This is a complete backup of all the data that a user selects when configuring a backup job. The copied files are usually placed into a single file archive and compressed to help save space. Every time another full backup is made, all the files in the source are once again copied an archive. The problem is that often there are only a few new or changed files, and continuously making full backups will end up copying a lot of extra files that don’t really need to be backed up again. This ends up using a lot of extra storage and wastes time. You can of course delete older backups to free up space, but the time is still lost. The extra wear on hard disks or the amount of bandwidth that is used to make frequent full backups must be considered too.
It is a much better idea to make a full backup once in a while, and then figure out a way to only copy the new or changed files on a more frequent basis. Several different methods, described below, have been created to implement this very thing.
Benefits and Disadvantages of Full Backups
- Faster restore of all files -- When a full restore is necessary full backups are quick because you are only dealing with one archive file.
- Full backups are large and time consuming to make -- They are not well suited for regular backups such as those performed hourly or daily.
After creating a full backup archive this backup method helps to reduce the size of subsequent backups by doing a “differential” comparison of the original files and the last full backup. All new and modified files are copied to a archive along side the full backup. The important thing to understand is that differential backups are cumulative. Each differential backup backs up everything that is different since the last full backup even if those files are already included in a previous differential. Since Differentials back up only new or changed files, they are a faster backup method than creating a full backup each time. Differential backups are well suited for daily or less frequent backup strategies.
Benefits and Disadvantages of Differential Backups
- Faster to restore that some other methods -- To do a full restore of all backup files, you only need the full backup and the last diff backup.
- Differential Backups are more demanding on storage than some of the other backup methods, because of data redundancy.
- Each subsequent differential grows significantly until it becomes necessary to create a new full backup. Then the process starts over.
This backup method works similarly to differential backups, but with one important difference that deals with the high level of data redundancy in differentials. Each incremental contains only the files that were created or modified since the last full backup or last incremental. Incrementals, while not containing as much redundant data as differentials, are still cumulative since successive backups will still contain any files that were already backed but have been modified in some way. Incremental backups are a good solution for more frequent backups such as those performed on an hourly basis.
Benefits and Disadvantages of Incremental Backups
- Incremental backups can be completed more quickly that differential backups because there is less redundant data being copied.
- Incremental backups are smaller than differential backups.
- The number of successive incrementals that can be made between full backups, while still remaining manageable, is much greater than with differentials.
- Incremental backups may take considerably longer to do complete restores than differential backups because all the individual archives must be merged together one by one with the full backup.
- It should be noted that a restore from an incremental backup may fail if one of the sequential backups were to be lost or damaged. Although in all the backups up to the damaged one should be recoverable.
The term “delta” is often used rather flexibly in reference to different backup technologies, but when paired with other terms as in “Delta Backup,” Delta Block Backup,” and “Delta-Style Backup” they generally refer to the same basic backup method. Deltas are best described as block-level technology, where as incrementals and differentials are file-level technologies. It is important to note that delta block techniques are only applied to modified files, not new files. New files are of course just backed up in a normal fashion.
File-level backups will backup a changed file in its entirety, even if it has only changed slightly. While this may not be much of a problem for small text documents, is can quickly become a problem with very large files like databases. Take for example the email clients like Outlook, which save all received email and attachments in single file databases. Even if only one email has been received, the entire database file has changed, and is backup again. Since these databases can easily grow to be several hundreds of megabytes in size you once again end up with a lot of data redundancy.
Delta backups deal with this problem by backing up only the parts files which have changed instead of the whole file. Each changed file is broken down in fixed size blocks and those blocks are compared with the original file. (The size of block that is handled is dependent on the particular program or perhaps on a user chosen size. Block sizes generally range between 1 and 32 kilobytes in size.) Only those blocks that contain differences are extracted and backed up. Deltas can be confusing because they can be applied in a couple of different ways. There are differential deltas, and incremental deltas. These work on the same principle as the differential and incremental file backups explained above, but at a much more granular level. Similarly each type of delta would inherit the same type of advantages and disadvantages.
Deltas are especially advantageous for use in technologies where files are backed up immediately after files are created or modified. This is known as real-time backup or continuous data protection. Deltas are also very beneficial when used to backup files over networks with limited bandwidth or to remote servers such as online storage.
Benefits and Disadvantages of Delta Style Backups
- Delta Backups are extremely fast because of the small amount of data being transferred.
- Deltas produce much less redundancy, and backups are fractionally smaller than those produced by incremental or differential backups. This dramatically reduces the demands on storage and bandwidth.
- Deltas of modified files do not produce whole files in the backup, and thus restores absolutely depend on the program that created them to do the restoration.
- Deltas are slower to restore because the individual files must be reconstructed from their various parts.
Binary patch technology was originally developed as a way for software developers to easily update their programs on customers over the Internet by sending “patches” that would replace the parts of files that needed modification. Recently it has started to be adapted into backup technologies as well. The most relevant example is a backup technology called FastBittm which is employed by number of online storage vendors.
Binary Patch Backups work very similarly to Deltas, the primary difference being they are even more granular. Deltas work on a block-level, while binary patches work on the, well, binary level. Because Deltas backup only the modified parts of files in fixed size blocks, part of that block may contain some unchanged data. Binary patches avoid this by only copying the actual bytes of the binary code that have changed.
Benefits and Disadvantages of Binary Patch Backups
Note: Due to the very limited application of binary patching technology in actual backup software, as well as very sparse information on the subject, the author is very uncertain about the benefits and/or limitations that may be inherent to the technique.
- Virtually eliminates all data redundancy, and produces the smallest backups possible with current technologies.
- It is even less bandwidth intensive than deltas.
- The production of the actual patch may be more demanding on system resources and more time consuming than deltas, although the loss may be regained in bandwidth and transfer costs.
- No information about how file reconstruction is handled and how efficient it is.
Most backup programs will list mirror backups as an alternative to full, differential, or incremental backups, etc. Some programs use an alternate term for mirrors, such as “simple copy.” Mirror backups are basically the simplest type of backup. There are no real backup technologies being employed when making a mirror style backup, only copy technology. If you copy and paste a folder from one drive to another you have created a mirror backup of that folder. The mirrored files generally exist in the same state they did in the source, not compressed into archives like with a full backup. (Although some programs support compressing each file individually and adding encryption)
When to Use Mirrored Backups
Mirror style backups without compressions are good to use when you are backing up a lot of files with compression already applied them. For example, music files in mp3 or wma format, images in jpg or png format, videos in dvix, mov, or flv format, and most program install or setup files are already compressed. If you include these files in a normal backup that applies compression you will often notice it will be very slow, and you will gain very little extra compression by doing so. It is best to set up separate backup jobs for compressed files and non compressed files. If your backup program supports include and exclude filters they can be used to either automatically select or deselect the compressed files respectively.
Benefits and Disadvantages of Mirror Backups
- Mirror backups are much faster when working with compressed files.
- Because mirrored files are not placed in single archive files there is less concern about corruption.
- Since mirror backups generally don’t use compression they can require large amounts of storage space, unless other techniques such as hard linking are also employed
Synthetic Full Backup is a term you will see from time to time and it should be understood that it is not a backup method like those above, but rather a technology that may be applied to one of the above methods to make full restores more efficient and require less down time.
Synthetics are generally only applied in server - client type backup systems. A client computer may perform a backup by any method, incremental, delta, etc. then transfer that backup to a server. At some point the server then combines several of the individual backup archives to form a synthetic full backup. Because of this, after the initial full backup, the client machine only needs to perform backups of new or modified files, another full backup will never be necessary.
The benefits of this approach are twofold. First, the backup speed of technologies like differentials won’t degrade over time because of the growing size of cumulative archives since a synthetic will be made on a regular basis. Secondly, when a full restore needs to be made on a client machine, no reconstruction of files or file parts needs to be done. The reconstruction has already been performed by the server allowing the client machine the fastest possible recovery time.
Some backup software has the ability to employ multiple hard links to preserve space when you wish to save multiple full mirror style backups of the same set of files.
To understand what a hard link is consider how files are stored on a hard drive. When you save a document file, the physical data can be written any where on the disk. Then the file system makes a reference or hard link to that physical data with the file name you specify. With some file systems it is possible to create more than one reference to that physical data. Using multiple hard links it is possible to assign any number of file names in different folders to the same physical data.
When using backup programs that support creating hard links to make several backups of the same files, the program will build hard links for all the files that have not changed. For example, if you create two copies of a folder that contains 100MB of data, they normally would end up using 200MB of space. With hard links they would only use 100MB of space. If you changed one 2MB file before you make the second copy using hard links, the two folders would consume 102MB of space.1 The first folder would contain the original 2MB file while the second would contain the modified one.
It should be mentioned that if you decide you want to delete one of the backups containing hard links, it is not a problem, as all the other hard links will be unaffected. The physical file on the disk is only deleted when all the hard links to it are removed. Also hard links can only exist within the same volume. ( e.g. they can not span across different partitions or drives) On Windows based file systems, NTFS supports hard links, while FAT does not.
1. Windows Explorer does not report file space as one would expect when using hard links. If a 100MB file has two hard links both links will be reported as consuming 100MB of space for a total of 200MB used. However, the space saved by the hard links is reflected in the amount of free space on the drive, only 100MB will have been consumed.