Configuring Amanda For Parallel Backups
Configuring Amanda For Parallel Backups
Abstract In todays computing environment running backups in parallel can often provide several benefits. These include the reduction of the backup window and increased throughput to the backup media. Understanding how to control your backup systems capability to run parallel backups is important if you want to optimize your backup environment. Amanda allows you to back up client systems in parallel. This short article describes some of the parameters that control parallel backups, and how they might be used to optimize your backups.
Audience
This paper is intended for people that are relatively new to Amanda planning and configuration. However, we assume you are familiar with the Amanda configuration files amanda.conf and the format of the disklist file. If you are unfamiliar with either of these you can refer to the Amanda man pages amanda.conf and amanda. (see http://wiki.zmanda.com/index.php/Man_pages). We also assume that you have been introduced to the concept of holding disks.
The illustration below shows the potential benefit of running the backups in parallel the total time for the backup can be less in a parallel backup.
Amanda uses several parameters and settings to determine the number of parallel backups. In summary they are: The amount of space available on the holding disk(s). The amount of estimated network bandwidth that will be consumed by each backup. The Parallel Backups setting in Zmanda Management Console, which is the inparallel setting in the amanda.conf file. This parameter limits the total number of parallel backups dispatched by the Amanda server. The default is 10. The Parallel Backups (clients) setting in Zmanda Management Console, which is The maxdumps setting for a given dumptype. This parameter limits the number of parallel backups dispatched for a single client.
We will examine each one separately, and include several examples of how they interact.
100 GB 5 GB 40 GB 25 GB 30 GB 200 GB
What about the remaining file systems? Since we have defined the backups to proceed from the smallest to the largest, /filesystem03 (40 GB) will be written to tape while the three file systems above are written to the holding disk.
Amanda knows that /filesystem001 will never fit on the holding disk. Since it is the largest backup, it will be dispatched last, and will go straight to tape. By increasing the holding disk to 100 GB we could achieve simultaneous backups of all 5 file systems: four would be dumped to the holding disk, while the largest is dumped to tape. The four being written to the holding disk will be flushed to the tape when the largest dump has completed. If we increase the holding disk size to 200 GB, then there would be enough room to dump all 5 file systems to the holding disk at the same time.
The above interface definitions are straightforward when the first interface is used Amanda will allow up to 500 kilobytes per second of backup data to be dispatched; the second will allow up to 1000 kilobyes per second to be dispatched. But how does Amanda know which interface a backup will use?
How does this alter which backups Amanda will dispatch? Based on the order of dumping (smallest to largest) Amanda would like to dispatch the backups for /filesystem02 and /filesystem04 first. However, both of these are defined to use fromchicago, which specifies to only dispatch up to 500 Kbps. /filesystem02 is estimated to run at 400 Kbps, so it can be dispatched. However, /filesystem04 cannot be dispatched, since it is estimated to need 500 Kbps.
The same process is repeated for the remaining backups in the list, which are specified to use the local interface. Up to 1500 Kbps of backups will be dispatched.
Lets assume that you have enough holding disk and network bandwidth settings to back up all six file systems in parallel; in order to make this happen you will need to change the Parallel Backups (Clients) parameter in Zmanda Management Console to 3, since there are three file systems on each client. Note that running these dumps in parallel is
good if the three file systems are on different spindles. If the three file systems are on the same spindle, then dump performance can suffer due to disk head thrashing.
In the first test no holding disk was used, thus no parallel backups were performed. It took 498 seconds to complete the backup of both clients. In the second test a holding disk was used. Further, the network bandwidth settings were set to allow for parallel backups. It took only 352 seconds to complete the backup of both clients, which is a savings of approximately 30%. An interesting note is the amount of time it took the clients to dump to the holding disk: 184 seconds. From the client systems perspective this represents a significant amount of savings, since once this step is complete no further resource demands are made of the client systems.
Conclusion
As you can see, Amanda gives you great flexibility, allowing you to specify how parallel backups are to be controlled: Holding disks Network bandwidth The global inparallel parameter The maxdumps parameter
By utilizing these parameters you can optimize your backups to perform the backups in the shortest possible backup window. You should carefully monitor your backups to make sure you are getting the amount of parallel backups that you need and desire.