File Protection - Using Rsync Whitepaper
File Protection - Using Rsync Whitepaper
Whitepaper
Contents
1. Introduction .....................................................................................................................................2
Documentation ..................................................................................................................................................................2
Licensing ...............................................................................................................................................................................2
Terminology ........................................................................................................................................................................2
Adding Rsync backups to your backup strategy is an excellent way of insure yourself against data loss.
Critical files can be copied to a secure, offsite location, away from your office, and backing up across
the internet overcomes the need to swap tapes or hard drives. Once you've selected the host where
your data will be stored, no further equipment or maintenance is required. Additional storage space
can be easily added to the data host as your data requirements grow, so you don't have to worry
about purchasing replacement hardware. Best of all, your critical files are available whenever you need
them and can be accessed from wherever you are, using BackupAssist.
Documentation
This whitepaper provides a comprehensive guide to File Protection using Rsync and can be used in
conjunction with other BackupAssist guides.
For information on BackupAssist File Protection, see the: BackupAssist File Protection Whitepaper.
For information on the BackupAssist Backup tab, see the BackupAssist Backup Tab Whitepaper.
For information on the BackupAssist Restore tab, see the BackupAssist Restore Tab Whitepaper.
For information on the BackupAssist Recover tab, see the BackupAssist Recover Tab Whitepaper.
Licensing
File Protection is a standard feature included with the BackupAssist license. To back up data across the
internet with Rsync, requires the Rsync Add-on license, once the initial trial period has expired. Please
contact your local BackupAssist reseller or distributor for pricing information, or visit
www.BackupAssist.com.
For instructions on how to activate / deactivate license keys, visit out Licensing BackupAssist page.
Terminology
In order to avoid confusion about the use of the words “client”, “server”, “Windows Server” and “Rsync
Server”, we will use the following terms to avoid ambiguity:
Data Host: The remote machine that will be used as your backup destination.
Rsync Server: The same as the data host, but specifically referring to the machine running Rsync that
accepts incoming connections and data from Rsync clients.
Rsync Client: A machine that contains your working data (typically a file server) that has BackupAssist
installed. BackupAssist comes packaged with the Rsync libraries necessary to transfer data to the Rsync
Server during a backup.
Rsync uses a checksum method to perform the bit level data transfer. Rsync checks whether any data
has changed by looking at the file size and modification date. If no data has changed, Rsync will not
transfer any data, saving time and bandwidth. If files do not match, Rsync uses a checksum method
called a rolling checksum on the changed files to see where it has been altered or appended. It will
then transfer only the altered or appended data within the file.
Rsync can cater for data that has been inserted, added, removed and shifted, with a minimum transfer
overhead. In real terms, that means more efficient use of your bandwidth and data allowances. As
Rsync will only transfer data that has changed (and knows when file alterations or movements have
occurred) your Internet based backups will take a lot less time when compared other methods such as
FTP.
Implementation
To help better understand how Rsync transfers work we will take a look at a hypothetical three day
backup scenario.
The scenario examines three different backup methods: Rsync, FTP and incremental drive imaging.
Local Server
Data transferred:
Rsync ~2GB (2:1 compression)
FTP ~ 4GB
Incremental Drive Image ~ 4GB
Looking at this first backup we see that for the initial data transfer there is a 100% transfer for both
Incremental drive imaging and for FTP. Thanks to Rsync‟s packet compression we see a 50% reduction
in the initial transfer. Depending on your Rsync server‟s setup this initial overhead can be removed by
seeding your backup server locally, a method we will discuss later in this paper.
Local Server
We can see that both FTP and incremental drive imaging perform a full backup of the file. Rsync only
backs up the changed data within the file, and compresses the sent data, resulting in a 50mb transfer.
Day 3: This day no data has been added, but data has been shifted within the file.
Local Server
Back
up
Rsync is able to recognize that the data is already on the backup server and will reorganize the file with
a minimal instruction file. Incremental drive imaging is also aware that the data was moved, however it
must re-backup the moved data as this section does not match the data source. FTP once again has to
do a full backup of the source data.
Summary
As demonstrated in this example, Rsync delivers substantial performance gains. With the ability to
check what data is still the same, then append, remove or modify it as necessary to match the local
source it can greatly reduce backup overhead.
For more information on how to get the most out of Rsync, visit our Video Presentations page.
Amazon account
In your Amazon Web Services account, you will need to obtain your Access Key ID and generate a
Secret Access Key. Then you will need to create an S3 bucket to use for your backups. See this
article for a guide to the Amazon S3 Simple Storage Service.
S3Rsync account
When you sign up for an s3rsync.com account, you will be given a username and a private SSH key
file. Save the SSH key file somewhere on the machine on which you wish to run BackupAssist.
BackupAssist.
Once you have performed these steps, you can set up your job in BackupAssist using the S3Rsync
Destination selection. See the Creating a File Protection backup using Rsync section of this
whitepaper for more information.
Do-it-yourself host
Any Rsync Server such as an Rsync-enabled NAS device, Windows or Unix machine can be used to
store backups using Rsync. The do-it-yourself approach has the advantage of keeping data in your
control, and a lack of monthly hosting fees or limits to the amount of data backed up.
Using your existing internet connection and hardware can be a cost effective solution. A popular
choice of destination is an Rsync-enabled NAS device placed in the business owner‟s home. Legal firms
especially appreciate this approach, since control over information is their primary concern.
In the following sections, the Windows and Linux data hosts support Rsync over SSH. However, some
NAS devices do not, and Daemon mode must be used instead. Daemon mode is still an acceptable
solution provided a secured LAN/WAN (such as site-to-site VPN) is used.
Prerequisites:
Windows Server 2003 (or later) machine with network connectivity and space to store backup data.
Windows Server 2008 or 2012 are highly recommended because of their support for both backup
histories and single-instance store in Rsync backup solutions.
Windows Small Business Servers (SBS) should not be used as Rsync hosts.
The cwRsyncServer installer.
The CopSSH installer.
BackupAssist v5.1.0 or later installed on the Windows machine you want to back up (i.e. the client).
Installing cwRsync:
Installing CopSSH:
Activating a user
If you are planning to use SSH, then before you register a BackupAssist client with your Rsync server,
you must activate a user with CopSSH.
1. In the Start menu, under All Programs -> CopSSH, select. The CopSSH Control Panel will open.
2. To start the process to activate a user, click on the Users tab across the top of the user interface.
3. Click on the Add button to bring up the wizard to activate a user.
DO NOT ACTIVATE USING YOUR ADMINISTRATOR ACCOUNT. Doing so will cause a lock down on
the account due to CopSSH‟s security settings. We recommend activating a newly created account.
7. On the fourth screen, click on Apply to complete the wizard and activate the user.
The user should now be showing as activated within the CopSSH Control Panel.
Your user‟s home directory will be located at (for example) C:\Program Files\ICW\home\user.
The location of this directory can be changed by editing the file C:\Program Files\ICW\etc\passwd.
Note: If you uninstall the Rsync server, be aware that the Windows service users SvcCOPSSH and
SvcCWRSYNC are not removed. So if you then re-install the CWRsync Server package the Windows users
cannot be recreated because the passwords will not match. This ultimately means the COPSSH and Rsync
services will not start on the server. The fix is to uninstall and remove the users manually then re-install to
add the users again with known passwords.
Note: You can choose to run Rsync as a daemon on your Linux server. (For security reasons, we do not
recommend this – use Rsync over SSH instead.) If you choose to run Rsync in daemon mode, you will not need
to have the SSH service installed. For instructions on setting up BackupAssist to connect to an Rsync daemon
please view the Configuring the BackupAssist client for a NAS server section below.
To determine if your system has the prerequisites installed, log into your system, start a shell and type:
man rsync – this should return the man page for Rsync if installed. Type „q‟ to exit the man page.
man sshd – this should return the man page for sshd if installed. Type „q‟ to exit the man page.
You should use your distribution‟s software package manager to install these packages, if they are not
already installed. Most commonly they can be found under the Server or Security categories. The next
step is to create logons on your data host. We recommend creating a separate logon for each client.
For example, if you host data for 5 different companies, create 5 different accounts so that each
company will only be able to see their own data. You should also make sure that each client‟s home
directories are on a partition that contains sufficient space to host their data.
You must also change the permissions on each user‟s home directory, otherwise most SSH daemons
will not allow you to connect to the server using the public/private key method (which BackupAssist
uses). To do this, use the chmod command – for example for a user “fred”, type in the following (when
logged on as root): chmod 700 /home/fred
A NAS that is running Rsync as a daemon, or one that has Rsync and an SSH service running.
Setup a share to act as a root directory for your Rsync backups and allow read and write
permissions to that directory.
If your NAS requires a password to connect to the Rsync service, you will need BackupAssist to
authenticate to it.
Your NAS will need to have the correct ports open for your Rsync Daemon or SSH service (873 and
22 respectively).
The options vary from device to device. You will need to consult your manual to setup the destination.
QNAP : drobo : NETGEAR : Synology > Click on any of the vendor below to go to their website.
Tip Description
When you select hardware to use as an Rsync server, make sure the
hardware can support the Rsync protocol.
Make sure it’s Rsync
compatible If you select a Windows system, it must be able to run cwRsync.
Although you may think you have enough disk space available when
you first implement your Rsync solution, a common cause of Rsync
problems is that the storage space eventually runs out.
Ensure there is plenty of Some of the BackupAssist backup schemes are designed to retain
disk space available significant amounts of data – meaning the space you have can be used
up faster than you expect!
Running out of disk space is a common problem and it can cause a lot
of problems when it occurs. For this reason, the available storage
space on your Rsync host should be monitored.
If you‟re planning on using a NAS device, you can run your seed
Seed your backup backup by connecting your NAS device directly to the local network.
This avoids having to seed to a USB drive, and then running the seed
to the NAS device in a two-step process (saving you a lot of time).
Even though you are logged in as a Domain Admin, most NAS devices
Double check permissions require users to be set up locally within the unit and have permissions
configured locally as well. If you receive permission issues, this is
usually the reason as to why.
Exchange VM Detection
If your backup job contains a Hyper-V guest with an Exchange Server, the authentication information
for that guest should be entered into the Exchange VM Detection tab on the Selection screen when
you create the backup job. With these credentials, BackupAssist can detect what guests have an
Exchange Server, and list the EDB file available for each guest when you perform a restore using the
Exchange Granular Restore console
The Exchange VM Detection tab will appear when the Hyper-V role is installed and running on the
server. If you are backing up multiple Exchange guests, each one should have the same username and
password.
The Hyper-V process is automated but the restore requires both the Exchange Granular Restore Add-
on and the Hyper-V Granular Restore Add-on licenses.
In some cases, only applications that are running will be detected. If an application is not listed, try re-
starting the application and the VSS service and then click the Refresh button in BackupAssist.
For Windows Small Business Server 2003, a registry entry modification is needed to see an Exchange
Server. See our online blog post, Backing up Exchange with SBS 2003, for more information.
A recovery is the process by which a computer is recovered after hardware has been replaced or an
operating system failure has occurred, and your computer can no longer start itself. To perform a
recovery you need a bootable media to start your computer, and an image backup that the bootable
media can use to recover your operating system, data and applications.
For more information on data recovery, see the Recover tab & RecoverAssist Whitepaper.
The use of Rsync is best suited to a regular file system. Due to the creation of rolling checksums on
altered backup files, it is disadvantageous to have files combined into an archive. This is because only
files that are flagged as altered will have the rolling checksum performed on them. If you have a very
large single archive file (>100 GB) it will take much longer to complete the rolling checksum process,
even if only a small element has changed. This may or may not be a problem, depending on the
processing power of your Rsync server.
DEVICE QNAP TS-209II with Rsync Ubuntu 9.04 desktop with Rsync
Single-Instance store
File Protection backups cannot use single-instance store when the backup is saved on a ReFS
formatted destination. This means all of the data will be backed up each time the backup job runs.
Use Rsync to back up data straight from the file system. This will make sure that the data is in the
smallest data blocks, resulting in the fastest possible backup.
Simultaneous backups
With Rsync, simultaneous connections may become unreliable with heavy transfer loads. It is therefore
recommended that you limit connections to your own server to 5 at any one time. Depending on data
storage requirements and the bandwidth speeds available, you may increase this number with caution.
This table outlines what attributes are preserved with the NTFS metadata option:
The file system is backed up via Windows Imaging, which results in a 50GB .vhd file.
Rsync will detect that the single .vhd file has changed, and needs to determine the in-file deltas. It
needs to calculate checksums on 50GB of data, which may take hours. Additionally, we have found that
even if the underlying file system changes very little, about 10% of a .vhd file changes from day to day
and needs to be transferred. So, about 5GB will be transferred.
We see here that it is greatly preferable in terms of bandwidth and CPU time the operate Rsync on the
underlying file system rather than a backup of that file system.
We have run tests on several different file systems – a typical file system of 70,000 files and 24 GB with
fewer than 50 MB of daily changes can be synced in around 10 minutes. The largest file system we‟ve
tested is of 200,000 files and 100 GB, which took 20 minutes to sync minimal changes.
Note: If you enable or disable encryption for an Rsync job, BackupAssist will need to re-seed the backup to the
host with a full set of data (i.e. the next backup will be a full backup).
Note: Data compression has a significant performance impact when encrypting your files for Rsync.
BackupAssist‟s settings can be entered and modified using the selections available in the Settings tab.
Clicking on the Settings tab will display the selections as icons. Four of these are used when creating
new a backup job and each one is described below:
A video explaining the creation of a backup user identity can be found on our, Videos Webpage.
Network paths
This option allows you to enter access credentials for networks, domains and drives that the default
account (specified in the Backup user identity) does not have access to. Enter or browse to the location
and add it to the Path list. The Edit option will allow you to enter an authentication account, specifically
for that path. When you create a backup job to a remote location, that location will be automatically
added here.
Having multiple connections to a resource using the same logon credentials can generate a Windows
error, such as the BA260 NAS error. It is therefore recommended that you avoid having mapped shares
on the computer running BackupAssist that are the same as the paths configured in BackupAssist.
1. Select the Backup tab, and click Create a new backup Job
If this is the first time you have created a backup job, you will be asked to provide a Backup user
identity if one has not been defined. See the section above, BackupAssist settings, for guidance.
3. Selections: The selections screen is used to select the data and applications that you would like to
back up. Any VSS applications detected will be displayed here as application directory containers.
An Exchange VM Detection tab will be available if you are backing up an Exchange VM guest.
Select the volumes, folders, files and applications that you want to back up, and click Next.
4. Destination media: The destination screen is used to select the type of media that you want to
back your data up to. This step‟s name will change to “Rsync”, when you click next.
Select Rsync or S3Rsync for your backup destination, and click Next.
The S3Rsync option is for backups to Amazon S3 via the s3rsync.com service.
Select Enable Rsync file based encryption if you want the backup data to be encrypted before
being transmitted.
For more information about creating custom schedules, refer to the Backup tab whitepaper.
6. Set up destination. The screen is used to configure your Rsync destination. The configuration
screen displayed will depend on whether Rsync or S3Rsync was selected.
IF the standard Rsync Destination was selected, follow the guidelines below:
b. Server Type: Select Rsync over SSH, Rsync Daemon or Rsync Daemon over SSH tunnel.
c. Port: The default port will display for the server type selected.
d. Path on server: It is best to use a new, empty directory for this path. The parent directory must
exist although the sub directories will be created when the job is first run:
/parent/sub_directory/.
If your data host is running Windows, you can enter a normal Windows path here, such as
“C:\Backups”. You can also enter a path relative to the user‟s home directory by starting
with a tilde ( e.g. “~/Backups”).
If your data host is running Linux, you can use an absolute path by starting with a slash or
a path relative to the user‟s home directory by starting with a tilde (e.g. “~/Backups”).
e. Username: Enter the username that was activated while setting up your Rsync host.
f. Register with server: Select this option and you will be prompted to enter the password.
BackupAssist will then create a public/private key pair to authenticate you to the data host.
g. Test connection: Click to test your connection to the Rsync server. If this step fails but the
registration succeeded, it is probably that the Path on server cannot be accessed.
a. Rsync Server: This should be farm.s3rsync.com (the default setting) unless you have been
advised otherwise by s3rsync.com.
c. Amazon S3 bucket: You can leave this blank unless you want to set up multiple backup jobs
using the same bucket (not recommended).
d. Set Path: Specify any folders you have created in the bucket.
g. S3rsync username: Your username supplied by s3rsync.com (note: this is different to your
Amazon username).
h. S3Rsync SSH key path: The location of the saved SSH key file provided by S3rsync.com.
i. If you selected Enable Rsync file based encryption, you will be prompted to create a password.
Note: It is important that you keep a copy of your password in a safe place, as we cannot retrieve
passwords if they are lost or forgotten.
For information on configuring S3Rsync, see the Third Party data host: setting up S3Rsync section
of this whitepaper.
7. Notifications: Once a backup job has completed, BackupAssist can send an email to inform
selected recipients of the result. This email notification can be enabled during the creation of a
backup job, if the mail server has been configured.
After the backup job has been created, you can modify the notifications by adding and removing
recipients, setting additional notification conditions and including print and file notification types.
To learn more about notification options, see the BackupAssist Backup tab whitepaper.
8. Prepare media: This step will be skipped because Rsync backups do not use removable media.
9. Name your backup: Provide a name for your backup. Click Finish.
The File Protection with Rsync backup job has now been created.
Important: Once a backup job has been created, it should be reviewed and run using the Manage
menu. This menu provides additional options to configure your backup. See the section, File Protection
using Rsync backup management, for more information.
Important: Once a backup job has been run and a backup created, a MANUAL test restore should be
performed to ensure the backup is working as intended. To perform a test restore, refer to the section,
Restoring from a File Protection backup.
To restore data from a File Protection backup, start BackupAssist and follow these steps:
2. From the Home page, select the type of restore you want to perform. When you select one of the
restore categories provided, BackupAssist will locate the corresponding backups for you.
Files and folders will display all data backups and all VSS application backups.
Applications will display backups that contain VSS applications, and exclude data only backups.
Exchange, SQL or Hyper-V, will display all backups that contain the selected application.
Selecting an application type will display application specific restore tools (e.g. Hyper-V
Granular Restore and SQL Restore) as well as the Restore Console.
3. Once you have selected the type of restore you want to perform, the Home page will display all
backups catalogued by BackupAssist that match your selection. The backups will be grouped by
the backup‟s source location, and by the restore tool that can be used.
If a backup can be used by two restore tools, it will appear in two groupings.
If a backup contains data from multiple locations, it will appear in a grouping for each location.
If your backup included both data and VSS applications, both will be available to restore once the
backup has been loaded in step 4, regardless of the restore type selected.
The BackupAssist Restore Console will open and load all of the backups that were listed on the
Home page. The next step is to locate the data you want to restore, from the loaded backups.
The Browse tab. Select this tab if you know the backup and date you wish to restore from, or if
you need to restore an entire backup set.
a. Use the drop-down menu to choose the backup that you want to restore from.
b. Use the calendar to select the date you want to restore from.
c. Use the middle panes to expand the backup set.
d. Select the data to restore.
e. Click Restore to at the bottom right of the window.
The Search tab. Select this tab to search all of the loaded backups for the data you want to
restore. You can display data filtered by name, date, size and type, for all backups. The results
can be compared (e.g. the dates of two files) to identify the correct data selection.
a. Enter your search term (The search accepts wild card searches, such as *.log or *.doc).
b. Select a filter/s if required.
c. Click the Search button.
d. Select the data to restore.
e. Click Restore to at the bottom right of the window.
If the backup is not present, or if you wish to load additional backups, select the Load backups
option. Click Load all known backups to load all backup catalogues.
For more information about data selection, refer to the Restore tab whitepaper.
b. Review Restore to: Leave the Original location selected or chose an Alternative path.
Restoring to an alternate location will use a minimal path. For example, restoring a single file
to an alternate location will copy the file to the location without re-creating the original folder
structure.
d. Selecting Create a log file listing all processed files, will create a file that lists the success or
failure of each file. The log is opened by selecting the log file‟s link in the backup report.
e. Queue all backup jobs when a restore is running, is selected by default.
f. Click the Restore button.
The Restore Console will connect to your Rsync host and restore the selected files.
The restore will run from the destination window and a Report link will appear once the
restore has finished.
g. Select Done.
Your File Protection using Rsync restore has now been completed.
Important: Only backups made with BackupAssist v5.3 or later will show up in the Restore Console.
Important: The Restore Console can restore encrypted files, but you will need to supply the password.
It is important that you keep a copy of your password in a safe place, as we cannot assist you with
opening password encrypted files if your password is lost or forgotten.
Helpful hint: These instructions explain how to restore data using the BackupAssist Restore console. If
you do not have BackupAssist installed and need to restore a File Protection backup, you can manually
browse the Rsync destination and transfer data back using any method permissible by your host.
File Protection using Rsync
21
© Cortex I.T. 2001-2015 Whitepaper: Version Jan 29 2014
9. File Protection using Rsync backup management
Once you have created a backup job, you can modify the settings and access advanced
configuration options using the Manage menu..
To learn more about the backup management options, see the Backup tab whitepaper.
2. You will be prompted to Rerun a past backup or to Run a future backup now.
3. When the backup job starts, the screen will change to the Monitor view.
4. Once the backup has been completed, select the Report button and review the results.
Seeding your backup via a slow internet connection may not be practical, so two methods are
provided here to seed your data host. Once the initial seed to the data host is complete, each
successive backup will be an incremental backup of data that has changed.
You can use BackupAssist to automatically seed data offsite using a removable media, which can be
physically transported to the data host so the data uploaded locally. Seeding your data using this
method is simple:
3. Click the Seed backup button and select the location of an empty folder on your portable media.
5. Transport the portable media containing the seed to the site where your Rsync server is located.
6. Connect the device to the Rsync server and copy the seed to it:
For a Windows server (assuming the seed is located on E:\SeedFolder)
a. Go to the Start menu > CopSSH > Start a Unix BASH shell.
b. Enter the following command: bash "/cygdrive/e/SeedFolder/seed.sh".
VSS applications are displayed under the Files and applications menu item. You can modify your
backup job by selecting entire VSS applications or drilling down to individual components. In some
cases, only applications that are currently running will be detected. If an application is not listed, try re-
starting it and then click the Refresh button in BackupAssist.
Rsync options
Select the Rsync options item from the left hand menu of the Manage jobs screen. The Rsync options
item contains 15 different configurations for backing up your data across the internet.
Similarly, if you have any suggestions for additional functionality in BackupAssist, or new products or
add-ons, please also forward your feedback to [email protected]
Troubleshooting FAQ
Test connection failed: Ensure that you are able to ping your Rsync server from your BackupAssist
server and that you have opened up the appropriate ports on your firewall. Make sure that the
username can access the path you have specified.
SSH Connection Refused: Ensure that the services Openssh SSHD and RsyncServer are started on the
data host machine (Administrative Tools > Services). Make sure your firewall is not blocking the
attempt.
Register with server failed: Ensure that you have the correct username and password set up on your
Rsync server.
Appendix
Data host: The server that has been set up to host backup data.
Client: The machine that BackupAssist is installed on, that sends data to the data host.
SSH Authentication: For SSH communication, we use a public / private key method of authentication,
meaning that you will only be asked for your password once (when registering with the server), and
your public key will be uploaded to the server, enabling BackupAssist to log into the server in the
future in a secure, password-less manner. For more information on public / private key authentication,
visit the following Wikipedia article: Wikipedia Public Key Cryptography
For example, selecting the Learn more about Backup link will open the Welcome Screen with the
Backup introduction selected. This screen provides an overview of the tab‟s functions and features, and
links to documentation and resources.