rackAID Logo

Industry Solutions Resources » Linux Tutorials » General Tutorials » Using rsync with SSH
Search:

Resources

Linux Tutorials

Using rsync with SSH

Need a secure backup plan?

The issue of server backup is quite comes up again and again. There are actually many very good solutions from a variety of companies, but with software costing in the $1000's, they are out of reach for small web hosts, service providers and other. If you are on a smaller budget, then there is a good solution for you. Using rsync with ssh.

What is rsync? Why use SSH?

What is rsync?

From the rsync web site, "rsync is a file transfer program for Unix systems. rsync uses the 'rsync algorithm' which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand."

The description may sound somewhat confusing, but basically rsync examines files for differences and then only sends the changes over to the backup copy. This technique reduces the amount of transfer, thus saving bandwidth and time. For you technical types, you can dig into the rsync algorithm for more details. For you non-techies, just know that rsync is a an easy way to backup your data.

What's this about SSH?

One issue about rsync is that it does not encrypt data. This means that any data you send via rsync could be captured by someone. Packet sniffing can be quite easy with the right tools and recompiling entire files is not that difficult. Rsync does include in an option to specify an alternate shell program to use. For security, we use SSH to encrypt the data transfer, thus providing the security of ssh with the benefits of rsync.

Installing rsync

Although you may already have a copy of rsync, I highly recommend installing rsync from source. Rsync has had a few security issues (mostly with using the default rsh or anonymous modes) and by using the source you will get the latest version available. Also, since rsync is a stand alone binary, the source file works great -- there are no start-up scripts or other such configurations that may be in an rpm version but not the source version.

First you need to get the rsync source from the rsync web site. Or you can just grab the source directly: rsync source. Always use the latest stable version.

Now that you have rsync, you need to build it. For most people the standard options are fine, so installation is just a three commands:

Configure:

[root@gigan rsync-2.5.5]# ./configure

Make:

[root@gigan rsync-2.5.5]# make

Install:

[root@gigan rsync-2.5.5]# make install

Version Confusion

Note if you have already have a version of rsync from an rpm, it will most likely be in /usr/bin/rsync on RedHat systems. The source install is in /usr/local/bin. Either always specify the path or remove the old one. To see which rysnc is in your path use:

[root@gigan rsync-2.5.5]# which rsync
/usr/bin/rsync

In this case, I already had a version of rsync from an rpm install. To get the version, use:

[root@gigan rsync-2.5.5]# /usr/bin/rsync --version
rsync version 2.4.6  protocol version 24

Written by Andrew Tridgell and Paul Mackerras

The version in my path is the old one. I need to use the one in /usr/local/bin explicitly.

[root@gigan rsync-2.5.5]# /usr/local/bin/rsync --version
rsync  version 2.5.5  protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others

Capabilities: 64-bit files, socketpairs, hard links \
IPv6, 64-bit system inums, 64-bit internal inums \

rsync comes with ABSOLUTELY NO WARRANTY.  

[root@gigan rsync-2.5.5]#

As you can see, the source version I just installed is in /usr/lo ca/bin/rsync, so I need to type to full path to use it or remove the other version. (Note, during the configure you can set the PREFIX variable to put rsync in /usr/bin, but PREFIX's and configure are another topic.)

Transferring Data with rsync and SSH

Now that we have rsync installed, transferring files is easy. In the example below, I am going to move a tar file from one server to my backup server using SSH to encrypt the data.

 

/usr/local/bin/rsync --archive --rsh=/usr/bin/ssh \
--verbose backup.tar admin@www.domain.com:./
admin@www.domain.com's password:

I have used a couple of options here. I have the "archive" option which both preserves permissions and acts recursively. If backup.tar had been a directory instead of a file, then the entire directory would have been copied. I also set the "rsh" flag to use ssh instead of rsh. By doing the later, I use ssh as the transfer agent, thus securing the data transfer. The "verbose" flag just tells rsync to be noisy about what it is doing. Lastly, is the target information: "admin@www.domain.com:./". The part preceding the colon specified the user and target host -- just like using ssh. The part after the colon specifies the target location. In this case, it is the admin's home directory, but any path could be used. After entering this command, you are prompted for the password. If this is the first time you have connected to domain.com, then ssh will prompt you about accepting a host key, which you should accept. Once you enter the password, the transfer will occur.

building file list ... done
backup.tar
wrote 1433875 bytes  read 36 bytes  13338.71 bytes/sec
total size is 1433600  speedup is 1.00

Here you can see the transfer has happened. When you first transfer a file, rsync saves neither bandwidth or time because the file is not on the target server. However, if you immediately run the command again, you will see rsync at work:

building file list ... done
wrote 63 bytes  read 20 bytes  18.44 bytes/sec
total size is 1433600  speedup is 17272.29

Notice the speedup value and the wrote/read bytes setting. This give you and indication of how much rsync has helped. Since the source and target files are identical, the speed up is very high. The only information transferred is due to the overhead of the rsync algorithm itself. Even if the file only has slight changes, you can routinely see 30-50% bandwidth savings. Saving bandwidth also means that the backup completes faster, thus allowing your system to handle other tasks.

Automating Rsync

Backing up your files is a real boring task, a task well suited to automation. To automate data transfer, you have to setup SSH keys between the two servers. We will cover SSH keys in another rackTIP, but for now, you should have the basics to start using rysnc to transfer data files manually, and if you know how to setup SSH keys, then you can write a script or set up a cron job to automate your transfers.

For more examples of how to use rsync, see the examples section at the rsync web site. Rsync is also a great tool for server migration and mirroring. We have some clients that have redundant servers, we use rsync combined with IP takeover to provide an inexpensive solution to server redundancy.

Automated Backups

Now that you can use rsync, you just need to write a script to backup the file(s) that you want. You can then setup your ssh keys and automate the backup via cron. That is how our backup service works.

Push vs. Pull

You can use rsync to both send and retrieve data by simply reversing the destination and target information:

This command sends the file to the target server:

/usr/local/bin/rsync --archive --rsh=/usr/bin/ssh \
--verbose backup.tar admin@www.domain.com:./

While this command, fetches the file to your local machine:

/usr/local/bin/rsync --archive --rsh=/usr/bin/ssh \
--verbose admin@www.domain.com:./backup.tar ./

Deciding rather to push or pull depends on what type of permissions your require. For security, it is better to have root push data to a non-root account than have login as root and pull the data to your machine. Although SSH provides a secure transfer medium, you should always try to limit sending root password over the network.

Compression

One thing to note is about gzip compression. Gzip compression may actually increase the amount of data transferred in some cases. This is due to an issue of how the gzip and rsync algorithms process data. For text based files and general web server files, compression is significant, so compressing the data first will often prove beneficial. For media like mp3, mpeg, gif, jpg and other files that are already compressed, rsync may work better without compressing the data first. Only by testing can you determine whether or not to compress your data.

Guru Tip

Rsync can be very efficient and can saturate your network connection if you have limited bandwidth. Fortunately, rsync has built-in bandwidth limiting via the "bwlimit" flag. By setting this flag you can place a cap on the bandwidth that rysnc uses. This is great if you are on broadband; you can limit the backup to 50-70% of your bandwidth and use the remainder to keep on working.

Monthly Server Management
©2000-2008 rackAID LLC