10. How do I do parallel rsync?
Doing normal rsync is quite simple. In the example below the files are copied from a remote server to a local directory. The flags -a are for archive, -v are for verbose and
If you know you have hard links, access control lists, extended attributes or sparse filesyou may want the flags -HAXS as well. This is usually not the case.
rsync -av --progress username@server:/directory/. /local/directory/.
This rsync is however done by a single processes and may be hitting bottlenecks in for example single CPU core performance, file system metadata latency or perhaps single process network transfer. Running several rsync processes in parallel may improve performance.
To do parallel rsync initiating from sender (pushing data), use the command parsync - a parallel rsync wrapper for large data transfers. Parsync has been packaged in the UUPEL repository.
parsync --maxload=16 --NP=12 --startdir=/tank/MOL-EXTBMC EXTBMC root@bmc-pcfs3:/tank/bmc-pcfs2.bmc.uu.se/tank/MOL-EXTBMC
In this example the directory EXTBMC in the directory /tank/MOL-EXTBMC will be synced to the host bmc-pcfs3 into the destination directory /tank/bmc-pcfs2.bmc.uu.se/tank/MOL-EXTBMC. To make parallel rsync initiating from the receiver (pulling data) is harder. One way of solving this is to:
- Do a initial rsync deleting all files and directories to be deleted.
- Sync all directories and no files.
- Parallel sync of all directories syncing the files in each directory.
- Do a final rsync.
U=username H=host SRC=/source/directory DST=/destination/directory # Delete everything that is supposed to be deleted. rsync -r --delete --existing --ignore-existing $U@$H:$SRC/. $DST/. # Sync directories but no files. rsync -a -f"+ */" -f"- *" $U@$H:$SRC/. $DST/. # For every directory, sync the files in that directory. Run 10 in parallel. find . -type d -print0 | xargs -P 10 -I {} -0 rsync -vlptgoxSH $U@$H:$SRC/{}/\* $DST/{}/. # Run a final sync with delete to make sure everything is ok. rsync -aSH --delete $U@$H:$SRC/. $DST/.