Move or Copy Data

Copying data between file systems is a common data management task that nearly all users need to perform frequently.  The cp command is convenient to use when the amount of data and/or number of files to copy is small and can finish quickly.  However, cp is not tolerant of being interrupted, and can be unwieldy and error prone to use for copying large directory trees and can take a long time to complete.  For such tasks, it is desirable that the copy process be convenient, resilient, and verifiable. This “How To” explains how the screen and rsync utilities can be used to achieve these goals when moving or copying your data.

Screen

The screen command provides virtual terminal sessions in which one can run interactive commands.  In a screen session, you can start a long-running command,  detach from the session, and reattach to it at a later time – even after logging out and logging back in – even if your network connection is disrupted.

Rsync

The rsync command is an efficient and versatile copying tool that can be used with local or remote sources and destinations.  By default, rsync compares the modification times and file sizes for all files in the source and destination and transfers only what it finds to be different. This is hugely advantageous relative to cp since it will only copy the data that actually needs to be copied rather than blindly copying everything.  It also means that if the operations is interrupted, the can be restarted where it left off. One can also run the rsync command to verify that all files from a previous rsync or cp were correctly copied.

Screen + Rsync

Let’s put these two commands together in an example to illustrate how a recursive copy of a large directory tree can be achieved in a convenient and fault tolerant manner.  In this example, user jane wants to copy the contents of her /scratch/lfs/jane/abcd directory to her long-term storage directory located at /lfs/group/jane, and she expects the process to take a long time.  To do this, jane would log into an interactive node, dev1 for example, and run the following commands.

$ screen
$ rsync -av /scratch/lfs/jane/abcd /lfs/group/jane/

The first command will start a virtual terminal session in which interactive commands may be run. Then, rsync will first build a list of files to be transferred so there may not be any immediate output but, the command will be running. At this point jane can detach from the screen session with Control-a d, shut down her laptop, and go home if she wants. When she is ready to check on the progress of her copy process, she can log back into dev1 and reattach to her existing screen session by typing screen -r. When the copy is complete she can verify that all her files are copied correctly by running the same rsync command again.

Moving vs. Copying

The example above will “copy” your data leaving you with the original copy in /scratch/lfs/jane and a new, identical copy, in /lfs/group/jane. Sometimes, the goal is to actually “move” the data which implies a “copy” followed by a “delete” at the source.  Rsync will do this for you if you use the --remove-source-files option. Be aware that this will delete all the files on the sending side of the transfer but will not remove the directories. In essence, you will be left with an empty directory tree at the source of the copy which can subsequently be quickly removed with rm -rf.

$ screen
$ rsync -av --remove-source-files /scratch/lfs/jane/abcd /lfs/group/jane/
$ rm -rf /scratch/lfs/jane/abcd

Learning More

Screen Quick Reference

Rsync Man Page

Screen Man Page