NSLU2-Linux
view · edit · print · history

How To Backup Your Windows Files Remotely

Since a couple of weeks I have an automated setup that does incremental backups of all important data on my windows box and sends it to my nslu2 on a remote location.

About This How-To

The nslu2 is my first (but certainly not my last) Linux box, and I am a bash scripting novice. Therefore, many of the things I describe here can undoubtedly be done in a more efficient way. Please correct me where I go wrong.

It has taken me a few weeks of self-study and trial and error to get this setup to work the way I want it. I encourage the reader to work in the same way. For that reason, I will not give a detailed outline of how to set things up for your system. Instead I will point to more information and leave it up to you to to the reading and trying.

This how-to will give you the following:

  1. Automated backup performed daily with no user interaction.
  2. Incremental backups, meaning additional backups will take virtually no space at all if no changes have been made on the source machine.
  3. Configurable number of backup increments.
  4. A neatly packaged report e-mailed to you after every backup.

About the Setup

The source machine is a desktop Windows box in my home running Windows XP Professional. At any given time, it may be on, off or standby (S3). The remote machine is the nslu2 running Debian. It is placed off-site in my employers server room and is on 24/7.

The setup depends on three bash shell scripts, one on the source side and two on the remote side. Backup is started by executing run_remote_jobs on the source side. run_remote_jobs will then, via ssh, call the script prepare_backup on the remote side. prepare_backup will prepare backup dirs to receive a new increment (e.g. shift daily.0 to daily.1). After that, run_remote_jobs will perform the actual backup jobs (listed in the file jobs.txt). Finally, run_remote_jobs will call the remote script finish_backup, which will package and e-mail a report to me.

Step 1: Getting Cygwin

Since rsync is primarily a *nix tool, you need to get a Windows port. Of course, you could try a minimalistic approach like cwRsync (http://www.itefix.no/cwrsync/) or DeltaCopy? (http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp(approve sites)), but I advise you not to. The reason is that none of them gives you the tremendous benefit of bash scripting. Instead, install Cygwin (http://www.cygwin.com/) on your machine. It's tried, tested, maintained and works. At least as far as I can tell. Oh, and install the rsync and shutdown packages while you're at it.

Step 2: Talking to Your NSLU2 Without a Password

I followed the nice tutorial available at http://troy.jdmz.net/rsync/index.html(approve sites). But since I am not completely security paranoid I didn't follow all the steps.

Step 3: Understanding rsync

Read the man pages at http://rsync.samba.org/(approve sites). Yes, all of it! If you don't have rsync, just run aptitude install rsync as root.

Step 4: Scripting It

Mike Rubel has an excellent tutorial (http://www.mikerubel.org/computers/rsync_snapshots/) on how to script incremental backups with rsync. I've ripped many ideas from him.

There are a few different setups to consider when backing up with rsync.

  1. Use rsync in daemon mode, either
    • on the source side, pulling the backup to the remote, or
    • on the remote side, pushing the backup to the remote.
  2. Use rscync daemon features via a remote shell connection
    • can also be done push or pull.

We will use the latter setup in a push fashion.

Now, the scripts below are presented rather bluntly. This does not mean that I read bash scripts like others read the morning paper. With Google, the Advanced Bash Scripting Guide (http://tldp.org/LDP/abs/html/) and a few very basic ideas in programming, you can pull this off as well.

The aim of the scripts is to give this:

  • 7 daily backups
    • daily.0
    • daily.1
    • ...
    • daily.6
  • 4 weekly backups in the same fashion
  • 3 monthly backups in the same fashion
  • 4 quarterly backups in the same fashion
  • Extensive logs saved on the nslu2
  • A compressed log e-mailed to me after every run

That is a total of 18 increments and a whole year to regret mistakes. I've found that an extra increment uses only roughly 30 Mb for 80.000+ files if there are no changes (this is due to hard-linking the unchanged files). Since 30 Mb isn't much these days, one can easily afford more than 18 increments. Just tweak the scripts to suit your needs.

Source Side Scripts

On the Windows side you may end up with something like this.

This is script run_remote_jobs.sh, performing all rsync jobs on the source machine.

#!bin/bash

# ------------- Variables ----------------------------------------------
TIME="/usr/bin/date +%T"
RM=/usr/bin/rm

JOBS=`cat /home/Björn/rsync/jobs.txt`
DAYDATE=`date +%Y%m%d`
LOGFILE=/home/bjorn/backup/logs/"$DAYDATE"_log.log
LOGFILE_STD=/home/Björn/rsync/"$DAYDATE"_std.log
LOGFILE_ERR=/home/Björn/rsync/"$DAYDATE"_err.log
RSYNC_PATH="/usr/bin/rsync --log-file=$LOGFILE --log-file-format=%o\ %i\ %n"
SSH_COMMAND="/usr/bin/ssh -i /path/to/key"
SCP_COMMAND="/usr/bin/scp -i /path/to/key"
LOCATION=remote.example.org
BACKUP_DEST=$LOCATION:/home/bjorn/backup

# Counters for the loop below
COUNTER=1
SRC_COUNTER=1
DEST_COUNTER=1

# Assign every source and destination its own variable SOURCE_no and DEST_no,
# as well as a variable DEST_ALL containing all sources
for i in $JOBS ; do
    if [ `expr $COUNTER % 2` -eq 1 ] ; then     # Do modulus check to identify even/odd args.
        eval "SOURCE_$SRC_COUNTER=$i"
        let SRC_COUNTER=SRC_COUNTER+1
        else
        eval "DEST_$DEST_COUNTER=$i"
        DEST_ALL="$DEST_ALL $i"
        let DEST_COUNTER=DEST_COUNTER+1
    fi
    let COUNTER=COUNTER+1
done

# ------------- The script itself --------------------------------------
# Redirect std_in and std_out
exec 4>&1
exec 6>&2

exec 1>>$LOGFILE_STD
exec 2>>$LOGFILE_ERR

# Allow network to recover after S3 standby
sleep 30

# Prepare dirs on nslu2 for incremental backup
$SSH_COMMAND $LOCATION /home/bjorn/backup/scripts/prepare_backup.sh $DEST_ALL

# Perform backup jobs
for ((i = 1 ; i <= $(($SRC_COUNTER - 1)) ; i += 1)) ; do
    eval "TMP_SOURCE=$"SOURCE_$i""
    eval "TMP_DEST_SHORT=$"DEST_$i""
    eval "TMP_DEST=$BACKUP_DEST/$TMP_DEST_SHORT/daily.0"
    RSYNC_COMMAND="/usr/bin/rsync -a --no-p --chmod=ugo=rwX --stats \
                  -e \"$SSH_COMMAND\" --delete --modify-window=1 \ 
                  --link-dest=/home/bjorn/backup/$TMP_DEST_SHORT/daily.1 \
                  --rsync-path=\"$RSYNC_PATH\""
    echo
    echo "############### Starting remote job $i of $(($SRC_COUNTER - 1)) \
          ($TMP_DEST_SHORT) ###############"
    eval "$RSYNC_COMMAND \"$TMP_SOURCE\" \"$TMP_DEST\""
done

echo
echo "Backup jobs finished at `$TIME`."

# Restore std_in and std_out
exec 1>&4 4>&-
exec 2>&6 6>&-

# Transfer log file(s)
$SCP_COMMAND $LOGFILE_STD bjorn@$LOCATION:/home/bjorn/backup/logs/"$DAYDATE"_std.log

# Transfer std_err file only if size > 0
if [ -s $LOGFILE_ERR ] ; then
    $SCP_COMMAND $LOGFILE_ERR bjorn@$LOCATION:/home/bjorn/backup/logs/"$DAYDATE"_err.log
fi

# Delete log file(s)
$RM $LOGFILE_STD
$RM $LOGFILE_ERR

# Order nslu2 to create and send e-mail report
$SSH_COMMAND $LOCATION /home/bjorn/backup/scripts/finish_backup.sh

# Re-suspend machine (requires Cygwin "shutdown" package)
shutdown -p now

exit 0

The files jobs.txt, used by run_remote_jobs.sh, lists all rsync jobs to be run. The first argument of each line is data on the Windows machine. The second argument is the location for the data on the nslu2.

This is file jobs.txt.

"/cygdrive/e"                           dokument
"/cygdrive/c/Docume~1/Anna/Favori~1"    favoriter.anna
"/cygdrive/c/Docume~1/Bjrn~1/Favori~1"  favoriter.bjorn
"/cygdrive/f"                           lagring
"/cygdrive/c/Docume~1/Anna/Skrivb~1"    skrivbord.anna
"/cygdrive/c/Docume~1/Bjrn~1/Skrivb~1"  skrivbord.bjorn

Remote Side Scripts

On the nslu2 side you may end up with something like this.

This is script prepare_backup.sh, called from the source machine via ssh.

#!/bin/bash

# ------------- Variables ----------------------------------------------
RM=/bin/rm
MV=/bin/mv
LS=/bin/ls
GREP=/bin/grep
TAIL=/usr/bin/tail
MKDIR=/bin/mkdir
RSYNC=/usr/bin/rsync

NR_OF_DAILY=7
NR_OF_WEEKLY=4
NR_OF_MONTHLY=3
NR_OF_QUARTERLY=4

DAY=`date +%d`
MONTH=`date +%m`
DOW=`date +%u`
DAYDATE=`date +%Y%m%d`
LOGFILE_TMP=/tmp/"$DAYDATE"_tmp.log

# Which to shift?
if [[ $DOW -eq 1 ]] ; then
    SHIFT_WEEK=true
    if [[ $DAY -le 7 ]] ; then
        SHIFT_MONTH=true
        if [[ $MONTH -eq 1 ]] || [[ $MONTH -eq 4 ]] || \
           [[ $MONTH -eq 7 ]] || [[ $MONTH -eq 10 ]] ; then
            SHIFT_QUARTER=true
        fi
    fi
fi

# ------------- The script itself --------------------------------------
# Perform first monday of every quarter
if [[ $SHIFT_QUARTER = true ]] ; then
    echo "First monday of quarter. Shifting quarterly." >> $LOGFILE_TMP

# Do for each backup job
    for job in $@ ; do
        SNAPSHOT=/home/bjorn/backup/$job

# Find latest monthly snapshot
        LATEST_MONTHLY=`$LS $SNAPSHOT | $GREP monthly | $TAIL -1`

# Shift snapshots
# Step 1: delete the oldest snapshot, if it exists
        if [ -d $SNAPSHOT/quarterly.$(($NR_OF_QUARTERLY - 1)) ] ; then
            $RM -rf $SNAPSHOT/quarterly.$(($NR_OF_QUARTERLY - 1))
        fi

# Step 2: shift the other snapshots(s) by one, if they exist
        for ((quarter = $(($NR_OF_QUARTERLY - 2)) ; quarter >= 0 ; quarter -= 1)) ; do
            if [ -d $SNAPSHOT/quarterly.$quarter ] ; then
                $MV $SNAPSHOT/quarterly.$quarter $SNAPSHOT/quarterly.$(($quarter + 1))
            fi
        done

# Step 3: move the latest monthly snapshot up to quarterly
    $MV $SNAPSHOT/$LATEST_MONTHLY $SNAPSHOT/quarterly.0
    done
fi

# Now, do the same for months on first monday of every month
if [[ $SHIFT_MONTH = true ]] ; then
    echo "First monday of month. Shifting monthly." >> $LOGFILE_TMP

    for job in $@ ; do
        SNAPSHOT=/home/bjorn/backup/$job
        LATEST_WEEKLY=`$LS $SNAPSHOT | $GREP weekly | $TAIL -1`

        if [ -d $SNAPSHOT/monthly.$(($NR_OF_MONTHLY - 1)) ] ; then
            $RM -rf $SNAPSHOT/monthly.$(($NR_OF_MONTHLY - 1))
        fi

        for ((month = $(($NR_OF_MONTHLY - 2)) ; month >= 0 ; month -= 1)) ; do
            if [ -d $SNAPSHOT/monthly.$month ] ; then
                $MV $SNAPSHOT/monthly.$month $SNAPSHOT/monthly.$(($month + 1))
            fi
        done

    $MV $SNAPSHOT/$LATEST_WEEKLY $SNAPSHOT/monthly.0
    done
fi

# And for weeks on every monday
if [[ $SHIFT_WEEK = true ]] ; then
    echo "Monday. Shifting weekly." >> $LOGFILE_TMP

    for job in $@ ; do
        SNAPSHOT=/home/bjorn/backup/$job
        LATEST_DAILY=`$LS $SNAPSHOT | $GREP daily | $TAIL -1`

        if [ -d $SNAPSHOT/weekly.$(($NR_OF_WEEKLY - 1)) ] ; then
            $RM -rf $SNAPSHOT/weekly.$(($NR_OF_WEEKLY - 1))
        fi

        for ((week = $(($NR_OF_WEEKLY - 2)) ; week >= 0 ; week -= 1)) ; do
            if [ -d $SNAPSHOT/weekly.$week ] ; then
                $MV $SNAPSHOT/weekly.$week $SNAPSHOT/weekly.$(($week + 1))
            fi
        done

    $MV $SNAPSHOT/$LATEST_DAILY $SNAPSHOT/weekly.0
    done
fi

# Days shall always be shifted
for job in $@ ; do
    SNAPSHOT=/home/bjorn/backup/$job

    if [ -d $SNAPSHOT/daily.$(($NR_OF_DAILY - 1)) ] ; then
        $RM -rf $SNAPSHOT/daily.$(($NR_OF_DAILY - 1))
    fi

    for ((day = $(($NR_OF_DAILY - 2)) ; day >= 0 ; day -= 1)) ; do
        if [ -d $SNAPSHOT/daily.$day ] ; then
            $MV $SNAPSHOT/daily.$day $SNAPSHOT/daily.$(($day + 1))
        fi
    done

    $MKDIR $SNAPSHOT/daily.0
done

exit 0

This is script finish_backup.sh, called from the source machine via ssh.

#!/bin/bash

# ------------- Variables ----------------------------------------------
HEAD=/usr/bin/head
TAIL=/usr/bin/tail
CAT=/bin/cat
RM=/bin/rm

DAYDATE=`date +%Y%m%d`
LOGFILE_STD=/home/bjorn/backup/logs/"$DAYDATE"_std.log
LOGFILE_ERR=/home/bjorn/backup/logs/"$DAYDATE"_err.log
LOGFILE_LOG=/home/bjorn/backup/logs/"$DAYDATE"_log.log
LOGFILE_TMP=/tmp/"$DAYDATE"_tmp.log

# ------------- The script itself --------------------------------------
# Append info to log file
echo >> $LOGFILE_STD
echo "Printout from df -h:" >> $LOGFILE_STD
df -h >> $LOGFILE_STD

# Create tmp file to send
$HEAD -1 $LOGFILE_STD >> $LOGFILE_TMP
$TAIL -9 $LOGFILE_STD >> $LOGFILE_TMP
echo >> $LOGFILE_TMP

# Check if errors were reported
if [ -e $LOGFILE_ERR ] ; then
    echo "The following errors were reported:" >> $LOGFILE_TMP
    $CAT $LOGFILE_ERR >> $LOGFILE_TMP
else
    echo "No errors were reported." >> $LOGFILE_TMP
fi

echo >> $LOGFILE_TMP

# Check for updated files and append to tmp file
echo "The following files were added, updated or deleted:" >> $LOGFILE_TMP
egrep [\<\>ch\.\*][fLDS][c\.\+\ \?][s\.\+\ \?][t\.\+\ \?][p\.\+\ \?] \
      [o\.\+\ \?][g\.\+\ \?][u\.\+\ \?][a\.\+\ \?][x\.\+\ \?] $LOGFILE_LOG >> $LOGFILE_TMP

# E-mail file to me
mail -s "Backup of plysch $DAYDATE" my.email.address@gmail.com < $LOGFILE_TMP

# Remove tmp file
$RM $LOGFILE_TMP

exit 0

Step 5: Further Automation

To completely automate things, we will take advantage of the fact that the Windows Task Scheduler has the ability to wake the machine from S3 standby or hibernate, and that BIOS settings allows us to wake the machine from the off state. BIOS settings differ from machine to machine, but most machines have this possibility. If you don't have it, simply never turn your machine off and use standby or hibernate instead. My machine is usually in standby after I have used it, and off after my wife has used it. That's why I use both strategies.

In the task scheduler, schedule the machine to run the file rsync.bat every morning at 02:30. Be sure to tick the checkbox "Wake the computer to run this task".

This is file rsync.bat.

c:
chdir C:\cygwin\bin
bash /home/Bj”rn/rsync/run_remote_jobs.sh

Then, in the BIOS, set your machine to boot every morning at 02:25. That way it will have plenty of time to boot before the task executes at 02:30. On my machine the BIOS will wake the machine only if it is off, not if it is in standby or hibernate. By using both the task scheduler and the BIOS setting, we can be sure that the machine wakes up, regardless of which state it is in.

Example E-mail Report

Each morning I receive an e-mail report looking like this. It gives me confirmation that everything works OK, and I can also see if my backup disk is getting full.

Example e-mail report.

from	root@nslu2
to	my.e-mail.address@gmail.com,
date	Thu, Mar 20, 2008 at 2:33 AM
subject	Backup of plysch 20080320
mailed-by	gmail.com

Backup jobs started at 02:30:15.
Backup jobs finished at 02:34:06.

Printout from df -h:
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             110G  889M  104G   1% /
tmpfs                  15M     0   15M   0% /lib/init/rw
udev                   10M   44K   10M   1% /dev
tmpfs                  15M     0   15M   0% /dev/shm
/dev/sdb1             459G  144G  293G  33% /home/bjorn/backup

No errors were reported.

The following files were added, updated or deleted:
2008/03/20 02:33:31 [12307] >f.stp.....foobar2000/Application Data/PlaybackStatistics.dat
2008/03/20 02:33:31 [12307] >f.stp.....foobar2000/Application Data/foobar2000.cfg
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/lyrics.xml
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/theme.fth
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000001.fpl
2008/03/20 02:33:31 [12307] >f.stp.....foobar2000/Application Data/playlists/00000002.fpl
2008/03/20 02:33:31 [12307] >f.stp.....foobar2000/Application Data/playlists/00000003.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000004.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000005.fpl
2008/03/20 02:33:31 [12307] >f.stp.....foobar2000/Application Data/playlists/00000006.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000007.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000008.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000009.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000010.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000011.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000012.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000013.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000014.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000015.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000016.fpl
2008/03/20 02:33:31 [12307] >f..tp.....foobar2000/Application Data/playlists/00000017.fpl
2008/03/20 02:33:31 [12307] >f.stp.....foobar2000/Application Data/playlists/00000018.fpl
2008/03/20 02:33:31 [12307] >f.stp.....foobar2000/Application Data/playlists/index.dat
2008/03/20 02:33:38 [12321] >f+++++++++Skrivb~1/Cygwin Bash Shell.URL
view · edit · print · history · Last edited by bjohv052.
Originally by bjohv052.
Page last modified on March 25, 2008, at 11:21 AM