Secure backups for your personal servers using duplicity and Amazon S3

I wanted to make secure backups of a few servers, but none of the guides I found online sufficed for my purposes. I wanted the following:

Only free software and well-known standardized encryption methods;
No proprietary formats, preferably restorable using just standard Linux command line tools;
Scalable, by not paying for stuff I do not use but also not having to constantly check if my storage is not full;
Secure against advanced actors;
Relatively cheap;

My solution is to use duplicity to make daily PGP-encrypted full backups to Amazon S3 storage. I have set up the S3 storage to regularly delete old backups, because we do not need to keep all backups forever. The S3 user that stores the backups has append-only access to the S3 storage. After initial setup, the server that is backed up only ever has the public PGP key of the backups. An optional second server functions as backup monitoring and a front desk for restoring individual files or entire backups.

This guide helps you implement this backup system yourself.

Why is this approach more secure?

There are many duplicity + S3 guides on the internet. This one takes a more roundabout route to arrive at a more secure result.

My risk analysis was:

An attacker may access the data in the backups at the storage provider or intercept them in-transit.
An attacker may compromise the server and access the data in the backups.
An attacker may compromise the server and destroy the backups.

To mitigate data access in storage or in transit, I use PGP encryption. To mitigate data access by server compromise, I use PGP encryption and do not provide the private key on the server. To mitigate data destruction by server compromise, I limit the access of the S3 user to append-only and let Amazon itself do the removal of old backups.

Prerequisites

You will need:

an Amazon S3 account;
a server to backup, running any OS that supports duplicity with PGP keys and S3 storage (I will use Debian in examples);
an optional second server that will function as backup monitor, running any OS that supports duplicity with PGP keys and S3 storage (I will use Debian in examples);
command-line root access to both servers.

Howto

This howto consists of the following steps:

Setting up an Amazon S3 bucket
Setting up duplicity on the server
Optional: set up a backup monitoring server
Instructions for daily use
Troubleshooting

Setting up an Amazon S3 bucket

Create bucket

Log in to your Amazon S3 account and create a new bucket. This bucket will contain your backups. You can share the bucket between different servers, using the same credentials for each. By using just one bucket and account, I avoid having to log into Amazon S3 for every server I want to add to the backup system.

Create the bucket in the region that suits you best. I choose Ireland, because it is close to my servers. This choice affects a setting later on. If you do not really care about the location of the data either way, choose Ireland to make it easier to follow this guide.

If the credentials of this account leak, an attacker could use them to store his own data in your bucket. If this is a relevant attack scenario for you, feel free to use a different bucket and account for each of your servers.

Set lifecycle policy

Select your bucket, click Properties and open the Lifecycle heading. Choose ‘Add rule’ and set a rule on the whole bucket to permanently delete all items after 6 days (for one week of backups).

Create Amazon IAM user

Switch to the Amazon IAM service. Click on ‘Users’ and ‘Create New Users’. Choose a username for the user, such as ‘backup-append’. Leave ‘Generate an access key for each user’ on. Click ‘Create’. Open the security credentials and copy-paste these to your system. You will need them later. Click ‘Close’ to go back to the user overview. Click on the new user and copy-paste his ARN to your system. The ARN starts with ‘arn:’ and ends with the username. You will need it later.

Grant the IAM user append access to the bucket

Go back to your new S3 bucket. Click ‘Properties’, open ‘Permissions’ and click ‘Add bucket policy’. Click the small link at the bottom to open the AWS Policy Generator.

Select ‘S3 Bucket Policy’ as the policy type. Set ‘Effect’ to ‘Allow’. Set Principal to the ARN of your new IAM user that you copied in the previous instruction. As actions, select:

AbortMultipartUpload
GetBucketLocation
GetObject
ListBucket
ListBucketMultipartUploads
ListMultipartUploadParts
PutObject

As Amazon Resource Name, insert arn:aws:s3:::<bucket_name>/*,arn:aws:s3:::<bucket_name>, replacing <bucket_name> with the name of your bucket. Click ‘Generate policy’ and copy-paste the result into the Bucket Policy Editor window from which you opened the AWS Policy Generator.

Congratulations, you’ve now set up your S3 bucket to store the backups you make.

Setting up duplicity on the server

Execute all these commands on the server that you will be backing up.

Install software

The Debian packages duplicity and python-boto are required. The package rng-tools is recommended but optional. Install them all with sudo apt-get install duplicity python3-boto rng-tools.

Update 2022-03-20: Newer versions of Debian (bullseye and up, I think) require python3-boto instead of python-boto. I updated the instructions accordingly.

Generating a PGP key

As the root user, generate a PGP key:

sudo gpg --gen-key

Choose option 1 (RSA), a key length of 3072 bits, expiry 0 (does not expire), y (does not expire), a real name that indicates its purpose (such as ‘My Backup Key for server.example.net’), your email address, no comment and O for OK.

For the passphrase, make sure you use a value that represents at least 128 bits of entropy. Thus, you may use either of:

the XKCD password scheme, with nine random words out of a list of 20,000;
a sequence of 28 random lowercase letters;
a sequence of 22 random uppercase and lowercase letters and digits.

If you only have command line tools at your disposal, you can quickly generate a passphrase with pwgen -s 22 1. Be sure to write it down and store it in a secure (with respect to confidentiality and availability) place.

Generating a key of this size takes a long time. If you have installed the rng-tools package, open a second session on the server and run sudo rngd -f -r /dev/urandom. This pipes the output of urandom into the randomness pool of /dev/random, which speeds up the key generation process. Kill the rngd process with Ctrl-C after key generation is complete. This method is still secure, as Filippo Valsorda explains so eloquently in his presentation at 32C3 (recording, brief summary).

Type sudo gpg --list-keys and make a note of the key ID of the key you just generated (it is the eight characters directly after pub 3072R/).

Make a backup targets file

Make a file /etc/backup-targets that contains a list of all the directories that should be included in the backup. This file should end with - / on a separate line. The - / signifies that the root itself should not be included in the backup.

An example backup-targets file:

/etc
/home
/var/www
- /

Create the backup cron job

Now, create the cron job that will execute duplicity every night, by saving it as /etc/cron.daily/duplicity-full:

#!/bin/sh

export AWS_ACCESS_KEY_ID=<Enter your Amazon IAM user key ID here>
export AWS_SECRET_ACCESS_KEY=<Enter your Amazon IAM user secret key here>

duplicity full --encrypt-key <PGP key ID> --include-filelist /etc/backup-targets / s3://s3-eu-west-1.amazonaws.com/<bucket-name>/<server identifier> 2>&1 | logger -tmy-backup

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

Update 2022-03-20: Apparently, the –include-globbing-filelist option is now obsolete. I update the instructions accordingly.

You have copied the Amazon IAM user key ID and secret key when creating the IAM user. You have copied the PGP key ID when creating the PGP key. You have chosen the bucket name when creating the bucket. Choose a server identifier yourself, such as its fully qualified domain name (FQDN).

Make sure to only include the AWS credentials in the export commands before the duplicity command. The export commands after the duplicity command are meant to clear these variables.

The Amazon AWS hostname at which your bucket is available depends on the region in which you created the bucket at the first step. s3-eu-west-1 is the value for Ireland. You can look up the value for your region in this list.

Make the script executable and world-unreadable

… by typing sudo chmod 700 /etc/cron.daily/duplicity-full.

Export the PGP key

Export the PGP key to the home directory of your user:

sudo gpg --export-secret-key --armor > /home/<user>/backup-secret-key-<server identifier>.asc

Change ownership of the key to the user (sudo chown <user>: /home/<user>/backup-secret-key-<server identifier>.asc) and download it to your computer. Store the secret key in a secure location (with respect to confidentiality and availability), preferably separately from the passphrase.

Delete the PGP key from the home directory of your user:

rm /home/<user>/backup-secret-key-<server identifier>.asc

Optional: set up a backup monitoring server

Backups are only as good as your ability to restore them. Imagine that you need to restore a backup, but you cannot figure out how to access or decrypt the backups you made. Even worse, imagine that your backup process failed a few weeks earlier but you did not notice.

To prevent such cases, set up a backup monitoring server.

I will assume that you have a server running, upon which you have taken your usual security measures.

Install software

To set up the backup monitoring server, install the Debian packages duply and python-boto: sudo apt-get install duply python-boto.

Add a group for all users that the server should monitor

sudo addgroup to-backup

We will create a separate user for every server that we want to monitor. In this example, we will assume that all servers have a FQDN of the form <hostname>.example.net, and that <hostname> is the server identifier we used earlier. The corresponding user on the backup monitor server will then also be named <hostname>. For example, the server identifier of mail.example.net is ‘mail’ and the username on the backup monitor server is also ‘mail’.

Create default files for home directories

Each user will need the same content in their home directory. We therefore create it beforehand in /etc/skel:

sudo mkdir /etc/skel/.ssh
sudo touch /etc/skel/.ssh/authorized_keys
sudo mkdir -p /etc/skel/.duply/backup
sudo chmod -R 700 /etc/skel/.duply
sudo touch /etc/skel/.duply/backup/conf
sudo touch /etc/skel/.duply/backup/exclude
sudo chmod 600 /etc/skel/.duply/backup/*
sudo touch /etc/skel/set-passphrase.sh
sudo touch /etc/skel/remove-passphrase.sh
sudo chmod 700 /etc/skel/*-passphrase.sh

In /etc/skel/.duply/backup/conf, we put:

GPG_KEY='<PGP key ID>'
GPG_PW=''
GPG_TEST='disabled'
TARGET=s3://s3-eu-west-1.amazonaws.com/<bucket name>/<server identifier>
TARGET_USER='<Enter your Amazon IAM user ID here>'
TARGET_PASS='<Enter your Amazon IAM secret key here>'
SOURCE='/'

Do not forget to enter your PGP key ID, bucket name, server identifier, Amazon IAM user ID and secret key in the indicated spots. If your bucket is in another region than Ireland, change the URL appropriately.

If you run into errors when running duply, especially “NoAuthHandlerFound: No handler was ready to authenticate.”, see the Troubleshooting section below.

Anyway, next we put the following in /etc/skel/set-passphrase.sh:

#!/bin/bash

read -s -p "Passphrase for PGP key: " passphrase
echo

expression="s/^GPG_PW=.*$/GPG_PW='$passphrase'/"
sed "$expression" -i ~/.duply/backup/conf

And in /etc/skel/remove-passphrase.sh:

#!/bin/bash

expression="s/^GPG_PW=.*$/GPG_PW=''/"
sed "$expression" -i ~/.duply/backup/conf

Add a backup access user

Assume the backup to which we wish to provide access is for mail.example.net.

Add a user ‘mail’:

sudo adduser --disabled-password mail
sudo adduser mail to-backup

Assume their identity:

sudo -u mail -i

Authorize your public key for logins, by putting it in the .ssh/authorized_keys file.

Upload the PGP public key for backups on mail.example.net to the home directory of the user mail.

Import the key:

gpg --import backup-secret-key-mail.example.net.asc

Now, set the key to ultimate trust:

gpg --edit-key <PGP key ID>
> trust
> 5 (ultimate)
> y (yes)
> quit

In .duply/backup/conf, insert the correct values for the PGP key ID and the server identifier.

Repeat these instructions for every server whose backup you want to access through this server.

Set up backup monitoring

Create a file /root/bin/remove-all-passphrases.sh:

#!/bin/bash

expression="s/^GPG_PW=.*$/GPG_PW=''/"

for homedir in /home/*/;
do
    if [ -e "$homedir/.duply/backup/conf" ]
    then
        sed "$expression" -i "$homedir/.duply/backup/conf"
    fi
done

Create a second file, /root/bin/monitor-backups.sh:

#!/bin/bash

# We have to change $IFS to be able to split usernames properly
oIFS="$IFS"
IFS=','

# Duply uses the en_US locale, so we do too.
LC_TIME=en_US.utf8

# We iterate over the usernames in the group to-backup 
for username in `getent group to-backup | cut -f4 -d:`
do
    # Extract a list of all non-full backups
    nonfull=`sudo -u "$username" duply backup status | grep "^                " | awk '{print $1}' | grep -v "Full"`
    # If this list is non-empty, there are non-full backups
    if [ -n "$nonfull" ];
    then
        echo "Server $username has a number of incremental backups. This should not occur. Notify the administrator of this server."
    fi

    # Pattern: this is what the date reference looks like if a backup was made today.
    pattern=`date +"%a %b %e"`
    # And this is the year that matches that date.
    year=`date +%Y`
    # Take the years of all backups that have month and day of today.
    today=`sudo -u "$username" duply backup status | grep "^                " | grep "$pattern" | awk '{ print $6 }' | uniq`
    # This list should only contain the current year. 
    if [ "$today" != "$year" ]
    then
        echo "Server $username did not make a backup last night. Notify the administrator of this server.
    fi
done

IFS="$oIFS"

Make both scripts executable with sudo chmod 700 /root/bin/monitor-backups.sh /root/bin/remove-all-passphrases.sh.

Add a new file, /etc/cron.d/backup-monitoring. The times in this example assume that your backups are all done by 8:15 in the morning. Adjust if this is not the case.

SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

17 8 * * * root /root/bin/remove-all-passphrases.sh
18 8 * * * root /root/bin/monitor-backups.sh 2&>1

Finally, make this file executable as well with sudo chmod +x /etc/cron.d/backup-monitoring.

Congratulations, you’re done setting up your backup access and monitoring server. Read on to learn how to access your backups and add new servers to the monitoring.

Instructions for daily use

How to access the backups

If you want to access the backups for server mail.example.net, log in to the backup monitoring server as the user mail.

Set the PGP passphrase:

./set-passphrase.sh

Now, use duply to access the backup.

Current state of the backups:

duply backup status

List contents of most recent backup:

duply backup list

Fetch a certain file (example: /etc/passwd) from the most recent backup and put it in your home directory:

duply backup fetch etc/passwd passwd

Fetch the same file, but from the backup of three days ago and put it in your home directory:

duply backup fetch etc/passwd passwd 3D

Fetch a certain directory (example: /etc/cron.d) from the most recent backup and put it in your home directory:

duply backup fetch etc/cron.d cron.d

Fetch all files and put them in /home/mail/all-files:

duply backup restore all-files

Always remove the passphrase after accessing the backups:

./remove-passphrase.sh

How to add a new server to the monitoring

If you want to add a new server to the monitoring, just use the instructions for adding a backup access user.

How to remove a server from the monitoring

If you want to remove a server from the monitoring, just remove the user and his associated home directory: sudo userdel -r <username>.

Troubleshooting

The backup works, but the server keeps asking for a passphrase

Sometimes, servers do this. I am not sure why. The following has helped for some people who followed this guide.

In the file /etc/cron.daily/duplicity-full, replace the line

duplicity full --encrypt-key <PGP key ID> --include-filelist /etc/backup-targets / s3://s3-eu-west-1.amazonaws.com/<bucket-name>/<server identifier> 2>&1 | logger -tmy-backup

PASSPHRASE="" duplicity full --encrypt-key <PGP key ID> --include-filelist /etc/backup-targets / s3://s3-eu-west-1.amazonaws.com/<bucket-name>/<server identifier> 2>&1 | logger -tmy-backup

After upgrading python-boto on the backup monitoring server, whenever I run duply I get the error “NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked.”.

Some readers have experienced difficulties with the above config after an upgrade of python-boto, in particular with “NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked.” errors. I found an explanation and a solution in this blog post. Change the TARGET_USER= and TARGET_PASS= statements to the following:

# TARGET_USER='XXXXXXXXXXXX'
# TARGET_PASS='XXXXXXXXXXXX'
export AWS_ACCESS_KEY_ID='XXXXXXXXXXXX'
export AWS_SECRET_ACCESS_KEY='XXXXXXXXXXXX'