FEATURE REQUEST Backblaze 1 recently launched a new service - B2 2, as a competing product with Amazon S3 and Google Cloud Storage. In doing so they have created an API with which you can interface to do backups and restores 3. It would be nice if y'all could add B2 as a. When running Duplicity with Backblaze's B2 as outlined in some articles: duplicity b2://keyID:application key@B2 bucket name Real values hidden, but provided though the Backblaze B2. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server. The duplicity package also includes the rdiffdir utility. Rdiffdir is an extension of librsync's rdiff to directories-it can be used to produce signatures.
Recently, I’ve been thinking more and more about backups for my small (but growing) homelab. The golden rule is to follow the 3-2-1 method for backups:
- 3 backups
- 2 different types of media
- 1 backup offsite
Current setup#
Currently, I keep an encrypted external HDD at home and another at work. Every couple weeks, I perform a backup to both and rotate the drives (this covers a 2-1-1 backup).
Planned setup#
I’d like to add cloud storage for a full 3-2-1 backup. My idea is to centralize all my backups to one location, then send the backups offsite to a cloud storage provider. The setup below is my final goal and will fulfill my 3-2-1 requirement.
Storage providers#
For this, I was looking for a raw storage endpoint with some sort of API or command line interface. I was not interested in a file syncing service (e.g., Google Drive or Dropbox) or a cloud backup solution (e.g., Crashplan or Carbonite). While looking for cloud storage providers, I compared the following:
I ended up choosing Backblaze B2 storage. They seemed to be the cheapest, had the most straight-forward pricing, and were the easiest to setup with the backup program I was using.
Full disclosure, I was already a Backblaze fanboy. I was already subscribed to their great blog where they post yearly stats on their hard drives. But, if that’s not enough, they offer free restores via USB flash drive or external HDD if your data is too big to download. And if you need to upload up to 40TB of data, you can request a Fireball (not free, but still cool).
Backup programs#
While looking for backup programs, I compared the following:
I ended up choosing Duplicity. It seemed to be the most popular program, it supports incremental backups and B2 storage, and supports encryption with GPG.
Sign up and install B2#
Sign up for a B2 account if you don’t have one already. You can download the official B2 command line tool from these instructions, but I’m installing the package from the AUR using pacaur. Note - You can create a bucket from the website if you don’t want to install the B2 command line tool.
Setup a bucket#
Start by authorizing your account (substitute your account ID as needed). You will be prompted for your Application Key, which you can get in the B2 control panel.
Now, create a bucket (make sure it is allPrivate). The bucket name must be globally unique to all of Backblaze, not just your account. You can have up to 100 buckets per account.
Finally, list your available buckets.
I highly recommend you encrypt your backups using GPG. It’s integrated into Duplicity and will protect your files from prying eyes. I won’t be covering it here, but check out my other guide on how to create a GPG key. For this setup, I will be using a separate key for encryption and signing.
Disclaimer - Don’t lose the keys or the passphrases to the keys. For example, don’t backup the GPG keys using Duplicity, then have your hard drive crash, which would require the GPG keys to unlock Duplicity. Store the keys on a separate backup by themselves.
First, install Duplicity.
Duplicity basics#
The basic syntax for Duplicity is below.
To backup directly to a server via SFTP, use a command similar to the one below.
To backup a folder to your B2 bucket, use a command similar to the one below. Substitute your account ID, application key, and bucket name as needed.
Duplicity also handles rotating backups. Here, I’m remove backups older than 3 months.
Duplicity script#
Because Duplicity has so many command line options, it’s easier to setup a script and run it via cron.
Hope this helps!
-Logan
Making regular backups of your data is important. I hope no one is trying to debate on that. Of course, some data is more important than other, but given that you want to keep things around, I recommend you to store your data in a good place and then make sure backups are done completely automatic.
I do this using an Ansible role, my Nextcloud, duplicity
and since a few days a backup cloud provider called Backblaze. The backups themselves are done already for quite a while, but I used Hetzner storage boxes for that until switching.
Backup strategy
Generally speaking everyone should be aware of the 3-2-1-Backup strategy. And while there is some room for customization, it’s never wrong to apply it. The idea is that you have 3 copies of your data, 2 of them on-site on two different mediums and a third one off-site either at a friend, family or cloud provider.
To achieve this I keep, for example, my documents on my local devices. But also use the desktop client to synchronize it to my Nextcloud instance. That’s two copies of the files on different mediums. The actual off-site is the one I’ll talk about in this article that uses duplicity
to create a backup of Nextcloud.
It’s important to take into consideration, that given your Nextcloud instance is classified as “on-site”, you should use another provider for your off-site backup. And that’s the reason why I’m switching away from Hetzner storage boxes. They are great and work perfectly fine, but since I move more and more services to Hetzner cloud instances, I don’t want to store my backup there.
Two different cloud providers are important, because mistakes happen. From an (accidental) deletion of all my account’s products at a cloud provider, over large scale hardware failures, to not working payment methods. In all cases, if your off-site backup is using anything in common as your “on-site” backup, you’ll are in trouble. So spreading the risk is important here.
But enough of the strategy and theory, let’s get started.
Account creation and setting up the bucket
In order to do server backups with duplicity
, Backblaze offers their so-called “B2” storage. It’s basically like Amazon’s S3 storage, just with less proxying and instead one more API request to store data.
During account creation you select the region where your data backups are stored in. Since I prefer to have my data in the EU, I had explicitly selected the region in the sign-up form. Make sure you do your choice there. It’s easy to miss and I couldn’t find any setting to change it after account creation.
Otherwise, it’s like everywhere. Throw in an email address and a password. Get your 2FA setup, confirm your email address, add billing information and create a private “bucket” to throw in your backup data into it.
Then switch the “App Keys”-dialogue where you can create the access tokens for your buckets, which will be used by duplicity
to back up your data.
In the 1st field you enter the name of the key. To keep it simply, I use the same name as for bucket itself, but you can be creative here. On the 2nd field by default “All” is selected. Of course, you shouldn’t allow “machine A” to delete backups for “machine B”. Therefore I highly recommend allowing this access token only for one bucket. Now click on “Create New Key” and your access token will be generated.
The provided keyID
and applicationKey
can be used to create the b2://
-URL that is our future backup target. In order to do this, you put everything together following this schema: b2://<keyID>:<applicationKey>@<bucket name>
. In this example it results in b2://XXXXXXXXXXXXXX80000000007:XXXXXXXXXXXXXXXXXXXXXXXXXXX1Vd0@backup-example
.
Make sure to store this URL for later steps.
Duplicity on CentOS 7 with B2 backend
Duplicity is a wonderful tool for backups on Linux. Besides just being able to handle all kinds of storage backends from FTP, SMB, and S3 as well as SFTP or B2 storage, it is also integrated with GnuPG in order to encrypt all content of your backups. This is essential as you usually use an “untrusted” storage for off-site backups. While I’m reasonable sure that they take care about data center security and destruction of data and disks, I don’t want to risk anything and therefore encrypting data before sending it, makes things always safe.
Main problem is that in order to use the B2-backend, duplicity
requires the python library b2sdk
which is not packaged for CentOS 7. I try to avoid installing things using pip
, as this either messes with your system installation or never looks spotless, instead I went for a solution that involves containers. As all hosts run moby-engine
anyway, why not?
In order to do this, I started to build a container for duplicity
. Taking a python base image, adding duplicity
and the mentioned b2sdk
dependency. Sending everything through CI, off to Quay and here we go.
Bringing it to production with backup_lvm
In order to get the container running in production, I updated my existing backup solution.The backup_lvm
Ansible role which runs on a daily basis and before this change, was running duplicity
directly installed on CentOS. With a few changes things are now running in a container and more confined than ever before.
All together the backup_lvm
role works now like this:
- Create a directory for the backups
- Take a snapshot from all LVM volumes configured
- Mount those snapshots read-only in the directory that was created
- Take a backup of all volumes and push them, once encrypted, to Backblaze
- Unmount all snapshots
- Delete all snapshots
If you don’t want to use my Ansible role, you can still run the container with my settings in an own, minimal playbook:
Just make sure to set the variables and you are ready to go!
In order to use GnuPG with duplicity
, make sure you generate a key and save a copy of both, the private and the public key, in a secure place off your machine in order to allow recovery of your backup when the machine disappears one day.
Update 2020-04-16:If you run on CentOS or alike, you might need to run gpgconf --kill gpg-agent
on the host after generating your key and before running the container in order to prevent the container failing due to connecting the local GPG agent, which has a too big version difference.
Duplicity Backblaze B2 Pro
Conclusion
Backblaze appears to be a viable alternative to my existing backup storage and helps with keeping data available even when the worst case scenario appears and my entire account that hosts my infrastructure would disappear tomorrow.
Backblaze B2 Synology
With duplicity
daily backups work out nicely and the containerized version makes it easy to evolve the setup even further. This tutorial should provide a rather detailed insight into how to decide on a good backup strategy, set up your backup storage and run your backups in an automated fashion. I wish, you always have, but never need a restorable backup!