On my home network I have a headless Linux server that I use for a variety of tasks. One of the problems is that I can't use Deja Dup to backup the data on the server because it requires a GUI. I'm not sure why this is, but it is what it is. So I needed to find a way to backup the disks.
For some time I had used rsnapshot to take regular backups but the problem is it doesn't have a nice way to restore or find files other than searching through the folder tree, which is fine but sometimes annoying. Also there is a lot of manual config stuff I had to set to get it to behave the way I want because it doesn't use deduplicating backups but rather hard linking. The reason deduplicating backups are nice is they only perform incremental deltas and make efficient use of storage resources.
Fortunately I was able to find, maybe late to the game actually, two CLI tools that are pretty good for this purpose. The first is borgbackup. The other is restic.
Borgbackup
Borgbackup is a deduplicating backup tool ... interesting name though. Seems to be a pretty modern backup tool that is designed to be fast, secure, and efficient. Ideally it focuses on backing up and restoring your data. I had it setup, you can install from Ubuntu repos using sudo apt install borgbackup, but the downside is it doesn't really support cloud storage directly, although through rclone I think you can get it to work, on that note let me talk about rclone.
Rclone
One of the biggest gripes I have about Google is that they don't have a native Linux client/tool to support Google Drive. It is pretty ridiculous in my opinion, but for a long time the rclone project has been a great solution; it's generally cross-platform and supports a lot of cloud storage providers. It is a command line tool to sync files and directories to and from various cloud storage providers. To be honest I'm not too strong with its configuration and structure because on my personal laptop I use insync to sync my Google Drive; I paid $4.99 for a lifetime license and have been grandfathered into the newer versions which are subscription based.
The thing with rclone is if you just go through the generic setup you will have to use the rclone org's built-in Google API Client ID which is going to most likely be rate-limited and have other quotas. So the best thing to do is to create your own Google API Client ID and use that. You'll have to login to console.cloud.google.com and create a new project. Then you can go to APIs & Services > Credentials and create a new credentials for your internal use. Save the output from this its the Client ID and Client Secret.
| Configuring Google API Client ID | 
Then when your setting up rclone and it asks you if you want to setup your own Google API Client ID then input the Client ID and Client Secret you just created. Then you can proceed to setup the rest of the configuration. This will allow you to experience a much more fully transport sync and backup experience. Essentially you have a way to sync your local filesystem to Google Drive and vice versa. Going to leave it there for now.
Restic
So with Borgbackup not being so friendly to cloud storage or rclone. I kept on my search and found restic, which supports rclone natively. The main command to leverage the restic repo (thats how they name things) is:
restic -r rclone:GoogleDrive:Backup init
restic -r rclone:GoogleDrive:Backup <commands>
This creates a backup repository using rclone on your configured Google Drive and it calls the folder Backup in your Google Drive. You can then use the following commands to backup your data:
restic -r rclone:GoogleDrive:Backup backup <path/want/to/backup>
restic -r rclone:GoogleDrive:Backup restore <snapshot_id> --target <path/where/to/restore>
restic -r rclone:GoogleDrive:Backup list snapshots
There is the downside that restic requires that you password encrypt your backups. So you can't view these backups directly, you have to leverage restic to view them and or mount them with say FUSE to view them locally.
Automating
To achieve regular backups without having to manually run the restic commands, the simplest solution is to just setup a cron job. Run something like:
crontab -e
# Run backup script daily at 2 AM
0 2 * * * /home/user/backup_script.sh
where the backup_script.sh is just a wrapper bash script for running restic and possibly any webhook notifications you want. You can add any logging stdout you want as well.
#!/bin/bash
export RESTIC_PASSWORD_FILE="/path/to/user/.restic_password"
REPO="rclone:GoogleDrive:Backup"
restic -r $REPO backup /location/to/backup
... other restic commands ...
Notice that you'll have to set the RESTIC_PASSWORD_FILE environment variable to the path to the file that contains your restic password for the repo you are backing up 😠.
Backup frequency and retention
The cron job just runs the backup_script.sh script at the specified time but it is not responsible for the backup frequency and retention policies. Although if you have daily backups that occur more than once you would want the cron job to run at that same frequency so that you keep the backups consistent.
So say you want to keep daily backups for 7 days, weekly for 4 weeks, and monthly for 12 months. You would want to run the following command after each backup:
restic -r rclone:GoogleDrive:Backup forget --keep-daily 7 --keep-weekly 4 --keep-monthly 12
# Removes the actual stale backups
restic -r rclone:GoogleDrive:Backup prune
The prune command actually removes the data that's no longer needed, while forget just marks snapshots for deletion. The forget command is more like a "soft delete" in that it marks the snapshots for deletion but doesn't actually remove the data. The prune command is more like a "hard delete" in that it actually removes the data.
Performance Considerations
Because I backup directly to Google Drive, I've noticed it's very slow and time consuming. This probably means I need to tune the restic command to be more efficient like using parallel uploading or changing the chunking size.
Final Thoughts
I'm finding that rclone + restic is a pretty good combination for my needs. The deduplication will prevent a lot of Google Drive storage consumption. I don't really care about the data encryption (wish I could turn off) but it is what it is. I think if you're working on a headless server (e.g. self-managed compute cluster) and you want to backup data on your network, this is a pretty reasonable solution.
The one thing I can't speak to is how syncing behaves, where you're working on two systems and need files to be locked and synced properly as you work on those files. Like I mentioned on my personal laptop I use insync to sync my Google Drive and this works pretty well.
No comments:
Post a Comment
Please refrain from using ad hominem attacks, profanity, slander, or any similar sentiment in your comments. Let's keep the discussion respectful and constructive.