In the near-term we're only going to put truly disposable 'cattle' instances on ceph. In the meantime, though, we should come up with some sort of backup/restore process.
It's true that we currently have no backups for VMs, but our current failure case is losing one Hypervisor worth of VMs, whereas with ceph we now run the risk of losing the whole cloud if ceph freaks out.
Quick summary of most recent conversation:
- We probably want to use Backy2 for this. We might also use Benji; it has fancier compression but is a younger project.
- For proof-of-concept (and possibly near-term production) we'll use cloudstore1008/9.
- For full-scale backups we probably need new hardware, but will learn more about storage needs as we go.
- Some users (e.g. https://www.reddit.com/r/ceph/comments/61nmfv/how_is_anyone_doing_backups_on_cephrbd/) have had trouble with Ceph freezing when capturing snapshots for backup.
- For starters we're going to hope that that isn't a problem for us; if it is then we'll have to consider creating a mirrored cluster just for backup purposes.
- Possibly that mirror can have only one replica rather than three, which might push it into affordability
- For starters we're going to hope that that isn't a problem for us; if it is then we'll have to consider creating a mirrored cluster just for backup purposes.
For the first round of tests/experiments, I'd like to answer these questions:
- Does the upstream backy .deb install on Buster?
- Can we do this using local storage on cloudstores, or do we need it on NFS?
- What are some rough numbers for how big a backup image is, relative to initial VM size?
- Same question for incremental backups
- Does Ceph misbehave for our users during the backup process?