Log In
T260692
Ceph VM image backups
Closed, ResolvedPublic
Assigned To
dcaro
Authored By
Andrew
Aug 18 2020, 2:22 PM
Tags
cloud-services-team (Kanban) (Doing)
Goal
Patch-For-Review
Subscribers
Aklapper
Andrew
bd808
Bstorm
Description
The proof of concept in T259192 is looking pretty good; let's move ahead with making a real backup setup.
identify hardware for this. 20Tb would be a nice amount of space to start with. Going to try to use cloudvirt1024. As a ceph node, its drives are idle; we'll see if running backup jobs interferes with VM performance.
identify more projects and/or VM types to exclude from backups so we don't get overwhelmed when more VMs move to ceph
puppetize the patch in https://github.com/wamdam/backy2/pull/72 (if upstream doesn't apply it and build a new package)
switch it on, starting with 7 days of backups
make sure we have disk space monitoring for the backup storage
document restoration process, make sure WMCS staff has experience with restoration Documentation is here: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Instance_backups#Future_concerns
add additional cloudvirts to the cluster
backup each VM in two places (T267195 ?)
Details
Show related patches Customize query in gerrit
Related Objects
Task Graph
Mentions
StatusAssignedTask
ResolvedMajavahT211393 openstack-browser and horizon: Security group and floating IP quota information being pulled from Nova instead of Neutron for eqiad1-r
ResolvedAndrewT211777 Can't get quota information from Neutron API
ResolvedAndrewT261137 upgrade cloud-vps openstack to Openstack version 'Victoria'
ResolveddcaroT261136 upgrade cloud-vps openstack to Openstack version 'Ussuri'
ResolvedAndrewT261138 Upgrade Horizon to latest OpenStack release
ResolvedAndrewT261135 upgrade cloud-vps openstack to Openstack version 'Train'
ResolvedAndrewT261134 upgrade cloud-vps openstack to Openstack version 'Stein'
ResolvedAndrewT259399 Upgrade cloudvirts to Debian Buster
ResolveddcaroT216195 Move cloudvirt hosts to 10Gb ethernet
OpenNoneT194334 [Epic] Modern Cloud VPS storage layer
ResolvedAndrewT261132 Move all cloud-vps VMs to Ceph
ResolvedAndrewT253365 Complete build out of Ceph cluster and attach "diskless" cloudvirts
ResolveddcaroT260692 Ceph VM image backups
ResolvedaborreroT260941 Practice restoring ceph backups
DeclineddcaroT267195 CloudVPS: improve VM backups to make them redundant and discoverable
ResolveddcaroT271094 [dumps] Review backup strategy
OpenNoneT273720 [ceph][rbd] Periodically cleanup dangling snapshots
OpenNoneT273723 [backups] Periodically cleanup non-handled backups
Andrew created this task.
Aug 18 2020, 2:22 PM
Andrew updated the task description. (Show Details)
Aug 18 2020, 4:09 PM
gerritbot added a comment.
Aug 18 2020, 4:40 PM
Change 621022 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Revert "backy2: temporarily hack data dir to /var/lib/nova/instances"
https://gerrit.wikimedia.org/r/621022
gerritbot added a project: Patch-For-Review.
Aug 18 2020, 4:40 PM
Change 621023 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs/ceph/backy: move backup engine to cloudstore1009
https://gerrit.wikimedia.org/r/621023
gerritbot added a comment.
Aug 18 2020, 4:40 PM
Change 621024 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] backy2: remove some unused hiera settings
https://gerrit.wikimedia.org/r/621024
gerritbot added a comment.
Aug 18 2020, 4:40 PM
Change 621025 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] backy2: hack in a fix to an upstream bug in 'backy2 du'
https://gerrit.wikimedia.org/r/621025
gerritbot added a comment.
Aug 18 2020, 6:01 PM
Change 621022 merged by Andrew Bogott:
[operations/puppet@production] Revert "backy2: temporarily hack data dir to /var/lib/nova/instances"
https://gerrit.wikimedia.org/r/621022
gerritbot added a comment.
Aug 18 2020, 6:02 PM
Change 621024 merged by Andrew Bogott:
[operations/puppet@production] backy2: remove some unused hiera settings
https://gerrit.wikimedia.org/r/621024
gerritbot added a comment.
Aug 18 2020, 6:04 PM
Change 621025 merged by Andrew Bogott:
[operations/puppet@production] backy2: hack in a fix to an upstream bug in 'backy2 du'
https://gerrit.wikimedia.org/r/621025
gerritbot added a comment.
Aug 18 2020, 6:06 PM
Change 621023 merged by Andrew Bogott:
[operations/puppet@production] wmcs/ceph/backy: move backup engine to cloudstore1009
https://gerrit.wikimedia.org/r/621023
gerritbot added a comment.
Aug 18 2020, 8:12 PM
Change 621058 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1024: move to Buster and make a ceph cloudvirt
https://gerrit.wikimedia.org/r/621058
Andrew updated the task description. (Show Details)
Aug 18 2020, 8:13 PM
gerritbot added a comment.
Aug 18 2020, 8:37 PM
Change 621058 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1024: move to Buster and make a ceph cloudvirt
https://gerrit.wikimedia.org/r/621058
ops-monitoring-bot added a comment.
Aug 18 2020, 8:43 PM
Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:
['cloudvirt1024.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/202008182043_andrew_31694.log​.
ops-monitoring-bot added a comment.
Aug 18 2020, 9:16 PM
Completed auto-reimage of hosts:
['cloudvirt1024.eqiad.wmnet']
Of which those FAILED:
['cloudvirt1024.eqiad.wmnet']
gerritbot added a comment.
Aug 18 2020, 9:41 PM
Change 621077 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1024: move to new role, 'virt_ceph_and_backy'
https://gerrit.wikimedia.org/r/621077
gerritbot added a comment.
Aug 18 2020, 9:44 PM
Change 621077 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1024: move to new role, 'virt_ceph_and_backy'
https://gerrit.wikimedia.org/r/621077
Andrew moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.
Aug 19 2020, 3:07 PM
gerritbot added a comment.
Aug 20 2020, 3:03 PM
Change 621538 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] ceph backups: exclude integration agents
https://gerrit.wikimedia.org/r/621538
gerritbot added a comment.
Aug 20 2020, 3:20 PM
Change 621538 merged by Andrew Bogott:
[operations/puppet@production] ceph backups: exclude integration agents
https://gerrit.wikimedia.org/r/621538
Andrew updated the task description. (Show Details)
Aug 20 2020, 7:40 PM
Andrew updated the task description. (Show Details)
Aug 20 2020, 8:13 PM
Andrew updated the task description. (Show Details)
Andrew added a comment.
Sep 1 2020, 7:53 PM
I just ran some performance tests on a VM while backup jobs were running. I didn't notice any change in behavior.
Once we're closer to full network capacity all bets are off, but there's no clear downside to the backups at the moment.
aborrero closed subtask T260941: Practice restoring ceph backups as Resolved.
Sep 7 2020, 3:36 PM
Andrew closed this task as Resolved.
Sep 7 2020, 4:56 PM
Andrew updated the task description. (Show Details)
gerritbot added a comment.
Oct 8 2020, 4:42 PM
Change 632960 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs server backups: Add a way to assign projects to backup hosts
https://gerrit.wikimedia.org/r/632960
Andrew reopened this task as Open.
Oct 8 2020, 4:44 PM
re-opening because cloudvirt1024 isn't big enough for all our backups.
gerritbot added a comment.
Oct 8 2020, 4:48 PM
Change 632961 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs backups: remove the 'special_projects' logic
https://gerrit.wikimedia.org/r/632961
gerritbot added a comment.
Oct 8 2020, 6:34 PM
Change 632976 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs backy2: allow hiera config of when the backup runs
https://gerrit.wikimedia.org/r/632976
gerritbot added a comment.
Oct 8 2020, 7:11 PM
Change 632976 merged by Andrew Bogott:
[operations/puppet@production] wmcs backy2: allow hiera config of when the backup runs
https://gerrit.wikimedia.org/r/632976
gerritbot added a comment.
Oct 8 2020, 7:40 PM
Change 632960 merged by Andrew Bogott:
[operations/puppet@production] wmcs server backups: Add a way to assign projects to backup hosts
https://gerrit.wikimedia.org/r/632960
gerritbot added a comment.
Oct 8 2020, 7:51 PM
Change 632961 merged by Andrew Bogott:
[operations/puppet@production] wmcs backups: remove the 'special_projects' logic
https://gerrit.wikimedia.org/r/632961
gerritbot added a comment.
Oct 9 2020, 2:38 AM
Change 633049 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs-backup-instances: add missing argument
https://gerrit.wikimedia.org/r/633049
gerritbot added a comment.
Oct 9 2020, 2:39 AM
Change 633049 merged by Andrew Bogott:
[operations/puppet@production] wmcs-backup-instances: add missing argument
https://gerrit.wikimedia.org/r/633049
gerritbot added a comment.
Oct 10 2020, 3:54 PM
Change 633306 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] backy2: throttle bandwidth for reading and writing
https://gerrit.wikimedia.org/r/633306
gerritbot added a comment.
Oct 10 2020, 3:54 PM
Change 633306 merged by Andrew Bogott:
[operations/puppet@production] backy2: throttle bandwidth for reading and writing
https://gerrit.wikimedia.org/r/633306
gerritbot added a comment.
Oct 13 2020, 1:57 PM
Change 633741 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] clouddvirt102[1-9]: apply libvirt-backy-ssd partman recipe
https://gerrit.wikimedia.org/r/633741
gerritbot added a comment.
Oct 13 2020, 2:08 PM
Change 633741 merged by Andrew Bogott:
[operations/puppet@production] clouddvirt102[1-9]: apply libvirt-backy-ssd partman recipe
https://gerrit.wikimedia.org/r/633741
gerritbot added a comment.
Oct 13 2020, 2:19 PM
Change 633744 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1022: move to virt_ceph_and_backy
https://gerrit.wikimedia.org/r/633744
Andrew updated the task description. (Show Details)
Oct 13 2020, 2:21 PM
gerritbot added a comment.
Oct 13 2020, 2:42 PM
Change 633744 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1022: move to virt_ceph_and_backy
https://gerrit.wikimedia.org/r/633744
gerritbot added a comment.
Oct 13 2020, 3:26 PM
Change 633771 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1021: add backy support
https://gerrit.wikimedia.org/r/633771
gerritbot added a comment.
Oct 13 2020, 3:26 PM
Change 633771 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1021: add backy support
https://gerrit.wikimedia.org/r/633771
gerritbot added a comment.
Oct 13 2020, 9:03 PM
Change 633829 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs VM backups: add two more backup hosts, increase days to 7
https://gerrit.wikimedia.org/r/633829
gerritbot added a comment.
Oct 13 2020, 9:30 PM
Change 633829 merged by Andrew Bogott:
[operations/puppet@production] wmcs VM backups: add two more backup hosts, increase days to 7
https://gerrit.wikimedia.org/r/633829
gerritbot added a comment.
Oct 19 2020, 4:25 PM
Change 635014 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs instance backup: move a few more projects to cloudvirt1021
https://gerrit.wikimedia.org/r/635014
gerritbot added a comment.
Oct 19 2020, 4:28 PM
Change 635014 merged by Andrew Bogott:
[operations/puppet@production] wmcs instance backup: move a few more projects to cloudvirt1021
https://gerrit.wikimedia.org/r/635014
Andrew updated the task description. (Show Details)
Oct 19 2020, 8:02 PM
Andrew triaged this task as Medium priority.
Oct 20 2020, 4:21 PM
gerritbot added a comment.
Oct 23 2020, 1:29 PM
Change 636020 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Define backy2::backup_time for cloudvirt102[5-8]
https://gerrit.wikimedia.org/r/636020
gerritbot added a comment.
Oct 23 2020, 1:31 PM
Change 636020 merged by Andrew Bogott:
[operations/puppet@production] Define backy2::backup_time for cloudvirt102[5-8]
https://gerrit.wikimedia.org/r/636020
gerritbot added a comment.
Oct 30 2020, 2:24 PM
Change 637704 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud-vps instance backups: ignore clouddb-services project
https://gerrit.wikimedia.org/r/637704
gerritbot added a comment.
Oct 30 2020, 2:27 PM
Change 637704 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps instance backups: ignore clouddb-services project
https://gerrit.wikimedia.org/r/637704
gerritbot added a comment.
Oct 30 2020, 3:34 PM
Change 637713 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs instance backups: move more projects from cloudvirt1024 to cloudvirt1021
https://gerrit.wikimedia.org/r/637713
gerritbot added a comment.
Oct 30 2020, 3:36 PM
Change 637713 merged by Andrew Bogott:
[operations/puppet@production] wmcs instance backups: move more projects from cloudvirt1024 to cloudvirt1021
https://gerrit.wikimedia.org/r/637713
aborrero updated the task description. (Show Details)
Nov 4 2020, 11:04 AM
gerritbot added a comment.
Dec 7 2020, 6:48 PM
Change 646797 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move some project backups to cloudvirt1025
https://gerrit.wikimedia.org/r/646797
Andrew added a comment.
Dec 7 2020, 6:49 PM
We aren't going to have space to backup everything in two places; right now I'm working on spreading the backups onto 1025-1028; after that I'll probably declare this finished.
gerritbot added a comment.
Dec 7 2020, 6:51 PM
Change 646797 merged by Andrew Bogott:
[operations/puppet@production] Move some project backups to cloudvirt1025
https://gerrit.wikimedia.org/r/646797
gerritbot added a comment.
Dec 8 2020, 3:20 PM
Change 647003 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud-vps VM backups: exclude some more hostnames from backup
https://gerrit.wikimedia.org/r/647003
gerritbot added a comment.
Dec 8 2020, 10:28 PM
Change 647003 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps VM backups: exclude some more hostnames from backup
https://gerrit.wikimedia.org/r/647003
Andrew reassigned this task from Andrew to dcaro.
Dec 23 2020, 5:45 PM
Stashbot added a comment.
Dec 28 2020, 12:23 PM
Mentioned in SAL (#wikimedia-cloud) [2020-12-28T12:23:04Z] <arturo> icinga downtime cloudvirt1026 disk space check until january 5 (T260692)
gerritbot added a comment.
Dec 28 2020, 12:25 PM
Change 652182 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud: drop dumps project backups
https://gerrit.wikimedia.org/r/652182
gerritbot added a comment.
Dec 28 2020, 12:28 PM
Change 652182 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud: drop dumps project backups
https://gerrit.wikimedia.org/r/652182
Stashbot added a comment.
Dec 28 2020, 12:32 PM
Mentioned in SAL (#wikimedia-cloud) [2020-12-28T12:32:21Z] <arturo> stop doing backups for the dumps project https://gerrit.wikimedia.org/r/c/operations/puppet/+/652182 (T260692)
Stashbot added a comment.
Feb 3 2021, 9:59 AM
Mentioned in SAL (#wikimedia-cloud) [2021-02-03T09:59:20Z] <dcaro> Doing a full vm backup on cloudvirt1024 with the new script (T260692)
dcaro added a subtask: T273720: [ceph][rbd] Periodically cleanup dangling snapshots.
Feb 3 2021, 10:01 AM
dcaro added a subtask: T273723: [backups] Periodically cleanup non-handled backups.
Feb 3 2021, 10:07 AM
gerritbot added a comment.
Feb 3 2021, 10:54 AM
Change 661348 had a related patch set uploaded (by David Caro; owner: David Caro):
[operations/puppet@production] wmcs.backups: Use the wmcs-backup script for vms
https://gerrit.wikimedia.org/r/661348
gerritbot added a comment.
Feb 3 2021, 1:03 PM
Change 661348 merged by David Caro:
[operations/puppet@production] wmcs.backups: Use the wmcs-backup script for vms
https://gerrit.wikimedia.org/r/661348
dcaro closed this task as Resolved.
Jul 28 2021, 1:48 PM
dcaro closed subtask T267195: CloudVPS: improve VM backups to make them redundant and discoverable as Declined.
This can be considered done, there's some improvements left, but those can be tracked individually.
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike 3.0 (CC-BY-SA) unless otherwise noted; code licensed under GNU General Public License (GPL) or other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL