Page MenuHomePhabricator

Restbase migration to Buster
Closed, ResolvedPublic

Description

Much of our restbase infrastructure is still running on stretch. We need to migrate to buster asap, ideally while creating a process we can use for the coming upgrades to bullseye.

Initial work:

  • Investigate whether we can reuse data-persistence work to migrate restbase nodes without wiping their disks

If not, automate (via a cookbook? possibly overkill) the decommissioning of Restbase nodes while we reimage them

  • Ensure that our reimaging process will allow us to rejoin nodes that have been reimaged without issue (a decommission will stop nodes in the cluster talking to this

Process:
Reimage: sudo cookbook sre.hosts.reimage --os buster -t T295375 $HOSTNAME -c
Fix permissions after reimage (check by hand - some gids are varying between hosts): sudo find /srv/ -user envoy -exec chown cassandra:cassandra {} \;
Workarounds for T300177:

  • sudo -u deploy-service /usr/bin/scap deploy-local --repo cassandra/twcs
  • sudo -u deploy-service /usr/bin/scap deploy-local --repo restbase/deploy
  • sudo -u deploy-service /usr/bin/scap deploy-local --repo cassandra/logstash-logback-encoder

Enable cassandra: sudo touch /etc/cassandra-{a,b,c}/service-enabled && for i in a b c; do sudo service cassandra-${i} start; done
Once host has rejoined the clusters (check compactions and potential instance-data checks failing on cluster), on puppetmaster: sudo confctl select name=HOSTNAME set/pooled=yes

Host migration:
Hosts marked with * are affected by T299652 and require BIOS upgrades to be reimaged

  • restbase1016.eqiad.wmnet
  • restbase1017.eqiad.wmnet
  • restbase1018.eqiad.wmnet
  • restbase1019.eqiad.wmnet*
  • restbase1020.eqiad.wmnet*
  • restbase1021.eqiad.wmnet*
  • restbase1022.eqiad.wmnet*
  • restbase1023.eqiad.wmnet*
  • restbase1024.eqiad.wmnet*
  • restbase1025.eqiad.wmnet*
  • restbase1026.eqiad.wmnet*
  • restbase1027.eqiad.wmnet*
  • restbase1028.eqiad.wmnet
  • restbase1029.eqiad.wmnet
  • restbase1030.eqiad.wmnet

restbase2009.codfw.wmnet

  • restbase2010.codfw.wmnet
  • restbase2011.codfw.wmnet
  • restbase2012.codfw.wmnet
  • restbase2013.codfw.wmnet
  • restbase2014.codfw.wmnet
  • restbase2015.codfw.wmnet
  • restbase2016.codfw.wmnet
  • restbase2017.codfw.wmnet*
  • restbase2018.codfw.wmnet
  • restbase2019.codfw.wmnet*
  • restbase2020.codfw.wmnet*
  • restbase2021.codfw.wmnet
  • restbase2022.codfw.wmnet
  • restbase2023.codfw.wmnet
  • restbase2024.codfw.wmnet
  • restbase2025.codfw.wmnet
  • restbase2026.codfw.wmnet

To be replaced by new instances

  • restbase-dev1004.eqiad.wmnet
  • restbase-dev1005.eqiad.wmnet
  • restbase-dev1006.eqiad.wmnet
  • deployment-restbase03.deployment-prep.eqiad1.wikimedia.cloud
  • restbase-dev1004.eqiad.wmnet
  • restbase-dev1004.eqiad.wmnet
  • restbase-dev1004.eqiad.wmnet
  • deployment-restbase04

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1019.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase2026.codfw.wmnet with OS buster completed:

  • restbase2026 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase2026.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-backend"}
{"restbase2026.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-ssl"}
{"restbase2026.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201211435_hnowlan_25630_restbase2026.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1019.eqiad.wmnet with OS buster executed with errors:

  • restbase1019 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1019.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1019.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1019.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1020.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1020.eqiad.wmnet with OS buster executed with errors:

  • restbase1020 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1020.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1020.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1020.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1021.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1021.eqiad.wmnet with OS buster executed with errors:

  • restbase1021 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1021.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1021.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1021.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1022.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1022.eqiad.wmnet with OS buster executed with errors:

  • restbase1022 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1022.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1022.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1022.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1023.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1023.eqiad.wmnet with OS buster executed with errors:

  • restbase1023 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1023.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1023.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1023.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1024.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1024.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1024.eqiad.wmnet with OS buster executed with errors:

  • restbase1024 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1024.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1024.eqiad.wmnet with OS buster executed with errors:

  • restbase1024 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1025.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1025.eqiad.wmnet with OS buster executed with errors:

  • restbase1025 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1025.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1025.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1025.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1026.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1026.eqiad.wmnet with OS buster executed with errors:

  • restbase1026 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1026.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1026.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1026.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1027.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1027.eqiad.wmnet with OS buster executed with errors:

  • restbase1027 (FAIL)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1027.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1027.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1027.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes

  • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1028.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1028.eqiad.wmnet with OS buster completed:

  • restbase1028 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1028.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1028.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1028.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201241240_hnowlan_18719_restbase1028.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1029.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1029.eqiad.wmnet with OS buster completed:

  • restbase1029 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1029.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1029.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1029.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201241326_hnowlan_1536_restbase1029.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1030.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1030.eqiad.wmnet with OS buster completed:

  • restbase1030 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1030.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1030.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1030.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201241401_hnowlan_1900_restbase1030.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1019.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1019.eqiad.wmnet with OS buster completed:

  • restbase1019 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1019.eqiad.wmnet": {"weight": 10, "pooled": "no"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1019.eqiad.wmnet": {"weight": 10, "pooled": "no"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1019.eqiad.wmnet": {"weight": 10, "pooled": "no"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201261721_hnowlan_16490_restbase1019.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=no
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=no
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=no

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1020.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1020.eqiad.wmnet with OS buster completed:

  • restbase1020 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1020.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1020.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1020.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201281227_hnowlan_30102_restbase1020.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes

  • Updated Netbox data from PuppetDB
hnowlan updated the task description. (Show Details)

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1021.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1021.eqiad.wmnet with OS buster completed:

  • restbase1021 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1021.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1021.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1021.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201281450_hnowlan_28655_restbase1021.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1022.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1022.eqiad.wmnet with OS buster completed:

  • restbase1022 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1022.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1022.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1022.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201281547_hnowlan_5950_restbase1022.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1023.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1023.eqiad.wmnet with OS buster completed:

  • restbase1023 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1023.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1023.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1023.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201281641_hnowlan_16165_restbase1023.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1024.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1024.eqiad.wmnet with OS buster completed:

  • restbase1024 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1024.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1024.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1024.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201281717_hnowlan_22550_restbase1024.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1025.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1025.eqiad.wmnet with OS buster completed:

  • restbase1025 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1025.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1025.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}
{"restbase1025.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201311120_hnowlan_26850_restbase1025.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1026.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1026.eqiad.wmnet with OS buster completed:

  • restbase1026 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1026.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1026.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1026.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201311202_hnowlan_15015_restbase1026.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase1027.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase1027.eqiad.wmnet with OS buster completed:

  • restbase1027 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase1027.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-ssl"}
{"restbase1027.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase"}
{"restbase1027.eqiad.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=eqiad,cluster=restbase,service=restbase-backend"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201311246_hnowlan_26234_restbase1027.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is not optimal, downtime not removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-ssl' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=eqiad,cluster=restbase,service=restbase-backend' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase2017.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase2017.codfw.wmnet with OS buster completed:

  • restbase2017 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase2017.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase"}
{"restbase2017.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-backend"}
{"restbase2017.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202011717_hnowlan_25220_restbase2017.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=codfw,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase2019.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1001 for host restbase2020.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase2020.codfw.wmnet with OS buster completed:

  • restbase2020 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase2020.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase"}
{"restbase2020.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-backend"}
{"restbase2020.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202081106_hnowlan_28815_restbase2020.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=codfw,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1001 for host restbase2019.codfw.wmnet with OS buster completed:

  • restbase2019 (WARN)
    • Downtimed on Icinga
    • Set pooled=inactive for the following services on confctl:

{"restbase2019.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase"}
{"restbase2019.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-backend"}
{"restbase2019.codfw.wmnet": {"weight": 10, "pooled": "yes"}, "tags": "dc=codfw,cluster=restbase,service=restbase-ssl"}

  • Disabled Puppet
  • Removed from Puppet and PuppetDB if present
  • Deleted any existing Puppet certificate
  • Removed from Debmonitor if present
  • Forced PXE for next reboot
  • Host rebooted via IPMI
  • Host up (Debian installer)
  • Host up (new fresh buster OS)
  • Generated Puppet certificate
  • Signed new Puppet certificate
  • Run Puppet in NOOP mode to populate exported resources in PuppetDB
  • Found Nagios_host resource for this host in PuppetDB
  • Downtimed the new host on Icinga
  • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202081058_hnowlan_17784_restbase2019.out
  • Checked BIOS boot parameters are back to normal
  • Rebooted
  • Automatic Puppet run was successful
  • Forced a re-check of all Icinga services for the host
  • Icinga status is optimal
  • Icinga downtime removed
  • Services in confctl are not automatically pooled, to restore the previous state you have to run the following commands:

sudo confctl select 'dc=codfw,cluster=restbase,service=restbase' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-backend' set/pooled=yes
sudo confctl select 'dc=codfw,cluster=restbase,service=restbase-ssl' set/pooled=yes

  • Updated Netbox data from PuppetDB

Change 761006 merged by Hnowlan:

[operations/puppet@production] restbase: remove restbase2010

https://gerrit.wikimedia.org/r/761006

Change 764801 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] restbase: add deployment-restbase04

https://gerrit.wikimedia.org/r/764801

Change 764825 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[mediawiki/services/restbase/deploy@master] Add deployment-restbase04

https://gerrit.wikimedia.org/r/764825

Change 764825 merged by Hnowlan:

[mediawiki/services/restbase/deploy@master] Add deployment-restbase04

https://gerrit.wikimedia.org/r/764825

Change 765313 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] restbase: disable redundant jmx config

https://gerrit.wikimedia.org/r/765313

Change 764801 merged by Hnowlan:

[operations/puppet@production] restbase: add deployment-restbase04

https://gerrit.wikimedia.org/r/764801

Change 765313 merged by Hnowlan:

[operations/puppet@production] restbase: disable redundant jmx config

https://gerrit.wikimedia.org/r/765313

Change 765532 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] restbase: change endpoint for deployment-prep to new host

https://gerrit.wikimedia.org/r/765532

Change 766082 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] restbase-dev: change role of new hosts

https://gerrit.wikimedia.org/r/766082

Change 765532 merged by Hnowlan:

[operations/puppet@production] restbase: change endpoint for deployment-prep to new host

https://gerrit.wikimedia.org/r/765532

Change 766602 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/mediawiki-config@master] Move to buster restbase host

https://gerrit.wikimedia.org/r/766602

Change 766602 merged by jenkins-bot:

[operations/mediawiki-config@master] [Beta Cluster] LabsServices: Move to buster restbase host

https://gerrit.wikimedia.org/r/766602

Mentioned in SAL (#wikimedia-releng) [2022-05-09T21:43:05Z] <James_F> Beta Cluster: Shutting down old deployment-restbase03 instance for T295375

hnowlan updated the task description. (Show Details)

Data persistence will handle the reimaging of the restbase-dev cluster on the new codfw hardware.

hnowlan claimed this task.

Change 766082 abandoned by Hnowlan:

[operations/puppet@production] restbase-dev: create new codfw cluster, replace old eqiad cluster

Reason:

These hosts aren't used for restbase-dev any more

https://gerrit.wikimedia.org/r/766082