Page MenuHomePhabricator

Upgrade s5 to Debian Buster and MariaDB 10.4
Closed, ResolvedPublic

Description

Steps to upgrade:

Please read the doc about procedure for more details.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Marostegui triaged this task as Medium priority.

This should be stalled for now and only to be done once we are happy with s6's performance/stability in around 3-4 weeks or so.

Marostegui changed the task status from Open to Stalled.May 20 2021, 11:12 AM

Change 693142 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Switchover codfw s5 backups from db2099 to db2101 (buster)

https://gerrit.wikimedia.org/r/693142

All s5 codfw buster hosts upgraded to 10.4.19

I am going to start working on this next week

Marostegui changed the task status from Stalled to Open.Jun 4 2021, 6:07 AM
Marostegui moved this task from Blocked to Ready on the DBA board.

Starting to upgrade eqiad hosts 10.4 hosts to the latest 10.4.19 uploaded

@jcrespo let me know if you want to upgrade db1150 to 10.4.19 or you want me to do it.

Change 698151 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db2113 to Buster and 10.4

https://gerrit.wikimedia.org/r/698151

Change 698151 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db2113 to Buster and 10.4

https://gerrit.wikimedia.org/r/698151

@jcrespo let me know if you want to upgrade db1150 to 10.4.19 or you want me to do it.

I can do it quickly.

Cool thanks!

MariaDB read only s4 Version 10.4.19-MariaDB, Uptime 188s, read_only: True, event_scheduler: True, 215.45 QPS, connection latency: 0.003659s, query latency: 0.000495s 	
MariaDB read only s5 Version 10.4.19-MariaDB, Uptime 154s, read_only: True, event_scheduler: True, 108.23 QPS, connection latency: 0.003970s, query latency: 0.000481s

Change 698157 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Switchover eqiad s5 backups from db1145 to db1150 (buster)

https://gerrit.wikimedia.org/r/698157

Change 698185 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1019: Depool clouddb1015

https://gerrit.wikimedia.org/r/698185

Change 698185 merged by Marostegui:

[operations/puppet@production] dbproxy1019: Depool clouddb1015

https://gerrit.wikimedia.org/r/698185

Mentioned in SAL (#wikimedia-operations) [2021-06-04T12:27:57Z] <marostegui> Upgrade mysql on clouddb1015 T283235

I am starting to upgrade all eqiad hosts, including clouddb* replicas

Mentioned in SAL (#wikimedia-operations) [2021-06-04T12:46:02Z] <marostegui> Upgrade mysql on clouddb1016 T283235

Change 698370 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2113: Disable notifications

https://gerrit.wikimedia.org/r/698370

Change 698370 merged by Marostegui:

[operations/puppet@production] db2113: Disable notifications

https://gerrit.wikimedia.org/r/698370

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2113.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106070512_marostegui_17172.log.

Change 698371 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbproxy1018: Depool clouddb1020

https://gerrit.wikimedia.org/r/698371

Change 698371 merged by Marostegui:

[operations/puppet@production] dbproxy1018: Depool clouddb1020

https://gerrit.wikimedia.org/r/698371

Completed auto-reimage of hosts:

['db2113.codfw.wmnet']

and were ALL successful.

Candidate master in codfw done (db2113) - checking its tables before proceeding with the master.
@jcrespo this can probably be pushed: https://gerrit.wikimedia.org/r/693142 anytime you like.

Mentioned in SAL (#wikimedia-operations) [2021-06-07T06:05:55Z] <marostegui> Upgrade mysql on dbstore1003 T283235

db2113 tables checked, all clean.

Change 693142 merged by Jcrespo:

[operations/puppet@production] dbbackups: Switchover codfw s5 backups from db2099 to db2101 (buster)

https://gerrit.wikimedia.org/r/693142

Change 698658 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2123: Disable notifications

https://gerrit.wikimedia.org/r/698658

Change 698658 merged by Marostegui:

[operations/puppet@production] db2123: Disable notifications

https://gerrit.wikimedia.org/r/698658

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2123.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106080459_marostegui_32562.log.

Change 698659 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db2123 to buster

https://gerrit.wikimedia.org/r/698659

Change 698659 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db2123 to buster

https://gerrit.wikimedia.org/r/698659

Completed auto-reimage of hosts:

['db2123.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db2123.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106080527_marostegui_5778.log.

Completed auto-reimage of hosts:

['db2123.codfw.wmnet']

and were ALL successful.

db2123 (master on codfw) reimaged to Buster. Checking its tables now.

sanitarium host (db1154) upgraded

Change 698715 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db1130 to Buster

https://gerrit.wikimedia.org/r/698715

Change 698715 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db1130 to Buster

https://gerrit.wikimedia.org/r/698715

db2123 finished checking its tables - all clean

Change 698907 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2123: Enable notifications

https://gerrit.wikimedia.org/r/698907

Change 698907 merged by Marostegui:

[operations/puppet@production] db2123: Enable notifications

https://gerrit.wikimedia.org/r/698907

Mentioned in SAL (#wikimedia-operations) [2021-06-09T10:04:23Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1130 T283235', diff saved to https://phabricator.wikimedia.org/P16337 and previous config saved to /var/cache/conftool/dbconfig/20210609-100423-marostegui.json

Change 698955 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1130: Disable notifications

https://gerrit.wikimedia.org/r/698955

Change 698955 merged by Marostegui:

[operations/puppet@production] db1130: Disable notifications

https://gerrit.wikimedia.org/r/698955

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1130.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106091007_marostegui_1368.log.

Completed auto-reimage of hosts:

['db1130.eqiad.wmnet']

and were ALL successful.

db1130 (candidate master) reimaged - checking tables now.

db1130 (candidate master) reimaged - checking tables now.

The check came back clean.

Change 698157 merged by Marostegui:

[operations/puppet@production] dbbackups: Switchover eqiad s5 backups from db1145 to db1150 (buster)

https://gerrit.wikimedia.org/r/698157

Merged the above patch to start generating eqiad backups from the buster host.

Marostegui changed the task status from Open to Stalled.Jun 11 2021, 4:45 AM

Waiting for the switchover to happen to be able to continue the rest of steps

Change 700725 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] dbbackups: Remove s5 (stretch) from backup sources

https://gerrit.wikimedia.org/r/700725

Change 700746 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] install_server: Reimage db1100 to Buster.

https://gerrit.wikimedia.org/r/700746

Marostegui changed the task status from Stalled to Open.Jun 22 2021, 5:13 AM

Change 700746 merged by Marostegui:

[operations/puppet@production] install_server: Reimage db1100 to Buster.

https://gerrit.wikimedia.org/r/700746

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1100.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106220603_marostegui_5451.log.

Completed auto-reimage of hosts:

['db1100.eqiad.wmnet']

and were ALL successful.

Old master upgraded to Buster and 10.4 - running mysqlcheck now.

db1100 had all the tables checked and clean. Started replication

db1100 has caught up - going to start repooling

Change 701013 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1100: Enable notifications

https://gerrit.wikimedia.org/r/701013

Change 701013 merged by Marostegui:

[operations/puppet@production] db1100: Enable notifications

https://gerrit.wikimedia.org/r/701013

This is all done, pending the backup clean up which has been scheduled for Monday by Jaime

Change 700725 merged by Jcrespo:

[operations/puppet@production] dbbackups: Remove s5 (stretch) from backup sources

https://gerrit.wikimedia.org/r/700725

Mentioned in SAL (#wikimedia-operations) [2021-06-28T08:19:01Z] <jynus> stop and remove db1145:s5 db2099:s5 T283235

Cleanup of s5 old backup sources should be done, and if everything went ok, instances should be no longer on icinga, tendril, grafana, but please double check!

Thanks! All good (also checked zarcillo)