Page MenuHomePhabricator

Upgrade Cassandra to latest 3.x (3.11.13)
Closed, ResolvedPublic

Description

The current/standard version for Cassandra in production is 3.11.4, which at this point is quite old (February 2019). We've recently tested 3.11.11 on AQS to solve a memory leak (the memory leak was solved), and consensus seemed to be that we could roll it out to the remaining clusters. We've since encountered T309736: Cassandra nodetool not working after openjdk-8 upgrade to 8u332, a fix for which has been included in 3.11.13. The delta between 3.11.11 & 3.11.13 is relatively small, I propose we reset on 3.11.13, first upgrading AQS, and then the remaining clusters.

  • AQS (3.11.11 -> 3.11.13)
  • Session store
  • RESTBase

Event Timeline

Change 803903 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] Pin Cassandra 3.11.13 as 'dev'

https://gerrit.wikimedia.org/r/803903

Change 803903 merged by Btullis:

[operations/puppet@production] Pin Cassandra 3.11.13 as 'dev'

https://gerrit.wikimedia.org/r/803903

Unfortunately, there is an error from puppet after merging.

Info: Applying configuration version '(ed759d43e5) Btullis - Pin Cassandra 3.11.13 as 'dev''
Notice: /Stage[main]/Cassandra/Service[cassandra]/ensure: ensure changed 'running' to 'stopped' (corrective)
Error: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install cassandra-tools=3.11.13' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Version '3.11.13' for 'cassandra-tools' was not found
Error: /Stage[main]/Cassandra/Package[cassandra-tools]/ensure: change from 'purged' to '3.11.13' failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install cassandra-tools=3.11.13' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Version '3.11.13' for 'cassandra-tools' was not found (corrective)
Info: Class[Cassandra]: Unscheduling all events on Class[Cassandra]

It looks like we need the cassandra-tools package now, but the new version of this isn't available.

btullis@aqs1010:~$ apt-cache policy cassandra cassandra-tools
cassandra:
  Installed: 3.11.13
  Candidate: 3.11.13
  Version table:
 *** 3.11.13 1001
       1001 http://apt.wikimedia.org/wikimedia buster-wikimedia/component/cassandradev amd64 Packages
        100 /var/lib/dpkg/status
     2.2.6-wmf5 1001
       1001 http://apt.wikimedia.org/wikimedia buster-wikimedia/main amd64 Packages
cassandra-tools:
  Installed: (none)
  Candidate: 3.11.11
  Version table:
     3.11.11 1001
       1001 http://apt.wikimedia.org/wikimedia buster-wikimedia/component/cassandradev amd64 Packages

There is a 3.13.13 cassandra-tools package, but it seems to have missed the upload; I've reopened T309878: Import Debian package of Cassandra 3.11.13 as 'dev' version

Mentioned in SAL (#wikimedia-operations) [2022-06-08T18:38:11Z] <urandom> uprading aqs1010.eqiad.wmnet to Cassandra 3.11.13 (canary) -- T309896

Change 803978 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] Set HOSTNAME as a custom Cassandra logback field

https://gerrit.wikimedia.org/r/803978

Change 803978 merged by Herron:

[operations/puppet@production] Set HOSTNAME as a custom Cassandra logback field

https://gerrit.wikimedia.org/r/803978

Mentioned in SAL (#wikimedia-operations) [2022-06-08T19:58:51Z] <urandom> restarting Cassandra, aqs1010-{a,b}, to apply logback work-around -- T309896

Change 815822 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] Merge Cassandra 3.11.13 configuration changes

https://gerrit.wikimedia.org/r/815822

Change 815822 merged by MVernon:

[operations/puppet@production] Merge Cassandra 3.11.13 configuration changes

https://gerrit.wikimedia.org/r/815822

Mentioned in SAL (#wikimedia-operations) [2022-07-21T14:39:25Z] <mvernon@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: merging upstream config changes T309896 - mvernon@cumin1001

Mentioned in SAL (#wikimedia-operations) [2022-07-21T16:58:04Z] <mvernon@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: merging upstream config changes T309896 - mvernon@cumin1001

Change 816719 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] hieradata: make sessionstore2001 a 3.11.13 canary

https://gerrit.wikimedia.org/r/816719

Change 816719 merged by MVernon:

[operations/puppet@production] hieradata: make sessionstore2001 a 3.11.13 canary

https://gerrit.wikimedia.org/r/816719

Mentioned in SAL (#wikimedia-operations) [2022-07-25T14:38:07Z] <mvernon@cumin2002> START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary T309896 - mvernon@cumin2002

Mentioned in SAL (#wikimedia-operations) [2022-07-25T14:44:19Z] <mvernon@cumin2002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary T309896 - mvernon@cumin2002

Change 816805 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] Do not assign CASSANDRA_LOG_DIR from environment config

https://gerrit.wikimedia.org/r/816805

Change 817798 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] hieradata: move all of sessionstore to 3.11.13

https://gerrit.wikimedia.org/r/817798

Change 816805 merged by MVernon:

[operations/puppet@production] Do not assign CASSANDRA_LOG_DIR from environment config

https://gerrit.wikimedia.org/r/816805

Mentioned in SAL (#wikimedia-operations) [2022-07-27T15:46:08Z] <urandom> restarting Cassandra, sessionstore2001, to restore on-disk logging -- T309896

Mentioned in SAL (#wikimedia-operations) [2022-07-27T15:51:38Z] <urandom> rolling Cassandra restart, aqs2001-2012, to restore on-disk logging -- T309896

Mentioned in SAL (#wikimedia-operations) [2022-07-27T16:31:10Z] <urandom> rolling Cassandra restart, aqs1010-1015, to restore on-disk logging -- T309896

Change 817798 merged by MVernon:

[operations/puppet@production] hieradata: move all of sessionstore to 3.11.13

https://gerrit.wikimedia.org/r/817798

Mentioned in SAL (#wikimedia-operations) [2022-07-28T14:46:05Z] <mvernon@cumin2002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: upgrade to 3.11.13 T309896 - mvernon@cumin2002

Mentioned in SAL (#wikimedia-operations) [2022-07-28T15:22:13Z] <mvernon@cumin2002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: upgrade to 3.11.13 T309896 - mvernon@cumin2002

Change 819062 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] hieradata: make restbase1016 a 3.11.13 canary

https://gerrit.wikimedia.org/r/819062

Change 819062 merged by MVernon:

[operations/puppet@production] hieradata: make restbase1016 a 3.11.13 canary

https://gerrit.wikimedia.org/r/819062

Mentioned in SAL (#wikimedia-operations) [2022-08-01T15:29:17Z] <mvernon@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001

Mentioned in SAL (#wikimedia-operations) [2022-08-01T15:39:38Z] <mvernon@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001

Change 819578 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] Hieradata: move restbase prod to 3.11.13

https://gerrit.wikimedia.org/r/819578

Change 819578 merged by MVernon:

[operations/puppet@production] Hieradata: move restbase prod to 3.11.13

https://gerrit.wikimedia.org/r/819578

Mentioned in SAL (#wikimedia-operations) [2022-08-11T13:52:03Z] <mvernon@cumin2002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: upgrade to 3.11.13 T309896 - mvernon@cumin2002

Mentioned in SAL (#wikimedia-operations) [2022-08-11T16:30:21Z] <mvernon@cumin2002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: upgrade to 3.11.13 T309896 - mvernon@cumin2002

Mentioned in SAL (#wikimedia-operations) [2022-08-11T16:35:29Z] <mvernon@cumin2002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002

Mentioned in SAL (#wikimedia-operations) [2022-08-11T19:20:50Z] <mvernon@cumin2002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002

MatthewVernon claimed this task.
MatthewVernon updated the task description. (Show Details)
MatthewVernon updated the task description. (Show Details)