Page MenuHomePhabricator

The Great Clean Up of Mailman2
Closed, ResolvedPublic

Description

Once T52864: Upgrade GNU Mailman from 2.1 to Mailman3 is done, we need to clean up lots of stuff:

  • Refactor puppet.
  • Disable mailman2
  • Remove the apache auth system and cut access to /private/ entierly
  • Remove emails from old archives (Note: mm3 upgrade sometimes, like 0.01% fails to upgrade a mail, should we clean them up manually and delete? Should we keep those? How to find them?
  • Delete all mm2 mail configs (with rmlist?)

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+15 -15
labs/privatemaster+11 -11
operations/puppetproduction+18 -18
operations/puppetproduction+178 -165
operations/puppetproduction+31 -57
operations/puppetproduction+0 -180
operations/puppetproduction+0 -25
operations/puppetproduction+3 -5
operations/puppetproduction+2 -2
operations/puppetproduction+0 -190
operations/puppetproduction+0 -12
operations/puppetproduction+2 -8 K
operations/puppetproduction+6 -64
operations/puppetproduction+0 -9
operations/puppetproduction+4 -36
operations/puppetproduction+2 -18
operations/puppetproduction+1 -22
operations/puppetproduction+0 -696
operations/puppetproduction+13 -13
operations/puppetproduction+5 -0
operations/puppetproduction+1 -1
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@jcrespo before we embark upon this cleanup, can we mark one of the backups of var-lib-mailman to be kept long term? Per https://wikitech.wikimedia.org/wiki/Bacula#Retention it seems the normal backups are only kept for 3 months - could we have one kept for a year or two? We're not actually ready for this yet, I just wanted to check if it was possible.

We cannot mark existing backup to be kept long term. But we can generate new backups on the archive schedule/pool, which will be retained for 5 years. If it is an old backup, we can recover it and backup with this new retention in the archive pool: https://wikitech.wikimedia.org/wiki/Bacula#Configured_Pools

Ack, we haven't deleted anything yet so creating a new backup should work. I'll ping you again once we're ready for that, thanks!

I'll ping you again once we're ready for that, thanks!

I think we are ready. Assigning as requested in IRC.

Change 694210 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mailman2: Generate a 5-year retention Archive backups of mailman

https://gerrit.wikimedia.org/r/694210

Change 694210 merged by Jcrespo:

[operations/puppet@production] mailman2: Generate a 5-year retention Archive backups of mailman

https://gerrit.wikimedia.org/r/694210

Change 694354 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mailman2: Disable temporarily production mailman2 backups

https://gerrit.wikimedia.org/r/694354

Change 694354 merged by Jcrespo:

[operations/puppet@production] mailman2: Disable temporarily production mailman2 backups

https://gerrit.wikimedia.org/r/694354

Ready when you are:

Run Backup job
JobName:  lists1001.wikimedia.org-Weekly-Mon-Archive-var-lib-mailman
Level:    Full
Client:   lists1001.wikimedia.org-fd
FileSet:  var-lib-mailman
Pool:     Archive (From Job resource)
Storage:  backup1001-FileStorageArchive (From Pool resource)
When:     2021-05-25 09:52:03
Priority: 10
OK to run? (yes/mod/no):

This is now scheduled, I will monitor and give a heads up when it finishes.

20 Gigabytes backed up so far (1/6th), it is normal a full backup takes a lot of time there due to many small files.

20.83 G lists1001.wikimedia.org-Weekly-Mon-Archive-var-lib-mailman is running

You can track the progress at: https://grafana.wikimedia.org/d/413r2vbWk/bacula?orgId=1&var-site=eqiad&var-job=lists1001.wikimedia.org-Weekly-Mon-Archive-var-lib-mailman

It will likely take until the Wednesday GMT morning to finish (2 337 957 files /61.73 G ongoing).

jcrespo moved this task from Ready to Done on the Data-Persistence-Backup board.

The backup finished, JobId=338470:

Elapsed time:           14 hours 53 mins 5 secs
SD Files Written:       6,117,027
SD Bytes Written:       155,327,853,326 (155.3 GB)

Do you want to do a (partial) test recovery before deleting to prove you can restore an arbitary subset of files?

Could you give me some meaningful restore operation (subdir). I am guessing recovering all will not be wanted because of time and space available. I can recover it elsewhere (not in place) and then you can compare with existing data. E.g. files for a list you are about to remove?

Something like /var/lib/mailman/archives/private/cloud-announce.mbox/cloud-announce.mbox and /var/lib/mailman/lists/cloud-announce/config.pck would be great.

The recovery as requested has been scheduled. FYI, there were other files inside /var/lib/mailman/lists/cloud-announce/, but were not marked for recovery.

The files recovered should appear soon (with full path) inside:
/var/tmp/bacula-restores

On a real case I would kill the ongoing backup to force the recovery start immediately, but as it is a test I would let ongoing backups to finish first, and the restore should eventually execute.

It should have ran already, can you check?

Termination:            Restore OK
root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/archives/private/cloud-announce.mbox# cmp cloud-announce.mbox /var/lib/mailman/archives/private/cloud.mbox/cloud.mbox 
cloud-announce.mbox /var/lib/mailman/archives/private/cloud.mbox/cloud.mbox differ: byte 6, line 1
root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/archives/private/cloud-announce.mbox# cmp cloud-announce.mbox /var/lib/mailman/archives/private/cloud-announce.mbox/cloud-announce.mbox 
root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/archives/private/cloud-announce.mbox#

Archive looks good but config differs:

root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/lists/cloud-announce# cmp config.pck /var/lib/mailman/lists/cloud-announce/config.pck
config.pck /var/lib/mailman/lists/cloud-announce/config.pck differ: byte 46, line 2

Can't say why

The backups ran yesterday, could have it changed since then? Is there a human readable way to see what changed?

Technically no, we disabled the list a while ago and now http requests to it would be redirected to mm3 but it's possible for example due to periodic clean up of the held messages, or other things. Can't say for sure. I tried cmp -l file1.bin file2.bin | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' but that didn't give me anything useful TBH.

root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/archives/private/cloud-announce.mbox# cmp cloud-announce.mbox /var/lib/mailman/archives/private/cloud.mbox/cloud.mbox 
cloud-announce.mbox /var/lib/mailman/archives/private/cloud.mbox/cloud.mbox differ: byte 6, line 1
root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/archives/private/cloud-announce.mbox# cmp cloud-announce.mbox /var/lib/mailman/archives/private/cloud-announce.mbox/cloud-announce.mbox 
root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/archives/private/cloud-announce.mbox#

Archive looks good but config differs:

root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/lists/cloud-announce# cmp config.pck /var/lib/mailman/lists/cloud-announce/config.pck
config.pck /var/lib/mailman/lists/cloud-announce/config.pck differ: byte 46, line 2

Can't say why

After some *python2* magic (sys.path.append) and copy and pasting from stackoverflow, here's a diff of the two files:
{P16243}

and it looks fine to me. Maybe the byte difference is from internal changes to the serialization format, like set() or dictionary order changing when mailman opened and closed the file?

Macro such-data:

Then it's good. Let's clean up 🧹

This helps clarify it was certainly not some bit-flipping-on-the-wire kind of corruption in our backup system, which would impact all of bacula jobs. Thanks for looking into it. My guess is some global changes could impact local config. There is a few criticisms to do for bacula, but so far in terms of storage and retrieval it was very reliable. Thank you.

Change 697631 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/puppet@production] mailman: Absent mm2 script files

https://gerrit.wikimedia.org/r/697631

Change 697632 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/puppet@production] mailman: Drop mm2 scripts

https://gerrit.wikimedia.org/r/697632

Mentioned in SAL (#wikimedia-operations) [2021-06-01T17:23:42Z] <Amir1> starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists (T282303)

Change 697634 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/puppet@production] mailman: Absent configuration files of mailman2 and make package absent

https://gerrit.wikimedia.org/r/697634

Change 697635 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/puppet@production] mailman: Drop absented files and packages

https://gerrit.wikimedia.org/r/697635

Change 697637 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/puppet@production] backup: Drop mm2 exclude backups

https://gerrit.wikimedia.org/r/697637

Change 697638 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/puppet@production] mailman: Drop cgi in apache and access to private/

https://gerrit.wikimedia.org/r/697638

Change 697631 merged by Legoktm:

[operations/puppet@production] mailman: Absent mm2 script files and their systemd timers

https://gerrit.wikimedia.org/r/697631

Change 697632 merged by Legoktm:

[operations/puppet@production] mailman: Drop mm2 scripts

https://gerrit.wikimedia.org/r/697632

Change 697638 merged by Legoktm:

[operations/puppet@production] mailman: Drop cgi in apache and access to private/

https://gerrit.wikimedia.org/r/697638

Change 697634 merged by Legoktm:

[operations/puppet@production] mailman: Absent configuration files of mailman2 and make package absent

https://gerrit.wikimedia.org/r/697634

Now that the mailman2 package is gone, if we need to unpickle a config file to look at it we'll need to install MM2 in a container locally or something. Not a huge issue, just something to keep in mind. It would've been an issue anyways when we switched to bullseye / a new VM.

Maybe with virtualenv?

for example from the source code but that'll be "fun"

Mentioned in SAL (#wikimedia-operations) [2021-06-05T15:21:21Z] <Amir1> delete mbox files of group D and E in mm2 (T282303)

Mentioned in SAL (#wikimedia-operations) [2021-06-05T16:16:11Z] <Amir1> deleting all private archives of mm2. All are inaccessible now (T282303)

Change 698306 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[operations/puppet@production] mailman: Drop lists3 role

https://gerrit.wikimedia.org/r/698306

Mentioned in SAL (#wikimedia-operations) [2021-06-09T02:56:43Z] <Amir1> clean up of the rest of mbox files (except arbcom) (T282303)

Change 697635 merged by Legoktm:

[operations/puppet@production] mailman: Drop absented files and packages

https://gerrit.wikimedia.org/r/697635

Change 698306 merged by Legoktm:

[operations/puppet@production] mailman: Drop lists3 role

https://gerrit.wikimedia.org/r/698306

Change 697637 merged by Legoktm:

[operations/puppet@production] backup: Simplify Mailman backups

https://gerrit.wikimedia.org/r/697637

Change 716077 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mailman: Drop listinfo files

https://gerrit.wikimedia.org/r/716077

Change 716077 merged by Legoktm:

[operations/puppet@production] mailman: Drop listinfo files

https://gerrit.wikimedia.org/r/716077

Mentioned in SAL (#wikimedia-operations) [2021-09-08T00:00:00Z] <legoktm> legoktm@lists1001:~$ sudo rm -rf /etc/mailman # cleanup as part of 4869d91b0be / T282303

Change 719484 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mailman: Remove absented file definitions

https://gerrit.wikimedia.org/r/719484

Change 719484 merged by Legoktm:

[operations/puppet@production] mailman: Remove absented file definitions

https://gerrit.wikimedia.org/r/719484

Change 720374 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mailman: Remove mailman2 config file

https://gerrit.wikimedia.org/r/720374

Change 720374 merged by Legoktm:

[operations/puppet@production] mailman: Remove mailman2 config file

https://gerrit.wikimedia.org/r/720374

Change 721811 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] snapshot: Change URL of xmldatadumps-l from mailman2 to mailman3

https://gerrit.wikimedia.org/r/721811

Change 721811 merged by ArielGlenn:

[operations/puppet@production] snapshot: Change URL of xmldatadumps-l from mailman2 to mailman3

https://gerrit.wikimedia.org/r/721811

Change 723673 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] admin: Deprecate mailman-admins group

https://gerrit.wikimedia.org/r/723673

Change 723674 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mailman: More mailman2 clean ups

https://gerrit.wikimedia.org/r/723674

Change 723673 merged by Legoktm:

[operations/puppet@production] admin: Deprecate mailman-admins group

https://gerrit.wikimedia.org/r/723673

Change 723674 merged by Legoktm:

[operations/puppet@production] mailman: More mailman2 clean ups

https://gerrit.wikimedia.org/r/723674

Change 725435 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mailman3: Drop profile::mailman3

https://gerrit.wikimedia.org/r/725435

Change 725436 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mailman: Drop mailman module and move them to profile::lists

https://gerrit.wikimedia.org/r/725436

After these two patches we have two more things to do and we can call this done:

  • Split modules/profile/manifests/lists.pp for web, monitoring and ferm
  • Rename profile::mailman3 hiera variables.

Change 725435 merged by Giuseppe Lavagetto:

[operations/puppet@production] mailman3: Drop profile::mailman3

https://gerrit.wikimedia.org/r/725435

Change 725436 merged by Legoktm:

[operations/puppet@production] mailman: Drop mailman module and move them to profile::lists

https://gerrit.wikimedia.org/r/725436

Change 731286 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] lists: Split ferm and monitoring of profile::lists

https://gerrit.wikimedia.org/r/731286

Change 731286 merged by Ladsgroup:

[operations/puppet@production] lists: Split ferm and monitoring of profile::lists

https://gerrit.wikimedia.org/r/731286

Change 736866 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] mailman: rename public hiera keys

https://gerrit.wikimedia.org/r/736866

Change 736866 merged by Ladsgroup:

[operations/puppet@production] mailman: rename public hiera keys

https://gerrit.wikimedia.org/r/736866

Change 736873 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[labs/private@master] Rename mailman3 hiera keys

https://gerrit.wikimedia.org/r/736873

Change 736873 merged by Ladsgroup:

[labs/private@master] Rename mailman3 hiera keys

https://gerrit.wikimedia.org/r/736873

Change 736876 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] lists: Use the new private hiera keys

https://gerrit.wikimedia.org/r/736876

Change 736876 merged by Ladsgroup:

[operations/puppet@production] lists: Use the new private hiera keys

https://gerrit.wikimedia.org/r/736876

Ladsgroup claimed this task.

This is done. There are always more refactoring to clean up but the most parts are done.