Page MenuHomePhabricator

db2094:3318 (sanitarium on codfw) needs recloning
Closed, ResolvedPublic

Description

db2094:3318 crashed and needs recloning.
The procedure would be:

  • stop mariadb@s8 and delete /srv/sqldata.s8
  • stop slave on its master, copy its coordinates, stop mysql and copy the content
  • Once the data is cloned run: systemctl set-environment MYSQLD_OPTS="--skip-slave-start" and start mariadb@s8
  • stop slave; reset slave all;
  • mysql_upgrade
  • Run redact_sanitarium to sanitize wikidatawiki
  • Configure replication
  • Start replication

Event Timeline

Marostegui triaged this task as Medium priority.May 27 2021, 5:51 AM
Marostegui moved this task from Triage to Ready on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2021-05-27T10:26:59Z] <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793

Mentioned in SAL (#wikimedia-operations) [2021-05-27T10:27:03Z] <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793

db2082 is db2094:s8's master:

root@db2082.codfw.wmnet[(none)]> stop slave;
Query OK, 0 rows affected (0.036 sec)

root@db2082.codfw.wmnet[(none)]> show master status;
+-------------------+-----------+--------------+------------------+
| File              | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+-----------+--------------+------------------+
| db2082-bin.007179 | 820956366 |              |                  |
+-------------------+-----------+--------------+------------------+
1 row in set (0.032 sec)

Running:
sudo transfer.py --type file --no-compress --no-encrypt --no-checksum db2082.codfw.wmnet:/srv/sqldata db2094.codfw.wmnet:/srv/sqldata.s8

Status:

  • Data copy from db2082 completed.
  • mysql_upgrade ran
  • redact_sanitarium.sh currently running.

redact_sanitarium.sh completed, and a quick check showed it had been successful.

db2082 and db2094:s8 are now both up and catching up on replication.

See email - s8 reported some tables that need to be dropped

See email - s8 reported some tables that need to be dropped

Done.

Looking at redact_sanitarium.sh, it doesn't do anything with private_tables. I guess that means this is a step that has to be run manually?

Normally what we do is: redact_sanitarium.sh -d wikidatawiki -S socket_path | mysql -S socket_path wikidatawiki

Normally what we do is: redact_sanitarium.sh -d wikidatawiki -S socket_path | mysql -S socket_path wikidatawiki

My point is that will only act on modules/role/files/mariadb/filtered_tables.txt, which isn't what's relevant here. check_private_data found tables that shouldn't exist; the ones listed in manifests/realm.pp:private_tables.

Ah yes, I misunderstood you. Yes, indeed, that's why we run check_private_data after data sanitization on new wikis, so we can also get those private tables deleted.

Ah yes, I misunderstood you. Yes, indeed, that's why we run check_private_data after data sanitization on new wikis, so we can also get those private tables deleted.

Ah hah. Ok, now i know what i forgot to do. Thanks!

To avoid redundancies, I think we should deprecate "redact_sanitarium.sh" and use the same script (check_private_data.py) for both checking and redacting. check_private_data.py can do almost everything that redact_sanitarium.sh can (including returning sql to be executed), except the triggers, while redact_sanitarium.sh cannot handle private dbs or tables.