Page MenuHomePhabricator

purgeParserCache.php should not take over 24 hours for its daily run
Closed, ResolvedPublic

Description

This follows-up from T280605#7070345

Background

This purgeParserCache.php script is scheduled to run every night, to prune ParserCache blobs that then are beyond their expiry date. Our blobs generally have an expiry date of 30 days, which means we expect this nightly run will remove the blobs we stored roughly 30 days ago on that day.

Problem

As of writing, the purge script now takes over a week to complete a single run. This has numerous consequences:

  1. Due to taking 10 days to run, we are effectively having to accomodate blobs for upto 37 days rather than 30-31 days. This means more space is occupied by default.
  2. Each run is taking longer than the last. This means the backlog is growing, and thus the space consumption as well. E.g. I expect we'll soon be accomodating blobs for 40 days, etc. There is no obvious end, other than a full disk.
  3. With the backlog growing, the run will take even longer, as it has to iterate more blobs to purge them. See point 2.
What we know

The script ran daily up until 19 April 2020 (last year):

  • 19 April 2020: Run took 1 day (the last time this happened).
  • 24 April 2020: Run took 3 days.
  • 26 Jun 2020: Run took 4 days.
  • 28 Nov 2020: Run took 6 days.
  • 13 Apr 2021: Run took 7 days.
  • 7 May 2021: Run was aborted after 5 days during which it completed 81% (2 May 01:51 - 7 May 05:23)
  • 13 May 2021: The current is at 26% which has taken 116 hours so far (May 7 05:23 - May 13 01:42). Extrapolating I would expect 446 hours in total, or 18 days?

(Caveat: The script's percentage meter assumes all shards are equal which they probably aren't.)

The script iterates over each parser cache database host, then each parser cache table on that host, and then selects/deletes in batches of 100 rows with a past expiry date. (code 1, code 2). It waits for a 500 ms sleep between each such batch.

This sleep was introduced in 2016 to mitigate T150124: Parsercache purging can create lag.

In 2016, the first mitigation used 100ms, which was then increased to 500ms.

Note that this task is not about product features adding more blobs to the ParserCache in general. I believe as it stands, the problem this task is about, will continue to worses even if our demand remains constant going forward. However, that increased demand in the last 12 months (see T280605) has pushed us over an invisible tipping point that has cascaded into this self-regressing situation.

Data

1# [19:15 UTC] krinkle at mwmaint1002.eqiad.wmnet in /var/log/mediawiki/mediawiki_job_parser_cache_purging
2# $ fgrep "Deleting" syslog.log
3Apr 10 01:00:03 mwmaint1002 mediawiki_job_parser_cache_purging[146571]: Deleting objects expiring before 01:00, 10 April 2020
4Apr 11 03:56:45 mwmaint1002 mediawiki_job_parser_cache_purging[240365]: Deleting objects expiring before 03:56, 11 April 2020
5Apr 12 08:12:32 mwmaint1002 mediawiki_job_parser_cache_purging[72089]: Deleting objects expiring before 08:12, 12 April 2020
6Apr 13 07:29:56 mwmaint1002 mediawiki_job_parser_cache_purging[2846]: Deleting objects expiring before 07:29, 13 April 2020
7Apr 14 02:38:09 mwmaint1002 mediawiki_job_parser_cache_purging[210524]: Deleting objects expiring before 02:38, 14 April 2020
8Apr 15 01:00:03 mwmaint1002 mediawiki_job_parser_cache_purging[83431]: Deleting objects expiring before 01:00, 15 April 2020
9Apr 16 01:00:06 mwmaint1002 mediawiki_job_parser_cache_purging[66915]: Deleting objects expiring before 01:00, 16 April 2020
10Apr 17 01:15:19 mwmaint1002 mediawiki_job_parser_cache_purging[12688]: Deleting objects expiring before 01:15, 17 April 2020
11Apr 18 01:00:01 mwmaint1002 mediawiki_job_parser_cache_purging[204947]: Deleting objects expiring before 01:00, 18 April 2020
12Apr 19 13:07:02 mwmaint1002 mediawiki_job_parser_cache_purging[182094]: Deleting objects expiring before 13:07, 19 April 2020
13Apr 21 20:59:35 mwmaint1002 mediawiki_job_parser_cache_purging[250916]: Deleting objects expiring before 20:59, 21 April 2020
14Apr 24 19:47:46 mwmaint1002 mediawiki_job_parser_cache_purging[101035]: Deleting objects expiring before 19:47, 24 April 2020
15Apr 27 23:09:18 mwmaint1002 mediawiki_job_parser_cache_purging[206784]: Deleting objects expiring before 23:09, 27 April 2020
16Apr 30 16:53:08 mwmaint1002 mediawiki_job_parser_cache_purging[205448]: Deleting objects expiring before 16:53, 30 April 2020
17May 3 00:37:27 mwmaint1002 mediawiki_job_parser_cache_purging[42048]: Deleting objects expiring before 00:37, 3 May 2020
18May 5 05:13:29 mwmaint1002 mediawiki_job_parser_cache_purging[20388]: Deleting objects expiring before 05:13, 5 May 2020
19May 7 16:30:25 mwmaint1002 mediawiki_job_parser_cache_purging[19386]: Deleting objects expiring before 16:30, 7 May 2020
20May 9 21:54:40 mwmaint1002 mediawiki_job_parser_cache_purging[259760]: Deleting objects expiring before 21:54, 9 May 2020
21May 12 03:47:45 mwmaint1002 mediawiki_job_parser_cache_purging[13882]: Deleting objects expiring before 03:47, 12 May 2020
22May 14 04:12:29 mwmaint1002 mediawiki_job_parser_cache_purging[164959]: Deleting objects expiring before 04:12, 14 May 2020
23May 16 02:39:31 mwmaint1002 mediawiki_job_parser_cache_purging[32112]: Deleting objects expiring before 02:39, 16 May 2020
24May 18 02:41:09 mwmaint1002 mediawiki_job_parser_cache_purging[196859]: Deleting objects expiring before 02:41, 18 May 2020
25May 20 07:12:25 mwmaint1002 mediawiki_job_parser_cache_purging[167348]: Deleting objects expiring before 07:12, 20 May 2020
26May 22 13:35:47 mwmaint1002 mediawiki_job_parser_cache_purging[186721]: Deleting objects expiring before 13:35, 22 May 2020
27May 25 06:03:37 mwmaint1002 mediawiki_job_parser_cache_purging[149790]: Deleting objects expiring before 06:03, 25 May 2020
28May 28 13:49:01 mwmaint1002 mediawiki_job_parser_cache_purging[190184]: Deleting objects expiring before 13:49, 28 May 2020
29Jun 1 05:22:29 mwmaint1002 mediawiki_job_parser_cache_purging[41516]: Deleting objects expiring before 05:22, 1 June 2020
30Jun 3 14:01:01 mwmaint1002 mediawiki_job_parser_cache_purging[161911]: Deleting objects expiring before 14:01, 3 June 2020
31Jun 5 23:24:21 mwmaint1002 mediawiki_job_parser_cache_purging[19792]: Deleting objects expiring before 23:24, 5 June 2020
32Jun 9 18:49:46 mwmaint1002 mediawiki_job_parser_cache_purging[228933]: Deleting objects expiring before 18:49, 9 June 2020
33Jun 13 06:37:17 mwmaint1002 mediawiki_job_parser_cache_purging[33396]: Deleting objects expiring before 06:37, 13 June 2020
34Jun 16 06:29:11 mwmaint1002 mediawiki_job_parser_cache_purging[84032]: Deleting objects expiring before 06:29, 16 June 2020
35Jun 19 06:14:53 mwmaint1002 mediawiki_job_parser_cache_purging[92680]: Deleting objects expiring before 06:14, 19 June 2020
36Jun 22 17:19:58 mwmaint1002 mediawiki_job_parser_cache_purging[88936]: Deleting objects expiring before 17:19, 22 June 2020
37Jun 26 11:15:00 mwmaint1002 mediawiki_job_parser_cache_purging[156464]: Deleting objects expiring before 11:15, 26 June 2020
38Jun 30 09:46:00 mwmaint1002 mediawiki_job_parser_cache_purging[205484]: Deleting objects expiring before 09:46, 30 June 2020
39Jul 3 17:42:57 mwmaint1002 mediawiki_job_parser_cache_purging[221089]: Deleting objects expiring before 17:42, 3 July 2020
40Jul 6 21:08:00 mwmaint1002 mediawiki_job_parser_cache_purging[119347]: Deleting objects expiring before 21:08, 6 July 2020
41Jul 9 17:05:20 mwmaint1002 mediawiki_job_parser_cache_purging[36442]: Deleting objects expiring before 17:05, 9 July 2020
42Jul 12 08:39:17 mwmaint1002 mediawiki_job_parser_cache_purging[128375]: Deleting objects expiring before 08:39, 12 July 2020
43Jul 14 22:00:43 mwmaint1002 mediawiki_job_parser_cache_purging[100580]: Deleting objects expiring before 22:00, 14 July 2020
44Jul 17 08:42:25 mwmaint1002 mediawiki_job_parser_cache_purging[24302]: Deleting objects expiring before 08:42, 17 July 2020
45Jul 20 01:20:05 mwmaint1002 mediawiki_job_parser_cache_purging[64583]: Deleting objects expiring before 01:20, 20 July 2020
46Jul 22 23:32:32 mwmaint1002 mediawiki_job_parser_cache_purging[253940]: Deleting objects expiring before 23:32, 22 July 2020
47Jul 26 00:31:13 mwmaint1002 mediawiki_job_parser_cache_purging[40242]: Deleting objects expiring before 00:31, 26 July 2020
48Jul 29 06:18:16 mwmaint1002 mediawiki_job_parser_cache_purging[191576]: Deleting objects expiring before 06:18, 29 July 2020
49Aug 1 08:44:23 mwmaint1002 mediawiki_job_parser_cache_purging[29297]: Deleting objects expiring before 08:44, 1 August 2020
50Aug 4 10:56:00 mwmaint1002 mediawiki_job_parser_cache_purging[173135]: Deleting objects expiring before 10:56, 4 August 2020
51Aug 7 22:14:41 mwmaint1002 mediawiki_job_parser_cache_purging[39261]: Deleting objects expiring before 22:14, 7 August 2020
52Aug 11 06:58:47 mwmaint1002 mediawiki_job_parser_cache_purging[258229]: Deleting objects expiring before 06:58, 11 August 2020
53Aug 14 07:15:29 mwmaint1002 mediawiki_job_parser_cache_purging[54964]: Deleting objects expiring before 07:15, 14 August 2020
54Aug 17 06:40:56 mwmaint1002 mediawiki_job_parser_cache_purging[100453]: Deleting objects expiring before 06:40, 17 August 2020
55Aug 20 01:53:44 mwmaint1002 mediawiki_job_parser_cache_purging[231161]: Deleting objects expiring before 01:53, 20 August 2020
56Aug 22 22:47:23 mwmaint1002 mediawiki_job_parser_cache_purging[185114]: Deleting objects expiring before 22:47, 22 August 2020
57Aug 25 20:14:44 mwmaint1002 mediawiki_job_parser_cache_purging[175794]: Deleting objects expiring before 20:14, 25 August 2020
58Aug 28 21:03:15 mwmaint1002 mediawiki_job_parser_cache_purging[38037]: Deleting objects expiring before 21:03, 28 August 2020
59Sep 1 04:16:18 mwmaint1002 mediawiki_job_parser_cache_purging[254032]: Deleting objects expiring before 04:16, 1 September 2020
60Oct 28 01:00:07 mwmaint1002 mediawiki_job_parser_cache_purging[85535]: Deleting objects expiring before 01:00, 28 October 2020
61Nov 1 13:27:46 mwmaint1002 mediawiki_job_parser_cache_purging[111100]: Deleting objects expiring before 13:27, 1 November 2020
62Nov 5 03:02:09 mwmaint1002 mediawiki_job_parser_cache_purging[125718]: Deleting objects expiring before 03:02, 5 November 2020
63Nov 8 06:52:23 mwmaint1002 mediawiki_job_parser_cache_purging[210463]: Deleting objects expiring before 06:52, 8 November 2020
64Nov 11 04:04:49 mwmaint1002 mediawiki_job_parser_cache_purging[46811]: Deleting objects expiring before 04:04, 11 November 2020
65Nov 13 18:05:08 mwmaint1002 mediawiki_job_parser_cache_purging[23469]: Deleting objects expiring before 18:05, 13 November 2020
66Nov 16 08:24:39 mwmaint1002 mediawiki_job_parser_cache_purging[25291]: Deleting objects expiring before 08:24, 16 November 2020
67Nov 18 11:56:21 mwmaint1002 mediawiki_job_parser_cache_purging[130565]: Deleting objects expiring before 11:56, 18 November 2020
68Nov 21 18:06:30 mwmaint1002 mediawiki_job_parser_cache_purging[145145]: Deleting objects expiring before 18:06, 21 November 2020
69Nov 27 06:01:34 mwmaint1002 mediawiki_job_parser_cache_purging[149815]: Deleting objects expiring before 06:01, 27 November 2020
70Dec 3 06:56:15 mwmaint1002 mediawiki_job_parser_cache_purging[253031]: Deleting objects expiring before 06:56, 3 December 2020
71Dec 9 03:39:20 mwmaint1002 mediawiki_job_parser_cache_purging[144320]: Deleting objects expiring before 03:39, 9 December 2020
72Dec 14 05:59:59 mwmaint1002 mediawiki_job_parser_cache_purging[111951]: Deleting objects expiring before 05:59, 14 December 2020
73Dec 18 08:23:34 mwmaint1002 mediawiki_job_parser_cache_purging[49615]: Deleting objects expiring before 08:23, 18 December 2020
74Dec 21 09:08:16 mwmaint1002 mediawiki_job_parser_cache_purging[149733]: Deleting objects expiring before 09:08, 21 December 2020
75Dec 24 17:45:30 mwmaint1002 mediawiki_job_parser_cache_purging[13185]: Deleting objects expiring before 17:45, 24 December 2020
76Dec 27 23:09:22 mwmaint1002 mediawiki_job_parser_cache_purging[236394]: Deleting objects expiring before 23:09, 27 December 2020
77Dec 31 04:37:15 mwmaint1002 mediawiki_job_parser_cache_purging[209639]: Deleting objects expiring before 04:37, 31 December 2020
78Jan 3 10:12:26 mwmaint1002 mediawiki_job_parser_cache_purging[9945]: Deleting objects expiring before 10:12, 3 January 2021
79Jan 6 22:31:39 mwmaint1002 mediawiki_job_parser_cache_purging[145468]: Deleting objects expiring before 22:31, 6 January 2021
80Jan 10 11:40:57 mwmaint1002 mediawiki_job_parser_cache_purging[37493]: Deleting objects expiring before 11:40, 10 January 2021
81Jan 14 08:21:18 mwmaint1002 mediawiki_job_parser_cache_purging[104818]: Deleting objects expiring before 08:21, 14 January 2021
82Jan 18 10:04:29 mwmaint1002 mediawiki_job_parser_cache_purging[201674]: Deleting objects expiring before 10:04, 18 January 2021
83Jan 23 02:11:11 mwmaint1002 mediawiki_job_parser_cache_purging[131431]: Deleting objects expiring before 02:11, 23 January 2021
84Jan 28 03:31:21 mwmaint1002 mediawiki_job_parser_cache_purging[17858]: Deleting objects expiring before 03:31, 28 January 2021
85Feb 2 01:50:58 mwmaint1002 mediawiki_job_parser_cache_purging[155799]: Deleting objects expiring before 01:50, 2 February 2021
86Feb 7 02:45:33 mwmaint1002 mediawiki_job_parser_cache_purging[99534]: Deleting objects expiring before 02:45, 7 February 2021
87Feb 12 01:00:02 mwmaint1002 mediawiki_job_parser_cache_purging[54460]: Deleting objects expiring before 01:00, 12 February 2021
88Feb 17 08:06:44 mwmaint1002 mediawiki_job_parser_cache_purging[109741]: Deleting objects expiring before 08:06, 17 February 2021
89Feb 22 07:49:48 mwmaint1002 mediawiki_job_parser_cache_purging[213157]: Deleting objects expiring before 07:49, 22 February 2021
90Feb 27 15:09:30 mwmaint1002 mediawiki_job_parser_cache_purging[334]: Deleting objects expiring before 15:09, 27 February 2021
91Mar 5 00:05:42 mwmaint1002 mediawiki_job_parser_cache_purging[7982]: Deleting objects expiring before 00:05, 5 March 2021
92Mar 10 15:46:03 mwmaint1002 mediawiki_job_parser_cache_purging[66538]: Deleting objects expiring before 15:46, 10 March 2021
93Mar 16 19:45:27 mwmaint1002 mediawiki_job_parser_cache_purging[100664]: Deleting objects expiring before 19:45, 16 March 2021
94Mar 23 09:25:13 mwmaint1002 mediawiki_job_parser_cache_purging[120833]: Deleting objects expiring before 09:25, 23 March 2021
95Mar 30 08:56:02 mwmaint1002 mediawiki_job_parser_cache_purging[92844]: Deleting objects expiring before 08:56, 30 March 2021
96Apr 6 02:31:55 mwmaint1002 mediawiki_job_parser_cache_purging[185950]: Deleting objects expiring before 02:31, 6 April 2021
97Apr 13 00:41:29 mwmaint1002 mediawiki_job_parser_cache_purging[17162]: Deleting objects expiring before 00:41, 13 April 2021
98Apr 19 18:08:28 mwmaint1002 mediawiki_job_parser_cache_purging[151047]: Deleting objects expiring before 18:08, 19 April 2021
99Apr 26 02:20:46 mwmaint1002 mediawiki_job_parser_cache_purging[90978]: Deleting objects expiring before 02:20, 26 April 2021
100May 2 01:51:04 mwmaint1002 mediawiki_job_parser_cache_purging[216627]: Deleting objects expiring before 01:51, 2 May 2021
101May 7 05:23:37 mwmaint1002 mediawiki_job_parser_cache_purging[56878]: Deleting objects expiring before 05:23, 16 May 2021

1kormat@mwmaint1002:/var/log/mediawiki/mediawiki_job_parser_cache_purging(0:0)$ grep -E 'Deleting|Done' syslog.log
2...
3Jun 6 06:04:16 mwmaint1002 mediawiki_job_parser_cache_purging[99066]: Deleting objects expiring before Sun, 06 Jun 2021 06:04:16 GMT
4Jun 8 00:23:24 mwmaint1002 mediawiki_job_parser_cache_purging[99066]: Done
5Jun 8 00:23:24 mwmaint1002 mediawiki_job_parser_cache_purging[165238]: Deleting objects expiring before Tue, 08 Jun 2021 00:23:24 GMT
6Jun 9 04:15:58 mwmaint1002 mediawiki_job_parser_cache_purging[165238]: Done
7Jun 9 04:15:59 mwmaint1002 mediawiki_job_parser_cache_purging[77891]: Deleting objects expiring before Wed, 09 Jun 2021 04:15:59 GMT
8Jun 9 19:40:29 mwmaint1002 mediawiki_job_parser_cache_purging[77891]: Done
9Jun 10 01:00:04 mwmaint1002 mediawiki_job_parser_cache_purging[91500]: Deleting objects expiring before Thu, 10 Jun 2021 01:00:04 GMT
10Jun 10 12:59:34 mwmaint1002 mediawiki_job_parser_cache_purging[91500]: Done
11Jun 11 01:00:03 mwmaint1002 mediawiki_job_parser_cache_purging[259200]: Deleting objects expiring before Fri, 11 Jun 2021 01:00:03 GMT
12Jun 11 14:47:30 mwmaint1002 mediawiki_job_parser_cache_purging[259200]: Done
13Jun 12 01:00:03 mwmaint1002 mediawiki_job_parser_cache_purging[220493]: Deleting objects expiring before Sat, 12 Jun 2021 01:00:03 GMT
14Jun 12 16:11:22 mwmaint1002 mediawiki_job_parser_cache_purging[220493]: Done
15Jun 13 01:00:04 mwmaint1002 mediawiki_job_parser_cache_purging[181658]: Deleting objects expiring before Sun, 13 Jun 2021 01:00:04 GMT
16Jun 13 15:54:56 mwmaint1002 mediawiki_job_parser_cache_purging[181658]: Done
17Jun 14 01:00:04 mwmaint1002 mediawiki_job_parser_cache_purging[146932]: Deleting objects expiring before Mon, 14 Jun 2021 01:00:04 GMT
18Jun 14 16:02:26 mwmaint1002 mediawiki_job_parser_cache_purging[146932]: Done
19Jun 15 01:00:04 mwmaint1002 mediawiki_job_parser_cache_purging[106082]: Deleting objects expiring before Tue, 15 Jun 2021 01:00:04 GMT
20Jun 15 17:36:49 mwmaint1002 mediawiki_job_parser_cache_purging[106082]: Done
21Jun 16 01:00:03 mwmaint1002 mediawiki_job_parser_cache_purging[65769]: Deleting objects expiring before Wed, 16 Jun 2021 01:00:03 GMT
22Jun 16 23:03:33 mwmaint1002 mediawiki_job_parser_cache_purging[65769]: Done
23Jun 17 01:00:00 mwmaint1002 mediawiki_job_parser_cache_purging[26955]: Deleting objects expiring before Thu, 17 Jun 2021 01:00:00 GMT
24Jun 18 13:42:19 mwmaint1002 mediawiki_job_parser_cache_purging[26955]: Done
25Jun 18 13:42:20 mwmaint1002 mediawiki_job_parser_cache_purging[250903]: Deleting objects expiring before Fri, 18 Jun 2021 13:42:20 GMT
26Jun 21 08:50:35 mwmaint1002 mediawiki_job_parser_cache_purging[250903]: Done
27Jun 21 08:50:36 mwmaint1002 mediawiki_job_parser_cache_purging[62918]: Deleting objects expiring before Mon, 21 Jun 2021 08:50:36 GMT
28Jun 23 13:46:56 mwmaint1002 mediawiki_job_parser_cache_purging[118261]: Deleting objects expiring before Wed, 23 Jun 2021 13:46:56 GMT
29Jun 28 06:59:28 mwmaint1002 mediawiki_job_parser_cache_purging[118261]: Done

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Optimize of pc1009 (and replica) finished.

Disk space usage went from 3.9TB to 2TB.

image.png (911×1 px, 61 KB)

It ran from 2021-06-10T10:28:26+00:00 to 2021-06-10T18:39:59+00:00 (~8.2h)

TODO:

Change 700047 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] pc1010: Move back to pc1, now that the maintenance is done.

https://gerrit.wikimedia.org/r/700047

Mentioned in SAL (#wikimedia-operations) [2021-06-16T09:40:37Z] <kormat@deploy1002> Synchronized wmf-config/db-eqiad.php: Repool pc1009 as pc3 primary T282761 (duration: 00m 59s)

Change 700047 merged by Kormat:

[operations/puppet@production] pc1010: Move back to pc1, now that the maintenance is done.

https://gerrit.wikimedia.org/r/700047

Mentioned in SAL (#wikimedia-operations) [2021-06-16T09:47:19Z] <kormat> truncating all pc* tables on pc1010 T282761

Krinkle raised the priority of this task from High to Unbreak Now!.Jun 21 2021, 6:05 PM

The queries that purgeParserCache sends are... weird, and more expensive than necessary.

E.g.:

DELETE /* SqlBagOStuff::deleteServerObjectsExpiringBefore */ FROM `pc134` WHERE (exptime < '20210621085036') AND keyname IN ('commonswiki:pcache:idoptions:71610166', 'skwiki:pcache:idhash:24341-0!canonical!responsiveimages=0', 'enwiki:pcache:idhash:36722755-0!canonical', 'commonswiki:pcache:idoptions:99175521', 'wikidatawiki:pcache:idhash:4725620-0!termboxVersion=22!userlang=de!wb=3!responsiveimages=0', 'commonswiki:pcache:idhash:90369939-0!canonical', 'itwiki:pcache:idoptions:4202454', 'enwiki:pcache:idhash:24256794-0!canonical', 'commonswiki:pcache:idoptions:86194370', 'ruwiki:pcache:idhash:2589508-0!canonical!responsiveimages=0', 'enwiki:pcache:idhash:24062326-0!canonical', 'commonswiki:pcache:idhash:4880597-0!canonical', 'commonswiki:pcache:idoptions:93201262', 'ruwiki:pcache:idhash:4136023-0!canonical!tmh-videojs', 'enwiki:pcache:idoptions:24251270', 'cebwiki:pcache:idhash:1691532-0!canonical!responsiveimages=0', 'commonswiki:pcache:idhash:35750027-1!userlang=ms!wb=3', 'wikidatawiki:pcache:idoptions:94489761', 'enwiki:pcache:idoptions:24046838', 'enwiki:pcache:idhash:30211084-0!numberheadings=1!thumbsize=3!userlang=de', 'enwiki:pcache:idoptions:58959373', 'enwiki:pcache:idoptions:24226310', 'enwiki:pcache:idhash:58893293-0!canonical', 'arzwiki:pcache:idhash:1124829-0!canonical!responsiveimages=0', 'enwiki:pcache:idhash:24249770-0!canonical', 'commonswiki:pcache:idhash:94786177-1!userlang=fr!wb=3', 'shwiki:pcache:idhash:3802096-0!canonical', 'wikidatawiki:pcache:idoptions:78749276', 'enwiki:pcache:idhash:58921469-0!canonical', 'enwiki:pcache:idhash:58947293-0!canonical', 'enwiki:pcache:idhash:58934889-0!canonical', 'enwiki:pcache:idhash:24213614-0!canonical', 'commonswiki:pcache:idhash:19004326-0!canonical', 'enwiki:pcache:idoptions:24058866', 'wikidatawiki:pcache:idhash:99437196-0!termboxVersion=22!wb=3!responsiveimages=0', 'ruwiktionary:pcache:idhash:786222-0!canonical!responsiveimages=0', 'zhwiki:pcache:idhash:2378231-0!canonical!zh-hant', 'zhwikisource:pcache:idhash:294844-0!canonical!zh-hans', 'commonswiki:pcache:idhash:71700898-0!wb=3', 'enwiki:pcache:idhash:58976469-0!canonical', 'frwiktionary:pcache:idoptions:2439030', 'wikidatawiki:pcache:idhash:872666-0!termboxVersion=1!wb=3', 'enwiki:pcache:idoptions:58903361', 'specieswiki:pcache:idhash:1409592-0!canonical!responsiveimages=0', 'srwiki:pcache:idhash:865929-0!userlang=sr-ec!sr-ec!responsiveimages=0', 'enwiki:pcache:idhash:24182714-0!canonical', 'itwiki:pcache:idoptions:6381937', 'zh_yuewiki:pcache:idhash:11417-0!canonical', 'wikidatawiki:pcache:idoptions:43834192', 'zhwiktionary:pcache:idhash:295216-0!userlang=zh-hans!zh-hans', 'commonswiki:pcache:idoptions:101845492', 'wikidatawiki:pcache:idoptions:93935984', 'commonswiki:pcache:idhash:101916466-0!thumbsize=7!wb=3', 'elwiktionary:pcache:idhash:100726-0!canonical', 'commonswiki:pcache:idoptions:41926142', 'commonswiki:pcache:idhash:7372137-1!wb=3', 'enwiki:pcache:idoptions:24050602', 'enwiki:pcache:idhash:58921933-0!canonical', 'enwiki:pcache:idoptions:24065738', 'tawiki:pcache:idoptions:11149', 'elwiki:pcache:idoptions:149075', 'commonswiki:pcache:idhash:71508282-0!wb=3', 'enwiki:pcache:idoptions:24243434', 'enwiki:pcache:idoptions:58887593', 'trwiki:pcache:idhash:2760344-0!canonical', 'commonswiki:pcache:idhash:52816220-1!userlang=nl!wb=3', 'wikidatawiki:pcache:idoptions:27666338', 'wikidatawiki:pcache:idoptions:41567642', 'wikidatawiki:pcache:idhash:6884476-0!termboxVersion=22!wb=3!responsiveimages=0', 'wikidatawiki:pcache:idhash:74232927-0!termboxVersion=22!wb=3!responsiveimages=0', 'commonswiki:pcache:idhash:71702234-0!wb=3', 'commonswiki:pcache:idhash:30485637-1!wb=3', 'enwiki:pcache:idhash:24128150-0!canonical', 'enwiki:pcache:idhash:24159162-0!canonical', 'commonswiki:pcache:idhash:17722176-1!userlang=vi!wb=3', 'enwiki:pcache:idhash:24147430-0!canonical', 'commonswiki:pcache:idhash:4825927-0!wb=3', 'urwiki:pcache:idhash:121007-1!canonical', 'dawiktionary:pcache:idhash:14302-0!canonical!responsiveimages=0', 'enwiki:pcache:idhash:58943453-0!canonical', 'commonswiki:pcache:idhash:75308860-0!wb=3', 'commonswiki:pcache:idhash:19827132-0!userlang=de!responsiveimages=0', 'trwiki:pcache:idoptions:1430198', 'enwiki:pcache:idoptions:24127406', 'zhwiki:pcache:idhash:4885077-0!canonical!zh-tw', 'enwikisource:pcache:idoptions:3339960', 'commonswiki:pcache:idhash:71563990-0!wb=3', 'enwiki:pcache:idhash:24292610-0!canonical', 'frwiki:pcache:idoptions:12645537', 'commonswiki:pcache:idhash:71612598-0!canonical', 'ruwikinews:stable-pcache:idoptions:466463', 'zhwiki:pcache:idhash:6024646-0!canonical!zh-hk', 'enwikinews:pcache:idoptions:2331881', 'hywiki:pcache:idhash:801694-0!canonical', 'enwikinews:pcache:idhash:2331437-0!canonical', 'commonswiki:pcache:idhash:29136537-0!wb=3!responsiveimages=0', 'commonswiki:pcache:idhash:71483186-0!canonical', 'enwikisource:pcache:idoptions:3323604', 'commonswiki:pcache:idhash:85429047-0!wb=3!responsiveimages=0', 'commonswiki:pcache:idhash:98070845-0!wb=3') /* 39c789a879b3532550b5cacffd54efa6 pc1007 parsercache 2s */

What i was expecting is:

DELETE FROM `pc134` WHERE (exptime < '20210621085036') limit 1000;

I did some digging into the codebase and (assuming my lack of php knowledge didn't lead me astray ;) it looks like the function that does the actual work is here:

https://doc.wikimedia.org/mediawiki-core/master/php/SqlBagOStuff_8php_source.html#l00737

On each pc* table it runs a select to find rows that should be expired, and then deletes them, instead of just getting the db to do it in one go.

In terms of expensiveness, looking at the table schema, it is going to be down to the amount of rows each transaction will be deleting. From what I can see with some similar queries, looks like the current delete is cheaper (it scans less rows), but it is only selecting 81 rows, so we'd need to check with a table that would be deleting more.

What I am really surprise is to see we actually run a SELECT + ORDER BY before each delete, I guess that's needed to narrow the delete itself?
The SELECT isn't super expensive, but it is definitely some overhead:

 root@pc2010.codfw.wmnet[parsercache]> explain  select keyname,exptime  from `pc069` WHERE (exptime < '20210621085036') order by exptime asc;
+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | pc069 | range | exptime       | exptime | 6       | NULL | 322778 | Using where; Using index |
+------+-------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
1 row in set (0.034 sec)

root@pc2010.codfw.wmnet[parsercache]>select keyname,exptime  from `pc100` WHERE (exptime < '20210621085036') order by exptime asc;

139901 rows in set (0.358 sec)

If we are doing the SELECT to build the DELETE from above, we should probably look into the amount of time the SELECT+DELETE takes in comparison with just a DELETE FROM XXX WHERE (exptime < 'XXX') limit XXX;

The above tests were done in a 100% idle host by the way.

Current status: the purge has been running for ~28h, and is at 15% progress. That extrapolates to ~8 days total for this run.

@Krinkle @aaron do you happen to know if there is a specific reason to do the deletes in the above way?

FWIW, looking at the code above I am under the impression that the SELECT+DELETE is only needed to gather some stats for progress estimate: the SELECTed rows are read to find out the minimum and maximum expiry being deleted. Perhaps, if the SELECT has a noticeable overhead, it would be acceptable to have less accurate statistics?

Also, I believe that the exptime < 'xxx' condition in the DELETE is unnecessary. The keynames come from the SELECT query above, which uses the same comparison. Is it there to ensure atomicity? Or can it be removed?

What I was expecting is:

DELETE FROM `pc134` WHERE (exptime < '20210621085036') limit 1000;

Do you happen to know if there is a specific reason to do the deletes in the above way?

It's hard to be sure since the code is so old, but based on today's best practices I would retroactively say that for production code our policy is generally to only perform writes or deletes by primary key. This is for performance, but also to be determistic and safe under various other conditions. I suppose it also helps with row-based replication. I understand parser caches could be exempt from this, as they are special in various ways, but from a code perspective we currently treat it like we treat anything else in production.

The exptime < 'xxx' condition in the DELETE is unnecessary. […] Or can it be removed?

No. The keys are canonical/determinitic, based on page IDs. And at any moment, when a wiki page is accessed relating to the expired row, MediaWiki may regenerate the value and replace the row in question, with a new value and new expiry. Thus the exptime condition needs to be there, I think.


Summary of meeting between @aaron, @LSobanski and myself yesterday:

In the last two months, we did forced purges in this task over each of the servers while they were in a depooled state. There we found that it consistently took about 1.8 minutes for a single unlimited DELETE statement to return, where each DELETE was removing about 400k rows. These were DELETE statements for a single table, with only a WHERE clause for the exptime and nothing else. In total, it consistently took about 8 hours to purge a single server, where each of the 255 tables were purged back-to-back with no batching, no limit, no sleeps. Script at P16060.

It seems from the various samples we took, that there are typically around 200k rows that expiry on any given day in any given table (255 tables, 3 servers). In practice currently we typically are having to remove 400k rows or more per table, because we're several days behind thus more expired data has to be removed at once. This is plausibly confirmed by emperical data from MySQL stats, where we see 500 write queries per second, which amounts to (500*3600*24)/255, or 170k new rows per-day per-table.

Aaron helped us analyse MySQL stats and EXPLAIN operations during the select and delete operations and we did not find anything concerning there. The handler stats show read_next and delete both counting about 150 per second. Specifically, there were not additional reads directly or indirectly happening during the delete operation, which could perhaps have suggested improper index use etc. Handler stats for the SELECT queries such as SELECT keyname,exptime FROM pc130 WHERE exptime < '20210622000000' AND exptime >= '20210618134451' ORDER BY exptime ASC LIMIT 100;:

wikiadmin@10.64.16.20(parsercache)> show session status LIKE 'handler_read%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Handler_read_first       | 0     |
| Handler_read_key         | 1     |
| Handler_read_last        | 0     |
| Handler_read_next        | 99    |
| Handler_read_prev        | 0     |

The explain for the delete looked at follows:

| hiwiki:pcache:idhash:1082758-0!canonical                          | 2021-06-18 13:45:26 |          9126 |
+-------------------------------------------------------------------+---------------------+---------------+
30 rows in set (0.00 sec)

wikiadmin@10.64.16.20(parsercache)> EXPLAIN EXTENDED DELETE FROM pc130 WHERE exptime < '20210622000000' AND keyname IN ('hiwiki:pcache:idhash:1082758-0!canonical', 'zhwiki:pcache:idhash:1204182-0!canonical!zh-tw', 'commonswiki:pcache:idhash:79476467-1!userlang=zh!wb=3', 'hiwiki:pcache:idoptions:1057118');
+------+-------------+-------+-------+-----------------+---------+---------+------+------+----------+-------------+
| id   | select_type | table | type  | possible_keys   | key     | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+-------+-----------------+---------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | pc130 | range | keyname,exptime | keyname | 257     | NULL | 4    |   100.00 | Using where |
+------+-------------+-------+-------+-----------------+---------+---------+------+------+----------+-------------+
1 row in set (0.00 sec)

We did a basic calculation on how much time might be wasted in sleeps. Given 3 servers, 255 tables, 170k-400k rows to delete in each, batch size of 100, and a sleep of 0.5s, that amounts to (((3*255*170000)/100)*0.5)/3600/24 or... 7 days just for the sleeps? I don't think that can be right since I've used the lower end of the range of rows to expect, and afaik over the past 12 months there were plenty of weeks during which we also had the same insert/expire rate of 500 replace queries per second, and yet had purge runs complete in under 24 hours (for example MySQL stats shows in June 2020 we had more or less the same insert rate as today, but the task description shows purge runs of 3-4 days for that same time).. So... I'm probably doing something wrong here.

I laid out three broad ideas for how to move forward:

  1. Reduce sleep duration (or increase batch size, effectively the same, sleep probably preferred). Also, we can improve it such that the sleep is an interval that includes the time for the delete query, thus representing more accurately the underlying use case, and naturally reducing idle time when queries take longer, and increasing it when they are faster.
  2. Remove sleep entirely. This is something we probably don't want to do right since we still replicate to Codfw. But, as part of T277831, it semes like we're likely to stop doing replication once we're active-active, with the only remaining use until then being to have a warm standby for unplanned switchovers. Perhaps it's worth the operational benefit to stop replication and purge-sleeps now, and then try to keep them warm by other means (or to go without it until we're active-active).
  3. Purge servers concurrently instead of one at a time (e.g. extra parameter for db server name, or spawn sub processes; non-trivial).

Also, to recap the reason for the sleeps. This was added in 2016 per T150124 to mitigate replication lag in the secondary datacentre. As I understand it, it's not that we think the delete statements themselves are lag-prone. These are only use primary keys, and are naturally spaced out by the time it takes for the primary DB to process the delete. The queries are all issues by a single process in a single thread, no concurrenty or other gracefulness to worry about. However, while the delete statements themselves might not cause lag, we do have to consider the regular production traffic which involves lots of inserts (500 per second), and those are more-or-less unthrottled based on user demand. So the sleeps are a way to avoid congesting this more important events in the replication stream with deletes.

Now, having said that, it seems that at least today, the delete operations themselves are fairly time-consuming already, so perhaps the sleeps no longer are as needed as they used to be. Although it's worth trying to figure out why they take as long as they do. The above didn't give us anything concerning about the queries, the indexes etc., but the reality is that it still took 8 hours unthrottled on a depooled server to delete the data in question. If the problem isn't in the index, the schema, or the queries, then perhaps there's something at the hardware or mysql tuning side that perhaps offers a clue for making these operations faster. Aaron mentioned two thing that may be worth looking into:

  1. Threshold for page splitting/merging.
  2. Threshold for inlining blobs.

I don't know anything about these though, beyond what the mysql manual says about them, but I'll document it here for posterity.

The plan for today was to start by reducing the sleep and then run it again to see how much it helps.


I was looking at some of the host stats with @Gilles earlier today, and he asked what the main bottleneck is on the MySQL side. E.g. is it maxing out CPU, or maxing out disk. If neither, then what is holding it up? We looked at the MySQL stats during the manual purge on 25 May and correlated them with the Host overview: pc1007 and it seemed that the bottleneck is disk I/O. During this unthrottled purge, disk utilization was around 90%. Perhaps this is obvious, but I thought it was interesting nonetheless.

Change 700957 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] mediawiki: Reduce purgeParserCache.php sleep from 500ms to 200

https://gerrit.wikimedia.org/r/700957

>>>! In T282761#7169546, @LSobanski wrote:

Do you happen to know if there is a specific reason to do the deletes in the above way?

It's hard to be sure since the code is so old, but based on today's best practices I would retroactively say that for production code our policy is generally to only perform writes or deletes by primary key. This is for performance, but also to be determistic and safe under various other conditions. I suppose it also helps with row-based replication. I understand parser caches could be exempt from this, as they are special in various ways, but from a code perspective we currently treat it like we treat anything else in production.

Does this mean it's not an option or can we consider an exception for parser cache? It looks like a promising way forward.

Is the hypothesis that using by-exptime instead of by-primarykey queries for deletion would be notably faster and/or that the select queries are our bottleneck? I did not get that impression. It seemed like the deletes would be equally fast, or faster using the current primary key method, with the main time being spent on disk I/O to delete the data itself. Also taking into account that our manual purge run last month did already use this proposed by-exptime method, and that also took 8 hours per server.

But yes, we could rewrite the code to no longer batch by a limited number of rows, but instead batch by e.g. 10 min timestamp chunks (without row count limit?), or batch just blindly with LIMIT 100 applied to an otherwise complete DELETE for all expired rows. If your team suspect that the difference in row selection coudl be a primary contributor to it taking longer to purge and that changing it might bring us under 24h, we could prioritise that refactor. Do you think that?

I'm not sure of my exact motivation for using the primary key in 53d267b3dce25f7b9c7b4216456ad3850ea5035e instead of DELETE with LIMIT, but most likely it was done on the theory that it would reduce replication lag. The exptime index is not used for anything else, so it might not be in memory when the script runs. Loading primary keys with SELECT avoids replicating a hit on a cold index.

Also, according to the MySQL manual, DELETE with LIMIT generates a warning when used with statement-based replication, even if ORDER BY is used.

Also, DELETE with LIMIT is not standard SQL and IDatabase::delete() does not support it.

I laid out three broad ideas for how to move forward:

  1. Reduce sleep duration (or increase batch size, effectively the same, sleep probably preferred). Also, we can improve it such that the sleep is an interval that includes the time for the delete query, thus representing more accurately the underlying use case, and naturally reducing idle time when queries take longer, and increasing it when they are faster.

+1 to reduce sleep time and/or batch size. Especially with the new hardware, I am pretty sure we can handle more load.

  1. Remove sleep entirely. This is something we probably don't want to do right since we still replicate to Codfw. But, as part of T277831, it semes like we're likely to stop doing replication once we're active-active, with the only remaining use until then being to have a warm standby for unplanned switchovers. Perhaps it's worth the operational benefit to stop replication and purge-sleeps now, and then try to keep them warm by other means (or to go without it until we're active-active).

If we don't replicate to codfw, I am fine with that, but as you mention we still do. And further, we'll be doing the DC switchover next week, so I do not want to start with codfw delayed, not because it is a bad thing, but because I don't know how MW will handle reading from parsercache hosts with lag, but I supposed it will complain hardly :)

  1. Purge servers concurrently instead of one at a time (e.g. extra parameter for db server name, or spawn sub processes; non-trivial).

Also, to recap the reason for the sleeps. This was added in 2016 per T150124 to mitigate replication lag in the secondary datacentre. As I understand it, it's not that we think the delete statements themselves are lag-prone. These are only use primary keys, and are naturally spaced out by the time it takes for the primary DB to process the delete. The queries are all issues by a single process in a single thread, no concurrenty or other gracefulness to worry about. However, while the delete statements themselves might not cause lag, we do have to consider the regular production traffic which involves lots of inserts (500 per second), and those are more-or-less unthrottled based on user demand. So the sleeps are a way to avoid congesting this more important events in the replication stream with deletes.

DELETES are very expensive on mysql indeed, but as you mention, these hosts have an additional load of writes and selects. Again, new hardware might help here, but yet to be tested.

  1. Threshold for page splitting/merging.
  2. Threshold for inlining blobs.

I don't know anything about these though, beyond what the mysql manual says about them, but I'll document it here for posterity.

The plan for today was to start by reducing the sleep and then run it again to see how much it helps.

This sounds good to me but keep in mind we are switching to codfw on Tuesday and we should not start with codfw lagged, unless you tell me MW will handle that without errors/warnings.


I was looking at some of the host stats with @Gilles earlier today, and he asked what the main bottleneck is on the MySQL side. E.g. is it maxing out CPU, or maxing out disk. If neither, then what is holding it up? We looked at the MySQL stats during the manual purge on 25 May and correlated them with the Host overview: pc1007 and it seemed that the bottleneck is disk I/O. During this unthrottled purge, disk utilization was around 90%. Perhaps this is obvious, but I thought it was interesting nonetheless.

I think there's not a single bottleneck here, it is probably a combination of many things.
We are generating lots of writes, as you mentioned, around 500 writes per second steadily, which is probably leading to contention too (which can be seeing here: https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=19&orgId=1&from=now-24h&to=now&var-server=pc1007&var-port=9104).

There's also the fact that the current hardware is way different from what we are used to in production. This hardware has half the ram (256GB) and only 4 SSDs disks (in production we have 10), which means less speed as well. On top of this, we use RAID 5, here and not 10, which also means we have less performance on each write, and having this huge amount of writes can make lots of difference.
We have purchased new hardware (T284825) which is the same one we have in production (512GB RAM, 10 disks and RAID10), so this for sure will speed up things. So far only codfw hosts have arrived, but I think we should start doing tests there as soon as we can.
We can copy pc2010 (spare) data to the host that will be in pc1 codfw, and we can do testing with production data on at least one host.

The exptime index is not used for anything else, so it might not be in memory when the script runs. Loading primary keys with SELECT avoids replicating a hit on a cold index.

That's actually a good approach indeed, but given how crazy parsercache has grown...it might be causing extra overload.

Also, according to the MySQL manual, DELETE with LIMIT generates a warning when used with statement-based replication, even if ORDER BY is used.

Yeah, mysql issues that warning as it is an unsafe statement, but given the nature of the parsercache data, maybe it would be ok to ignore it.

Change 700957 merged by Kormat:

[operations/puppet@production] mediawiki: Reduce purgeParserCache.php sleep from 500ms to 200

https://gerrit.wikimedia.org/r/700957

Notes from the meeting today with @Kormat and @Krinkle

  • Code changes to be made:
    • Recording number of iterations since the last percentage print (purge script feature to be added).
    • Running on multiple servers (adding a --server parameter) requires changes to both the maintenance function in SqlBagOStuff and the script.
  • Sleep interval deduction (making sleep variable instead of static) requires further data gathering.
  • Filtering deletes from replication and running on both DCs would result in spare PC not getting purged but this won't be a problem once the script is host-aware.
  • We can test the new hosts, they would have to be introduced as replicas first.

Change 701132 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] purgeParserCache.php: Print stats for time and iterations

https://gerrit.wikimedia.org/r/701132

For what is worth, I have tested the following changes on all eqiad pc hosts for a few hours each:
innodb_change_buffering to: deletes, purges, all.

They made no difference on the script speed, which remained around 10-11 minutes per 0.1% step.

I have set it back to none.

  • Running on multiple servers (adding a --server parameter) requires changes to both the maintenance function in SqlBagOStuff and the script.

My idea was to reorder the deletes so that the inner loop is by server, rather than the outer loop. This would allow the delay to be tuned downwards (perhaps to zero) since while replication is catching up, the script will be busy with other servers. But --server would also work.

  • Running on multiple servers (adding a --server parameter) requires changes to both the maintenance function in SqlBagOStuff and the script.

My idea was to reorder the deletes so that the inner loop is by server, rather than the outer loop. This would allow the delay to be tuned downwards (perhaps to zero) since while replication is catching up, the script will be busy with other servers. But --server would also work.

Ack. I mentioned this in the meeting and was meant to be captured as part of the point "Sleep interval deduction (making sleep variable instead of static)" which would involve deducting time spent from the interval, as well as to further make use of that time by handling other servers at the same time – effectively inverting the loop as you say.

We discussed that how much this gains us depends on how much of the time is actually spent in the deletes vs in the sleep right now. Our napkin calculations suggested that even with near-zero sleep we may still be close to or over the 24h mark, so actually running them concurrently seemed to be simpler in terms of code refactoring and a bigger more certain step to getting under 24h. Although in the time since that meeting, I've come to believe we'll need multiple mitigations since 8 days divided 3 is still more than 1. Perhaps the reduced sleep will get us there, but if not we can still do the dynamic interval on top, it would only deduct the single write time from each iteration's sleep.

I did raise the concern that doing this concurrently would mean either hardcoding a new shard parameter in the cron config (and thus an awkward separation of concerns between MW and Puppet), or complexity in the shape of subprocesses/forks (not our favourite thing to do in PHP, but might be managable if we add the shard parameter anyway, and then spawn sub procs that set that parameter). I haven't yet decided which of the two is best, but I figured either way would involve a shard parameter, and that parameter seems relatively easy to implement. We could hardcode it at least to get started initially if we're not sure about sub procs yet.

@dpifke mentioned table partitioning. If we wanted to rearchitect it more aggressively, there are some ways we could avoid deletes altogether. For example, we could use a MultiWriteBagOStuff to write to two tiers of SQL cache. Then periodically wipe one tier and rotate the configured order of the tiers. When you introduce a new tier, a cache miss on the new tier causes the data to be copied from the old tier. It would be a LRU eviction scheme.

If DROP TABLE can be done without stalling the whole server, we could just use table prefixes for the tiers, and the whole thing could be automated in MediaWiki. We could have a new BagOStuff subclass that implements rotation automatically, instead of requiring a change to the config.

Change 701132 merged by jenkins-bot:

[mediawiki/core@master] purgeParserCache.php: Print stats for time and iterations

https://gerrit.wikimedia.org/r/701132

@Marostegui I was curious if adjusting MERGE_THRESHOLD would help (e.g https://dev.mysql.com/doc/refman/5.6/en/index-page-merge-threshold.html ) though INFORMATION_SCHEMA.INNODB_METRICS seems to not be enabled? Are those stats really expensive?

The current run has progressed to 10.4% over the course of 19h. That works out to ~7.6 days for a full run.

On the plus side, reducing the sleep duration doesn't appear to have had an impact on replication lag:

https://grafana-rw.wikimedia.org/d/000000273/mysql?orgId=1&from=1624352800983&to=1624525600983&var-server=pc2008&forceLogin=true&viewPanel=6

(purgeParserCache.php is currently running against pc1008; this is the codfw replica in the same section)

@Krinkle : another feature request for purgeParserCache.php would be to print the current section/host. I realise that that's not doable right now with the current code structure, but it would help quite a bit with correlating things.

Change 701285 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@wmf/1.37.0-wmf.11] purgeParserCache.php: Print stats for time and iterations

https://gerrit.wikimedia.org/r/701285

Change 701433 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] purgeParserCache.php: Implement --tag for purging one server only

https://gerrit.wikimedia.org/r/701433

Change 701433 merged by jenkins-bot:

[mediawiki/core@master] purgeParserCache.php: Implement --tag for purging one server only

https://gerrit.wikimedia.org/r/701433

Change 701408 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@wmf/1.37.0-wmf.11] purgeParserCache.php: Implement --tag for purging one server only

https://gerrit.wikimedia.org/r/701408

Change 701602 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] obectcache: made SqlBagOStuff::deleteObjectsExpiringBefore alternate over shards

https://gerrit.wikimedia.org/r/701602

The last purge run took 113h (4d 17h).
The current purge is currently at 2.3%, and at the current pace will take a total of 117h, so pretty comparable.

Change 701408 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.11] purgeParserCache.php: Implement --tag for purging one server only

https://gerrit.wikimedia.org/r/701408

Mentioned in SAL (#wikimedia-operations) [2021-06-28T19:45:05Z] <krinkle@deploy1002> Synchronized php-1.37.0-wmf.11/includes/objectcache/SqlBagOStuff.php: T282761 - I618bc1e8ca3008 (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2021-06-28T19:46:30Z] <krinkle@deploy1002> Synchronized php-1.37.0-wmf.11/includes/libs/objectcache/: T282761 - I618bc1e8ca3008 (duration: 00m 56s)

The last purge run took 113h (4d 17h).
The current purge is currently at 2.3%, and at the current pace will take a total of 117h, so pretty comparable.

The scheduled script is currently doing pc3 (pc1009). I briefly ran a separate process with --tag pc2 (pc1008) to confirm that this works as expected in production. That is, the extra process is making progress (stdout below), and Grafana: mysql pc1008 shows delete queries kicking in, and the scheduled script continues unaffected at the same time at Grafana: mysql pc1009.

[21:41 UTC] krinkle at mwmaint1002.eqiad.wmnet in ~
$ mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc2
Deleting objects expiring before Mon, 28 Jun 2021 21:41:48 GMT
... 0.0% done
... 0.1% done

I'll leave it running from my gnu-screen as it seems very low risk at this point. Feel free to kill the proc if there is any amount of uncertainty, discomfort, or trouble.

Mentioned in SAL (#wikimedia-operations) [2021-06-28T21:55:01Z] <Krinkle> krinkle@mwmaint1002: purgeParserCache.php --tag pc2, ref T282761

Mentioned in SAL (#wikimedia-operations) [2021-06-29T00:25:56Z] <Krinkle> krinkle@mwmaint1002: purgeParserCache.php --tag pc1, ref T282761

The manual run for pc1 and pc2, as well as the scheduled run that was at pc3, as running on mwmaint1002.eqiad, were all cut-off by the switchover procedure. This was anticipated and we decided together with the DBAs to let this happen and restart manually on the other side (prioritising an uncomplicated/uneventful switchover, over the pressure on parsercaches which we think we can eat another 1-2 days into our margins for).

After the switchover was completed, I started three manual "tagged" runs around Tue 29 June, 18:00 UTC, on mwmaint2002.codfw.

  • pc1 (pc2007/pc1007): Started around 18:00 UTC and was only at 19% when I looked at it just now at 16:20 UTC the following day. It must be noted that the scheduled run (which is still serial and untagged) started as scheduled around 01:00 UTC and started dealing with the same server. This resulted in double the deletion load. I have killed my manual tagged run to let the scheduled run take over and run its course at the normal deletion rate. (Grafana: mysql pc2007).
  • pc2 (pc2008/pc1008): Started around 18:00 UTC and completed in 8 hours at 04:15 UTC. (Grafana: mysql pc2008).
  • pc3 (pc2009/pc1009): Started around 18:00 UTC and completed in 17 hours at 11:22 UTC. (Grafana: mysql pc2009).

The wild difference in runtimes I think can be attributed to how the previous runs were cut-off and thus some servers having been idle/unpurged for a much longer time than others. pc2 is the one that I was manually purging in Eqiad, pc3 is the one that the scheduled run in Eqiad had almost finished purging, and pc1 has been waiting for its turn in the scheduled run with me only just getting started on it in Eqiad after also having been cut off in the previous runs.

Change 702425 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] mediawiki: Run purgeParserCache.php in parallel for each shard

https://gerrit.wikimedia.org/r/702425

Change 702425 merged by Kormat:

[operations/puppet@production] mediawiki: Run purgeParserCache.php in parallel for each shard

https://gerrit.wikimedia.org/r/702425

Current status of the purge jobs after 14h:

  • pc1: 32% done
  • pc2: 31% done
  • pc3: 100% done, but looks suspicious:
Jul  1 02:06:56 mwmaint2002 mediawiki_job_purge_parsercache_pc3[136034]: ... 33.2% done
Jul  1 02:07:08 mwmaint2002 mediawiki_job_purge_parsercache_pc3[136034]: ... 33.3% done
Jul  1 02:07:20 mwmaint2002 mediawiki_job_purge_parsercache_pc3[136034]: ... 100.0% done
Jul  1 02:07:20 mwmaint2002 mediawiki_job_purge_parsercache_pc3[136034]: Done

Could it be that the percentage calculation isn't taken the tag into account?

Otherwise pc1/pc2 are projected to take ~43h each in total.

And indeed:

Jul  1 09:44:18 mwmaint2002 mediawiki_job_purge_parsercache_pc1[196269]: ... 100.0% done
Jul  1 09:44:18 mwmaint2002 mediawiki_job_purge_parsercache_pc1[196269]: Done
Jul  1 09:59:10 mwmaint2002 mediawiki_job_purge_parsercache_pc2[196509]: ... 100.0% done
Jul  1 09:59:10 mwmaint2002 mediawiki_job_purge_parsercache_pc2[196509]: Done

Total run time for each section:

  • pc1: 14.8h
  • pc2: 15h
  • pc3: 7.2h

Change 702729 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] objectcache: Fix progress printing to account for server tag filter

https://gerrit.wikimedia.org/r/702729

In the long run, with active-active multi-DC, the parsercache DELETEs should not be replicated, even if there is circular mysql replication at that time (it certainly would not be one way, either none or circular).

Since the running the purge script in only the primary DC would have some race conditions that would leave stuff in the secondary DC. For example, a key receives REPLACE in secondary DC B just after receiving a DELETE in primary DC A, the REPLACE does not replicate to A until a bit later, likewise with the DELETE replacating to B. The result is that the key only exists in A and will not be purged if a DC switchover happens first. Also, running the script in both DCs makes the DELETE replication pointless.

@Marostegui: It would be nice to expirement with non-replicated events using https://mariadb.com/kb/en/create-event/ . Any objection to enabling the event manager on pc* and x2 (DBMainStash) servers?

Change 702729 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Fix SqlBagOStuff progress printing to account for server tags

https://gerrit.wikimedia.org/r/702729

So is this resolved? Runtime was less than 24 hours, right? Either it should be resolved or it should stop being UBN.

In the long run, with active-active multi-DC, the parsercache DELETEs should not be replicated, even if there is circular mysql replication at that time (it certainly would not be one way, either none or circular).

Since the running the purge script in only the primary DC would have some race conditions that would leave stuff in the secondary DC. For example, a key receives REPLACE in secondary DC B just after receiving a DELETE in primary DC A, the REPLACE does not replicate to A until a bit later, likewise with the DELETE replacating to B. The result is that the key only exists in A and will not be purged if a DC switchover happens first. Also, running the script in both DCs makes the DELETE replication pointless.

@Marostegui: It would be nice to expirement with non-replicated events using https://mariadb.com/kb/en/create-event/ . Any objection to enabling the event manager on pc* and x2 (DBMainStash) servers?

The event scheduler is ON by default in production (we use it for the query killers).
If this wants to be tested, I am ok with it, but I'd like to also explore other possibilities apart from events, as they can be a bottleneck in very loaded systems (we've seen this on Tendril and even with the query killers).

Kormat lowered the priority of this task from Unbreak Now! to Medium.Jul 2 2021, 12:44 PM

So is this resolved? Runtime was less than 24 hours, right? Either it should be resolved or it should stop being UBN.

Good point, dropping to medium.

Current status:

  • pc1: 26.7% (of 33.3%) done after 12.8h => est 16h total
  • pc2: 22.3% (of 33.3%) done after 12.8h => est 19h total
  • pc3: 33.3% (finished) after 3h17m total.

So is this resolved? Runtime was less than 24 hours, right? Either it should be resolved or it should stop being UBN.

For the record, we decided in yesterday's meeting to keep monitoring this over the next 7 days since various circumstances make yesterday's and today's run less than representative (pc3 was handled manually shortly before, and all three were started 6h ahead of the regular schedule). Today's run will catch up with that, and tomorrow's run will (hopefully) be first "normal" clean day-over-day run since April last year.

Assuming all that remains within 24h, this can be closed, although I do note that we're currently still with reduced retention (30d->20d). When we increase under T280604, there'll be more data per-server and thus significantly more purging time again. Our hope is this will be balanced out by the newer hardware and proposed RAID configuration, but we can monitor all that under T280604.

Runtimes so far this week:

Monday 5th:
* pc1: 13h
* pc2: 15h
* pc3: 14h

Tuesday 6th:
* pc1: 11h
* pc2: 13h
* pc3: 12h

Just something I noticed on each run:

Jul  6 14:03:57 mwmaint2002 mediawiki_job_purge_parsercache_pc2[225981]: ... 33.2% done (+322 iterations in 142.9s)
Jul  6 14:06:21 mwmaint2002 mediawiki_job_purge_parsercache_pc2[225981]: ... 33.3% done (+318 iterations in 143.8s)
Jul  6 14:08:05 mwmaint2002 mediawiki_job_purge_parsercache_pc2[225981]: ... 100.0% done (+252 iterations in 104.8s)
Jul  6 14:08:05 mwmaint2002 mediawiki_job_purge_parsercache_pc2[225981]: Done

That big jump seems to happen on each run for each shard, is that normal?

Runtimes:

Wednesday 7th:
* pc1: 11h
* pc2: 13h
* pc3: 11h
Jul  6 14:06:21 mwmaint2002 mediawiki_job_purge_parsercache_pc2[225981]: ... 33.3% done (+318 iterations in 143.8s)
Jul  6 14:08:05 mwmaint2002 mediawiki_job_purge_parsercache_pc2[225981]: ... 100.0% done (+252 iterations in 104.8s)

That big jump seems to happen on each run for each shard, is that normal?

Kind of. It was a bug in the patch that added the --tag parameter. It still calculated as if it was purging all three, with a static 100% ensured as last printed progress line. Already fixed in https://gerrit.wikimedia.org/r/702729, which should roll out next week.

Thursday 8th:
* pc1: 11h
* pc2: 13h
* pc3: 13h
Friday 9th:
* pc1: 11h
* pc2: 12h
* pc3: 12h

These runtimes seem pretty stable lately, so I guess it is fine to close this as fixed?
Once we have the new hosts in place, that should also help.

Krinkle claimed this task.

Yep, although noted that:

  • disk space is still quite rapidly increasing despite shortened retention and daily purging, which suggests we're not going to stay stable for long given more data will mean longer purge times.
  • as part of restoring retention, purge time is expected to go up even furhter.

But let's cover that as part of T280606.