Page MenuHomePhabricator

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.
  • wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s
  • wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s
  • mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s
  • mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s
  • mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s
  • mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s
  • mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s
  • mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s
  • mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

  • wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
  • mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

  • 7-10 minutes (plain mediawiki-core)
  • Sample:
    • PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
    • QUnit: 4 seconds / 437 tests.
    • PHPUnit (db): 4 minutes / 7604 tests.

Updated status quo

As of 11 May 2021, the gate usually takes around 25 minutes.

The slowest job typically takes 20-25 minutes per run. The time for the gate overall can never be faster than the slowest job, and can be worse as though we run other jobs in parallel, they don't always start immediately, due to given limited CI execution slots.

Below is the time results from a sample MediaWiki commit (master branch):

[Snipped: Jobs faster than 5 minutes]

  • 9m 43s: mediawiki-quibble-vendor-mysql-php74-docker/5873/console
  • 9m 47s: mediawiki-quibble-vendor-mysql-php73-docker/8799/console
  • 10m 03s: mediawiki-quibble-vendor-sqlite-php72-docker/10345/console
  • 10m 13s: mediawiki-quibble-composer-mysql-php72-docker/19129/console
  • 10m 28s: mediawiki-quibble-vendor-mysql-php72-docker/46482/console
  • 13m 11s: mediawiki-quibble-vendor-postgres-php72-docker/10259/console
  • 16m 44s: wmf-quibble-core-vendor-mysql-php72-docker/53990/console
  • 22m 26s: wmf-quibble-selenium-php72-docker/94038/console

Clearly the last two jobs are dominant in the timing:

  • wmf-quibble: This jobs installs MW with the gated extensions, and then runs all PHPUnit and QUnit tests.
  • wmf-quibble-selenium: This job installs MW with the gated extensions, and then runs all the Selenium tests.

Note that the mediawiki-quibble jobs each install just the MW bundled extensions, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 13-18 minutes (wmf-gated, extensions-only)
  • Select times:
    • PHPUnit (unit tests): 9 seconds / 13,170 tests.
    • PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
    • PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.
    • QUnit: 31 seconds / 1421 tests.

Stats from wmf-quibble-selenium-php72-docker:

  • 20-25 minutes

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

Ideas to explore and related work

  • Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

SubjectRepoBranchLines +/-
mediawiki/coremaster+30 -4
integration/quibblemaster+109 -0
mediawiki/extensions/Echomaster+6 -18
mediawiki/extensions/Echomaster+16 -33
mediawiki/coremaster+14 -5
mediawiki/libs/less.phpmaster+3 -3
mediawiki/coremaster+14 -0
mediawiki/coremaster+22 -25
mediawiki/libs/less.phpmaster+39 -43
mediawiki/coremaster+11 -8
mediawiki/extensions/Kartographermaster+35 -27
mediawiki/extensions/Translatemaster+7 -4
mediawiki/libs/less.phpmaster+11 -6
mediawiki/coremaster+30 -30
mediawiki/coremaster+5 -4
mediawiki/coremaster+23 -22
mediawiki/coremaster+52 -55
integration/quibblemaster+24 -2
mediawiki/coremaster+3 -3
mediawiki/coremaster+2 -28
mediawiki/extensions/MobileFrontendmaster+0 -3
integration/configmaster+14 -0
integration/configmaster+0 -4
mediawiki/coremaster+0 -5
integration/quibblemaster+40 -15
mediawiki/coremaster+56 -16
mediawiki/coremaster+2 -12
mediawiki/coremaster+8 -3
integration/quibblemaster+34 -0
mediawiki/coremaster+2 -12
integration/configmaster+1 -1
integration/configmaster+32 -32
integration/configmaster+28 -28
integration/quibblemaster+27 -3
mediawiki/extensions/ProofreadPagemaster+11 K -621
mediawiki/extensions/GrowthExperimentsmaster+13 K -549
mediawiki/extensions/Echomaster+11 K -216
mediawiki/extensions/AbuseFiltermaster+12 K -469
mediawiki/extensions/FileImportermaster+11 K -530
mediawiki/coremaster+22 -140
mediawiki/coremaster+1 K -679
mediawiki/coremaster+21 -14
mediawiki/coremaster+6 -4
mediawiki/coremaster+1 -1
integration/configmaster+0 -19
integration/configmaster+12 -5
mediawiki/coremaster+20 -19
mediawiki/coremaster+17 -49
mediawiki/coremaster+1 -3
mediawiki/coremaster+13 -1
integration/configmaster+22 -22
integration/configmaster+54 -0
integration/quibblemaster+4 -0
mediawiki/coremaster+3 -10
mediawiki/coremaster+1 -1
mediawiki/extensions/Wikibasemaster+70 -21
mediawiki/coremaster+27 -37
mediawiki/coremaster+29 -5
mediawiki/coremaster+37 -1
mediawiki/extensions/Babelmaster+47 -52
mediawiki/coremaster+16 -1
Show related patches Customize query in gerrit

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
OpenNone
ResolvedLadsgroup
Resolvedaaron
Resolvedhashar
Resolvedhashar
OpenNone
Resolvedhashar
DeclinedNone
ResolvedJdforrester-WMF
Resolvedawight
OpenNone
ResolvedNone
ResolvedDaimona
DeclinedNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
StalledNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedDaimona
ResolvedDaimona

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 745991 abandoned by Ladsgroup:

[mediawiki/core@master] [DNM] Test if reduction in logging would reduce the tests run time

Reason:

test done

https://gerrit.wikimedia.org/r/745991

Change 822663 had a related patch set uploaded (by Krinkle; author: Krinkle):

[integration/config@master] jjb: Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 822663 merged by jenkins-bot:

[integration/config@master] jjb: [quibble] Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 748312 merged by jenkins-bot:

[integration/config@master] dockerfiles: Add php-excimer to quibble

https://gerrit.wikimedia.org/r/748312

Change 853292 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

Three years later: maybe we should target a more realistic number, like 15 minutes? If we can get to 15 minutes, we could then move on to 10 minutes, and then maybe end up back at this task's current goal of 5 minutes.

T287582: Move some Wikibase selenium tests to a standalone job is a potential low hanging fruit. The Wikibase Selenium tests takes roughly 5minutes30 and only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector (and MediaWiki core obviously). If we moved those to a standalone job only triggering for those few repositories, that will save 5minutes30 for all other repositories. We would have to drop the npm selenium-test entry point from Wikibase to prevent it from being discovered when Wikibase is a dependency.

Another potentially large saver would be to mark slow tests with @group Standalone which would only trigger when a patchset target that repo and thus prevent them from running when the patchset is for another repository.

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

Change 853292 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

Change 859146 had a related patch set uploaded (by Krinkle; author: Tim Starling):

[mediawiki/core@master] Reduce time cost of password tests

https://gerrit.wikimedia.org/r/859146

Change 859146 merged by jenkins-bot:

[mediawiki/core@master] password: Reduce time cost of password unit tests

https://gerrit.wikimedia.org/r/859146

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. The only difference is the kind of tests being run which are filtered via quibble parameters, respectively: --skip=selenium and --run=selenium.

The registration or discovery of tests varies though. A detailed description (you probably know most of that already but that can be helpful for other readers):

PHPUnit

We run the MediaWiki core extensions suite (defined at tests/phpunit/suites/ExtensionsTestSuite.php) which:

  • discover tests if they are put in a /tests/phpunit directory relatively to each extension/skin.
  • add tests registered via the UnitTestsList hook

Those tests are depending on the centrally managed configuration from mediawiki/core eg the phpunit version.

Selenium

There is no registration mechanism, only a discovery phase. Each extension/skin test suite is standalone and can use different versions of Webdriver.io. The convention is developers define how to run the browser tests by adding a selenium-test npm script. In those jobs quibble --run=selenium has all extensions/skins cloned in and checked out, Quibble then crawls through each repo looking for a package.json and if has a selenium-test it will run those tests invoking npm run selenium. They are run serially.


The alternative would be to only run the Selenium test for the triggering repository. Something such as: quibble --command 'cd $THING_NAME && npm run-script selenium-test. But we will loose the integration testing between repository :-(

For PHPUnit test we have moved some tests to @group Standalone. The use case was to avoid running the Scribunto integration tests for any extension depending on it, given those tests are only going to be affected by a change to Scribunto there was no point in running them when a patch triggers Wikibase for example. It is another build stage defined in Quibble: `quibble --run phpunit-standalone.

I guess we could do something similar for Selenium and split tests between:

  • integration testing (to be run by any repo)
  • standalone tests (to be run solely when a patch triggers for this repo)
  • add a selenium-standalone npm script convention which in the repo would invoke something such as wdio --spec tests/selenium/specs/standalone/**/*.js

The Wikibase Selenium tests […] only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector […]
! In T225730#8370890, @Krinkle wrote:

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? […]

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. […] The registration or discovery of tests varies though. […] Selenium: There is no registration mechanism, only a discovery phase. […]

My reason for asking is that I thought you meant that Wikibase CI is slow because it is running selenium tests for extensions that it does not depend on. The lack of a test registration system for Selenium should not be an issue since we can only discover what we install in CI, and CI only installs what Wikibase depends on according to the repo dependency map.

For PHPUnit test we have moved some tests to @group Standalone. […] I guess we could do something similar for Selenium and split tests, [and] add a selenium-standalone npm script convention.

Another option might be to filter out @standalone tests via wdio --mochaOpts.grep --invert. This is even more similar to PHPUnit and is what we already use for running the @daily selenium tests. This has the benefit of keeping a simple and consistent way to run selenium tests on any given repo as a contributor without needing to know about this detail. It is then to Quibble to invoke a variant npm run selenium-nostandalone on non-current repos to skip those. This instead of e.g. having the main selenium entrypoint no longe run all the tests, which might lead to confusion.

zeljkofilipin changed the status of subtask T284568: Speed up login from In Progress to Open.Dec 15 2022, 11:52 AM

Change 913562 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [DNM] Measure impact of GeneralizedSql on tests run time

https://gerrit.wikimedia.org/r/913562

Change 913562 abandoned by Ladsgroup:

[mediawiki/core@master] [DNM] Measure impact of GeneralizedSql on tests run time

Reason:

https://gerrit.wikimedia.org/r/913562

Change 702909 abandoned by Hashar:

[integration/quibble@master] npm: Use cache for npm ci and prefer offline

Reason:

https://gerrit.wikimedia.org/r/702909

Change 932455 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] api tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/932455

Change 932455 merged by jenkins-bot:

[mediawiki/core@master] api tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/932455

Change 936317 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] storage tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/936317

Change 936230 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Fix TestLocalisationCache being way to small

https://gerrit.wikimedia.org/r/936230

Change 936317 merged by jenkins-bot:

[mediawiki/core@master] storage tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/936317

Change 936230 merged by jenkins-bot:

[mediawiki/core@master] Fix TestLocalisationCache being way to small

https://gerrit.wikimedia.org/r/936230

Change 938346 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] tests: Pass Title to editPage() when already parsed

https://gerrit.wikimedia.org/r/938346

Change 938346 merged by jenkins-bot:

[mediawiki/core@master] tests: Pass Title to editPage() when already parsed

https://gerrit.wikimedia.org/r/938346

Change 939296 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Re-arrange execution order in LESS parser for performance

https://gerrit.wikimedia.org/r/939296

Change 939684 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

Change 939690 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Translate@master] [POC] Set content language to qqx in slow tests

https://gerrit.wikimedia.org/r/939690

Change 939296 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/libs/less.php@master] Re-arrange execution order in LESS parser for performance

Reason:

https://gerrit.wikimedia.org/r/939296

Change 939727 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Inline and optimize heavily called Parser::MatchQuoted

https://gerrit.wikimedia.org/r/939727

Change 939690 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/Translate@master] [POC] Set content language to qqx in slow tests

Reason:

No, that's not it.

https://gerrit.wikimedia.org/r/939690

Change 939717 had a related patch set uploaded (by Krinkle; author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Less_Parser: Faster MatchQuoted() by using native strcspn()

https://gerrit.wikimedia.org/r/939717

Change 939717 merged by jenkins-bot:

[mediawiki/libs/less.php@master] Less_Parser: Faster MatchQuoted() by using native strcspn()

https://gerrit.wikimedia.org/r/939717

Change 939684 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

Change 939684 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

From my review on that change:

Think the root cause is the localization cache is set to use a static array from includes/DevelopmentSettings.php:

// Localisation Cache to StaticArray (T218207)
$wgLocalisationCacheConf['store'] = 'array';

For PHPUnit tests, it is probably fine since there is a single process. But for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

But it is a problem if it's done thousands of times for tests that don't even do anything with the results.

I'm digging into this for a few days now but can't find that single bottleneck anywhere. I suspect the majority of the up to 20 minutes this ticket talks about is just because we are naively running the same processes with the same inputs and the same outputs over and over again, millions and millions of times.

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

T50217#9030084 is related. I noticed that paratest was hanging, and spending most of the time (12 of 18 seconds) on a single test, LanguageConverterFactoryTest. I looked at that test yesterday, it basically loads the localisation cache for all languages, one by one. And while loading a single language isn't terribly expensive (the slowest I saw was around 80ms), it adds up when done hundreds of times. I couldn't find a solution to that, though, and just stopped trying after a while.

Change 940120 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make MapCacheLRU in LanguageFactory static

https://gerrit.wikimedia.org/r/940120

Change 940233 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] TestLocalisationCache: add static cache of JSON files

https://gerrit.wikimedia.org/r/940233

In order to help understand what's going on under the hood and make more informed decisions on what to improve and what don't. I run tests with excimer and built flamegraphs out of them which you can see for three types of tests (phpunit, selenium, api_tests) here: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/

The flamegraphs can be searched, zoomed in, etc. (More info: https://www.brendangregg.com/flamegraphs.html)

That provides way better insights and are a treasure trope of areas to improve. I haven't fully dug into all of them but for example: 3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.

Feel free to dig into them!

(How it was made? this patch enables logging of excimer and tests create two files, one for wall time and one for cpu time with sampling rate of every .1 second. Then downloaded those logs and run this on the logs: cat ../selenium/cpu-profile.log* | ./flamegraph.pl > ../selenium/cpu-profile.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl > ../selenium/wall-profile.svg; cat ../selenium/cpu-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/cpu-profile.reverse.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/wall-profile.reverse.svg)

You can also take the setting of the patch and enable it locally and run them.

3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

That's something I'd like to get rid of. But first of all, we should avoid cloning and resetting the database for tests not in the 'Database' group (T155147). Once that's done, I'll look into not creating a page and a user for every test.

Zooming out a little bit, PHPUnit (and possibly selenium too, but I haven't tested it) would benefit a lot from parallelization with paratest, see T50217#9022060 and the following comments. I'm a bit skeptical about seeing a x6 speedup in CI as I observed locally; but still, even if it's just a 2x, it's much more than we can do by just improving the code. I'm also planning to look into paratest support once T155147 and T90875 are done.

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

As requested by @Krinkle I increased the sampling rate from every 100m to 1ms. The result are here https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

13 seconds is not much compared to at 1-2 minutes of every selenium test being spent on cache misses of RL modules: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/wall-profile.selenium.svg

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.
[…]

When I download the .log artefacts and load them up into Speedscope, I can confirm that APCUBagOStuff is selected (so php-apcu cache is enabled and discovered by MW), and we see both doGet calls without minification, and minification calls with doSet. Suggesting that functionally it is using the cache correctly.

It's had to tell from the current snapshot whether it is slow or whether it is missing the cache more than once for the same module. The settings here are more or less stock MediaWiki + DevelopmentSettings, similar to mediawiki-docker where this is generally working well.

My first hunch here is that the CI worker is under memory pressure and forced to evict fresh values from the cache before they can be used. For example, the default apc.shm_size in PHP is 32M (https://www.php.net/manual/en/apcu.configuration.php) which is quite small but may just about suffice for human use in MediaWiki-Docker, but in CI we're likely over a threshold where we engage with a lot of different areas. Likewise, opcache (https://www.php.net/manual/en/opcache.configuration.php) has a default of opcache.memory_consumption=128 (MB) and opcache.interned_strings_buffer=8 (MB).

In production (Codesearch) we use:

# appserver
profile::mediawiki::apc_shm_size: 6096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# jobrunner
profile::mediawiki::apc_shm_size: 4096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# default (unused?)
profile::mediawiki::apc_shm_size: 128M
opcache.interned_strings_buffer: 50
opcache.memory_consumption: 300

We don't need that much in CI given that we only interact with 1 wiki (not 1000), and in ~1 language (not 300). It currently has the following set in Quibble (source):

opcache.memory_consumption=256

I presume the others inherit upstream defaults of apc.shm_size=32M and opcache.interned_strings_buffer=8, which would be very small indeed.

I've uploaded the raw log files from @Ladsgroup's high-res exicmer capture (change 940226) to people.wikimedia.org, which has CORS enabled. This means we can load them into Speedscope (aka Excimer UI).

https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/

For example (this is a 35MB large file): https://performance.wikimedia.org/excimer/speedscope/#profileURL=https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/phpunit_mwquibble_cpuprofile.log.

Screenshot 2023-07-21 at 03.02.01.png (1×2 px, 403 KB)

The sample count reflects time passing in milliseconds. It says the phpunit invocation for mediawiki-quibble-vendor yielded 374,000 samples, or 374 seconds, or 6 minutes, which is right.

Using the "Sandwhich" mode, which is like a more powerful version of classic "reverse" flamegraphs, we can quickly find (for example) the LocalisationCache as having one of the highest "self" times. Clicking it, then reveals which tests it is largely spent by, with most (57%) coming from LanguageConverterFactoryTest.

Screenshot 2023-07-21 at 03.06.49.png (973×2 px, 404 KB)

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run.

I incidentally happened to look into language object instantiation yesterday for another reason: POC: Create Language objects without initializing l10n cache – I’ll probably look into this a bit more. (Edit: now tracked at T342418.)

Change 940502 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 939727 merged by jenkins-bot:

[mediawiki/libs/less.php@master] Less_Parser: Inline and optimize heavily called MatchQuoted()

https://gerrit.wikimedia.org/r/939727

Change 940502 merged by jenkins-bot:

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 940929 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

Change 940929 merged by jenkins-bot:

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

If we can just fix the RL's caching in the apache server of selenium. We can easily brush off 1-5 minutes (!) from each selenium run. See my attempt in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/944984

Change 698467 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Use a single login for browser tests

Reason:

https://gerrit.wikimedia.org/r/698467

Change 698468 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Move notifications page test into general suite

Reason:

https://gerrit.wikimedia.org/r/698468

Change 748314 abandoned by Hashar:

[integration/quibble@master] [DNM] Add excimer config

Reason:

https://gerrit.wikimedia.org/r/748314

Change 774409 abandoned by Kosta Harlan:

[mediawiki/core@master] DevelopmentSettings: Use MWLoggerDefaultSpi for debug logging

Reason:

https://gerrit.wikimedia.org/r/774409

wmf-quibble-core-vendor-mysql-php74-docker/37543/console today shows 18m 47s, which is a bit higher than the upper end of the spectrum noted in May 2021 (13-18m).

Three years later: maybe we should target a more realistic number, like 15 minutes? If we can get to 15 minutes, we could then move on to 10 minutes, and then maybe end up back at this task's current goal of 5 minutes.

I still think this (more realistic scope) might be a useful way to orient our work, though some individual or team would have to own this goal for it to be realistic, and to hold everyone accountable to it.

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

The obsolete, by 5 years, CI infrastructure will not be able to handle that. Last time I checked PHPUnit took a full CPU and MySQL took the other half and running them in parallel would exceed the capacity of the executing VM which runs multiple jobs in parallel competing for the same resource.


There are a few things that can potentially be done though such as:

  • moving Wikibase tests to their own jobs instead of running them from any repository participating in the wmf-quibble jobs T287582
  • related find a way to avoid running every single tests
  • do some profiling to find the low hanging fruit. I used to do that regularly and found some oddities such as a MediaWiki a service uselessly triggering, using a large PHPUnit dataprovider which causes each cases to be hit by PHPUnit test overhead
  • mocking the slow database wherever possible

do some profiling to find the low hanging fruit. I used to do that regularly and found some oddities […]

I also do this regularly, and will happily continue to do so. However. The sad reality is that it doesn't make much of a difference. The slowest individual PHPUnit tests we currently have take about 2 seconds. That's about 0.2% of the total runtime. Even if we could identify and eliminate 200 of such slow tests, the remaining 12,000 tests would still take 10 minutes.

I think the only approach that makes an actual difference is to find more ways to not run all tests all the time. For example, what about a more fine-grained tree of dependencies between codebases that tells us more than the current list of "gated extensions"? For example, while it's critical to run the FileImporter tests whenever something in core's import system is changed, it doesn't make sense to run the Wikibase tests when I make changes in FileImporter. There are a lot of weird combinations like this.

That said, one interesting bottleneck I'm still exploring is the LanguageFactory, see https://gerrit.wikimedia.org/r/940120.

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

The obsolete, by 5 years, CI infrastructure will not be able to handle that. Last time I checked PHPUnit took a full CPU and MySQL took the other half and running them in parallel would exceed the capacity of the executing VM which runs multiple jobs in parallel competing for the same resource.

Are there plans to improve the infrastructure or add more resources? As Thiemo said, improving individual tests is, for the most part, not really useful, because there isn't a single source of slowness. Running fewer tests is also an option, but I'm not sure if doing that cleverly is that easy: MW has a complex ecosystem, so it's not just a matter of flipping a switch.

IMHO, the lowest-hanging fruit is parallelization. If we find a way to make that work (both in MW and the CI infrastructure), we should see a huge reduction in CI times, possibly more than can be achieved by running fewer test, and at a lower maintenance cost. Once that's out of the way, perhaps start considering a more fine-grained testing infrastructure to run fewer tests.

At any rate, it's clear that resources are needed to reach this goal. There's no magic trick that will cut the CI runtime by just snapping one's fingers. And I agree with Kosta that this can only happen if a person or team chooses to own this goal, else it'll be really hard to find the necessary resources.

Slow CI can, at times, have a massive effect on productivity. And as "fun" as it might sound, having your patch wait 40 minutes after the +2, only for it to fail a random selenium test and having to go through gate-and-submit again really makes me wanna throw my laptop out of the window sometimes.

Slow CI can, at times, have a massive effect on productivity. And as "fun" as it might sound, having your patch wait 40 minutes after the +2, only for it to fail a random selenium test and having to go through gate-and-submit again really makes me wanna throw my laptop out of the window sometimes.

Somewhat orthogonal to this task, but wanted to note that T323750: Provide early feedback when a patch has job failures can help you know sooner when there's a failure. It has to be enabled per repo, though. (Patch for MW core if anyone wants to look.) It's a band-aid fix, yes, but you at least know of a failure earlier and can trigger a rebase to restart the tests.

I think there are two common use-cases where slow CI is annoying:

  • When submitting a change for code review and not learning quickly that it needs further improvements. As Kosta says, the early feedback bot could help with this.
  • When backporting changes, and having to do a merge (or several merges) in a 60-minute window. IMO we should just make most CI on wmf/ branches non-voting, they provide little value (even when they fail, it rarely represents a real error). T307180: Drop Selenium tests from gate-and-submit-wmf

Agreed that early feedback can help. One more thing to consider though: the bot helps when your patch is the only thing in CI, or at least if there aren't many patches being tested. But CI can get really full sometimes (e.g. if LibUp is running). When that happens, you also need to wait for all the previous jobs to complete before yours even starts, and this can only be improved by making everything faster.