Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Krinkle
	Jun 13 2019, 3:28 PM

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.

wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s

wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s

mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s

mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s

mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s

mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s

mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s

mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s

mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

9-15 minutes (wmf-gated, extensions-only)
Sample:
- PHPUnit (dbless): 1.91 minutes / 15,782 tests.
- QUnit: 29 seconds / 1286 tests.
- Selenium: 143 seconds / 43 tests.
- PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

7-10 minutes (plain mediawiki-core)
Sample:
- PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
- QUnit: 4 seconds / 437 tests.
- PHPUnit (db): 4 minutes / 7604 tests.

Updated status quo

As of 11 May 2021, the gate usually takes around 25 minutes.

The slowest job typically takes 20-25 minutes per run. The time for the gate overall can never be faster than the slowest job, and can be worse as though we run other jobs in parallel, they don't always start immediately, due to given limited CI execution slots.

Below is the time results from a sample MediaWiki commit (master branch):

[Snipped: Jobs faster than 5 minutes]

9m 43s: mediawiki-quibble-vendor-mysql-php74-docker/5873/console
9m 47s: mediawiki-quibble-vendor-mysql-php73-docker/8799/console
10m 03s: mediawiki-quibble-vendor-sqlite-php72-docker/10345/console
10m 13s: mediawiki-quibble-composer-mysql-php72-docker/19129/console
10m 28s: mediawiki-quibble-vendor-mysql-php72-docker/46482/console
13m 11s: mediawiki-quibble-vendor-postgres-php72-docker/10259/console
16m 44s: wmf-quibble-core-vendor-mysql-php72-docker/53990/console
22m 26s: wmf-quibble-selenium-php72-docker/94038/console

Clearly the last two jobs are dominant in the timing:

wmf-quibble: This jobs installs MW with the gated extensions, and then runs all PHPUnit and QUnit tests.
wmf-quibble-selenium: This job installs MW with the gated extensions, and then runs all the Selenium tests.

Note that the mediawiki-quibble jobs each install just the MW bundled extensions, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

13-18 minutes (wmf-gated, extensions-only)
Select times:
- PHPUnit (unit tests): 9 seconds / 13,170 tests.
- PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
- PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.
- QUnit: 31 seconds / 1421 tests.

Stats from wmf-quibble-selenium-php72-docker:

20-25 minutes

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

T202116: LoadBalancer opening extra connections in different connection categories doesn't work with PHPUnit & temporary tables
T225496: Improve caching in CI tests
T225901: Don't deduplicate archive table on new installs
T225719: HashRingTest::testHashRingKetamaMode takes 3+ seconds
T225184: CirrusSearch\SearcherTest::testSearchText PHPUnit tests take a while and runs for everyone
T227067: ReleaseNotesTest:testReleaseNotesFilesExistAndAreNotMalformed takes ~ 4 seconds
T196347: Quibble may need to rebuild localization cache before running tests
Speed up LocalisationCache in Quibble jobs by switching from DB to file-based (by setting $wgCacheDirectory). – https://gerrit.wikimedia.org/r/528933, https://gerrit.wikimedia.org/r/529057
Skip "npm test" step during Quibble jobs (covered by separate node10-test job). – T233143
T230701: Migrate Scribunto to stop using MediaWikiIntegrationTestCase on unit tests
T232759: Move CI selenium/qunit tests of mediawiki repository to a standalone job
Wikibase: T231862: Selenium tests for Wikibase are being ran twice
T225068: Add a PHPUnit group to skip test on gated CI runs
T225248: Consider moving browser based tests (Selenium and QUnit) to a non-voting pipeline
Use npm ci: T234738: Quibble jobs re-download npm packages every build (Castor not loading?)

Ideas to explore and related work

Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

Subject	Repo	Branch	Lines +/-
DevelopmentSettings: Use MWLoggerDefaultSpi for debug logging	mediawiki/core	master	+30 -4
[DNM] Add excimer config	integration/quibble	master	+109 -0
selenium: Move notifications page test into general suite	mediawiki/extensions/Echo	master	+6 -18
selenium: Use a single login for browser tests	mediawiki/extensions/Echo	master	+16 -33
Make MapCacheLRU in LanguageFactory static	mediawiki/core	master	+14 -5
Less_Parser: Faster MatchQuoted() by using native strcspn()	mediawiki/libs/less.php	master	+3 -3
TestLocalisationCache: add static cache of JSON files	mediawiki/core	master	+14 -0
Make "pluralRules" caches static in LocalisationCache	mediawiki/core	master	+22 -25
Less_Parser: Inline and optimize heavily called MatchQuoted()	mediawiki/libs/less.php	master	+39 -43
Improve performance of LocalisationCache::mergeMagicWords()	mediawiki/core	master	+11 -8
Much faster tests by setting content language to qqx	mediawiki/extensions/Kartographer	master	+35 -27
[POC] Set content language to qqx in slow tests	mediawiki/extensions/Translate	master	+7 -4
Re-arrange execution order in LESS parser for performance	mediawiki/libs/less.php	master	+11 -6
tests: Pass Title to editPage() when already parsed	mediawiki/core	master	+30 -30
Fix TestLocalisationCache being way to small	mediawiki/core	master	+5 -4
storage tests: Call editPage() with WikiPage when used for same page	mediawiki/core	master	+23 -22
api tests: Call editPage() with WikiPage when used for same page	mediawiki/core	master	+52 -55
npm: Use cache for npm ci and prefer offline	integration/quibble	master	+24 -2
[DNM] Measure impact of GeneralizedSql on tests run time	mediawiki/core	master	+3 -3
password: Reduce time cost of password unit tests	mediawiki/core	master	+2 -28
Remove selenium entries from package.json	mediawiki/extensions/MobileFrontend	master	+0 -3
dockerfiles: Add php-excimer to quibble	integration/config	master	+14 -0
jjb: [quibble] Remove 'compress junit' postbuildscript step	integration/config	master	+0 -4
[DNM] Test if reduction in logging would reduce the tests run time	mediawiki/core	master	+0 -5
Run post-dependency install, pre-test steps in parallel	integration/quibble	master	+40 -15
debug: Fix $wgDebugRawPage to work with PSR-3 debug logging	mediawiki/core	master	+56 -16
phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase (ii)	mediawiki/core	master	+2 -12
phpunit: Fix slow testBotPasswordThrottled by lowering limits	mediawiki/core	master	+8 -3
Logging: Use MWLoggerDefaultSpi	integration/quibble	master	+34 -0
phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase	mediawiki/core	master	+2 -12
Revert "zuul: Install MobileFrontend when testing Echo"	integration/config	master	+1 -1
jjb: update Quibble jobs from 1.3.0 to 1.4.3	integration/config	master	+32 -32
jjb: update Quibble jobs from 1.3.0 to 1.4.0	integration/config	master	+28 -28
release: Quibble 1.4.0	integration/quibble	master	+27 -3
build: Update eslint-config-wikimedia to 0.21.0	mediawiki/extensions/ProofreadPage	master	+11 K -621
build: Update eslint-config-wikimedia to 0.21.0	mediawiki/extensions/GrowthExperiments	master	+13 K -549
build: Update eslint-config-wikimedia to 0.21.0	mediawiki/extensions/Echo	master	+11 K -216
build: Update eslint-config-wikimedia to 0.21.0	mediawiki/extensions/AbuseFilter	master	+12 K -469
build: Update eslint-config-wikimedia to 0.21.0	mediawiki/extensions/FileImporter	master	+11 K -530
phpunit: Remove MediaWikiPHPUnitTestListener	mediawiki/core	master	+22 -140
mediawiki.jqueryMsg: Refactor test suite to not make any API requests	mediawiki/core	master	+1 K -679
Revision: use MWTimestamp directly instead of wfTimesamp()	mediawiki/core	master	+21 -14
Revision: Use setFakeTime() instead of sleep() in unit tests	mediawiki/core	master	+6 -4
build: Merge doc linting into 'npm test'	mediawiki/core	master	+1 -1
layout: Drop mediawiki-core-javascript-docker, rely on mwgate-node10-docker	integration/config	master	+0 -19
zuul: omit mediawiki-quibble-(composer\|vendor) from mwcore test pipeline	integration/config	master	+12 -5
phpunit: Speed up MediaWikiCoversValidator trait	mediawiki/core	master	+20 -19
resourceloader: Speed up structure/ResourcesTest	mediawiki/core	master	+17 -49
selenium: Remove "RunJobs" wait from specialrecentchanges test	mediawiki/core	master	+1 -3
localisation: Add process cache to LCStoreDB	mediawiki/core	master	+13 -1
jjb: Upgrade all quibble jobs to 0.0.35	integration/config	master	+22 -22
docker: rebuild for quibble 0.0.35	integration/config	master	+54 -0
Set cache directory	integration/quibble	master	+4 -0
resourceloader: Remove slow structure test for checking getVersionHash	mediawiki/core	master	+3 -10
phpunit: Avoid get_class() in MediaWikiCoversValidator	mediawiki/core	master	+1 -1
Avoid using MediaWikiIntegrationTestCase on unit tests	mediawiki/extensions/Wikibase	master	+70 -21
resourceloader: Speed up dependency checks in structure/ResourcesTest	mediawiki/core	master	+27 -37
resourceloader: Document which FileModule methods use a DB	mediawiki/core	master	+29 -5
AutoLoader: Skip tokenizing of irrelevant lines in ClassCollector	mediawiki/core	master	+37 -1
Make BabelTest less slow (don't create a dozen wiki pages for every test)	mediawiki/extensions/Babel	master	+47 -52
Add a report about slow PHPUnit tests	mediawiki/core	master	+16 -1

Revisions and Commits

rMW MediaWiki
	rMW047c184bfef7 tests: Use Title::makeTitle instead of Title::newFromText

Related Objects
Search...

View Standalone Graph

This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Status	Assigned	Task
		· · ·
Open	None	T225730 Reduce runtime of MW shared gate Jenkins jobs to 5 min
Resolved	Ladsgroup	T230701 Migrate Scribunto to stop using MediaWikiIntegrationTestCase on unit tests
Resolved	aaron	T202116 LoadBalancer opening extra connections in different connection categories doesn't work with PHPUnit & temporary tables
Resolved	hashar	T188375 castor rsync's taking 3-5 minutes for mwgate-npm jobs
Resolved	hashar	T232759 Move CI selenium/qunit tests of mediawiki repository to a standalone job
Open	None	T234002 Make MediaWiki Wdio tests less slow (Sept 2019)
Resolved	hashar	T234028 Investigate speed of MediaWiki logging setup under Quibble
Declined	None	T225871 Selenium and PHPUnit: Stop execution on failure
Resolved	Jdforrester-WMF	T234738 Quibble jobs re-download npm packages every build (Castor not loading?)
Resolved	awight	T225218 Consider httpd for quibble instead of php built-in server
Open	None	T235449 Quibble: Run PHPUnit databaseless and database stages in parallel
Resolved	None	T233133 DefaultPreferencesFactoryTest:testIntvalFilter takes ~8s for a pretty trivial test for a single preference value
Resolved	Daimona	T234020 Switch mediawiki code coverage from xdebug to pcov
Declined	None	T248531 Abort a Zuul pipeline when one job completed with failures (change zuul scheduler's failure check from areAllJobsComplete to didAnyJobFail)
Duplicate	None	T248596 Change zuul scheduler's failure check from areAllJobsComplete to didAnyJobFail to abort CI failures earlier
Open	None	T226869 Run browser tests in parallel
Open	None	T287582 Move some Wikibase selenium tests to a standalone job
Open	None	T50217 Speed up MediaWiki PHPUnit build by running integration tests in parallel
Open	None	T297348 PHPUnit: Allow test runner to use main wiki database
Stalled	None	T297349 PHPunit: Don't run DeferredUpdates by default
Open	None	T297561 Run linters before starting longer running jobs
Open	None	T298735 Run api-testing tests in parallel
Open	None	T284568 Speed up login
Open	None	T332865 PHPUnit data providers should be simple static functions that return plain data
Resolved	Daimona	T342428 Avoid creating a page and user for every PHPUnit Database test
Resolved	Daimona	T155147 Do not initialise the database in tests when not needed
		· · ·

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 745991 abandoned by Ladsgroup:

[mediawiki/core@master] [DNM] Test if reduction in logging would reduce the tests run time

Reason:

test done

https://gerrit.wikimedia.org/r/745991

Krinkle added a commit: rMW047c184bfef7: tests: Use Title::makeTitle instead of Title::newFromText.Jul 31 2022, 7:14 AM

Change 822663 had a related patch set uploaded (by Krinkle; author: Krinkle):

[integration/config@master] jjb: Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 822663 merged by jenkins-bot:

[integration/config@master] jjb: [quibble] Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 748312 merged by jenkins-bot:

[integration/config@master] dockerfiles: Add php-excimer to quibble

https://gerrit.wikimedia.org/r/748312

thcipriani edited projects, added Release-Engineering-Team (Priority Backlog 📥); removed Release-Engineering-Team (Next).Sep 7 2022, 3:46 PM

Change 853292 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

Three years later: maybe we should target a more realistic number, like 15 minutes? If we can get to 15 minutes, we could then move on to 10 minutes, and then maybe end up back at this task's current goal of 5 minutes.

T287582: Move some Wikibase selenium tests to a standalone job is a potential low hanging fruit. The Wikibase Selenium tests takes roughly 5minutes30 and only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector (and MediaWiki core obviously). If we moved those to a standalone job only triggering for those few repositories, that will save 5minutes30 for all other repositories. We would have to drop the npm selenium-test entry point from Wikibase to prevent it from being discovered when Wikibase is a dependency.

Another potentially large saver would be to mark slow tests with @group Standalone which would only trigger when a patchset target that repo and thus prevent them from running when the patchset is for another repository.

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

Change 853292 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

ReleaseTaggerBot added a project: MW-1.40-notes (1.40.0-wmf.10; 2022-11-14).Nov 4 2022, 6:00 PM

Change 859146 had a related patch set uploaded (by Krinkle; author: Tim Starling):

[mediawiki/core@master] Reduce time cost of password tests

https://gerrit.wikimedia.org/r/859146

Change 859146 merged by jenkins-bot:

[mediawiki/core@master] password: Reduce time cost of password unit tests

https://gerrit.wikimedia.org/r/859146

ReleaseTaggerBot edited projects, added MW-1.40-notes (1.40.0-wmf.12; 2022-11-28); removed MW-1.40-notes (1.40.0-wmf.10; 2022-11-14).Nov 23 2022, 1:01 AM

In T225730#8370890, @Krinkle wrote:

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. The only difference is the kind of tests being run which are filtered via quibble parameters, respectively: --skip=selenium and --run=selenium.

The registration or discovery of tests varies though. A detailed description (you probably know most of that already but that can be helpful for other readers):

PHPUnit

We run the MediaWiki core extensions suite (defined at tests/phpunit/suites/ExtensionsTestSuite.php) which:

discover tests if they are put in a /tests/phpunit directory relatively to each extension/skin.
add tests registered via the UnitTestsList hook

Those tests are depending on the centrally managed configuration from mediawiki/core eg the phpunit version.

Selenium

There is no registration mechanism, only a discovery phase. Each extension/skin test suite is standalone and can use different versions of Webdriver.io. The convention is developers define how to run the browser tests by adding a selenium-test npm script. In those jobs quibble --run=selenium has all extensions/skins cloned in and checked out, Quibble then crawls through each repo looking for a package.json and if has a selenium-test it will run those tests invoking npm run selenium. They are run serially.

The alternative would be to only run the Selenium test for the triggering repository. Something such as: quibble --command 'cd $THING_NAME && npm run-script selenium-test. But we will loose the integration testing between repository :-(

For PHPUnit test we have moved some tests to @group Standalone. The use case was to avoid running the Scribunto integration tests for any extension depending on it, given those tests are only going to be affected by a change to Scribunto there was no point in running them when a patch triggers Wikibase for example. It is another build stage defined in Quibble: `quibble --run phpunit-standalone.

I guess we could do something similar for Selenium and split tests between:

integration testing (to be run by any repo)
standalone tests (to be run solely when a patch triggers for this repo)
add a selenium-standalone npm script convention which in the repo would invoke something such as wdio --spec tests/selenium/specs/standalone/**/*.js

In T225730#8423575, @hashar wrote:

In T225730#8370314, @hashar wrote:

The Wikibase Selenium tests […] only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector […]
! In T225730#8370890, @Krinkle wrote:

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? […]

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. […] The registration or discovery of tests varies though. […] Selenium: There is no registration mechanism, only a discovery phase. […]

My reason for asking is that I thought you meant that Wikibase CI is slow because it is running selenium tests for extensions that it does not depend on. The lack of a test registration system for Selenium should not be an issue since we can only discover what we install in CI, and CI only installs what Wikibase depends on according to the repo dependency map.

For PHPUnit test we have moved some tests to @group Standalone. […] I guess we could do something similar for Selenium and split tests, [and] add a selenium-standalone npm script convention.

Another option might be to filter out @standalone tests via wdio --mochaOpts.grep --invert. This is even more similar to PHPUnit and is what we already use for running the @daily selenium tests. This has the benefit of keeping a simple and consistent way to run selenium tests on any given repo as a contributor without needing to know about this detail. It is then to Quibble to invoke a variant npm run selenium-nostandalone on non-current repos to skip those. This instead of e.g. having the main selenium entrypoint no longe run all the tests, which might lead to confusion.

kostajh added a subtask: T284568: Speed up login.Dec 9 2022, 3:50 PM

zeljkofilipin changed the status of subtask T284568: Speed up login from In Progress to Open.Dec 15 2022, 11:52 AM

Krinkle mentioned this in T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel.Feb 1 2023, 5:24 PM

Krinkle added a subtask: T332865: PHPUnit data providers should be simple static functions that return plain data.Mar 24 2023, 2:52 AM

Krinkle mentioned this in T330508: Expand running of ForeignResourceStructureTest against skins and extensions that have foreign-resources.yaml files.Apr 27 2023, 4:40 PM

Change 913562 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [DNM] Measure impact of GeneralizedSql on tests run time

https://gerrit.wikimedia.org/r/913562

Change 913562 abandoned by Ladsgroup:

[mediawiki/core@master] [DNM] Measure impact of GeneralizedSql on tests run time

Reason:

https://gerrit.wikimedia.org/r/913562

Change 702909 abandoned by Hashar:

[integration/quibble@master] npm: Use cache for npm ci and prefer offline

Reason:

https://gerrit.wikimedia.org/r/702909

TheresNoTime subscribed.Jun 6 2023, 2:38 PM

Change 932455 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] api tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/932455

Change 932455 merged by jenkins-bot:

[mediawiki/core@master] api tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/932455

ReleaseTaggerBot added a project: MW-1.41-notes (1.41.0-wmf.15; 2023-06-27).Jun 25 2023, 11:00 AM

Change 936317 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] storage tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/936317

Change 936230 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Fix TestLocalisationCache being way to small

https://gerrit.wikimedia.org/r/936230

Change 936317 merged by jenkins-bot:

[mediawiki/core@master] storage tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/936317

ReleaseTaggerBot edited projects, added MW-1.41-notes (1.41.0-wmf.17; 2023-07-11); removed MW-1.41-notes (1.41.0-wmf.15; 2023-06-27).Jul 7 2023, 9:00 PM

Change 936230 merged by jenkins-bot:

[mediawiki/core@master] Fix TestLocalisationCache being way to small

https://gerrit.wikimedia.org/r/936230

thiemowmde subscribed.Jul 9 2023, 6:43 PM

Change 938346 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] tests: Pass Title to editPage() when already parsed

https://gerrit.wikimedia.org/r/938346

Daimona added a subtask: T155147: Do not initialise the database in tests when not needed.Jul 15 2023, 4:54 PM

Change 938346 merged by jenkins-bot:

[mediawiki/core@master] tests: Pass Title to editPage() when already parsed

https://gerrit.wikimedia.org/r/938346

ReleaseTaggerBot edited projects, added MW-1.41-notes (1.41.0-wmf.18; 2023-07-18); removed MW-1.41-notes (1.41.0-wmf.17; 2023-07-11).Jul 16 2023, 10:00 AM

Change 939296 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Re-arrange execution order in LESS parser for performance

https://gerrit.wikimedia.org/r/939296

Daimona updated the task description. (Show Details)Jul 18 2023, 1:00 PM

Change 939684 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

Change 939690 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Translate@master] [POC] Set content language to qqx in slow tests

https://gerrit.wikimedia.org/r/939690

Change 939296 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/libs/less.php@master] Re-arrange execution order in LESS parser for performance

Reason:

https://gerrit.wikimedia.org/r/939296

Change 939727 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Inline and optimize heavily called Parser::MatchQuoted

https://gerrit.wikimedia.org/r/939727

Change 939690 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/Translate@master] [POC] Set content language to qqx in slow tests

Reason:

No, that's not it.

https://gerrit.wikimedia.org/r/939690

Change 939717 had a related patch set uploaded (by Krinkle; author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Less_Parser: Faster MatchQuoted() by using native strcspn()

https://gerrit.wikimedia.org/r/939717

Change 939717 merged by jenkins-bot:

[mediawiki/libs/less.php@master] Less_Parser: Faster MatchQuoted() by using native strcspn()

https://gerrit.wikimedia.org/r/939717

Change 939684 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

In T225730#9030571, @gerritbot wrote:

Change 939684 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

From my review on that change:

Think the root cause is the localization cache is set to use a static array from includes/DevelopmentSettings.php:

// Localisation Cache to StaticArray (T218207)
$wgLocalisationCacheConf['store'] = 'array';

For PHPUnit tests, it is probably fine since there is a single process. But for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

ReleaseTaggerBot edited projects, added MW-1.41-notes (1.41.0-wmf.19; 2023-07-25); removed MW-1.41-notes (1.41.0-wmf.18; 2023-07-18).Jul 20 2023, 10:00 AM

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

But it is a problem if it's done thousands of times for tests that don't even do anything with the results.

I'm digging into this for a few days now but can't find that single bottleneck anywhere. I suspect the majority of the up to 20 minutes this ticket talks about is just because we are naively running the same processes with the same inputs and the same outputs over and over again, millions and millions of times.

In T225730#9030945, @thiemowmde wrote:

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

T50217#9030084 is related. I noticed that paratest was hanging, and spending most of the time (12 of 18 seconds) on a single test, LanguageConverterFactoryTest. I looked at that test yesterday, it basically loads the localisation cache for all languages, one by one. And while loading a single language isn't terribly expensive (the slowest I saw was around 80ms), it adds up when done hundreds of times. I couldn't find a solution to that, though, and just stopped trying after a while.

Change 940120 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make MapCacheLRU in LanguageFactory static

https://gerrit.wikimedia.org/r/940120

Change 940233 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] TestLocalisationCache: add static cache of JSON files

https://gerrit.wikimedia.org/r/940233

In order to help understand what's going on under the hood and make more informed decisions on what to improve and what don't. I run tests with excimer and built flamegraphs out of them which you can see for three types of tests (phpunit, selenium, api_tests) here: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/

The flamegraphs can be searched, zoomed in, etc. (More info: https://www.brendangregg.com/flamegraphs.html)

That provides way better insights and are a treasure trope of areas to improve. I haven't fully dug into all of them but for example: 3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.

Feel free to dig into them!

(How it was made? this patch enables logging of excimer and tests create two files, one for wall time and one for cpu time with sampling rate of every .1 second. Then downloaded those logs and run this on the logs: cat ../selenium/cpu-profile.log* | ./flamegraph.pl > ../selenium/cpu-profile.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl > ../selenium/wall-profile.svg; cat ../selenium/cpu-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/cpu-profile.reverse.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/wall-profile.reverse.svg)

You can also take the setting of the patch and enable it locally and run them.

In T225730#9032430, @Ladsgroup wrote:

3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

That's something I'd like to get rid of. But first of all, we should avoid cloning and resetting the database for tests not in the 'Database' group (T155147). Once that's done, I'll look into not creating a page and a user for every test.

Zooming out a little bit, PHPUnit (and possibly selenium too, but I haven't tested it) would benefit a lot from parallelization with paratest, see T50217#9022060 and the following comments. I'm a bit skeptical about seeing a x6 speedup in CI as I observed locally; but still, even if it's just a 2x, it's much more than we can do by just improving the code. I'm also planning to look into paratest support once T155147 and T90875 are done.

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

As requested by @Krinkle I increased the sampling rate from every 100m to 1ms. The result are here https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/

In T225730#9032655, @Daimona wrote:

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

13 seconds is not much compared to at 1-2 minutes of every selenium test being spent on cache misses of RL modules: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/wall-profile.selenium.svg

In T225730#9032430, @Ladsgroup wrote:

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.
[…]

When I download the .log artefacts and load them up into Speedscope, I can confirm that APCUBagOStuff is selected (so php-apcu cache is enabled and discovered by MW), and we see both doGet calls without minification, and minification calls with doSet. Suggesting that functionally it is using the cache correctly.

It's had to tell from the current snapshot whether it is slow or whether it is missing the cache more than once for the same module. The settings here are more or less stock MediaWiki + DevelopmentSettings, similar to mediawiki-docker where this is generally working well.

My first hunch here is that the CI worker is under memory pressure and forced to evict fresh values from the cache before they can be used. For example, the default apc.shm_size in PHP is 32M (https://www.php.net/manual/en/apcu.configuration.php) which is quite small but may just about suffice for human use in MediaWiki-Docker, but in CI we're likely over a threshold where we engage with a lot of different areas. Likewise, opcache (https://www.php.net/manual/en/opcache.configuration.php) has a default of opcache.memory_consumption=128 (MB) and opcache.interned_strings_buffer=8 (MB).

In production (Codesearch) we use:

# appserver
profile::mediawiki::apc_shm_size: 6096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# jobrunner
profile::mediawiki::apc_shm_size: 4096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# default (unused?)
profile::mediawiki::apc_shm_size: 128M
opcache.interned_strings_buffer: 50
opcache.memory_consumption: 300

We don't need that much in CI given that we only interact with 1 wiki (not 1000), and in ~1 language (not 300). It currently has the following set in Quibble (source):

opcache.memory_consumption=256

I presume the others inherit upstream defaults of apc.shm_size=32M and opcache.interned_strings_buffer=8, which would be very small indeed.

I've uploaded the raw log files from @Ladsgroup's high-res exicmer capture (change 940226) to people.wikimedia.org, which has CORS enabled. This means we can load them into Speedscope (aka Excimer UI).

https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/

For example (this is a 35MB large file): https://performance.wikimedia.org/excimer/speedscope/#profileURL=https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/phpunit_mwquibble_cpuprofile.log.

Screenshot 2023-07-21 at 03.02.01.png (1×2 px, 403 KB)

The sample count reflects time passing in milliseconds. It says the phpunit invocation for mediawiki-quibble-vendor yielded 374,000 samples, or 374 seconds, or 6 minutes, which is right.

Using the "Sandwhich" mode, which is like a more powerful version of classic "reverse" flamegraphs, we can quickly find (for example) the LocalisationCache as having one of the highest "self" times. Clicking it, then reveals which tests it is largely spent by, with most (57%) coming from LanguageConverterFactoryTest.

Screenshot 2023-07-21 at 03.06.49.png (973×2 px, 404 KB)

In T225730#9032655, @Daimona wrote:

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run.

I incidentally happened to look into language object instantiation yesterday for another reason: POC: Create Language objects without initializing l10n cache – I’ll probably look into this a bit more. (Edit: now tracked at T342418.)

Daimona added a subtask: T342428: Avoid creating a page and user for every PHPUnit Database test.Jul 21 2023, 12:14 PM

Michael subscribed.Jul 21 2023, 4:02 PM

Change 940502 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 939727 merged by jenkins-bot:

[mediawiki/libs/less.php@master] Less_Parser: Inline and optimize heavily called MatchQuoted()

https://gerrit.wikimedia.org/r/939727

Change 940502 merged by jenkins-bot:

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 940929 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

Change 940929 merged by jenkins-bot:

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

ReleaseTaggerBot edited projects, added MW-1.41-notes (1.41.0-wmf.20; 2023-08-01); removed MW-1.41-notes (1.41.0-wmf.19; 2023-07-25).Jul 25 2023, 8:00 PM

If we can just fix the RL's caching in the apache server of selenium. We can easily brush off 1-5 minutes (!) from each selenium run. See my attempt in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/944984

Novem_Linguae subscribed.Aug 8 2023, 6:18 AM

Krinkle moved this task from Watching to Perf recommendation on the Performance-Team (Radar) board.Aug 18 2023, 8:02 PM

Krinkle edited projects, added Wikimedia-Performance-recommendation; removed Performance-Team (Radar).Aug 18 2023, 8:42 PM

Daimona closed subtask T155147: Do not initialise the database in tests when not needed as Resolved.Aug 31 2023, 3:19 PM

Jdforrester-WMF closed subtask T342428: Avoid creating a page and user for every PHPUnit Database test as Resolved.Sep 7 2023, 1:30 PM

kostajh mentioned this in T351357: In CI, only run tests for code that is affected by a given change.Nov 16 2023, 12:08 PM

Change 698467 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Use a single login for browser tests

Reason:

https://gerrit.wikimedia.org/r/698467

Change 698468 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Move notifications page test into general suite

Reason:

https://gerrit.wikimedia.org/r/698468

Change 748314 abandoned by Hashar:

[integration/quibble@master] [DNM] Add excimer config

Reason:

https://gerrit.wikimedia.org/r/748314

Krinkle changed the status of subtask T297349: PHPunit: Don't run DeferredUpdates by default from Open to Stalled.Dec 14 2023, 10:56 PM

Change 774409 abandoned by Kosta Harlan:

[mediawiki/core@master] DevelopmentSettings: Use MWLoggerDefaultSpi for debug logging

Reason:

https://gerrit.wikimedia.org/r/774409

wmf-quibble-core-vendor-mysql-php74-docker/37543/console today shows 18m 47s, which is a bit higher than the upper end of the spectrum noted in May 2021 (13-18m).

In T225730#8370008, @kostajh wrote:

Three years later: maybe we should target a more realistic number, like 15 minutes? If we can get to 15 minutes, we could then move on to 10 minutes, and then maybe end up back at this task's current goal of 5 minutes.

I still think this (more realistic scope) might be a useful way to orient our work, though some individual or team would have to own this goal for it to be realistic, and to hold everyone accountable to it.

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

Tgr updated the task description. (Show Details)Feb 12 2024, 3:01 PM

In T225730#9533714, @Daimona wrote:

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

The obsolete, by 5 years, CI infrastructure will not be able to handle that. Last time I checked PHPUnit took a full CPU and MySQL took the other half and running them in parallel would exceed the capacity of the executing VM which runs multiple jobs in parallel competing for the same resource.

There are a few things that can potentially be done though such as:

moving Wikibase tests to their own jobs instead of running them from any repository participating in the wmf-quibble jobs T287582
related find a way to avoid running every single tests
do some profiling to find the low hanging fruit. I used to do that regularly and found some oddities such as a MediaWiki a service uselessly triggering, using a large PHPUnit dataprovider which causes each cases to be hit by PHPUnit test overhead
mocking the slow database wherever possible

do some profiling to find the low hanging fruit. I used to do that regularly and found some oddities […]

I also do this regularly, and will happily continue to do so. However. The sad reality is that it doesn't make much of a difference. The slowest individual PHPUnit tests we currently have take about 2 seconds. That's about 0.2% of the total runtime. Even if we could identify and eliminate 200 of such slow tests, the remaining 12,000 tests would still take 10 minutes.

I think the only approach that makes an actual difference is to find more ways to not run all tests all the time. For example, what about a more fine-grained tree of dependencies between codebases that tells us more than the current list of "gated extensions"? For example, while it's critical to run the FileImporter tests whenever something in core's import system is changed, it doesn't make sense to run the Wikibase tests when I make changes in FileImporter. There are a lot of weird combinations like this.

That said, one interesting bottleneck I'm still exploring is the LanguageFactory, see https://gerrit.wikimedia.org/r/940120.

In T225730#9534306, @hashar wrote:

In T225730#9533714, @Daimona wrote:

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

The obsolete, by 5 years, CI infrastructure will not be able to handle that. Last time I checked PHPUnit took a full CPU and MySQL took the other half and running them in parallel would exceed the capacity of the executing VM which runs multiple jobs in parallel competing for the same resource.

Are there plans to improve the infrastructure or add more resources? As Thiemo said, improving individual tests is, for the most part, not really useful, because there isn't a single source of slowness. Running fewer tests is also an option, but I'm not sure if doing that cleverly is that easy: MW has a complex ecosystem, so it's not just a matter of flipping a switch.

IMHO, the lowest-hanging fruit is parallelization. If we find a way to make that work (both in MW and the CI infrastructure), we should see a huge reduction in CI times, possibly more than can be achieved by running fewer test, and at a lower maintenance cost. Once that's out of the way, perhaps start considering a more fine-grained testing infrastructure to run fewer tests.

At any rate, it's clear that resources are needed to reach this goal. There's no magic trick that will cut the CI runtime by just snapping one's fingers. And I agree with Kosta that this can only happen if a person or team chooses to own this goal, else it'll be really hard to find the necessary resources.

Slow CI can, at times, have a massive effect on productivity. And as "fun" as it might sound, having your patch wait 40 minutes after the +2, only for it to fail a random selenium test and having to go through gate-and-submit again really makes me wanna throw my laptop out of the window sometimes.

In T225730#9535536, @Daimona wrote:

Slow CI can, at times, have a massive effect on productivity. And as "fun" as it might sound, having your patch wait 40 minutes after the +2, only for it to fail a random selenium test and having to go through gate-and-submit again really makes me wanna throw my laptop out of the window sometimes.

Somewhat orthogonal to this task, but wanted to note that T323750: Provide early feedback when a patch has job failures can help you know sooner when there's a failure. It has to be enabled per repo, though. (Patch for MW core if anyone wants to look.) It's a band-aid fix, yes, but you at least know of a failure earlier and can trigger a rebase to restart the tests.

I think there are two common use-cases where slow CI is annoying:

When submitting a change for code review and not learning quickly that it needs further improvements. As Kosta says, the early feedback bot could help with this.
When backporting changes, and having to do a merge (or several merges) in a 60-minute window. IMO we should just make most CI on wmf/ branches non-voting, they provide little value (even when they fail, it rarely represents a real error). T307180: Drop Selenium tests from gate-and-submit-wmf

Agreed that early feedback can help. One more thing to consider though: the bot helps when your patch is the only thing in CI, or at least if there aren't many patches being tested. But CI can get really full sometimes (e.g. if LibUp is running). When that happens, you also need to wait for all the previous jobs to complete before yours even starts, and this can only be improved by making everything faster.

ArthurTaylor mentioned this in T361118: [REPO][CLIENT][SW] Reduce CI test runtime for Wikibase and related extensions.Wed, Mar 27, 3:26 PM

hashar closed subtask T248531: Abort a Zuul pipeline when one job completed with failures (change zuul scheduler's failure check from areAllJobsComplete to didAnyJobFail) as Declined.Fri, Mar 29, 3:06 PM

pmiazga subscribed.Mon, Apr 15, 11:00 AM

	F37145008: Screenshot 2023-07-21 at 03.06.49.png
	Jul 21 2023, 2:08 AM

	F37145004: Screenshot 2023-07-21 at 03.02.01.png
	Jul 21 2023, 2:08 AM

	F29513257: Screenshot 2019-06-13 at 15.57.50.png
	Jun 13 2019, 3:28 PM

	F29513308: Screenshot 2019-06-13 at 15.59.18.png
	Jun 13 2019, 3:28 PM

	F30679987: wmf-quibble-core
	Oct 14 2019, 6:51 PM

Reduce runtime of MW shared gate Jenkins jobs to 5 minOpen, MediumPublicActions

Description

Objective

Status quo

Updated status quo

Scope of task

Stuff done

Ideas to explore and related work

Details

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic
Actions

Related Objects
Search...