Kafka HTTP purging

From Wikitech

The current (2020) mechanism for purging objects from the CDN is based on a daemon running on all cache nodes called Purged. Purged can be configured to read purge messages using either the legacy Multicast HTCP purging mechanism, or via Kafka. Regardless of the source from which purge messages are read, Purged converts them into HTTP PURGE requests sent locally to both the ATS cache backend and to the Varnish cache frontend.

Typical purge flow

  • MediaWiki detects that a purge is needed. It produces a Kafka message on a given topic for each individual URI that needs to be purged
  • Purged, the daemon running on every relevant cache machine, consumes the appropriate Kafka topics receives a copy of the purge message. Purged forwards the request to the Varnish and ATS instances on localhost over a persistent HTTP/1.1 connection, using the PURGE request method.
  • PURGE requests are handled by ATS and Varnish and cause the cache object in question to be invalidated.
CDN purge flow (MW to Kafka to Purged to Varnish/ATS).

MediaWiki

All CDN purges are generated in MediaWiki via CdnCacheUpdate::purge method. Currently MediaWiki is configured to send the generated purges to the EventRelayer under the cdn-url-purges key. EventBus extension provides an implementation of the EventRelayer, CdnPurgeEventRelayer that creates purge events and sends them to Kafka using normal EventBus flow - via eventgate service.

Relevant configuration:

// Configuration for the EventRelayer to send purges to resource-purge kafka topic
'wgEventRelayerConfig' => [
	'cdn-url-purges' => [
		'class' => \MediaWiki\Extension\EventBus\Adapters\EventRelayer\CdnPurgeEventRelayer::class,
		'stream' => 'resource-purge',
	],
	'default' => [
		'class' => EventRelayerNull::class,
	],
],
// EventBus stream configuration 
'wgEventServiceDefault' => 'eventgate-main'

One-off purge

On mwmaint1002, run:

$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php

Note that static content under /static/ must always be purged via hostname 'en.wikipedia.org'. This is the shared virtual hostname under which Varnish caches content for /static/, regardless of requesting wiki hostname. Note also that mobile hostnames are cached independently of desktop hostnames. For example, to purge all copies of enwiki's article about Foo, one must purge both https://en.wikipedia.org/wiki/Foo and https://en.m.wikipedia.org/wiki/Foo