Open Source for Open Knowledge
June 2021 Datacenter Switchover
July 23, 2021
Site Reliability Engineering Kunal Mehta
In June 2021, most user traffic was switched from our primary Virginia datacenter to our secondary one in Texas. This post covers how the swtichover went and the issues that came up.
By Kunal Mehta, Site Reliability Engineer, Service Operations
In June 2021, the Wikimedia Foundation’s Site Reliability Engineering team switched most user traffic from our primary datacenter in Virginia (“eqiad”) to our secondary one in Texas (“codfw”, learn more about our different datacenters). This is an exercise we’ve done multiple times over the past five years, and this was the smoothest and fastest one yet.
The main reason we perform a datacenter switchover is to verify that in an emergency, we can switch to a different datacenter with minimal interruptions for users. All of our services and datacenters have redundant networking, power, disks, and more. Even then, freak accidents can happen, and we need to be prepared.
We also used this time to perform maintenance in Virginia that’s cumbersome to do when we’re actively serving user traffic. For example, we’re currently swapping out about 45 MediaWiki application servers for brand new hardware, giving users a slight performance boost. There’s also a large list of pending database maintenance that was waiting for the switchover to happen.
The switchover itself was divided into three primary sections: Services, Traffic (caches), and MediaWiki.
At one point in time, MediaWiki was a large PHP application, but years ago, we started deconstructing it into a set of smaller services. Today, we have MediaWiki, which is still a large PHP application, and many services that provide some independent function to MediaWiki, such as maps, or math syntax, or even the WikiText parsing itself. For each switchover, we try to expand the list of services being switched. This time we included two more services in this list, notably Swift, which handles all of our media storage.
Most of these are active-active, in that they run out of both datacenters at the same time. Under normal circumstances, we choose to use these in the same datacenter as MediaWiki. During the switchover, we moved usage to Texas to ensure we have enough capacity there to handle the load. Here’s an example of traffic shifting from Virginia to Texas for the Citoid service, which fetches and generates reference templates and metadata.

Virginia graph by Legoktm, CC BY-SA 4.0

Texas graph by Legoktm, CC BY-SA 4.0
During this process we identified a few issues:
Most requests for articles never hit MediaWiki itself. They’re served from our edge caches, typically the one closest to you, of: Virginia, Texas, California, Amsterdam, or Singapore. We disconnected Virginia by excluding it from our geographic DNS, where all countries are mapped to datacenters, and within a few minutes, nearly all of that traffic was going to Texas instead.

Varnish traffic graph by Legoktm, CC BY-SA 4.0
We didn’t run into any issues during this step.
MediaWiki is the application that powers all of our wikis. Work is ongoing to make it possible to run it in multiple datacenters at the same time, but for now, it can only be active in one at a time. The process for switching datacenters for MediaWiki is complex, but in brief entails setting the primary databases as read-only, waiting for replication to finish across into the other datacenter, and then lifting read-only mode in the new datacenter.
Because of how disruptive stopping edits is for wikis, we’ve been cutting down how long this read-only period takes, each time. This time, it only lasted 1 minute and 57 seconds, the fastest yet!
After the switch, the Turkish Wikivoyage was unavailable for a few minutes because of a typo in the configuration. An incident report was written for this, and a patch is pending review to prevent it from happening again.
Various other improvements to the automation around switching have been filed in Phabricator as well.
Next steps
We will switch back to our primary Virginia datacenter sometime in August once most maintenance has finished, allowing us to test the procedure once again. We also have Datacenter-Switchover and MediaWiki-MultiDC Phabricator projects tracking our work in this area to make Wikimedia wikis more resilient and available on a technical level.
About this post
Featured image credit: Wikimedia Servers by Victor Grigas, CC BY-SA 3.0
MediaWiki SRE swtichover
1 thought on “June 2021 Datacenter Switchover”
jyoti karotiya says:
AUGUST 11, 2021 AT 5:16 AM
this blog is really helpful for me.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
Notify me of follow-up comments by email.
Notify me of new posts by email.
The rollout of single-sign-on (SSO) at the Wikimedia Foundation
Sending messages to Wiki users in their preferred language
Privacy Policy | About
Wikipedia® and other Wikimedia project names and logos are registered trademarks of the Wikimedia Foundation, a non-profit organization.
Unless otherwise stated content is licensed under a CC BY-SA 4.0 international license.
Powered by VIP, Automattic Privacy Notice.
Learn more about the
Wikimedia Foundation
Follow us on Twitter @wikimediatech