Page MenuHomePhabricator

Convert static mediawiki configuration to form more suitable for containers
Open, MediumPublic

Description

Having static configuration built into the mediawiki container means that it would have to be rebuilt every time configuration changed. The container image shouldn't be tied to a specific configuration or envrionment. We need to move the configuration out of the build step so that we have dynamic configuration and a configuration change only requires a re-deploy.

There has been some discussion here: https://www.mediawiki.org/wiki/Talk:Wikimedia_Release_Engineering_Team/MW-in-Containers_thoughts
and Joe has already written up a good starting point here: https://docs.google.com/document/d/1nGv5N1_PYM5Wl5TTPlkFI8TZTSokuJcmdIZD-dKZdEA

This will likely involve some combination of moving configuration to a data store, inserting some as environment variables, and (maybe) attaching a config volume to the container.

Joe has proposed the following in his document:

  • State and wiki-specific configuration will live in a configuration datastore, and will be fetched periodically by MediaWiki. Right now we’re using etcd for state, but this decision might be revisited if we need to store more stuff into this system.
  • Production code and global, “immutable” configuration should be part of the code release
  • Miscellanea should be managed separately, possibly as another repository again. This needs to be better defined.
  • Every MediaWiki branch will be bundled with production code and Miscellanea and built into a container. It probably makes sense to have a layered approach to such containers so that we can exploit copy on write as much as possible (given production code will be the same across all containers).

Event Timeline

I thought of some more guidelines we could use to help migrate our configuration. Open to feedback.

  • config that is unlikely to change often could go in a datasore (e.g. the wiki-specific configuration mentioned in the task)
  • config that needs to be edited by the application should be stored in a datastore (e.g. the state configuration mentioned in the task)
  • secret config should be stored in a separate database or maybe something like hashicorp Vault is an option?
  • ‘tuning’ config, such as timeouts, etc could be inserted as an environment variable or configmap
  • frequently changed (by humans) config should be inserted as an environment variable or configmap

As usual, one of the challenges is that whatever we do should also downscale/work seamlessly for local development and for third party installations without all the fanciness of our production environment.

Removed from the T238770: Deploy MediaWiki to Wikimedia production in containers task tree as this doesn't necessarily block deployment.

For the first iteration packaging wmf-config with the code will suffice.

greg triaged this task as Medium priority.Apr 18 2021, 12:06 AM

Based on my limited understanding of Kubernetes, I could imagine a multi-layered configuration system, where one layer can override the other. I'd be interested to hear if I'm at all thinking in the right direction with this.

The base layer would be per-site configuration. This can be pre-generated for each wiki, and be copied or mounted into the mediawiki containers. This rarely changes.

The medium layer would be per-environment configuration, it would differ between data centers, or for staging, and maybe we could even have one to run locally in minicube. This can probably go into Kubernetes ConfigMaps.

The top layer is the "live override" layer that allows us to react to incidents, e.g. to take a db server out of rotation, adjust a rate limit, or disable a feature flag. This could live in etcd.

On top of it all, there would probably be short-term caching in APC.

Does this sound like a viable approach?

Based on my limited understanding of Kubernetes, I could imagine a multi-layered configuration system, where one layer can override the other. I'd be interested to hear if I'm at all thinking in the right direction with this.

The base layer would be per-site configuration. This can be pre-generated for each wiki, and be copied or mounted into the mediawiki containers. This rarely changes.

To clarify, when you say "into the containers" do you mean the container images or at runtime via a k8s volume (sourced from a configmap or otherwise)?

The medium layer would be per-environment configuration, it would differ between data centers, or for staging, and maybe we could even have one to run locally in minicube. This can probably go into Kubernetes ConfigMaps.

Use of ConfigMaps and/or Secrets (with a real at-rest encryption provider configured) would be the natural k8s option IMO, and k8s already supports a degree of layering of these sources with projected volumes, allowing multiple config sources to be written to files under the same mountpoint—say a conf.d like directory with simple numerical priority.

The top layer is the "live override" layer that allows us to react to incidents, e.g. to take a db server out of rotation, adjust a rate limit, or disable a feature flag. This could live in etcd.

I think this depends on the requirements of those kinds of config changes. On its face, it seems a little redundant to use etcd when k8s ConfigMaps/Secrets (or any k8s object) are already in etcd and can be modified both as a part of deployments (via chart templates) and independently via direct calls to the API (kubectl or custom tooling)—much like etcd configuration. However, I don't know how quickly k8s can propagate CM changes downstream to running Pods—it seems to depend on the configured cache and TTLs—and whether that would meet the requirements for db server rotation for example.

I lean towards: If ConfigMap/Secret change propagation is fast enough for us, we should use them; If not, etcd seems like a very reasonable second choice.

Another requirement that I'd like to suggest is that changes to the aggregate config must be auditable/reviewable prior to deployment without requiring a MW runtime. A huge issue we have now is that when you (or someone) modifies a single variable in mediawiki-config, it's nearly impossible to know how it will effect our multiple environments. This stems from the fact that it is not so much configuration as initialization and there are many conditionals and code paths—too many for a reviewer to make sense of.

The base layer would be per-site configuration. This can be pre-generated for each wiki, and be copied or mounted into the mediawiki containers. This rarely changes.

To clarify, when you say "into the containers" do you mean the container images or at runtime via a k8s volume (sourced from a configmap or otherwise)?

I was thinking that either would work, since this config would be the same no matter where the image is deployed, and it would not change between deployments. Can you confirm that?

The medium layer would be per-environment configuration, it would differ between data centers, or for staging, and maybe we could even have one to run locally in minicube. This can probably go into Kubernetes ConfigMaps.

Use of ConfigMaps and/or Secrets (with a real at-rest encryption provider configured) would be the natural k8s option IMO, and k8s already supports a degree of layering of these sources with projected volumes, allowing multiple config sources to be written to files under the same mountpoint—say a conf.d like directory with simple numerical priority.

My understanding was that config maps are limited in size, and may not be able to support all config we need, including e.g. interwiki maps and such. If this is not the case, having everything in one place is of course the simplest solution.

I lean towards: If ConfigMap/Secret change propagation is fast enough for us, we should use them; If not, etcd seems like a very reasonable second choice.

from my limited understanding, I was assuming that ConfigMaps can't be used to push config changes to running pods... at least not reliably. If it is possible to have everything update within a similar time frame that is currently possible with etcd, then so much the better.

Another requirement that I'd like to suggest is that changes to the aggregate config must be auditable/reviewable prior to deployment without requiring a MW runtime. A huge issue we have now is that when you (or someone) modifies a single variable in mediawiki-config, it's nearly impossible to know how it will effect our multiple environments. This stems from the fact that it is not so much configuration as initialization and there are many conditionals and code paths—too many for a reviewer to make sense of.

I am envisioning a way to "pre-compile" the configuration for each wiki by a system of mixing and matching configuration snippets. However, this would be part of the deployment process, MediaWiki would be totally oblivious of it. MediaWiki may get a mechanism for loading config from json files and combining and layering config from multiple sources - this will come in handy for testing, for instance. For production use, having a pre-compiled php file is probably preferable for performance reasons, as well as for auditing.

The base layer would be per-site configuration. This can be pre-generated for each wiki, and be copied or mounted into the mediawiki containers. This rarely changes.

To clarify, when you say "into the containers" do you mean the container images or at runtime via a k8s volume (sourced from a configmap or otherwise)?

I was thinking that either would work, since this config would be the same no matter where the image is deployed, and it would not change between deployments. Can you confirm that?

In the case where it's baked into the image, no, it wouldn't change between deployments (of the same image). Similarly, but perhaps slightly more flexibly, a ConfigMap defined within the helm chart template would not change between deployments unless the template (or variables within, generally sourced from the values file) is changed. The benefit of doing the latter is that changes (even if they are infrequent) would not necessitate rebuilding the image.

The medium layer would be per-environment configuration, it would differ between data centers, or for staging, and maybe we could even have one to run locally in minicube. This can probably go into Kubernetes ConfigMaps.

Use of ConfigMaps and/or Secrets (with a real at-rest encryption provider configured) would be the natural k8s option IMO, and k8s already supports a degree of layering of these sources with projected volumes, allowing multiple config sources to be written to files under the same mountpoint—say a conf.d like directory with simple numerical priority.

My understanding was that config maps are limited in size, and may not be able to support all config we need, including e.g. interwiki maps and such. If this is not the case, having everything in one place is of course the simplest solution.

Ah, that's true. I believe the limit is 1Mb per CM. Using a projected volume would allow us to segment that into multiple CMs where necessary and perhaps came in under the size limit, but that of course means more segmentation/complexity. Do we have a sense of the absolute size of those interwiki maps and other large portions of config?

from my limited understanding, I was assuming that ConfigMaps can't be used to push config changes to running pods... at least not reliably. If it is possible to have everything update within a similar time frame that is currently possible with etcd, then so much the better.

I'm going off of general k8s documentation on ConfigMap:

When a ConfigMap currently consumed in a volume is updated, projected keys are eventually updated as well

However, the keyword here is "eventually" and it seems to be highly dependent on the cache in use. @akosiaris or @Joe can you clarify whether ConfigMaps in our k8s clusters are kept eventually consistent with running pods and what the TTLs are?

I am envisioning a way to "pre-compile" the configuration for each wiki by a system of mixing and matching configuration snippets. However, this would be part of the deployment process, MediaWiki would be totally oblivious of it. MediaWiki may get a mechanism for loading config from json files and combining and layering config from multiple sources - this will come in handy for testing, for instance. For production use, having a pre-compiled php file is probably preferable for performance reasons, as well as for auditing.

That sounds great! If the pre-compiled config could go through code review (or at least be presented during code review of any constituent config source) that would be wonderful.

Pinging @hnowlan for contextual awareness.

Based on my limited understanding of Kubernetes, I could imagine a multi-layered configuration system, where one layer can override the other. I'd be interested to hear if I'm at all thinking in the right direction with this.

The base layer would be per-site configuration. This can be pre-generated for each wiki, and be copied or mounted into the mediawiki containers. This rarely changes.

The medium layer would be per-environment configuration, it would differ between data centers, or for staging, and maybe we could even have one to run locally in minicube. This can probably go into Kubernetes ConfigMaps.

The top layer is the "live override" layer that allows us to react to incidents, e.g. to take a db server out of rotation, adjust a rate limit, or disable a feature flag. This could live in etcd.

I would like to see us split config into multiple layers by *who* should be configuring it and how often it needs to change.

Broadly:

  • SRE-level stuff, like server hostnames should be in etcd for ease of automation
  • releng-level stuff, like which wiki is in which group or which wiki is on which release, which extensions are loaded: probably Git, so it goes through CI and can be diffed and blamed
  • basic wiki configuration tasks, currently by sysadmins or trusted users but really should and can be managed through on-wiki interfaces and stored in MySQL.

I think we should also try to move away from loading all configuration during initialization, and defer stuff that's needed only for specific features or endpoints until it's actually needed.

Broadly:

  • SRE-level stuff, like server hostnames should be in etcd for ease of automation
  • releng-level stuff, like which wiki is in which group or which wiki is on which release, which extensions are loaded: probably Git, so it goes through CI and can be diffed and blamed
  • basic wiki configuration tasks, currently by sysadmins or trusted users but really should and can be managed through on-wiki interfaces and stored in MySQL.

I think we should also try to move away from loading all configuration during initialization, and defer stuff that's needed only for specific features or endpoints until it's actually needed.

Yes!

It would also be nice to get rid of a bunch of conditional logic in the config initialization. I'm not sure how to clean it up but putting some stuff in the database might be a good direction to explore.

I'm currently doing an exploration of how we can make more flexible (and at the same time, sane) about how it loads config. Here are some ideas I have after digging around in CommonSettings.php for a while:

  • config falls into one of four categories: datacenter-specific, server-group specific, wiki-specific, and dynamic overrides (from etcd).
  • most of what CommonSettings.php does could be captured into static (json) files
  • the remaining bits (hooks, plus dynamic behavior) would be extracted into a WMF specific extension.
  • when running from k8s, an app server would get the right config for the data center and server group from a config-map.
  • pre-generated per-wiki config could be mounted in a volume, or even be bundled with the image.
  • mediawiki then only needs a bit of logic to pick the correct per-wiki config at runtime, based on the request (more specifically, the host header).

I plan to submit a more detailed proposal to the Tech Decision Forum shortly.

PS: We can still put some of the config into a database if we want. The point is to allow config to be mixed and matched on the fly inside mediawiki, while pre-generating as much into flat files as possible beforehand.

I plan to submit a more detailed proposal to the Tech Decision Forum shortly.

This is really awesome to hear. I don't know how I can materially contribute other than encouragement but I wholeheartedly support this and all of your suggestions seem very appropriate.

And most importantly, this sounds like it could finally be the end of multiversion! 🎉

I'm currently doing an exploration of how we can make more flexible (and at the same time, sane) about how it loads config. Here are some ideas I have after digging around in CommonSettings.php for a while:

  • config falls into one of four categories: datacenter-specific, server-group specific, wiki-specific, and dynamic overrides (from etcd).

My worry with this kind of organization is that it runs against Conway's Law. I don't think the same workflow for making MW changes works for all the different teams that want to make changes, and we shouldn't try to do that either.

  • most of what CommonSettings.php does could be captured into static (json) files
  • the remaining bits (hooks, plus dynamic behavior) would be extracted into a WMF specific extension.

I want to strongly push against this just because it'll end up being a dumping ground for WMF hacks. Most hooks and dynamic behavior needs fixing properly in MediaWiki.

  • when running from k8s, an app server would get the right config for the data center and server group from a config-map.

For things that we expect to stay relatively static, I think this makes sense (e.g. envoy proxies, long-lived hostnames).

  • pre-generated per-wiki config could be mounted in a volume, or even be bundled with the image.

The main advantage to bundling in the image is that it's very reproducible, config is in the image so reverting is as simple as using an older image.

But the downside is that it requires building a new image (and adds complexity to that process), pushing it to the registry, pulling it on the nodes, etc. I don't yet have a strong feeling either way on which solution would be better, I think it partly depends what's going to be easier/faster for deployers.

  • mediawiki then only needs a bit of logic to pick the correct per-wiki config at runtime, based on the request (more specifically, the host header).

I plan to submit a more detailed proposal to the Tech Decision Forum shortly.

PS: We can still put some of the config into a database if we want. The point is to allow config to be mixed and matched on the fly inside mediawiki, while pre-generating as much into flat files as possible beforehand.

I'm not yet sold on pre-generating and loading everything during initialization yet. Why does load.php need to load all of the API settings? Why does api.php need to load settings that only affect the human index.php UI? etc. That's what I meant by:

I think we should also try to move away from loading all configuration during initialization, and defer stuff that's needed only for specific features or endpoints until it's actually needed.

  • config falls into one of four categories: datacenter-specific, server-group specific, wiki-specific, and dynamic overrides (from etcd).

My worry with this kind of organization is that it runs against Conway's Law. I don't think the same workflow for making MW changes works for all the different teams that want to make changes, and we shouldn't try to do that either.

The split I outlined above is how we would deploy config, not necessarily how we would maintain it. I think we'd probably want to generate these files from other files that are organized more around component ownership.

  • the remaining bits (hooks, plus dynamic behavior) would be extracted into a WMF specific extension.

I want to strongly push against this just because it'll end up being a dumping ground for WMF hacks. Most hooks and dynamic behavior needs fixing properly in MediaWiki.

I agree that we should fix them properly, but moving them out of the config file is a step in the right directly. If the owners of the respective hooks are up for implementing the logic properly in core, I'm all for it...

  • when running from k8s, an app server would get the right config for the data center and server group from a config-map.

For things that we expect to stay relatively static, I think this makes sense (e.g. envoy proxies, long-lived hostnames).

That is the idea, though config maps can be changed relatively easily. That should not require more than redeploying pods. The source these files are generated from would of course still be maintained in a git repo.

  • pre-generated per-wiki config could be mounted in a volume, or even be bundled with the image.

The main advantage to bundling in the image is that it's very reproducible, config is in the image so reverting is as simple as using an older image.

But the downside is that it requires building a new image (and adds complexity to that process), pushing it to the registry, pulling it on the nodes, etc. I don't yet have a strong feeling either way on which solution would be better, I think it partly depends what's going to be easier/faster for deployers.

I was thinking this would be used for things that are big but rarely change (e.g. unicode mappings, interwiki mappings).

I'm not yet sold on pre-generating and loading everything during initialization yet. Why does load.php need to load all of the API settings? Why does api.php need to load settings that only affect the human index.php UI? etc. That's what I meant by:

I think we should also try to move away from loading all configuration during initialization, and defer stuff that's needed only for specific features or endpoints until it's actually needed.

I agree in principle, but I don't think we need to do this right away. Re-organizing when and from where we load config in a more fine grained way is interesting, but not necessary for deploying from Kubernetes. We should keep it in mind when designing the new system, but we don't have to do it right away. We just have to make sure that our first step towards using static config files doesn't degrade performance. Initial tests indicate that loading a single giant JSON file is quite a bit faster than executing the thirty thousand lines of dynamic config code we currently run on every request.

  • the remaining bits (hooks, plus dynamic behavior) would be extracted into a WMF specific extension.

I want to strongly push against this just because it'll end up being a dumping ground for WMF hacks. Most hooks and dynamic behavior needs fixing properly in MediaWiki.

A potential future use case here is one wiki needing to use another wiki's configuration, e.g. for cross-wiki reads/writes (currently supported to a very limited extent by the DB layer exposing other wikis' DB handles, using SiteConfiguration). MediaWiki core would have to know how to apply dynamic behavior to a configuration array (as opposed to globals like done now) for that to be possible. That doesn't necessarily exclude putting it in a Wikimedia-specific extension, but that extension would have to use a (currently non-existent) standard configuration manipulation mechanism, as opposed to reading/writing globals in random unrelated hooks.

  • the remaining bits (hooks, plus dynamic behavior) would be extracted into a WMF specific extension.

I want to strongly push against this just because it'll end up being a dumping ground for WMF hacks. Most hooks and dynamic behavior needs fixing properly in MediaWiki.

A potential future use case here is one wiki needing to use another wiki's configuration, e.g. for cross-wiki reads/writes (currently supported to a very limited extent by the DB layer exposing other wikis' DB handles, using SiteConfiguration). MediaWiki core would have to know how to apply dynamic behavior to a configuration array (as opposed to globals like done now) for that to be possible. That doesn't necessarily exclude putting it in a Wikimedia-specific extension, but that extension would have to use a (currently non-existent) standard configuration manipulation mechanism, as opposed to reading/writing globals in random unrelated hooks.

The immediate need is to get executable logic out of config files, so we can use Kubernetes Config Maps. I agree that "hacks" should be replaced by proper handling in core, but having a centralized "dumping ground" is better than having them scattered across config files. I also agree that we need a better way to manipulate configuration than updating global variables (or better, remove the need to manipulate configuration). But that is orthogonal to what we are trying to achieve right now. Moving the code into an extension doesn't make it harder to replace global variable access with something better in the future.

Agreed, although one of the stated objectives of T292402: TDF: Review mechanism to configure individual MediaWiki installations is to "Allow the effective configuration of a given wiki (for a given server group and data center) to be reviewed easily", and dynamic hacks make that harder, so I don't think it's entirely unrelated. But one step at a time is generally a good approach to sweeping changes :)