Page MenuHomePhabricator

Evaluate and decide on a MediaWiki distribution strategy targeted at VMs
Open, LowPublic

Description

Virtual machines are gradually replacing shared hosting as the preferred hosting option. Price points are in the same ballpark, and are likely to fall even further.

At the same time, we are moving to a service-based architecture, with separate services for some key tasks like wikitext parsing.

A distribution strategy that lets users easily install and maintain a MediaWiki system with all basic features (at least VE & Parsoid, auth, storage/caching) can enable users to take advantage of modern hosting offers. It can also help us put new features in the hands of third-party users quickly by making it part of the standard distribution. There is also the potential to automate security updates and simplify major version upgrades.

Requirements

  • Simple to use: one-click or at most copy&pasting a few shell commands
  • Compatible with cheap VM offers
  • Ability to use the same mechanism to create images
  • Support for security and major version upgrades
  • Support for optional components / services / extensions

Options:

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke subscribed.
GWicke set Security to None.
GWicke edited subscribers, added: brion; removed: Aklapper.

Request: make it straightforward for extension developers to customize it and provide it to users with special extensions and other config added.

@Worden.lee: Do you think we could get away with a conf.d style solution where each optional extension can drop in a config fragment? Do you see use cases where an extension would need to perform more invasive changes?

For extensions it may be worth looking at how extensions are installed via roles in MediaWiki-Vagrant -- a lot of them come with a drop-in config file (.d-style) but also pull in some external dependencies via some puppet configuration.

Not sure how easy it'd be to make a generic solution for service/shell-out program dependency installation though. :D

@GWicke I'm not sure what the conf.d style would and wouldn't allow. I can see wanting to run some bash commands while setting up the VM, for example to set up a cache directory for the extension. I can also see wanting to download and install binaries and other packages that the extension uses.

A third-party developer might also want to offer a variant of the MW VM distribution with different default configuration options, if an extension works best that way.

Thanks!

How would minor security upgrades work for a "MediaWiki 1.25 LTS" VM? VMs are great for providing everyone a default complex install, but 14 months later how does someone who's added a few extensions and made some local tweaks keep it all up to date?

  • The way you keep a developer Vagrant instance up-to-date is a hodge-podge of git pulls of vagrant and services, vagrant git-update, apt-get commands, and vagrant provision (and if that all fails, vagrant destroy and start all over).
  • The way WMF keeps production instances up-to-date is very WMF-specific puppet plus git deploys.
  • The way you keep a labs instance up-to-date is ...

Can we reuse any of this for third-party MediaWiki VM maintenance?

Main things should be automatable:

  • base OS updates -> automatic background apt-get upgrades or whatever
  • if distributed straight from git, then something like the 'vagrant git-update' that only pulls in point-release updates might do the basic job, especially if it can be either totally automated or wrapped in an administration panel. (compare with the big "upgrade!" buttons on WordPress installs if you get out of date). Phoning home for automatic update checks would be a wise default, with ability to turn it off perhaps.
  • the biggest danger of course is if you actually tweaked the wiki source -- then an update might fail to apply. Of course that way lies madness. ;) Encourage use of clean extensions and continue to clean up our interfaces perhaps to make it less likely people will just hack stuff up.

Main things should be automatable:

Agreed.

  • base OS updates -> automatic background apt-get upgrades or whatever

For Debian / Ubuntu packages, there is https://wiki.debian.org/UnattendedUpgrades, which we could enable for security updates from our own repo on install.

  • the biggest danger of course is if you actually tweaked the wiki source -- then an update might fail to apply. Of course that way lies madness. ;) Encourage use of clean extensions and continue to clean up our interfaces perhaps to make it less likely people will just hack stuff up.

This is a strong reason for avoiding a central LocalSettings.php file, and going with a conf.d style setup instead. @Worden.lee, most mechanisms we could use (packages, puppet, images..) support executing shell commands on installations or upgrades, as well as the installation of other binaries via packaged dependencies.

I bumped this to high priority as I think that getting clarity on which direction we want to go in soon is important to let us make the right decisions to get us there. This does not mean that we need to *implement* this with high priority. A timely decision will actually give us more time to implement it.

@Nemo_bis suggested that T114457: [RFC] Use `npm install mediawiki-express` as basis for all-in-one install of MediaWiki+services might be relevant here. I'm not completely sure: if you've got a whole VM, you don't necessarily have to run PHP and node in the same process. On the other hand, the standardized configuration of services envisioned by T114457 is relevant. You don't necessarily want to have all your services running at separate internet ports; for better security you probably want to have them communicate using unix domain sockets at standardized filenames.

@cscott, tcp ports can be bound to localhost. All of the most important services support that, afaik.

@GWicke Yes, of course they can. I think @tstarling's suggestion was more along lines of "well-known pathnames are easier to allocate than well-known TCP ports". I'm agnostic on that point, but I'm interested in hearing discussion about any technical roadblocks that might prevent listening on unix domain sockets. (It's obvious that we can bind to localhost, but if there's some security problem with that approach I'm of course interested in hearing the discussion on that as well.)

A somewhat related discussion is whether we should be running full HTTP servers for services, especially if they are listening on single-host-only unix domain sockets. Again, my default position is, "Yes, we should share as much code coverage as possible between the single-server and multi-server setups", but if someone thinks we could eek out some extra performance for small servers by doing something else, I'm listening.

We now have a basic docker-based fully featured MediaWiki installer: https://github.com/wikimedia/mediawiki-containers

See T92826 for updates & discussion.

GWicke lowered the priority of this task from High to Medium.Oct 12 2016, 7:39 PM
GWicke edited projects, added Services (watching); removed Services (later).

Current work focused on Docker containers and Kubernetes is happening in T170453. T170456 in particular is going to provide a light-weight, yet fully-featured container-based MediaWiki system. While the initial focus is on providing a development environment, we expect this to also provide a good basis for other VM installs.

demon lowered the priority of this task from Medium to Low.Jan 12 2018, 10:48 PM
Jdforrester-WMF renamed this task from Evaluate and decide on a distribution strategy targeted at VMs to Evaluate and decide on a MediaWiki distribution strategy targeted at VMs.Jul 10 2019, 7:09 PM