This document is a draft. It is shared to gather some early feedback
Introduction
Deploying applications to an enviroment with certain security and reproducibility requirements can be challenging. This is also true for python applications, whose "standard" deploy mechanism of using a virtualenv + pip downloading dependencies from a requirments file is both insecure and prone to multiple kinds of failure. Those are the reasons why that deployment method is forbidden on the Wikimedia cluster and the golden standard of deployment for python applications is creating debian packages. There are situations, though, in which the use of a virtualenv is unavoidable, e.g. due to the application requiring a more updated library than what the distribution offers, or it downright using some libraries that would conflict with what is on the system. For such cases, there is no standard set of rules, although some have been worked out along the way.
This document sets a series of guidelines about how to deploy python applications to the Wikimedia Foundation servers. It focuses on combining ease of build for the developer, ease of deploy for the deployer, reproducibility and security.
Which deployment method to choose
By default, any python application is expected to be packaged as a debian package and to use distribution-provided libraries. Any case when this is not happening needs to be thoroughly justified, as it will pose a burden on both the ops team and on the deployers of said software. The case can be made for a different deployment strategy when a software deployment meets all, or most of, the following criteria:
- Has a proven need of features in a library version that's not available for the system, and whose API has changed enough that backporting the library as a debian package could break other things.
- Has more than N libraries not currently available as debian packages, that we would need to package ourselves (TODO: figure out a sensible value for N - I would say 3 is the magic number)
- Needs to be deployed as a service (so depool - deploy - restart - test - pool, canary deployments, etc)
Application deployment and build process
In case a debian package is not an option, the following prescriptions apply:
- It must be distributed via scap3, using a <package-name>/deploy repository
- It must provide wheels for the software and any external library that needs to be installed
- It must store the wheels on an artifact repository, and serve them in production via git-fat or git-lfs.
- It should build said wheels from a frozen-requirements.txt file provided in the deployment repository, via a standardized CI job
- It should sanitize said wheels making them reproducible by using a tool like strip-nondeterminism
- It should be installed to a dedicated virtual environment that will be located at /srv/deployment/<package-name>/venv
- If deployment via scap3 is used just to have service coordination/depool/restart, the virtual environment should make use of system libraries as much as possible.
Structure of the deploy repository
In order to avoid reinventing the wheel (pun intended) every time, the deploy repositories should adhere to a strict structure so that all the glue needed for build and deployment of the application can be standardized and some scaffolding script can be created, removing all the burden from the developer about learning the details of how the build process works, or even implementing their own flavour of it.
The repository should look as follows:
- It must include a frozen-requirements.txt file, where all dependencies are listed, with frozen versions.
- It must include an artifacts directory where wheels should be stored via git-fat or git-lfs in subdirectories indicating which distribution they were built for.
- It must include a scap check to run in the promote stage that will refresh the virtualenv if needed, and will install all the wheels to it.
- It should not include a build directory, in fact it should ignore any build directory that it might contain.
If the source code is developed or modified by us:
- It must include a reference to the git tag or tree-ish we want to release - either in a src_version file or as a git submodule
- The source code must to be either cloned or located in the src subdirectory.
Example implementation
Most of the concepts expressed here (but definitely not all!) are implemented in the operations/docker-images/docker-pkg-deploy repository, in particular its Makefile.build can be used as a basis for setting up a standardized CI job for building and uploading the artifacts.