Portal:Toolforge/Admin/Kubernetes/Upgrading Kubernetes

This document only applies to a kubeadm-managed cluster deployed as described in Portal:Toolforge/Admin/Kubernetes/Deploying.

Always upgrade lima-kilo first, then toolsbeta, and leave Toolforge for last

Prepare upgrade

Kubernetes changelog

You or someone else with good understanding of everything that runs inside our Kubernetes cluster should read through the Kubernetes upstream release notes and changelog for the release we're upgrading to.

Also, look at the deprecated API call dashboard for the target version. It does not tell what is making those requests, but tells if they exist. (It might be coming from inside the control plane!)

Third-party components

You need to check that all of the third-party components listed on Portal:Toolforge/Admin/Kubernetes/Components are compatible with the new version we're upgrading to. If not, upgrade them to a release that is compatible with both the current and the new version.

Some third-party components have detailed upgrade guides at Portal:Toolforge/Admin/Kubernetes#Cluster management.

Also check that the etcd version we run is supported by the new Kubernetes release.

Managing packages

This step requires SRE (global root) access.

We mirror the Kubernetes Apt repository to reprepro, in a component named thirdparty/kubeadm-k8s-X-YY. Generally speaking, you can copy-paste the component and update stanzas for the current version and adjust the version numbers, except that for 1.24 you need to migrate the configuration to use the new pkgs.k8s.io repository.

Upgrade lima-kilo

You need to update the node image used in lima-kilo.

Announce user-facing changes

Upgrade a cluster

Begin upgrade

Run the prepare-upgrade cookbook

user@cloudcumin1001:~$ sudo cookbook wmcs.toolforge.k8s.prepare_upgrade --help

usage: cookbooks.wmcs.toolforge.k8s.prepare_upgrade [-h] --cluster-name {tools,toolsbeta} [--task-id TASK_ID] [--no-dologmsg] --src-version SRC_VERSION --dst-version DST_VERSION

WMCS Toolforge Kubernetes - prepares a cluster for upgrading

Usage example:
    cookbook wmcs.toolforge.k8s.prepare_upgrade \
        --cluster-name toolsbeta \
        --src-version 1.22.17 \
        --dst-version 1.23.15

optional arguments:
  -h, --help            show this help message and exit
  --cluster-name {tools,toolsbeta}
                        cluster to work on (default: None)
  --task-id TASK_ID     Id of the task related to this operation (ex. T123456). (default: None)
  --no-dologmsg         To disable dologmsg calls (no SAL messages on IRC). (default: False)
  --src-version SRC_VERSION
                        Old version to upgrade from. (default: None)
  --dst-version DST_VERSION
                        New version to migrate to. (default: None)

downtime the project on metricsinfra
- open https://prometheus-alerts.wmcloud.org
- click the bell icon on top right
- add filter on the project label
- add a few hours of duration
- add reason
- click save

if user-visible cluster, update topic on wikimedia-cloud "Status: Ok" to "Status: upgrading Toolforge k8s"

Upgrade control nodes

Run the upgrade worker cookbook for the first control node.

usage: cookbook [GLOBAL_ARGS] wmcs.toolforge.k8s.worker.upgrade [-h] --cluster-name {tools,toolsbeta} [--task-id TASK_ID] [--no-dologmsg] --hostname HOSTNAME --src-version SRC_VERSION --dst-version DST_VERSION

WMCS Toolforge - Upgrade a Kubernetes worker node

Usage example:
    cookbook wmcs.toolforge.k8s.worker.upgrade \
        --cluster-name toolsbeta \
        --hostname toolsbeta-test-worker-4 \
        --src-version 1.22.17 \
        --dst-version 1.23.15

options:
  -h, --help            show this help message and exit
  --cluster-name {tools,toolsbeta}
                        cluster to work on (default: None)
  --task-id TASK_ID     Id of the task related to this operation (ex. T123456). (default: None)
  --no-dologmsg         To disable dologmsg calls (no SAL messages on IRC). (default: False)
  --hostname HOSTNAME   Host name of the node to upgrade. (default: None)
  --src-version SRC_VERSION
                        Old version to upgrade from. (default: None)
  --dst-version DST_VERSION
                        New version to migrate to. (default: None)

On the first control node, the cookbook will ask you to approve the upgrade plan. You should save this in case it's needed for later troubleshooting.

[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.15.0
[upgrade/versions] kubeadm version: v1.15.0
[upgrade/versions] Latest stable version: v1.15.1
[upgrade/versions] Latest version in the v1.15 series: v1.15.1

External components that should be upgraded manually before you upgrade the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT   AVAILABLE
Etcd        3.2.26    3.3.10

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     5 x v1.15.0   v1.15.1

Upgrade to the latest version in the v1.15 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.15.0   v1.15.1
Controller Manager   v1.15.0   v1.15.1
Scheduler            v1.15.0   v1.15.1
Kube Proxy           v1.15.0   v1.15.1
CoreDNS              1.3.1     1.3.1

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.15.1

Note: Before you can perform this upgrade, you have to update kubeadm to v1.15.1.

Some important things to note here:

Etcd is external, so upgrades there need to involve the packaged versions. Make sure that the version we are using (or that can be upgraded to) is acceptable to the new version of Kubernetes before trying anything.
kubeadm is deployed from packages, which need to be upgraded, including kubelet in order to finish an upgrade.

Now wait a few minutes until the cookbook finishes. Check that all control plane pods (scheduler, apiserver and controller-manager) start up, do not start crash looping and don't have any errors in their logs. See #Troubleshooting if they do.

Repeat the cookbook for the remaining control nodes, and check the logs again.

Upgrade worker nodes

Once the control nodes have been upgraded, we can upgrade the workers.

You now need to run the wmcs.toolforge.k8s.worker.upgrade cookbook for each worker node. The currently recommended way is to split the list of normal and NFS workers into two or three chunks, then make that many shell scripts that call the upgrade cookbook for each node in the chunk. Start those scripts in separate screen/tmux tabs.

Ingress nodes

The ingress nodes are similar to the worker nodes but they need some special treatment:

On a Toolforge bastion, run kubectl sudo -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=2 to prevent an ingress controller from being scheduled on a regular node.
Ingress pods take a while to evict. It should be safe to upgrade the ingress nodes in parallel with the normal worker nodes.
When done, run kubectl sudo -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=3 to return the cluster to normal operation.

finishing touches

Upgrade kubectl on bastions
Revert topic changes on -cloud
Remove Alertmanager downtime

Troubleshooting

Permission errors after control plane upgrades

Sometimes the control plane components log error messages after upgrading a control node. Stuff like:

E0410 09:18:10.387734       1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: leases.coordination.k8s.io "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

The exact cause of this is unknown. Some theories include a race condition in which the controller-manager pods starts before the api-server.

Try:

a VM reboot
if didn't work, a manual restart of the affected static pod (copy out the file from /etc/kubernetes/manifests/, wait for the pod to disappear, then put the file back in the same place)