Project Name: integration
Type of quota increase requested: disk IO rate limit
Reason:
The integration project holds Docker daemon used for the CI workflow. Some of the build creates a lot of files on disk which takes a fairly long time (minutes) to reap off when deleting the container. It got first noticed immediately after the migration to Ceph in October 2020.
The default WMCS limits are:
quota:disk_read_iops_sec='5000' |
quota:disk_total_bytes_sec='200000000' |
quota:disk_write_iops_sec='500' |
@Andrew created us a new flavor with 4 times the limits (I don't have access to the exact limits).
We still have disk slowness which are noticeable whenever doing heavy write operations. From a conversation with @aborrero
and @dcaro this morning, it seems the limits are easy to raise and there is room to raise them hence this task.
I don't know which limits will be fine. Last time I have created a Grafana board representing the IO latency and operations per seconds which can help track progress: https://grafana-labs.wikimedia.org/d/Yj81kH2Gk/cloud-project-io-metrics
We can probably migrate half of the instance and compare how things get improved.
From T266777#6598396 there is another Qemu parameter to allow burst limits:
iops_max=bm,iops_rd_max=rm,iops_wr_max=wm
Specify bursts in requests per second, either for all request types or for reads or writes only. Bursts allow the guest I/O to spike above the limit temporarily.
That one is in OpenStack rocky / Nova 18.0.0+ and is exposed as disk_write_iops_sec_max
- commit https://review.opendev.org/#/c/558530/
- Mentioned (albeit not fully documented) at https://docs.openstack.org/cinder/ussuri/admin/blockstorage-basic-volume-qos.html
Maybe worth investigating on top of the existing limitations.