docker on GitLab runners
Closed, ResolvedPublic2 Estimated Story Points
Actions

Assigned To

Authored By

	• dduvall
	Oct 19 2021, 10:03 PM

Description

Provide more capacity on Docker based GitLab runners for built images, container filesystems, and temporary volumes using a separate cinder volume mounted at /var/lib/docker. See T291221: runner-1002 is out of space for recent out-of-space issues.

A quota increase for volume storage and volumes is necessary to unblock this task. See T293832: Request increased quota for gitlab-runners Cloud VPS project.

Details

	Subject	Repo	Branch	Lines +/-
	gitlab: Refactor docker volume parameters to use cinder	operations/puppet	production	+17 -7

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• dduvall	T293835 Provide separate/larger volume for /var/lib/docker on GitLab runners
		Resolved		aborrero	T293832 Request increased quota for gitlab-runners Cloud VPS project

Event Timeline

• dduvall created this task.Oct 19 2021, 10:03 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 19 2021, 10:03 PM

• dduvall triaged this task as Medium priority.Oct 19 2021, 10:03 PM

brennen moved this task from Inbox to CI & Job Runners on the GitLab board.Oct 19 2021, 11:16 PM

brennen edited projects, added GitLab (CI & Job Runners); removed GitLab.

thcipriani edited projects, added Release-Engineering-Team (Done by Thu 04 Nov 🧟); removed Release-Engineering-Team (Priority Backlog 📥).Oct 20 2021, 5:45 PM

thcipriani set the point value for this task to 2.

• dduvall claimed this task.Oct 20 2021, 6:05 PM

• dduvall moved this task from Backlog to In progress on the Release-Engineering-Team (Done by Thu 04 Nov 🧟) board.

Change 732392 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/puppet@production] gitlab: Refactor docker volume parameters to use cinder

https://gerrit.wikimedia.org/r/732392

gerritbot added a project: Patch-For-Review.Oct 20 2021, 6:06 PM

dancy subscribed.Oct 20 2021, 6:09 PM

• dduvall moved this task from In progress to Waiting for review on the Release-Engineering-Team (Done by Thu 04 Nov 🧟) board.Oct 20 2021, 6:11 PM

Change 732392 merged by Dzahn:

[operations/puppet@production] gitlab: Refactor docker volume parameters to use cinder

https://gerrit.wikimedia.org/r/732392

Puppet patches are tested and merge. runner-1002.gitlab-runners.eqiad1.wikimedia.cloud now has a 60G cinder volume mounted at /var/lib/docker. Waiting on an increase to quotas before provisioning/attaching volumes and reconfiguring all runners.

• dduvall changed the task status from Open to Stalled.Oct 20 2021, 8:29 PM

• dduvall added a parent task: T293832: Request increased quota for gitlab-runners Cloud VPS project.

bd808 removed a parent task: T293832: Request increased quota for gitlab-runners Cloud VPS project.Oct 20 2021, 8:39 PM

bd808 added a subtask: T293832: Request increased quota for gitlab-runners Cloud VPS project.

Maintenance_bot removed a project: Patch-For-Review.Oct 20 2021, 9:10 PM

aborrero closed subtask T293832: Request increased quota for gitlab-runners Cloud VPS project as Resolved.Oct 21 2021, 9:24 AM

With T293832: Request increased quota for gitlab-runners Cloud VPS project closed, I'll perform a rolling deletion/creation of runner instances—realized I'll have to do this since the estimate was based off a different flavor than is used by current instances.

• dduvall moved this task from Waiting for review to In progress on the Release-Engineering-Team (Done by Thu 04 Nov 🧟) board.Oct 21 2021, 6:05 PM

Mentioned in SAL (#wikimedia-releng) [2021-10-21T18:07:03Z] <dduvall> replacing runner-1001 with new instance (T293835)

• dduvall reopened subtask T293832: Request increased quota for gitlab-runners Cloud VPS project as Open.Oct 21 2021, 6:16 PM

Waiting on the new flavor.

aborrero closed subtask T293832: Request increased quota for gitlab-runners Cloud VPS project as Resolved.Oct 22 2021, 11:20 AM

Moving forward with runner re-provisioning.

Mentioned in SAL (#wikimedia-releng) [2021-10-25T23:27:23Z] <dduvall> fully provisioned runner-{1008,1011,1012,1013,1014,1015,1016,1017,1018,1019} instances for use as new gitlab runners and removed old instances (T293835)

We now have 10 g3.cores8.ram24.disk20.ephemeral40.4xiops instances and each runs an executor with a max concurrency of 4, for a total of 40 available concurrent jobs.

Note that the flavor made available to us comes with a high-iops 40G ephemeral volume which was chosen by the cinderutils::ensure resource during provisioning (because it resolves by a size range and it favored the first volume found). This was a mistake and yields less space than the 60G volume I was planning to use for each instance.

However, the 40G volumes are much much faster when it comes to IOPS and throughput. I ran a quick benchmark using fio to verify this:

60G standard IOPS volume

test: (groupid=0, jobs=1): err= 0: pid=13532: Mon Oct 25 23:21:09 2021
  read: IOPS=1493, BW=5974KiB/s (6117kB/s)(3070MiB/526262msec)
   bw (  KiB/s): min= 3808, max= 8136, per=100.00%, avg=5972.81, stdev=594.98, samples=1052
   iops        : min=  952, max= 2034, avg=1493.15, stdev=148.74, samples=1052
  write: IOPS=499, BW=1996KiB/s (2044kB/s)(1026MiB/526262msec); 0 zone resets
   bw (  KiB/s): min= 1496, max= 2488, per=99.99%, avg=1995.72, stdev=113.54, samples=1052
   iops        : min=  374, max=  622, avg=498.91, stdev=28.37, samples=1052
  cpu          : usr=2.66%, sys=7.83%, ctx=856527, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=5974KiB/s (6117kB/s), 5974KiB/s-5974KiB/s (6117kB/s-6117kB/s), io=3070MiB (3219MB), run=526262-526262msec
  WRITE: bw=1996KiB/s (2044kB/s), 1996KiB/s-1996KiB/s (2044kB/s-2044kB/s), io=1026MiB (1076MB), run=526262-526262msec

Disk stats (read/write):
  sdc: ios=785920/263175, merge=0/140, ticks=1179322/32106947, in_queue=33279168, util=100.00%

40G 4xIOPS volume

test: (groupid=0, jobs=1): err= 0: pid=13617: Mon Oct 25 23:24:46 2021
  read: IOPS=5893, BW=23.0MiB/s (24.1MB/s)(3070MiB/133363msec)
   bw (  KiB/s): min=  600, max=31448, per=100.00%, avg=23573.22, stdev=2637.22, samples=266
   iops        : min=  150, max= 7862, avg=5893.29, stdev=659.30, samples=266
  write: IOPS=1969, BW=7878KiB/s (8067kB/s)(1026MiB/133363msec); 0 zone resets
   bw (  KiB/s): min=  128, max= 9608, per=100.00%, avg=7877.08, stdev=815.33, samples=266
   iops        : min=   32, max= 2402, avg=1969.27, stdev=203.83, samples=266
  cpu          : usr=7.07%, sys=19.90%, ctx=693863, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=23.0MiB/s (24.1MB/s), 23.0MiB/s-23.0MiB/s (24.1MB/s-24.1MB/s), io=3070MiB (3219MB), run=133363-133363msec
  WRITE: bw=7878KiB/s (8067kB/s), 7878KiB/s-7878KiB/s (8067kB/s-8067kB/s), io=1026MiB (1076MB), run=133363-133363msec

Disk stats (read/write):
  sdb: ios=785530/262538, merge=0/58, ticks=1160955/7313880, in_queue=8465012, util=100.00%

Based on the massive performance difference, I think we should stick with the 40G volumes. If we need additional space or performance, we could conceivably partition the /var/lib/docker partition further to use the 60G volume for Docker volumes and the 40G volume for container/image filesystems. Just a thought.

We have additional space now, more theoretical performance, and we have options for further expansion. Closing this out.

thcipriani moved this task from In progress to Done on the Release-Engineering-Team (Done by Thu 04 Nov 🧟) board.Nov 3 2021, 5:20 PM

Provide separate/larger volume for /var/lib/docker on GitLab runnersClosed, ResolvedPublic2 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Provide separate/larger volume for /var/lib/docker on GitLab runners
Closed, ResolvedPublic2 Estimated Story Points
Actions

Related Objects
Search...