Page MenuHomePhabricator

Add contint-roots to releases{1,2}001
Closed, ResolvedPublic

Description

Folks in contint-roots (@thcipriani and @hashar) have been upgrading jenkins on releases{1,2}001 for security releases. Ideally we would have root on those boxes to handle updates there without needing to synchronize with someone who has root on those boxes (currently limited only to SRE).

I request that contint-roots admin group be added to that box granting myself and @hashar the same permissions we have on the contint{1,2}001 machines.

Event Timeline

Note that we already have:

%releasers-mediawiki ALL = (jenkins) NOPASSWD: ALL
%releasers-mediawiki ALL = NOPASSWD: /usr/sbin/service jenkins *

Which could run any command as jenkins. I understand you need more than the jenkins user though?

Which could run any command as jenkins. I understand you need more than the jenkins user though?

For upgrades of jenkins itself we'd need to use apt as root.

Since this request is expanding root scope to other boxes I believe it'll need to be put up at the next SRE meeting on Monday

fgiunchedi triaged this task as Medium priority.Aug 8 2018, 7:44 AM

This request has been neither approved nor denied but set to "needs more discussion" in SRE meeting.

Dzahn changed the task status from Open to Stalled.Aug 13 2018, 9:17 PM

We have had some minor discussion on IRC about it. I mentioned that the point was brought up that even with full root access the upgrade process still requires both teams working together on a jenkins upgrade due to the needed package upload to apt.wikimedia.org.

In today's meeting it has been suggested by Moritz that we can give sudo privileges specific to using apt, as opposed to full root.

Would that work if we continue with more specific sudo privilege lines for using apt?

Would that work if we continue with more specific sudo privilege lines for using apt?

I'd probably still want someone available with full root on the machine to help troubleshoot should anything go wrong.

I filled this task so that we can maintain this install in the same way that we maintain the ci jenkins install; i.e., once Jenkins is updated in our apt repository we can find a non-disruptive time throughout the day to restart jenkins without having to coordinate ahead of time. If the final result is that we need to pair with someone with full root to maintain the Jenkins install on these machines due to their nature that would be an acceptable result of this task.

If it's not just about installing the package and restarting but also troubleshooting, it sounds like you'll want an SRE person around anyways (just as we want SRE folks around for deploys generally). @thcipriani It seems from your last comment that this is ok?

If it's not just about installing the package and restarting but also troubleshooting, it sounds like you'll want an SRE person around anyways (just as we want SRE folks around for deploys generally). @thcipriani It seems from your last comment that this is ok?

Having SRE folks around generally is always good for any kind of modification for production and we would definitely never perform any kind of production work on the weekend or when no one was around; however, my comment above was meant a little differently.

Having an SRE available during the time window for necessary upgrades vs having them issue specific commands at specific times by proxy.

Having root on the releases machines would allow contint-roots to perform the same actions that are already being performed on the contint{1,2}001 machines without having to execute those commands by proxy. Having root is especially helpful for troubleshooting failed deployment of software since there may be root level access required to determine the exact nature of the failure (i.e., syslogs, strace utilities, etc). Performing this troubleshooting by proxy is especially difficult and I think prone to leak sensitive information.

My comment was meant to acquiesce to the reality that there may be different concerns about the nature of these machines that means that this task can't be fulfilled: which is fine.

If there are security concerns about adding contint-roots to the releases machines, it might be a better option for SRE to handle the maintenance of Jenkins on those machines. For the general case, as is mentioned on this task, the upgrades are fairly uneventful.

I'd like to avoid the case where contint-roots are nominally responsible for security upgrades for Jenkins on these machines (currently we've been coordinating these updates along-side the CI-instance upgrades) without actually being able to perform the upgrade.

I've talked about this a little with moritzm and we've decided to go back to the SRE meeting with it, since the solution proposed of sudo use of apt-get is deemed insufficient. Sorry for the delay! We'll sort it out though.

This is waiting for the next SRE meeting for discussion.

RobH added subscribers: mark, RobH.

This was not resolved in this weeks SRE meeting and instead will be reviewed directly by @mark.

Although we didn't manage to discuss this in our SRE meeting yesterday I discussed it with relevant people afterwards.

There are no concerns about adding the contint-roots group to releases* (for local root access); however (as discussed above) for Jenkins upgrades a necessary step is to add the new .deb to the apt.wikimedia.org repo which we are not able to grant access to, making local root access for this somewhat moot.

Nonetheless, we can grant that access for the reasons @thcipriani stated above. Please ensure to continue to coordinate with an SRE on any package upgrades as before; @MoritzMuehlenhoff will also document the secure download and upgrade process he's followed before on Wikitech.

Approved. @RobH: please implement ASAP, thanks!

Change 461148 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] adding contint-roots to releases servers with sudo rights

https://gerrit.wikimedia.org/r/461148

Change 461148 merged by RobH:
[operations/puppet@production] adding contint-roots to releases servers with sudo rights

https://gerrit.wikimedia.org/r/461148

RobH removed RobH as the assignee of this task.

Ok, with @mark's approval I've gone ahead and merged a patchset live that will grant the requested access to releases servers. It can take up to 30 minutes for the affected systems to call in.

If there are any issues, please re-open this task or feel free to ping directly in irc (robh).

That works, thank you.

There are no concerns about adding the contint-roots group to releases* (for local root access); however (as discussed above) for Jenkins upgrades a necessary step is to add the new .deb to the apt.wikimedia.org repo which we are not able to grant access to, making local root access for this somewhat moot.

Nonetheless, we can grant that access for the reasons @thcipriani stated above. Please ensure to continue to coordinate with an SRE on any package upgrades as before; @MoritzMuehlenhoff will also document the secure download and upgrade process he's followed before on Wikitech.

For Jenkins, Release-Engineering-Team and Moritz receive the security email notification. Release engineering manually fill a task upon reception and we already synchronize with Moritz for the upload to apt.wikimedia.org. Seems that is working smoothly.

The package download is handled by reprepro and is documented on wikitech https://wikitech.wikimedia.org/wiki/Jenkins I guess it handles the validation of the package source by using Jenkins apt gpg key.

For Jenkins, Release-Engineering-Team and Moritz receive the security email notification. Release engineering manually fill a task upon reception and we already synchronize with Moritz for the upload to apt.wikimedia.org. Seems that is working smoothly.

Indeed, IMHO that's working really well.

The package download is handled by reprepro and is documented on wikitech https://wikitech.wikimedia.org/wiki/Jenkins I guess it handles the validation of the package source by using Jenkins apt gpg key.

I wasn't aware of that wikitech page. The comment about reprepro being broken with the jenkins upstream repo still applies, but the alternative recommendation for obtaining the package is not secure. I'll fix up the docs in the next days.