Page MenuHomePhabricator

The phabricator server, WMF7426, was given to us temporarily, we would like to make it permanent
Closed, DeclinedPublic

Description

In T221389: setup/install WMF7426 as phab1003.eqiad.wmnet we requested (and received) a temporary server so that we could more comfortably migrate to debian stretch.

We would like to now request that this hardware become permanent in order to have phabricator redundancy in eqiad (warm standby).

permanent allocation of the system phab1003/WMF7426 will require the sign off of either @faidon or @mark.

Event Timeline

mmodell renamed this task from The server, WMF7426, was given to us temporarily, we would like to make it permanent to The phabricator server, WMF7426, was given to us temporarily, we would like to make it permanent.Sep 13 2019, 6:56 PM
mmodell created this task.
Dzahn awarded a token.

Makes sense to me to have a failover within eqiad for such an important service.

herron triaged this task as Medium priority.Sep 18 2019, 7:09 PM
herron awarded a token.
RobH moved this task from Backlog to Pending Approval on the hardware-requests board.
RobH added subscribers: mark, faidon, RobH.

permanent allocation of the system phab1003/WMF7426 will require the sign off of either @faidon or @mark.

@mark please approve per OKR "Improve resilience and modernize devtools services" -> "Phabricator can be failed over in common failure scenarios"

I'm a bit confused; as far as I know the old plan was always to have HA of Phabricator between eqiad and codfw, and the linked task T190572 also talks about that. So is that no longer the case, and if so, why is that? I believe there have been blockers & complications for that deployment, but are they documented anywhere? How does this task relate to those plans, why do we feel failover within eqiad is (also) needed?

Also, to satisfy the OKR KR "Phabricator can be failed over in common failure scenarios" we should probably list which failure scenarios we consider common and how these plans help with that.

After further discussion with Mukunda and Mark we are focusing on eqiad/codfw failover and take back the request to permanently keep the eqiad server.

Databases used by the codfw server have been switched to codfw, avoid cross-DC connections. The codfw db server is readonly. This prevents the phd service from starting.

But we can still failover and make the server writable if needed.

We will give this server back to the pool in T238957

There are further comments on failure scenarios at T190572#5668444.