Page MenuHomePhabricator

Move servers off asw2-a-eqiad
Closed, ResolvedPublic

Description

In order to troubleshot and experiment with asw2-a-eqiad for T201145, we need to move servers off of that stack.
This represents 17 servers to move, of which 7 will need to either:

  • Be moved to a different rack
  • Have a cross rack uplink
  • Be connected to an extra 10G switch in the same rack

My guess is that cross rack uplink is the easiest temporary solution.

If some servers are not production (and can wait before becoming so) they could be left on asw2-a to serve as canaries.

Descriptionasw2 portMove toSpecial careNotes
dns1001ge-1/0/8asw-a:ge-1/0/8Yes
lvs1015:enp5s0f0xe-2/0/0asw2-a5(ex4500):xe-0/0/2NoWill utilize current cross connect - downtime may be up to 1 min
cloudelastic1001xe-2/0/7asw2-a5(ex4500):xe-0/0/3No10M Fiber #3935
lvs1016:enp4s0f1 {#3917}xe-4/0/7asw2-a5(ex4500):xe-0/0/4YesWill utilize current fiber -- downtime may be up to 1min
cp1075xe-4/0/11asw-a:xe-4/1/2YesWill utilize current cable
cp1076xe-4/0/13asw2-a5(ex4500):xe-0/0/5Yes5M DAC #4888
dbproxy1012ge-5/0/7asw-a:ge-5/0/7No
labstore1008ge-5/0/14asw-a:ge-5/0/14No
db1116ge-6/0/11asw-a:ge-6/0/11No
db1066ge-6/0/16asw-a:ge-6/0/16YesDowntime needs to be as minimal as possible
labstore1009ge-6/0/17asw-a:ge-6/0/17No
dbproxy1013ge-6/0/32asw-a:ge-6/0/32No
ms-be1040xe-7/0/29asw2-a5(ex4500):xe-0/0/6No5M DAC #4889
cp1077xe-7/0/30asw2-a5(ex4500):xe-0/0/8Yes5M DAC #4990 **updated port number
cp1078xe-7/0/31asw2-a5(ex4500):xe-0/0/15Yes5M DAC #4991
db1118ge-8/0/2asw-a:ge-8/0/2No
torrelay1001ge-8/0/3asw-a:ge-8/0/3No

Event Timeline

ayounsi triaged this task as High priority.Aug 10 2018, 4:25 PM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

So db1066 is the s2 eqiad master active, so any downtime there means the s2 wikis go read only: https://noc.wikimedia.org/db.php#tabs-2

re: ms-be1040 it can be moved back to the old switch any time

@Cmjohnson Could you pre-cable the hosts that will terminate on asw2-a5(ex4500) ?
Not unplug anything, but have the fibers ready.

Change 452620 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Set s2 in read only mode due to maintenance

https://gerrit.wikimedia.org/r/452620

Change 452632 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Set s2 as read-write and promote db1122 as the new s2 master

https://gerrit.wikimedia.org/r/452632

Change 452642 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Point s2-master CNAME to db1122

https://gerrit.wikimedia.org/r/452642

Change 452644 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1102 for maintenance

https://gerrit.wikimedia.org/r/452644

Change 452644 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1102 for maintenance

https://gerrit.wikimedia.org/r/452644

Change 452648 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switch db1122 binlog format to ROW

https://gerrit.wikimedia.org/r/452648

Change 452648 merged by Jcrespo:
[operations/puppet@production] mariadb: Switch db1122 binlog format to ROW

https://gerrit.wikimedia.org/r/452648

Change 452649 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switch db1122 binlog format to STATEMENT

https://gerrit.wikimedia.org/r/452649

Change 452649 merged by Jcrespo:
[operations/puppet@production] mariadb: Switch db1122 binlog format to STATEMENT

https://gerrit.wikimedia.org/r/452649

@ayounsi I added sfp-t's to asw2-a5-eqiad for the new server in that rack. For the remainder of the 10G servers in rack's 2/4/6 do you want me to run cross connects to asw2-a5? If yes, please give me the ports you would like to use.

@ayounsi I pre-cabled everything. The lvs cross connects only need to move racks to the new switch. We probably need to do those 1 at a time, because downtime may be close to 1min for each. Also cp1077 has an updated port number since we're using ge-0/0/7 for dbproxy1012.

lvs1015:enp5s0f0xe-2/0/0asw2-a5(ex4500):xe-0/0/2NoWill utilize current cross connect - downtime may be up to 1 min
cloudelastic1001xe-2/0/7asw2-a5(ex4500):xe-0/0/3No10M Fiber #3935
lvs1016:enp4s0f1 {#3917}xe-4/0/8asw2-a5(ex4500):xe-0/0/4YesWill utilize current fiber -- downtime may be up to 1min
cp1075xe-4/0/11asw-a:xe-4/1/2YesWill utilize current cable
cp1076xe-4/0/13asw2-a5(ex4500):xe-0/0/5Yes5M DAC #4888
ms-be1040xe-7/0/29asw2-a5(ex4500):xe-0/0/6No5M DAC #4889
cp1077xe-7/0/30asw2-a5(ex4500):xe-0/0/8Yes5M DAC #4990 **updated port number
cp1078xe-7/0/31asw2-a5(ex4500):xe-0/0/15Yes5M DAC #4991

Mentioned in SAL (#wikimedia-operations) [2018-08-14T17:22:52Z] <XioNoX> configuring eqiad A switch ports for T201694

Task description updated with Chris's info so we have everything in 1 place.
Switch ports configured accordingly.

Mentioned in SAL (#wikimedia-operations) [2018-08-16T14:25:29Z] <XioNoX> starting moving asw2-a-eqiad servers' uplinks for T201694

Mentioned in SAL (#wikimedia-operations) [2018-08-16T14:27:04Z] <cmjohnson1> lvs1015 moving cross connect from asw2-a2 to asw2-a5 T201694

Change 452620 abandoned by Jcrespo:
mariadb: Set s2 in read only mode due to maintenance

Reason:
Not needed.

https://gerrit.wikimedia.org/r/452620

Change 452632 abandoned by Jcrespo:
mariadb: Set s2 as read-write and promote db1122 as the new s2 master

Reason:
Not needed.

https://gerrit.wikimedia.org/r/452632

Change 452642 abandoned by Jcrespo:
mariadb: Point s2-master CNAME to db1122

Reason:
Not needed.

https://gerrit.wikimedia.org/r/452642

Mentioned in SAL (#wikimedia-operations) [2018-08-16T15:09:12Z] <bblack> stopping pybal on lvs1016 to fail traffic to lvs1006 for T201694

Mentioned in SAL (#wikimedia-operations) [2018-08-16T15:16:23Z] <bblack> restarting pybal on lvs1016 - T201694

ayounsi assigned this task to Cmjohnson.