Page MenuHomePhabricator

Q1:(Need By: ASAP) rack/setup/install ms-be10[64-67]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of ms-be10[64-67]

Please note these were originally needed by end of July, but a long lead time of chipsets for network cards has resulted in a 60+ day leadtime. As soon as these arrive, they should be racked with priority as they will be pushed into service by @fgiunchedi once they are online.

Hostname / Racking / Installation Details

Hostnames: ms-be10[64-67]
Racking Proposal: One host per row
Networking/Subnet/VLAN/IP: 10G private VLAN
Partitioning/Raid: Same as existing ms-be
OS Distro: Stretch

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

ms-be1064:

  • - receive in system on procurement task T284953 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

ms-be1065:

  • - receive in system on procurement task T284953 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

ms-be1066:

  • - receive in system on procurement task T284953 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

ms-be1067:

  • - receive in system on procurement task T284953 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).

@wiki_willy looking for racking space for these big and very heavy servers.

A4 u17/u18
B4 U2/U3 (U3 has maps1002 that looks to be off and ready for decom but I do not have a task
C4 u13/u14

@godog for row D I do not have space but you have some old HP ms-be servers. Is it possible to one of these? ms-be1037, 1038 or 1039?

Just a quick summary of what Chris and I went over:

  • the decom of maps1002 has been taken care of via T289271 to free up rack space in B4
  • we're asking Service-Ops if they can prioritize the decom of mc1033 and 1034 via T289657 to free up space in rack D4

Thanks,
Willy

ms-be1064 A4 U17. Cable#11035 port#25
ms-be1065 B4 U2. Cable#11036. port#29
ms-be1066 C2 U19. Cable#11037. port#22

RobH renamed this task from (Need By: ASAP) rack/setup/install ms-be10[64-67] to Q1:(Need By: ASAP) rack/setup/install ms-be10[64-67].Aug 26 2021, 7:46 PM

Change 715981 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] Adding dhcpd updates for ms-be1064-1066

https://gerrit.wikimedia.org/r/715981

Change 715981 merged by Cmjohnson:

[operations/puppet@production] Adding dhcpd updates for ms-be1064-1066

https://gerrit.wikimedia.org/r/715981

Change 715983 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] Adding ms-be1064-66 to site.pp insetup role

https://gerrit.wikimedia.org/r/715983

Change 715983 merged by Cmjohnson:

[operations/puppet@production] Adding ms-be1064-66 to site.pp insetup role

https://gerrit.wikimedia.org/r/715983

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

ms-be1064.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202109011541_cmjohnson_14089_ms-be1064_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

ms-be1065.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202109011542_cmjohnson_14910_ms-be1065_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

ms-be1066.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202109011544_cmjohnson_16416_ms-be1066_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['ms-be1064.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['ms-be1065.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['ms-be1066.eqiad.wmnet']

and were ALL successful.

@fgiunchedi ms-be1064/65/66 are installed and are ready for you to take over, 1067 is not racked yet until we can space in row D. We haven't had a response from traffic about expediting the decommissioning of 2 mc servers. If you have a ms-be server in rack D2, D4 or D7 that can be decom'd and removed please let me know.

@fgiunchedi ms-be1064/65/66 are installed and are ready for you to take over, 1067 is not racked yet until we can space in row D. We haven't had a response from traffic about expediting the decommissioning of 2 mc servers. If you have a ms-be server in rack D2, D4 or D7 that can be decom'd and removed please let me know.

Thank you! Hosts are looking good, I see the mc servers are being decom'd now in T289657 and I'll be waiting for the fourth host to be online as well.

@Jclark-ctr I removed the 2 servers in D4, can you please rack ms-be1067.

ms-be1067 D4 U33 CABLEID#11042 PORT36

Finished Provision a server's network attributes script on netbox configured bios handing over to rob for hopefully finishing

Change 719384 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] ms-be1067 updates

https://gerrit.wikimedia.org/r/719384

Change 719384 merged by RobH:

[operations/puppet@production] ms-be1067 updates

https://gerrit.wikimedia.org/r/719384

So I'm updating the firmware and I've applied puppet updates for the installer. However, the PXE flag needs to be shifted from the 1G to 10G port, which I've intentionally not done yet until all firmware is done (dont want it tryign to pxe boot until we're ready.)

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

ms-be1067.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202109081436_robh_3494_ms-be1067_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['ms-be1067.eqiad.wmnet']

and were ALL successful.

RobH updated the task description. (Show Details)

all hosts installed and staged