Page MenuHomePhabricator

scs-c1-eqiad unresponsive
Closed, ResolvedPublic

Description

I've been working on T174475 and flashed the firmware on scs-c1-eqiad on 2017-08-29, since the firmware update, I have not been able to connect to the scs console.

Please do not just powercycle this, as it could result in a dead scs. We need to check for serial console output, in case there was an issue with the firmware update and it requires user input.

They seem to take 5-6 minutes to flash firmware, but this has been offline and is still unresponsive to my attempts to ping/ssh/https connect to it. (Other scs consoles, like scs-a8-eqiad will respond to all 3.) Firmware has been updated on most of the remainder of our SCS console fleet without incident.

The CM4148 console server does have its own serial console redirection port on the front. Please attach a serial cable (using a usb to serial adapter to connect to your laptop) so we can troubleshoot if it needs input from that front. (You could also run it to a spare port on scs-a8-eqiad with a temp cat5 run, but using laptop seems easier. Can test laptop on scs-a8-eqiad, which is responsive so you'll know you have working usb-serial connection on laptop if needed.)

Basically troubleshoot with local means (serial redirection) and see if we cannot repair this scs console. This was purchased around 2014-05-07, so it is well out of any kind of warranty. If we cannot repair it, we will have to order a replacement.

Event Timeline

scs-c1-eqiad is dead! No power, swapped power cable, tried different power port. We need to replace this ASAP.

RobH added a subscriber: Cmjohnson.

After chatting with sales to determine the warranty period, turns out its 4 years for opengear without any additional warranties. I've opened a support case via email with opengear to get this replaced.

I didn't have the serial to provide to Sales during the chat via their website, and instead had to submit a support request via their site. I got no email confirmation of the request from yesterday.

Today I called and left a voicemail for their support to call me back. We haven't had issues with OpenGear in the past, and thus haven't had to request support. How difficult it's been so far doesn't bode well.

Rey Lamuri, Sep 14, 17:13 MDT:
Hi Rob,

The recovery process for your CM4148 is here https://opengear.zendesk.com/hc/en-us/articles/216376223-Firmware-recovery

If you wanted to monitor the recovery process via the Local Console this is how to connect https://opengear.zendesk.com/hc/en-us/articles/216372143-Connecting-the-Opengear-device-s-local-serial-console-port

Regards,
Rey Lamuri
Opengear Technical Support
Brisbane, Australia (UTC +10)
T: +61 1800 838 196

@Cmjohnson: These require onsite use of the reset button, so escalating back to you for work on this. It sounds like we can reset and rollback the failed firmware load.

@RobH there is not a reset button just an erase button. which I did try....there is zero power to the switch and did not work.

I've sent a followup to opengear requesting a replacement scs console.

RobH mentioned this in Unknown Object (Task).Sep 25 2017, 5:02 PM
RobH added a subtask: Unknown Object (Task).

I've replaced the console, set it up so it's accessible. Setup all the ports but I am not able to access ports via pmshell. @RobH could you look into this and see if I missed something please.

I'm not exactly sure what you mean by unable to access ports, so I'll just list off the issue I'm seeing and what I've confirmed between the console servers.

Compare the setup of scs-a8-eqiad (CM4148 existing) and scs-c1-eqiad (CM7148-DAC new), they appear identical in configuration and firmware revisions. Also, the individual ports (other than their descriptions) are all identical across both platforms.
SSH into scs-c1-eqiad.mgmt.eqiad.wmnet works.
pmshell command works, and lists all ports

Connecting to the actual PDU or network device fails to produce any output. I've tried connecting to three different PDU and three different network devices.

I'd suggest that the pin out may have changed on these serial devices. The next steps for testing that I would take are as follows:

  • powercycle the scs-c1-eqiad (yeah i dont think this will fix it but support will ask)
  • find a cisco opengear adapter (dont rely on our custom made cables)
  • find working cable/adapter pair that works on scs-a8-eqiad (test it) and borrow it for scs-c1-eqiad. (I'd just take a patch cable out of the bag and pair with the adapter, test on scs-a8-eqiad, and then move it to scs-c1-eqiad.
  • determine if the known good cable+adapter in scs-a8-eqiad works in scs-c1-eqiad.
  • if no output, try the reverse (a row c+d device on scs-a8 and see if it registers with the cable and adapter)

The next steps after attempting and documenting the above will be to open a support case with OpenGear.

Tested a standard ethernet cable and it works fine. It appears that the custom pinout for the cable is no longer required and each of the cables will need to be re-done.

RobH closed subtask Unknown Object (Task) as Resolved.Dec 6 2017, 3:43 AM

All serial connections have been fixed to be a standard pin-out