Page MenuHomePhabricator

Tracking task for network syslog messages
Closed, ResolvedPublic

Description

Mostly for tracking and history purposes:

  • Nexthop index allocation failed: private index space exhausted

Hosts: cr1/2-eqiad
This is because there are too many ARP entries on the fxp0 interface. The only fix suggested by Juniper is to reduce the size of the fxp0 subnet. Which due to our design isn't easily possible.

Outside the syslog spam (that we can mute), the only risk I see is the routers not having the ARP entry for mr1-eqiad, and thus not being reachable when needed.
We can add an Icinga check for that interface as we don't use it regularly. If there is indeed connectivity issue, we can statically set mr1's IP/MAC pair on the fxp interface, but we will have to remember to update it when we refresh mr1.

  • BGP too many prefixes:

host: cr2-esams
Fix: BGP peers contacted and sessions cleared, all good now.

  • NTP server unreachable

host: pfw3-codfw
ayounsi@pfw3-codfw> set date ntp node 0 91.198.174.106
10 Aug 18:22:23 ntpdate[75455]: no server suitable for synchronization found
ayounsi@pfw3-codfw> set date ntp node 0 91.198.174.122
10 Aug 18:22:33 ntpdate[75529]: no server suitable for synchronization found
FIX:

[edit policy-options]
    prefix-list frack-codfw4 { ... }
+   prefix-list ntp-servers {
+       apply-path "system ntp server <*>";
+   }
[edit firewall family inet filter loopback4 term allow_ntp4 from source-prefix-list]
-        bgp-out;
+        ntp-servers;
  • Could not load host key: /etc/ssh/ssh_host.......

regenerate dsa/ecdsa host key
https://kb.juniper.net/InfoCenter/index?page=content&id=KB24078
this can't be done for DSA as we don't allow DSA keys (the generated key gets deleted after 1h or so).
DSA specific syslog message can be muted

  • Interfaces flapping

This is usually do to new server powered on but not configured (in reboot loop)
Fix: disabled interfaces until DHCP is configured

  • Interface-Transmit-Statistics Knob not supported on this hardware. Will have no effect.

Fix: delete "Interface-Transmit-Statistics" setting
http://forums.juniper.net/t5/Junos/interface-transmit-statistics-on-MX5-quot-knob-not-supported-on/m-p/234774#M7626

  • %-LICENSE_SHM_SCALE_READ_FAILURE: Failed to read license scale usage and %-LICENSE_SHM_FILE_OPEN_FAILURE: Failed to open the shared memory file /mfs/var/tmp/license_shmem, errno=No such file or directory

Fix: # mkdir -p /mfs/var/tmp/license_shmem
http://forums.juniper.net/t5/Ethernet-Switching/EX-3200-License-Errors/td-p/150766

  • Some more SRX specific messages on T171970#3526886
  • DDOS_PROTOCOL_VIOLATION_SET: Protocol Rejectv6:aggregate is violated

See T174364

  • fpc7 qsfp-7/0/52 Chan# 3: %-: Rx power low warning set

See T174366

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 374435 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Icinga: Add basic monitoring for routers' active RE

https://gerrit.wikimedia.org/r/374435

Change 374435 merged by Ayounsi:
[operations/puppet@production] Icinga: Add basic monitoring for routers' active RE

https://gerrit.wikimedia.org/r/374435

Will update that task if needed in the future.