CB 960 Offline Online RMA
CB 960 Offline Online RMA
CB 960 Offline Online RMA
The device has appeared some error logs and alarms, as belows:
BJWJ-PC-CMNET-RT02
From: Mengzhe Hu Sent: Friday, May 15, 2015 10:52 AM To: '??' Cc: support Subject:
RE: Tech SR: 2015-0429-0357,CM-PB-BJ:BJWJPCRT02:ALL CB CHECK STATES
I understand that all CBs are in check state since 24 Nov 2014:
root@BJWJ-PC-CMNET-RT02-RE0> show chassis alarms no-forwarding
I�ve went through the actions you have done and the logs before/after event.
->First. I have to tell you that CB0 and CB1 has not really been offlined/onlined.
That is why you are still seeing the alarms onCB0 and CB1
->Second. As only the fabric plane is affected, we can offline/online the fabric
plane. In this case, the routing-engine on CB won�t be affected. And since MX960 is
2+1 CB redundancy, offline/online the fabric plane one by one should not affect the
traffic.
Commands to do. Before executing each commands, using �show chassis fabric plane�
to verify
request chassis fabric plane 0 offline
request chassis fabric plane 0 online
request chassis fabric plane 1 offline
request chassis fabric plane 1 online
request chassis fabric plane 2 offline
request chassis fabric plane 2 online
request chassis fabric plane 3 offline
request chassis fabric plane 3 online
->Third. If the soft reset doesn�t help. We have to go to the process of physically
reset the CB0 AND CB1. In order to reducetraffic impact, we can RMA the CBs first
and do the replacement in one maintenance window.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello Paul,
I've checked the logs, it looks like there were transient CRC errors on all the
fabric plane back then:
Line 40291: Nov 24 15:24:01 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link
0 failed because of crc errors
Line 40293: Nov 24 15:24:01 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link
1 failed because of crc errors
Line 40295: Nov 24 15:24:01 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link
36 failed because of crc errors
Line 40297: Nov 24 15:24:01 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link
37 failed because of crc errors
Line 40299: Nov 24 15:24:01 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link
52 failed because of crc errors
The "CHECK" status will not be cleared by its own unless we reboot/reseat the CB.
As you don't have NSR enabled, is it possible toschedule a MW to reboot/reseat
these CBs one by one?
Also, "show log messages" is rolled over, it doesn't give any information at 2014-
11-24.
From "show log chassisd", it looks like there's some temperature issue before issue
happen across the system. Can you recall anyrelated event in the data center?
Nov 24 15:24:01 FPC 0 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 0 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 0 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 0 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 0 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 0 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 2 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 2 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 2 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 2 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 2 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 2 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 4 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 4 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 4 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 4 temp sensor not ok, status 0x8 failed 1 times
Nov 24 15:24:01 FPC 4 temperature is -60 degrees C, which is outside operating
range
Nov 24 15:24:01 FPC 4 temp sensor not ok, status 0x8 failed 1 times
Next action plan: Collect the data to double confirm the issueReseat/reboot CB one
by one. Most situations, "CHECK" status will goaway after reboot. If "CHECK" status
continue, we will go ahead and create RMA for these CBs.