Quick Maintenance - Huawei Technical Support

15
Quick Maintenance Issue: 03 (2019-01-16) S7700 and S9700 Series Smart & Core Routing Switches

Transcript of Quick Maintenance - Huawei Technical Support

Quick Maintenance

Issue: 03 (2019-01-16)

S7700 and S9700 Series Smart & Core Routing Switches

Copyright © Huawei Technologies Co., Ltd. 2019. All rights

reserved.No part of this document may be reproduced or transmitted in any form or by any

means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co.,

Ltd.

All other trademarks and trade names mentioned in this document are the property

of their respective holders.

NoticeThe purchased products, services and features are stipulated by the contract

made between Huawei and the customer. All or part of the products, services, and

features described in this document may not be within the purchase scope or the

usage scope. Unless otherwise specified in the contract, all statements,

information, and recommendations in this document are provided "AS IS" without

warranties, guarantees or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort

has been made in the preparation of this document to ensure accuracy of the

contents, but all statements, information, and recommendations in this document

do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.Address: Huawei Industrial Base

Bantian, Longgang

Shenzhen 518129

People's Republic of China

Website: http://e.huawei.com/en/

Contents

1

2

10

12

Before You Start

How to Quickly Maintain the

S7700&S9700

Fault Information Collection

and Feedback

Solution to Device Login Failure

Risky Operations

8

Before You Start

Before you take over the maintenance of the switch, you are advised to:

1 Obtain the network's topology diagram and data plans (including ports, VLANs, and IP addresses),

print them, and paste them in your equipment room for quick reference.

2 Print the following contact information and paste it nearby your workplace.

Contact method of the agent who is responsible for constructing your network and providing service

3 Prepare the tools and cables that you may use during device maintenance.

4 Visit Huawei enterprise technical support website (http://support.huawei.com/enterprise) to register an

account. Then you can browse or download more product documents, cases, and announcements,

and receive push messages from the website.

No. Item Description

1 Cable

Maintenance terminal

Instruments and meters

An RS232 serial cable: used to log in to the device through the console port.

Serial to USB converter: used to connect the USB port of the maintenance terminal to the

Console port of the switch.

Two straight-through cables: used to commission the management port or other services.

Multiple fibers and SFP/eSFP/SFP+/XSFP/QSFP+ optical modules: used to connect the switch

to other network devices.

A maintenance terminal can be a portable computer with serial communication software installed.

You can log in to the switch through the maintenance terminal.

Optical power meter: used to test optical parameters of optical ports

(such as optical power and receive sensitivity).

2

3

1

4

5 This document uses the command outputs of V200R003 as an example. If your switch is not running

V200R003, the actual command outputs may be different. This document will provide the important

version differences.

How to Quickly Maintain the

S7700&S9700

Start

Check indicators

and rectify the fault.

Check alarms on

the switch and

rectify the fault.

Check the health

status of the switch

and rectify the fault.

Check card status

and rectify the fault.

Is the fault

rectified?

Collect and report

the fault information.

End

YesNo

The overall S7700&S9700 maintenance process is as follows:

2

To check the alarms, health status, card status, and record fault information, you

must log in to the switch through the Console port, Telnet, or STelnet. (For how to

log in to the switch, see Configuration Guide-Basic Configuration.) If you fail to log

in to the switch, see Solution to Device Login Failure.

Check Indicator Status

Check whether the status of each indicator is normal. If an indicator is in abnormal state, record the fault

information and find out the fault handling methods according to the indicator status and meanings in

Hardware Description or the troubleshooting procedures in Troubleshooting Guide. If the fault

cannot be rectified, contact your agent or Huawei enterprise technical support hotline.

The following table lists the normal status of each indicator on the switch.

3

Note: For the meanings, status, and status description of each indicator, see the Hardware Description.

Category Indicator Normal State

1600 W DC

Power supply

INPUT

ALM

Steady green

Off

MPU RUN/ALM

ACT

Slow blinking green

Steady green: active MPU; off: standby MPU

CMU RUN/ALM

ACT

Slow blinking green

Steady green: active CMU; off: standby CMU

LPU RUN/ALM Slow blinking green

Fan module RUN/ALM Slow blinking green

INPUT

ALM

Steady green

Off

•2200 W DC

power supply

•800 W AC

power supply

•2200 W AC

power supply

FAULT Off

Cluster card RUN/ALM Slow blinking green

VAS card RUN/ALM Slow blinking green

Check for Critical or Major Alarms on the Switch

4

Log in to the switch and run the display alarm active command to view the alarm status on the switch.

Check whether any critical or major alarms exist.

<HUAWEI> display alarm active | include Major

A/B/C/D/E/F/G/H/I/J

A=Sequence, B=RootKindFlag(Independent|RootCause|nonRootCause)

C=Generating time, D=Clearing time

E=ID, F=Name, G=Level, H=State

I=Description information for locating(Para info, Reason info)

J=RootCause alarm sequence(Only for nonRootCause alarm)

1/Independent/2014-07-29 19:43:21+08:00/-/0xff0c201c/hwStorageUtilizationRisin

gAlarm/Major/Start/OID 1.3.6.1.4.1.2011.5.25.129.2.6.1 Storage utilization excee

ded the pre-alarm threshold.(Index=70778889, BaseUsagePhyIndex=0, UsageType=5, U

sageIndex=0, Severity=4, ProbableCause=151, EventType=4, PhysicalName="MPU Board

14", RelativeResource="", UsageValue=92, UsageUnit=1, UsageThreshold=90)

4/Independent/2014-07-29 19:43:21+08:00/-/0x418c2002/hwGtlDefaultValue/Major/S

tart/OID 1.3.6.1.4.1.2011.5.25.142.2.1 Current license value is default, the rea

son is No license available.

The alarms on the switch are classified into critical, major, minor, and warning alarms. The critical and

major alarms must be handled immediately. Handle these alarms according to the Alarm Reference. If

the alarms cannot be cleared, contact your agent or Huawei enterprise technical support hotline.

If you have a network management system (NMS), check the alarms on the NMS.

For details, see the NMS product documents.

Check the Health Status of the Switch

5

Log in to the switch and run the display health command to check the health status of the switch.

1 View the voltage information and check whether the voltage status of each present card is normal.

2 View the temperature information and check whether the temperature status of each present card is

normal.

3 View the power information and check whether the status of each present power supply is Supply.

-------------------------------------------------------------------------------

Slot Card SDR No. SensorName Status Upper Lower Voltage.(V)

-------------------------------------------------------------------------------

7 - 3 3.3V normal 3.9592 2.6460 3.2928

- 4 2.5V normal 2.9988 1.9992 2.5872

- 5 1.8V normal 2.1560 1.4406 1.8816

If the voltage status of a card is abnormal, record the fault information and handle the fault according

to the Troubleshooting Guide. If the fault cannot be rectified, contact your agent or Huawei enterprise

technical support hotline.

-----------------------------------------------------------

Slot Card SDR No. Status Upper Lower Temperature.(C)

-----------------------------------------------------------

7 - 1 normal 67.00 0.00 38.00

- 2 normal 64.00 0.00 34.00

10 - 1 normal 58.00 0.00 36.00

- 2 normal 56.00 0.00 31.00

If the temperature is abnormal, check whether the ambient temperature in the equipment room is

normal, whether the heat dissipation channel in the chassis is blocked, and whether all fan modules

are working properly. Take appropriate measures accordingly. If the fault cannot be rectified, record

the fault information and contact your agent or Huawei enterprise technical support hotline.

--------------------------------------------------------------------------

PowerNo Present Mode State Current(A) Voltage(V) RealPwr(W)

--------------------------------------------------------------------------

PWR1 YES AC Supply 2.7500 53.5200 148.6000

PWR2 YES AC Supply 2.6400 53.3900 143.6000

PWR3 NO N/A N/A N/A N/A N/A

PWR4 NO N/A N/A N/A N/A N/A

PWR5 NO N/A N/A N/A N/A N/A

PWR6 NO N/A N/A N/A N/A N/A

If the power status is abnormal, check whether the power supply is switched on and whether the

power cable is loose, and replace the problematic power supply. If the fault cannot be rectified, record

the fault information and contact your agent or Huawei enterprise technical support hotline.

View the fan information and check whether the register status of each fan is YES.4

5 View the memory information. The memory usage of each present card should be lower than 60%.

System Memory Usage Information:

System memory usage at 2004-08-03 16:10:35

-------------------------------------------------------------------------------

Slot Total Memory(MB) Used Memory(MB) Used Percentage Upper Limit

-------------------------------------------------------------------------------

7 170 58 34% 85%

10 170 60 35% 85%

13 1827 163 8% 95%

14 1827 162 8% 95%

-------------------------------------------------------------------------------

If the memory usage is too high, observe the memory usage for 5-10 minutes. If the memory usage is

still high, contact your agent or Huawei enterprise technical support hotline.

6

6 View the CPU usage information. The CPU usage of each present card should be lower than 80%.

System CPU Usage Information:

System cpu usage at 2004-08-03 16:10:35

-------------------------------------------------------------------------------

Slot CPU Usage Upper Limit

-------------------------------------------------------------------------------

7 13% 80%

10 14% 80%

13 12% 80%

14 8% 80%

-------------------------------------------------------------------------------

If the CPU usage is too high, observe the CPU usage for 5-10 minutes. If the CPU usage is still high,

contact your agent or Huawei enterprise technical support hotline.

7 View the storage media usage information. The storage media usage should be lower than 80%.

Disk Usage Information:

System disk usage at 2004-08-03 16:10:35

-------------------------------------------------------------------------------

Slot Device Total Memory(MB) Used Memory(MB) Used Percentage

-------------------------------------------------------------------------------

13 flash: 103 88 85%

cfcard: 509 438 86%

-------------------------------------------------------------------------------

If the storage media usage exceeds 80%, delete redundant files. For details, see the Configuration

Guide-Basic Configuration.

-------------------------------------------------------------------------------

FanId FanNum Present Register Speed Mode

-------------------------------------------------------------------------------

FAN1 [1-2] YES YES 30%(2160) AUTO

1 2100

2 2220

FAN2 [1-2] YES YES 35%(2340) AUTO

1 2250

2 2430

If the fan status is abnormal, check whether the fan module is properly connected, whether the fan

blades are blocked, and whether dust is accumulated on the fans. If the preceding situations occur,

reinstall the fan modules or clean the fan blades. If other situations occur, replace the fan module.

If the fault cannot be rectified, record the fault information and contact your agent or Huawei

enterprise technical support hotline.

Check the Card Status

7

Log in to the switch and run the display device command to view card status.

<HUAWEI> display device

S9712's Device status:

Slot Sub Type Online Power Register Alarm Primary

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

7 - EH1D2X02XEC0 Present PowerOn Registered Normal NA

10 - EH1D2G48SEC0 Present PowerOn Registered Normal NA

13 - EH1D2SRUDC00 Present PowerOn Registered Normal Master

14 - EH1D2SRUDC00 Present PowerOn Registered Normal Slave

PWR1 - - Present PowerOn Registered Normal NA

PWR2 - - Present PowerOn Registered Normal NA

CMU1 - EH1D200CMU00 Present PowerOn Registered Normal Master

FAN1 - - Present PowerOn Registered Normal NA

FAN2 - - Present PowerOn Registered Normal NA

FAN3 - - Present PowerOn Registered Normal NA

FAN4 - - Present PowerOn Registered Normal NA

Check the following items:

Whether the Online value is Present.

Whether the Power value is PowerOn.

Whether the Register value is Registered.

Whether the Alarm value is Normal.

If the card status is abnormal, record the fault information and handle the fault according to the

Troubleshooting Guide. If the fault cannot be rectified, contact your agent or Huawei enterprise

technical support hotline.

8

Fault Information Collection

and Feedback

When you detect errors on your switch, collect fault information in real time and take the corresponding

measures.

Fault information includes:

Basic fault information: fault occurrence time, symptom, severity, impact, network topology, measures

that have been taken, and effect

Switch running status: device name, version, current configuration, and interface information

Log information: logs recorded when faults occur

Provide the collected information to your agent or Huawei technical support engineers.

Collect Fault Basic Information

When a fault occurs, collect the following fault basic information:

No. Item Collection Method

1

5

Fault occurrence time

Symptom

Impact

Networking

Measures that have

been taken

Record the time when the fault occurs, in minutes.

Record the fault symptom and detailed information.

Record the severity of the fault and impacted services.

Draw a networking diagram, including the upstream and downstream devices and

connected ports.

Record the measures that have been taken and effect of the measures

(including command execution procedure and output).

2

3

4

Collect Switch Running Information

Log in to the switch and run the display diagnostic-information command to collect switch running

information, including startup configuration, current configuration, port information, time, and system

version.<HUAWEI> display diagnostic-information dia-info.txt

Now saving the diagnostic information to the device.............................

................................................................................

..............

Info: The diagnostic information was saved to the device successfully.

The generated configuration file is saved in the cfcard:/ directory by default. You can run the dir

command in the user view to check whether the configuration file is generated.

You can transfer the configuration file to your computer through TFTP, FTP, or SFTP to facilitate

information query and feedback. For details, see the Configuration Guide - Basic Configuration.

9

Collect Logs

Device logs involve user operations, system faults, and system security issues. Logs are classified into

user logs and diagnostic logs. After logging in to the switch, obtain the user logs and diagnostic logs as

follows:

<HUAWEI> save logfile //Collect common user logs.

<HUAWEI> system-view

[HUAWEI] diagnose

[HUAWEI-diagnose] save diag-logfile //Collect diagnostic logs.

[HUAWEI-diagnose] terminal diag-logging //Enable diagnostic log debugging.

You can transfer the files from cfcard:/logfile to your computer through TFTP, FTP, or SFTP to facilitate

information query and feedback. For details, see the Configuration Guide - Basic Configuration.

10

Solution to Device Login Failure

If you fail to log in to the switch through Telnet or STelnet, log in to the switch through the Console (also

called serial) port and check the Telnet or STelnet configuration.

If you still fail to log in to the switch through the Console port, you cannot perform any operations related

to CLI. In this situation, you need to perform the following operations:

Perform the following operations only when you confirm that the user

service has been interrupted, because these operations will affect user

service. Collect the fault information and contact your agent or Huawei

enterprise technical support hotline.

1 Check and recover the power supply system.

If the indicators of all cards are off and the fans do not work (listening to the noise), the power supply

system fails.

1.Check the power supply switches. If your switch has multiple power supplies installed, at least one

power supply must be switched on.

2.Check the RUN indicator of the power supply. If the indicator is off, the power supply input is

abnormal. Request the electrician to recover the power lines in the equipment room, rack, or cabinet.

3.Check the ALM indicator of the power supply. If the indicator is on, the power supply is abnormal.

Replace the power supply.

4.If the cards cannot be powered on and no error is found in the preceding checks, contact your agent

or Huawei enterprise technical support hotline.

2 Check and modify the communication parameters of the COM port on your computer.

Check whether the communication parameters of the COM port are the same as the those of the

switch's Console port. If not, modify the communication parameters.

The default settings of the switch's Console port parameters include 9600 bps, 8 data bits, 1 stop bit,

no parity check, and no flow control (the actual settings may be different).

3 Remove/reinstall or replace the MPU.

If the power supply system and Console port work properly, the MPU may be faulty. If your switch

has two MPUs installed, remove/reinstall the problematic MPU. If your switch has only one MPU

installed, replace it with a new one.

11

4 Restart the switch.

If the fault cannot be rectified after you remove/reinstall or replace the MPU, you can restart the switch.

Power off the switch, and then power on it after 3 minutes.

5 Seek technical support.

If the preceding methods are ineffective, contact your agent or Huawei enterprise technical support

hotline.

Risky Operations

Hardware-Related Risky Operations

Remove or install cables inside a cabinet.

Remove or install cards without an ESD wrist strap.

Remove the active MPU.

Press the RST button of the MPU.

Remove or install a CF card when the switch is running.

Software-Related Risky Operations

Run the reboot command to restart the switch.

Run the reset slot command to reset cards.

Run the power off slot command to power off cards.

Run the shutdown command to shut down physical ports.

Run the format command to format the storage device.

Run the delete command to delete files from the storage device.

Run the reset command to reset protocols.

Change the authentication method or user login password of the Console port or VTY users.

12