Troubleshooting and Monitoring On The QFX Series: Release
Troubleshooting and Monitoring On The QFX Series: Release
Release
13.2
Published: 2014-04-01
Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify,
transfer, or otherwise revise this publication without notice.
The information in this document is current as of the date on the title page.
Juniper Networks hardware and software products are Year 2000 compliant. Junos OS has no known time-related limitations through the
year 2038. However, the NTP application is known to have some difficulty in the year 2036.
The Juniper Networks product that is the subject of this technical documentation consists of (or is intended for use with) Juniper Networks
software. Use of such software is subject to the terms and conditions of the End User License Agreement (“EULA”) posted at
http://www.juniper.net/support/eula.html. By downloading, installing or using such software, you agree to the terms and conditions of
that EULA.
Part 1 Overview
Chapter 1 General Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Understanding Troubleshooting Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Troubleshooting Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Understanding Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chassis Alarm Messages on a QFX3500 Device . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Interface Alarm Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
System Utilization Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Part 2 Administration
Chapter 3 Routine Monitoring Using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Monitoring SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Tracing SNMP Activity on a Device Running Junos OS . . . . . . . . . . . . . . . . . . . . . . 19
Configuring the Number and Size of SNMP Log Files . . . . . . . . . . . . . . . . . . . 20
Configuring Access to the Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Configuring a Regular Expression for Lines to Be Logged . . . . . . . . . . . . . . . . 20
Configuring the Trace Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Monitoring RMON MIB Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Displaying a Log File from a Single-Chassis System . . . . . . . . . . . . . . . . . . . . . . . . 23
Monitoring System Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Monitoring Traffic Through the Router or Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Displaying Real-Time Statistics About All Interfaces on the Router or
Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Displaying Real-Time Statistics About an Interface on the Router or
Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Pinging Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Part 3 Troubleshooting
Chapter 4 Configuration and File Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Loading a Previous Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Reverting to the Default Factory Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Reverting to the Rescue Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Cleaning Up the System File Storage Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 5 Ethernet Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Troubleshooting Ethernet Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 6 High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Troubleshooting VRRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 7 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Troubleshooting an Aggregated Ethernet Interface . . . . . . . . . . . . . . . . . . . . . . . . 39
Troubleshooting Network Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
The interface on the port in which an SFP or SFP+ transceiver is installed
in an SFP or SFP+ module is down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Troubleshooting Multichassis Link Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
MAC Addresses Learned on MC-AE Interfaces Are Not Removed from the
MAC Address Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
MC-LAG Peer Does Not Go into Standby Mode . . . . . . . . . . . . . . . . . . . . . . . . 41
Secondary MC-LAG Peer with Status Control Set to Standby Becomes
Inactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Redirect Filters Take Priority over User-Defined Filters . . . . . . . . . . . . . . . . . . 41
Operational Command Output Is Wrong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
ICCP Connection Might Take Up to 60 Seconds to Become Active . . . . . . . . 42
MAC Address Age Learned on an MC-AE Interface Is Reset to Zero . . . . . . . . 42
MAC Address Is Not Learned Remotely in a Default VLAN . . . . . . . . . . . . . . . 43
Snooping Entries Learned on MC-AE Interfaces Are Not Removed . . . . . . . . 43
ICCP Does Not Come Up After You Add or Delete an Authentication Key . . . 43
Local Status Is Standby When It Should Be Active . . . . . . . . . . . . . . . . . . . . . 43
Packets Loop on the Server When ICCP Fails . . . . . . . . . . . . . . . . . . . . . . . . . 43
Both MC-LAG Peers Use the Default System ID After a Reboot or an ICCP
Configuration Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
No Commit Checks Are Done for ICL-PL Interfaces . . . . . . . . . . . . . . . . . . . . 44
Double Failover Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Multicast Traffic Floods the VLAN When the ICL-PL Interface Goes Down
and Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Layer 3 Traffic Sent to the Standby MC-LAG Peer Is Not Redirected to Active
MC-LAG Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
AE Interfaces Go Down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Flooding of Upstream Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chapter 8 Junos OS Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Rebooting and Halting a QFX Series Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Recovering from a Failed Software Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Recovering the Root Password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Creating an Emergency Boot Device for a QFX Series Device . . . . . . . . . . . . . . . . 50
Performing a Recovery Installation on a QFX Series Device . . . . . . . . . . . . . . . . . . 52
Part 1 Overview
Chapter 1 General Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Table 3: Troubleshooting Resources on the QFX Series . . . . . . . . . . . . . . . . . . . . . . 3
Table 4: Troubleshooting on the QFX Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Table 5: Alarm Terms and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Table 6: QFX3500 Chassis Alarm Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Part 2 Administration
Chapter 3 Routine Monitoring Using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Table 7: SNMP Tracing Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Table 8: Output Control Keys for the monitor interface Command . . . . . . . . . . . . 27
If the information in the latest release notes differs from the information in the
documentation, follow the product Release Notes.
Juniper Networks Books publishes books by Juniper Networks engineers and subject
matter experts. These books go beyond the technical documentation to explore the
nuances of network architecture, deployment, and administration. The current list can
be viewed at http://www.juniper.net/books.
Supported Platforms
For the features described in this document, the following platforms are supported:
If you want to use the examples in this manual, you can use the load merge or the load
merge relative command. These commands cause the software to merge the incoming
configuration into the current candidate configuration. The example does not become
active until you commit the candidate configuration.
If the example configuration contains the top level of the hierarchy (or multiple
hierarchies), the example is a full example. In this case, use the load merge command.
If the example configuration does not start at the top level of the hierarchy, the example
is a snippet. In this case, use the load merge relative command. These procedures are
described in the following sections.
1. From the HTML or PDF version of the manual, copy a configuration example into a
text file, save the file with a name, and copy the file to a directory on your routing
platform.
For example, copy the following configuration to a file and name the file ex-script.conf.
Copy the ex-script.conf file to the /var/tmp directory on your routing platform.
system {
scripts {
commit {
file ex-script.xsl;
}
}
}
interfaces {
fxp0 {
disable;
unit 0 {
family inet {
address 10.0.0.1/24;
}
}
}
}
2. Merge the contents of the file into your routing platform configuration by issuing the
load merge configuration mode command:
[edit]
user@host# load merge /var/tmp/ex-script.conf
load complete
Merging a Snippet
To merge a snippet, follow these steps:
1. From the HTML or PDF version of the manual, copy a configuration snippet into a text
file, save the file with a name, and copy the file to a directory on your routing platform.
For example, copy the following snippet to a file and name the file
ex-script-snippet.conf. Copy the ex-script-snippet.conf file to the /var/tmp directory
on your routing platform.
commit {
file ex-script-snippet.xsl; }
2. Move to the hierarchy level that is relevant for this snippet by issuing the following
configuration mode command:
[edit]
user@host# edit system scripts
[edit system scripts]
3. Merge the contents of the file into your routing platform configuration by issuing the
load merge relative configuration mode command:
For more information about the load command, see the CLI User Guide.
Documentation Conventions
Caution Indicates a situation that might result in loss of data or hardware damage.
Laser warning Alerts you to the risk of personal injury from a laser.
Table 2 on page xi defines the text and syntax conventions used in this guide.
Bold text like this Represents text that you type. To enter configuration mode, type the
configure command:
user@host> configure
Fixed-width text like this Represents output that appears on the user@host> show chassis alarms
terminal screen.
No alarms currently active
Italic text like this • Introduces or emphasizes important • A policy term is a named structure
new terms. that defines match conditions and
• Identifies guide names. actions.
• Junos OS CLI User Guide
• Identifies RFC and Internet draft titles.
• RFC 1997, BGP Communities Attribute
Italic text like this Represents variables (options for which Configure the machine’s domain name:
you substitute a value) in commands or
configuration statements. [edit]
root@# set system domain-name
domain-name
Text like this Represents names of configuration • To configure a stub area, include the
statements, commands, files, and stub statement at the [edit protocols
directories; configuration hierarchy levels; ospf area area-id] hierarchy level.
or labels on routing platform • The console port is labeled CONSOLE.
components.
< > (angle brackets) Encloses optional keywords or variables. stub <default-metric metric>;
# (pound sign) Indicates a comment specified on the rsvp { # Required for dynamic MPLS only
same line as the configuration statement
to which it applies.
[ ] (square brackets) Encloses a variable for which you can community name members [
substitute one or more values. community-ids ]
GUI Conventions
Bold text like this Represents graphical user interface (GUI) • In the Logical Interfaces box, select
items you click or select. All Interfaces.
• To cancel the configuration, click
Cancel.
> (bold right angle bracket) Separates levels in a hierarchy of menu In the configuration editor hierarchy,
selections. select Protocols>Ospf.
Documentation Feedback
Technical product support is available through the Juniper Networks Technical Assistance
Center (JTAC). If you are a customer with an active J-Care or JNASC support contract,
or are covered under warranty, and need post-sales technical support, you can access
our tools and resources online or open a case with JTAC.
• JTAC hours of operation—The JTAC centers have resources available 24 hours a day,
7 days a week, 365 days a year.
• Find solutions and answer questions using our Knowledge Base: http://kb.juniper.net/
To verify service entitlement by product serial number, use our Serial Number Entitlement
(SNE) Tool: https://tools.juniper.net/SerialNumberEntitlementSearch/
Overview
• General Troubleshooting on page 3
• Alarms on page 9
General Troubleshooting
This topic describes some of the troubleshooting resources available for the QFX Series.
These resources include tools such as the Junos OS CLI, Junos Space applications, and
the Advanced Insight Scripts (AI-Scripts).
Chassis alarms Chassis alarms indicate a failure on the “Chassis Alarm Messages on a QFX3500
switch or one of its components. A chassis Device” on page 10
alarm count is displayed on the LCD panel
on the front of the switch.
Chassis Status LEDs and Fan Tray A blinking amber Power, Fan, or Fan Tray Chassis Status LEDs on a QFX3500 Device
LEDs LED indicates a hardware component error.
A blinking amber Status LED indicates a
software error.
Interface alarms A predefined alarm (red or yellow) for an “Interface Alarm Messages” on page 13
interface type is triggered when an interface
of that type goes down.
System log messages The system log includes details of system • Overview of Single-Chassis System
and user events, including errors. Specify Logging Configuration
the severity and type of system log • Junos OS System Log Configuration
messages you wish to view or save, and Statements
configure the output to be sent to local or
remote hosts.
Junos OS operational mode Operational mode commands can be used • Monitoring System Process Information
commands to monitor switch performance and current • Monitoring System Properties
activity on the network. For example, use
• traceroute monitor
the traceroute monitor command to locate
points of failure in a network.
Junos OS automation scripts Event scripts can be used to automate Junos OS Automation Library
(event scripts) network troubleshooting and management
tasks.
Junos OS XML operational tags XML operational tags are equivalent in Junos XML API Operational Developer
function to operational mode commands Reference
in the CLI, which you can use to retrieve
status information for a device.
NETCONF XML management The NETCONF XML management protocol NETCONF XML Management Protocol
protocol defines basic operations that are equivalent Developer Guide
to Junos OS CLI configuration mode
commands. Client applications use the
protocol operations to display, edit, and
commit configuration statements (among
other operations), just as administrators
use CLI configuration mode commands
such as show, set, and commit to perform
those operations.
SNMP MIBs and traps MIBs enable the monitoring of network • SNMP MIBs Support
devices from a central location. For • SNMP Traps Support
example, use the Traceroute MIB to monitor
• Using the Traceroute MIB for Remote
devices remotely.
Monitoring Devices Running Junos OS
AI-Scripts and Advanced Insight AI-Scripts installed on the switch can Advanced Insight Scripts (AI-Scripts)
Manager (AIM) automatically detect and monitor faults on Release Notes
the switch, and depending on the
configuration on the AIM application, send
notifications of potential problems and
submit problem reports to Juniper Support
Systems.
Junos Space Service Now This application enables you to display and Service Automation
manage information about problem events.
When problems are detected on the switch
by Advanced Insight Scripts (AI-Scripts)
that are installed on the switch, the data is
collected and sent to Service Now for your
review and action.
Junos Space Service Insight This application helps in accelerating Service Automation
operational analysis and managing the
exposure to known issues. You can identify
devices that are nearing their End Of Life
(EOL) and also discover and prevent issues
that could occur in your network. The
functionality of Service Insight is dependent
on the information sent from Service Now.
Juniper Networks Knowledge Base You can search in this database for Juniper http://kb.juniper.net
Networks product information, including
alerts and troubleshooting tips.
Troubleshooting Overview
This topic provides a general guide to troubleshooting some typical problems you may
encounter on your QFX Series product.
Switch hardware LCD panel shows a chassis alarm count. See “Chassis Alarm Messages on a QFX3500 Device”
components on page 10.
Fan tray LED is blinking amber. See Fan Tray LED on a QFX3500 Device.
Chassis status LED for the power is blinking See Chassis Status LEDs on a QFX3500 Device.
amber.
Chassis status LED for the fan (on the Replace the management board as soon as possible.
management board) is blinking amber. See Chassis Status LEDs on a QFX3500 Device.
Port configuration Cannot configure a port as a Gigabit Ethernet Check whether the port is a valid Gigabit Ethernet
port. port (6 through 41).
Cannot configure a port as a Fibre Channel port. Check whether the port is a valid Fibre Channel port
(0 through 5 and 42 through 47).
Cannot configure a port as a 10-Gigabit Ethernet If the port is not a 40-Gbps QSFP+ interface, check
port. whether the port is in the range of 0 through 5 or 42
through 47. If one of the ports in that block (0 through
5 or 42 through 47) is configured as a Fibre Channel
port, then all ports in that block must also be
configured as Fibre Channel ports.
Cannot configure a 40-Gbps QSFP+ interface. The 40-Gbps QSFP+ interfaces can only be used as
10-Gigabit Ethernet interfaces. Each 40-Gbps QSFP+
interface can be split into four 10-Gigabit Ethernet
interfaces using a breakout cable. However, port 0
is reserved, so you can only configure an additional
fifteen 10-Gigabit Ethernet interfaces.
External devices (USB Upgrading software from a USB device results Unplug the USB device and reboot the switch.
devices) in an upgrade failure, and the system enters an
invalid state.
Initial device Cannot configure management Ethernet ports. Configure the management ports from the console
configuration port. You cannot configure the management ports
by directly connecting to them.
Software upgrade Failed software upgrade. See “Recovering from a Failed Software Installation”
and configuration on page 48.
Active partition becomes inactive after upgrade.
Problem with the active configuration file. See the following topics:
Root password is lost or forgotten. Recover the root password. See “Recovering the Root
Password” on page 49.
Network interfaces An aggregated Ethernet interface is down. See “Troubleshooting an Aggregated Ethernet
Interface” on page 39.
Ethernet switching A MAC address entry in the Ethernet switching See “Troubleshooting Ethernet Switching” on
table is not updated after the device with that page 35.
MAC address has been moved from one
interface to another on the switch.
Firewall filter Firewall configuration exceeded available See “Troubleshooting Firewall Filter Configuration”
Ternary Content Addressable Memory (TCAM) on page 57.
space.
Alarms
Understanding Alarms
The QFX Series support different alarm types and severity levels. Table 5 on page 9
provides a list of alarm terms and definitions that may help you in monitoring the device.
Alarm Signal alerting you to conditions that might prevent normal operation. On the device, alarm indicators might
include the LCD panel and LEDs on the device. The LCD panel (if present on the device) displays the chassis
alarm message count. Blinking amber LEDs indicate yellow alarm conditions for chassis components.
Alarm severity Seriousness of the alarm. The level of severity can be either major (red) or minor (yellow).
levels
• Major (red)—Indicates a critical situation on the device that has resulted from one of the following
conditions. A red alarm condition requires immediate action.
• One or more hardware components have failed.
• One or more hardware components have exceeded temperature thresholds.
• An alarm condition configured on an interface has triggered a critical warning.
• Minor (yellow or amber)—Indicates a noncritical condition on the device that, if left unchecked, might
cause an interruption in service or degradation in performance. A yellow alarm condition requires
monitoring or maintenance. For example, a missing rescue configuration generates a yellow system
alarm.
• Chassis alarm—Predefined alarm triggered by a physical condition on the device such as a power supply
failure or excessive component temperature.
• Interface alarm—Alarm you configure to alert you when an interface link is down. Applies to ethernet,
fibre-channel, and management-ethernet interfaces. You can configure a red (major) or yellow (minor)
alarm for the link-down condition, or have the condition ignored.
• System alarm—Predefined alarm that might be triggered by a missing rescue configuration, failure to
install a license for a licensed software feature, or high disk usage.
Chassis alarms indicate a failure on the device or one of its components. Chassis alarms
are preset and cannot be modified.
The chassis alarm message count is displayed on the LCD panel on the front of the device.
To view the chassis alarm message text remotely, use the show chassis lcd CLI command.
• Major (red)—Indicates a critical situation on the device that has resulted from one of
the conditions described in Table 6 on page 11. A red alarm condition requires
immediate action.
Fans Major (red) Fan/Blower Absent The fan is missing. Install a fan.
Fan I2C Failure Check the system log for one of the
following messages and report the error
message to customer support:
fan-number Not Spinning Fan Remove and check the fan for
obstructions, and then reinsert the fan. If
the problem persists, replace the fan.
Power Supplies Major (red) PEM pem-number Airflow not matching The power supply airflow direction is the
Chassis Airflow opposite of the chassis airflow direction.
Replace the power supply with a power
supply that supports the same airflow
direction as the chassis.
PEM pem-number I2C Failure Check the system log for one of the
following messages and report the error
message to customer support:
Minor (yellow) PEM pem-number Absent For information only. Indicates the device
was powered on with two power supplies
installed, but now one is missing. The
device can continue to operate with a
single power supply. If you wish to
remove this alarm message, reboot the
device with one power supply.
PEM pem-number is not powered For information only. Check the power
cord connection and reconnect it if
necessary.
PEM pem-number Power Supply Type For information only. Indicates that an
Mismatch AC power supply and DC power supply
have been installed in the same chassis.
If you wish to remove this alarm message,
reboot the device with two AC power
supplies or two DC power supplies.
Temperature Major (red) sensor-location Temp Sensor Fail Check the system log for the following
Sensors message and report it to customer
support:
Minor (yellow) sensor-location Temp Sensor Too Warm For information only. Check
environmental conditions and alarms on
other devices. Ensure that environmental
factors (such as hot air blowing around
the equipment) are not affecting the
temperature sensor.
• alarm
Interface alarms are alarms that you configure to alert you when an interface is down.
By default, major alarms are configured for interface link-down conditions on the control
plane and management network interfaces in a QFabric system. The link-down alarms
indicate that connectivity to the control plane network is down. You can configure these
alarms to be ignored using the alarm statement at the [edit chassis] hierarchy level.
QFX Series devices provide system alarms that alert you when disk usage in the /var
partition exceeds acceptable levels.
You can display the messages for these alarms by issuing the show system alarms
operational mode command if the /var partition usage exceeds 75 percent. A usage level
between 76 and 90 percent indicates high usage and raises a minor alarm condition,
whereas a usage level above 90 percent indicates that the partition is full and raises a
major alarm condition.
The following sample output from the show system alarms command shows system
alarm messages that are displayed when disk usage is exceeded on the switch.
Administration
• Routine Monitoring Using the CLI on page 17
Monitoring SNMP
There are several commands that you can access in Junos OS operational mode to
monitor SNMP information. Some of the commands are:
• show snmp health-monitor, which displays the health monitor log and alarm information.
• show snmp mib, which displays information from the MIBs, such as device and system
information.
• show snmp statistics, which displays SNMP statistics such as the number of packets,
silent drops, and invalid output values.
• show snmp rmon, which displays the RMON alarm, event, history, and log information
The following example provides sample output from the show snmp health-monitor
command:
jnxOperatingBuffer.9.1.0.0 35 active
The following example provides sample output from the show snmp mib command:
The following example provides sample output from the show snmp statistics command:
SNMP statistics:
Input:
Packets: 0, Bad versions: 0, Bad community names: 0,
Bad community uses: 0, ASN parse errors: 0,
Too bigs: 0, No such names: 0, Bad values: 0,
Read onlys: 0, General errors: 0,
Total request varbinds: 0, Total set varbinds: 0,
Get requests: 0, Get nexts: 0, Set requests: 0,
Get responses: 0, Traps: 0,
Silent drops: 0, Proxy drops: 0, Commit pending drops: 0,
Throttle drops: 0, Duplicate request drops: 0
Output:
Packets: 0, Too bigs: 0, No such names: 0,
Bad values: 0, General errors: 0,
Get requests: 0, Get nexts: 0, Set requests: 0,
Get responses: 0, Traps: 0
Related • health-monitor
Documentation
• show snmp mib
SNMP tracing operations track activity for SNMP agents and record the information in
log files. The logged error descriptions provide detailed information to help you solve
problems faster.
By default, Junos OS does not trace any SNMP activity. If you include the traceoptions
statement at the [edit snmp] hierarchy level, the default tracing behavior is:
• Important activities are logged in files located in the /var/log directory. Each log is
named after the SNMP agent that generates it. Currently, the following log files are
created in the /var/log directory when the traceoptions statement is used:
• chassisd
• craftd
• ilmid
• mib2d
• rmopd
• serviced
• snmpd
• When a trace file named filename reaches its maximum size, it is renamed filename.0,
then filename.1, and so on, until the maximum number of trace files is reached. Then
the oldest trace file is overwritten. (For more information about how log files are created,
see the Junos OS System Log Messages Reference.)
• Log files can be accessed only by the user who configured the tracing operation.
You cannot change the directory (/var/log) in which trace files are located. However,
you can customize the other trace file settings by including the following statements at
the [edit snmp] hierarchy level:
[edit snmp]
traceoptions {
file <files number> <match regular-expression> <size size> <world-readable |
no-world-readable>;
flag flag;
no-remote-trace;
}
You can configure the limits on the number and size of trace files by including the following
statements at the [edit snmp traceoptions] hierarchy level:
For example, set the maximum file size to 2 MB, and the maximum number of files to 20.
When the file that receives the output of the tracing operation (filename) reaches 2 MB,
filename is renamed filename.0, and a new file called filename is created. When the new
filename reaches 2 MB, filename.0 is renamed filename.1 and filename is renamed
filename.0. This process repeats until there are 20 trace files. Then the oldest file
(filename.19) is overwritten by the newest file (filename.0).
The number of files can be from 2 through 1000 files. The file size of each file can be from
10 KB through 1 gigabyte (GB).
To specify that any user can read all log files, include the file world-readable statement
at the [edit snmp traceoptions] hierarchy level:
To explicitly set the default behavior, include the file no-world-readable statement at the
[edit snmp traceoptions] hierarchy level:
You can refine the output by including the match statement at the [edit snmp traceoptions
file filename] hierarchy level and specifying a regular expression (regex) to be matched:
all;
configuration;
database;
events;
general;
interface-stats;
nonvolatile-sets;
pdu;
policy;
protocol-timeouts;
routing-socket;
server;
subagent;
timer;
varbind-error;
}
To display the end of the log for an agent, issue the show log agentd | last operational
mode command:
[edit]
user@host# run show log agentd | last
• Configuring SNMP
5 monitor
jnxOperatingCPU.9.1.0.0 5 falling threshold
Event
Index Type Last Event
1 log and trap 2010-07-10 11:34:17 PDT
Event Index: 1
Description: Event 1 triggered by Alarm 5, rising threshold (90) crossed,
(variable: jnxOperatingCPU.9.1.0.0, value: 100)
Time: 2010-07-10 11:34:07 PDT
Description: Event 1 triggered by Alarm 5, falling threshold (75) crossed,
(variable: jnxOperatingCPU.9.1.0.0, value: 5)
Time: 2010-07-10 11:34:17 PDT
Meaning The display shows that an alarm has been defined to monitor jnxRmon MIB object
jnxOperatingCPU, which represents the CPU utilization of the Routing Engine. The alarm
is configured to generate an event that sends an SNMP trap and adds an entry to the
logTable in the RMON MIB. The log table shows that two occurrences of the event have
been generated—one for rising above a threshold of 90 percent, and one for falling below
a threshold of 75 percent.
To display a log file stored on a single-chassis system such as the QFX3500 switch, enter
Junos OS CLI operational mode and issue the following commands:
By default, the commands display the file stored on the local Routing Engine.
The following example shows the output from the show log messages command:
The following example shows the output from the file show command. The file in the
pathname /var/log/processes has been previously configured to include messages from
the daemon facility.
user@switch1> file show /var/log/processes
Feb 22 08:58:24 switch1 snmpd[359]: SNMPD_TRAP_WARM_START: trap_generate_warm:
SNMP trap: warm start
Feb 22 20:35:07 switch1 snmpd[359]: SNMPD_THROTTLE_QUEUE_DRAINED:
trap_throttle_timer_handler: cleared all throttled traps
Feb 23 07:34:56 switch1 snmpd[359]: SNMPD_TRAP_WARM_START: trap_generate_warm:
SNMP trap: warm start
Feb 23 07:38:19 switch1 snmpd[359]: SNMPD_TRAP_COLD_START: trap_generate_cold:
Sample Output
Nov 4 11:30:01 switch1 newsyslog[2283]: logfile turned over due to size>128K
Nov 4 11:30:01 switch1 newsyslog[2283]: logfile turned over due to size>128K
Nov 4 11:30:06 switch1 chassism[952]: CM ENV Monitor: set fan speed is 65 percent
for Fan 1
Nov 4 11:30:06 switch1 chassism[952]: CM ENV Monitor: set fan speed is 65 percent
for Fan 2
Nov 4 11:30:06 switch1 chassism[952]: CM ENV Monitor: set fan speed is 65 percent
for Fan 3
...
Nov 4 11:52:53 switch1 snmpd[944]: SNMPD_HEALTH_MON_INSTANCE: Health Monitor:
jroute daemon
memory usage (Management process): new instance detected (variable:
sysApplElmtRunMemory.5.6.2293)
Nov 4 11:52:53 switch1 snmpd[944]: SNMPD_HEALTH_MON_INSTANCE: Health Monitor:
jroute daemon
memory usage (Command-line interface): new instance detected (variable:
sysApplElmtRunMemory.5.8.2292)
...
Nov 4 12:10:24 switch1 mgd[2293]: UI_CMDLINE_READ_LINE: User 'jsmith', command
'exit '
Nov 4 12:10:27 switch1 mgd[2293]: UI_DBASE_LOGOUT_EVENT: User 'jsmith' exiting
configuration mode
Nov 4 12:10:31 switch1 mgd[2293]: UI_CMDLINE_READ_LINE: User 'jsmith', command
'show log messages
Meaning The sample output shows the following entries in the messages file:
• A new log file was created when the previous file reached the maximum size of
128 kilobytes (KB).
• clear log
• show log
• syslog
To help with the diagnosis of a problem, display real-time statistics about the traffic
passing through physical interfaces on the router or switch.
1. Displaying Real-Time Statistics About All Interfaces on the Router or Switch on page 25
2. Displaying Real-Time Statistics About an Interface on the Router or Switch on page 26
Action To display real-time statistics about traffic passing through all interfaces on the router
or switch:
Sample Output
user@host> monitor interface traffic
host name Seconds: 15 Time: 12:31:09
Interface Link Input packets (pps) Output packets (pps)
so-1/0/0 Down 0 (0) 0 (0)
so-1/1/0 Down 0 (0) 0 (0)
so-1/1/1 Down 0 (0) 0 (0)
so-1/1/2 Down 0 (0) 0 (0)
so-1/1/3 Down 0 (0) 0 (0)
t3-1/2/0 Down 0 (0) 0 (0)
t3-1/2/1 Down 0 (0) 0 (0)
t3-1/2/2 Down 0 (0) 0 (0)
t3-1/2/3 Down 0 (0) 0 (0)
so-2/0/0 Up 211035 (1) 36778 (0)
so-2/0/1 Up 192753 (1) 36782 (0)
so-2/0/2 Up 211020 (1) 36779 (0)
so-2/0/3 Up 211029 (1) 36776 (0)
so-2/1/0 Up 189378 (1) 36349 (0)
so-2/1/1 Down 0 (0) 18747 (0)
so-2/1/2 Down 0 (0) 16078 (0)
so-2/1/3 Up 0 (0) 80338 (0)
at-2/3/0 Up 0 (0) 0 (0)
at-2/3/1 Down 0 (0) 0 (0)
Bytes=b, Clear=c, Delta=d, Packets=p, Quit=q or ESC, Rate=r, Up=^U, Down=^D
Meaning The sample output displays traffic data for active interfaces and the amount that each
field has changed since the command started or since the counters were cleared by using
the C key. In this example, the monitor interface command has been running for 15 seconds
since the command was issued or since the counters last returned to zero.
Action To display traffic passing through an interface on the router or switch, use the following
Junos OS CLI operational mode command:
Sample Output
user@host> monitor interface so-0/0/1
Next='n', Quit='q' or ESC, Freeze='f', Thaw='t', Clear='c', Interface='i'
R1
Interface: so-0/0/1, Enabled, Link is Up
Encapsulation: PPP, Keepalives, Speed: OC3 Traffic statistics:
Input bytes: 5856541 (88 bps)
Output bytes: 6271468 (96 bps)
Input packets: 157629 (0 pps)
Output packets: 157024 (0 pps)
Encapsulation statistics:
Input keepalives: 42353
Output keepalives: 42320
LCP state: Opened
Error statistics:
Input errors: 0
Input drops: 0
Input framing errors: 0
Input runts: 0
Input giants: 0
Policed discards: 0
L3 incompletes: 0
L2 channel errors: 0
L2 mismatch timeouts: 0
Carrier transitions: 1
Output errors: 0
Output drops: 0
Aged packets: 0
Active alarms : None
Active defects: None
SONET error counts/seconds:
LOS count 1
LOF count 1
SEF count 1
ES-S 77
SES-S 77
SONET statistics:
BIP-B1 0
BIP-B2 0
REI-L 0
BIP-B3 0
REI-P 0
Received SONET overhead: F1 : 0x00 J0 : 0xZ
Meaning The sample output shows the input and output packets for a particular SONET interface
(so-0/0/1). The information can include common interface failures, such as SONET/SDH
and T3 alarms, loopbacks detected, and increases in framing errors. For more information,
see Checklist for Tracking Error Conditions.
To control the output of the command while it is running, use the keys shown in
Table 8 on page 27.
Display information about the next interface. The monitor interface command N
scrolls through the physical or logical interfaces in the same order that they
are displayed by the show interfaces terse command.
Clear (zero) the current delta counters since monitor interface was started. It C
does not clear the accumulative counter.
See the CLI Explorer for details on using match conditions with the monitor traffic
command.
Pinging Hosts
Purpose Use the CLI ping command to verify that a host can be reached over the network. This
command is useful for diagnosing host and network connectivity problems. The switch
sends a series of Internet Control Message Protocol (ICMP) echo (ping) requests to a
specified host and receives ICMP echo responses.
Action To use the ping command to send four requests (ping count) to host3:
Sample Output
ping host3 count 4
user@switch> ping host3 count 4
PING host3.site.net (176.26.232.111): 56 data bytes
64 bytes from 176.26.232.111: icmp_seq=0 ttl=122 time=0.661 ms
64 bytes from 176.26.232.111: icmp_seq=1 ttl=122 time=0.619 ms
64 bytes from 176.26.232.111: icmp_seq=2 ttl=122 time=0.621 ms
64 bytes from 176.26.232.111: icmp_seq=3 ttl=122 time=0.634 ms
• Sequence number of the ping response packet. You can use this value to match the
ping response to the corresponding ping request.
• Total time between the sending of the ping request packet and the receiving of the
ping response packet, in milliseconds. This value is also called round-trip time.
Troubleshooting
• Configuration and File Management on page 31
• Ethernet Switching on page 35
• High Availability on page 37
• Interfaces on page 39
• Junos OS Basics on page 47
• Layer 3 Protocols on page 55
• Security on page 57
• Services on page 67
• Traffic Management on page 71
You can use the rollback <number command to return to a previously committed
configuration file. A switch saves the last 50 committed configurations, including the
rollback number, date, time, and name of the user who issued the commit configuration
command.
Syntax
rollback <number>
Options
• Range: 0 through 49. The most recently saved configuration is number 0, and the
oldest saved configuration is number 49.
• Default: 0
1. Specify the rollback number (here, 1 is entered and the configuration returns to the
previously committed configuration):
[edit]
user@switch# rollback 1
load complete
[edit]
user@switch# commit
If for any reason the current active configuration fails, you can revert to the default factory
configuration. The default factory configuration contains the basic configuration settings.
This is the first configuration of the switch, and it is loaded when the switch is first installed
and powered on.
The load factory default command is a standard Junos OS configuration command. This
configuration command replaces the current active configuration with the default factory
configuration.
1. [edit]
user@switch# load factory-default
[edit]
user@switch# delete system commit factory-settings
[edit]
user@switch# commit
[edit]
user@switch# load override filename
[edit]
user@switch# commit filename
The following error message is displayed during a typical operation on the switch after
the file storage space is full.
user@switch% cli
user@switch> configure
/var: write failed, filesystem is full
Solution Clean up the file storage on the switch by deleting system files.
Ethernet Switching
Sometimes silent devices, such as syslog servers or SNMP trap receivers that receive
UDP traffic but do not return acknowledgment (ACK) messages to the traffic source,
fail to send gratuitous ARP packets when a device moves. If such a move occurs when
the system administrator is not available to explicitly clear the affected interfaces by
issuing the clear ethernet-switching table command, the entry for the moved device in
the Ethernet switching table is not updated.
1. Reduce the system-wide ARP aging timer. (By default, the ARP aging timer is set at
20 minutes. The range of the ARP aging timer is from 1 through 240 minutes.)
[edit system arp]
user@switch# set aging-timer 3
2. Set the MAC aging timer to the same value as the ARP timer. (By default, the MAC
aging timer is set to 300 seconds. The range is 15 to 1,000,000 seconds.)
[edit vlans]
user@switch# set vlans sales mac-table-aging-time 180
The ARP entry and the MAC address entry for the moved device expire within the times
specified by the aging timer values. After the entries expire, the switch sends a new ARP
message to the IP address of the device. The device responds to the ARP message,
thereby refreshing the entries in the switch’s ARP cache table and Ethernet switching
table.
Related • arp
Documentation
• mac-table-aging-time
High Availability
Troubleshooting VRRP
Problem If you configure multiple VRRP groups on an interface (using multiple VLANs), traffic for
some of the groups might be briefly dropped if a failover occurs. This can happen because
the new master must send gratuitous ARP replies for each VRRP group to update the
ARP tables in the connected devices, and there is a short delay between each gratuitous
ARP reply. Traffic sent by devices that have not yet received the gratuitous ARP reply is
dropped (until the device receives the reply and learns the MAC address of the new
master).
Solution Configure a failover delay so that the new master delays sending gratuitous ARP replies
for the period that you set. This allows the new master to send the ARP replies for all of
the VRRP groups simultaneously.
Related • failover-delay
Documentation
Interfaces
• Verify that a LAG is part of family ethernet-switching (Layer 2 LAG) or family inet (Layer
3 LAG).
• Verify that the LAG member is connected to the correct LAG at the other end.
The interface on the port in which an SFP or SFP+ transceiver is installed in an SFP or SFP+
module is down
Problem The QFX Series has an SFP or SFP+ module installed. The interface on the port in which
an SFP or SFP+ transceiver is installed is down.
When you check the status with the CLI command show interfaces interface-name , the
disabled port is not listed.
Cause By default, the SFP or SFP+ module operates in the 10-Gigabit Ethernet mode and
supports only SFP or SFP+ transceivers. The operating mode for the module is incorrectly
set.
Solution Only SFP or SFP+ transceivers can be installed in SFP or SFP+ modules. You must
configure the operating mode of the SFP or SFP+ module to match the type of transceiver
you want to use. For SFP+ transceivers, configure 10-Gigabit Ethernet operating mode.
• MAC Addresses Learned on MC-AE Interfaces Are Not Removed from the MAC Address
Table on page 40
• MC-LAG Peer Does Not Go into Standby Mode on page 41
• Secondary MC-LAG Peer with Status Control Set to Standby Becomes
Inactive on page 41
• Redirect Filters Take Priority over User-Defined Filters on page 41
• Operational Command Output Is Wrong on page 42
• ICCP Connection Might Take Up to 60 Seconds to Become Active on page 42
• MAC Address Age Learned on an MC-AE Interface Is Reset to Zero on page 42
• MAC Address Is Not Learned Remotely in a Default VLAN on page 43
• Snooping Entries Learned on MC-AE Interfaces Are Not Removed on page 43
• ICCP Does Not Come Up After You Add or Delete an Authentication Key on page 43
• Local Status Is Standby When It Should Be Active on page 43
• Packets Loop on the Server When ICCP Fails on page 43
• Both MC-LAG Peers Use the Default System ID After a Reboot or an ICCP Configuration
Change on page 43
• No Commit Checks Are Done for ICL-PL Interfaces on page 44
• Double Failover Scenario on page 44
• Multicast Traffic Floods the VLAN When the ICL-PL Interface Goes Down and
Up on page 44
• Layer 3 Traffic Sent to the Standby MC-LAG Peer Is Not Redirected to Active MC-LAG
Peer on page 44
• AE Interfaces Go Down on page 44
• Flooding of Upstream Traffic on page 45
MAC Addresses Learned on MC-AE Interfaces Are Not Removed from the MAC Address Table
Problem When both of the mulitchassis aggregated Ethernet (MC-AE) interfaces on both
connected multichassis link aggregation group (MC-LAG) peers are down, the MAC
addresses learned on the MC-AE interfaces are not removed from the MAC address table.
For example, if you disable the MC-AE interface (ae0) on both MC-LAG peers by issuing
the set interfaces ae0 disable command and commit the configuration, the MAC table
still shows the MAC addresses as being learned on the MC-AE interfaces of both MC-LAG
peers:
Solution To prevent failure to enter standby mode, make sure the peer IP address in the ICCP
configurations and the IP address in multichassis protection configurations are the same.
Secondary MC-LAG Peer with Status Control Set to Standby Becomes Inactive
Problem When the interchassis control link-protection link (ICL-PL) and multichassis aggregated
Ethernet (MC-AE) interfaces go down on the primary multichassis link aggregation group
(MC-LAG) peer, the secondary MC-LAG peer’s MC-AE interfaces with status control set
to standby become inactive instead of active.
For example:
The show iccp command output always shows registered modules regardless of whether
or not ICCP peers are configured.
For example, the ICL-PL has been deactivated, and the show ethernet-switching table
command output shows that the MAC addresses have an age of 0.
ICCP Does Not Come Up After You Add or Delete an Authentication Key
Problem The Interchassis Control Protocol (ICCP) connection is not established when you add
an authentication key and then delete it only at the global ICCP level. However,
authentication works correctly at the ICCP peer level.
Solution Delete the ICCP configuration , and then add the ICCP configuration.
Both MC-LAG Peers Use the Default System ID After a Reboot or an ICCP Configuration Change
Problem After a reboot or after a new Interchassis Control Protocol (ICCP) configuration has been
committed, and the ICCP connection does not become active, the Link Aggregation
Control Protocol (LACP) messages transmitted over the multichassis aggregated Ethernet
(MC-AE) interfaces use the default system ID. The configured system ID is used instead
of the default system ID only after the MC-LAG peers synchronize with each other.
Multicast Traffic Floods the VLAN When the ICL-PL Interface Goes Down and Up
Problem When the interchassis control link-protection link (ICL-PL) goes down and up, multicast
traffic is flooded to all of the interfaces in the VLAN. The Packet Forwarding Engine (PFE)
flag Ip4McastFloodMode for the VLAN is changed to MCAST_FLOOD_ALL. This problem
only occurs when a multichassis link aggregation group (MC-LAG) is configured for Layer
2.
Layer 3 Traffic Sent to the Standby MC-LAG Peer Is Not Redirected to Active MC-LAG Peer
Problem When the Interchassis Control Protocol (ICCP) is down, the status of a remote MC-LAG
peer is unknown. Even if the MC-LAG peer is configured as standby, the traffic is not
redirected to this peer because it is assumed that this peer is down.
AE Interfaces Go Down
Problem When a multichassis aggregated Ethernet (MC-AE) interface is converted to an
aggregated Ethernet (AE) interface, it retains some MC-AE properties. For example, the
AE interface might retain the administrative key of the MC-AE. When this happens, the
AE interface goes down.
Solution Restart the Link Aggregation Control Protocol (LACP) on the multichassis link aggregation
group (MC-LAG) peer hosting the AE interface to bring up the AE interface. Restarting
LACP removes the MC-AE properties of the AE interface.
Solution Make sure that downstream traffic is sent from the MC-LAG peers periodically to prevent
the MAC addresses from aging out.
Junos OS Basics
Similarly, to halt the switch, issue the request system halt command.
CAUTION: Before entering this command, you must have access to the
switch’s console port in order to bring up the Routing Engine.
Issuing the request system halt command on the switch halts the Routing Engine. To
reboot a Routing Engine that has been halted, you must connect through the console.
Solution If a Junos OS image already exists on the switch, you can either install the new Junos OS
package in a separate partition, in which case both Junos OS images remain on the switch,
or you can remove the existing Junos OS image before you start the new installation
process.
Press the Spacebar to enter the manual loader. The loader> prompt appears.
• format—Enables you to erase the installation media before installing the installation
package. If you do not include this option, the system installs the new Junos OS in
a different partition from that of the most recently installed Junos OS.
• external—Installs the installation package onto external media (a USB stick, for
example).
• Network address of the server and the path on the server; for example,
tftp://192.17.1.28/junos/jinstall-qfx-11.1R1.5-domestic-signed.tgz
• Junos OS package on a USB device (commonly stored in the root drive as the
only file), for example, file:///jinstall-qfx-11.1R1.5-domestic-signed.tgz).
The installation now proceeds normally and ends with a login prompt.
If you forget the root password for the QFX3500 switch, you can use the password
recovery procedure to reset the root password.
NOTE: You need console access to the switch to recover the root password.
1. Power off the switch by switching off the AC power outlet of the device or, if necessary,
by pulling the power cords out of the QFX3500 switch power supplies.
2. Turn off the power to the management device, such as a PC or laptop computer, that
you want to use to access the CLI.
3. Plug one end of the Ethernet rollover cable supplied with the switch into the
RJ-45–to–DB-9 serial port adapter supplied with the switch.
4. Plug the RJ-45–to–DB-9 serial port adapter into the serial port on the management
device.
5. Connect the other end of the Ethernet rollover cable to the console port on the switch.
• Data bits: 8
• Parity: None
• Stop bits: 1
9. Power on the switch by (if necessary) plugging the power cords into the QFX3500
switch power supply, or turning on the power to the device or switch by switching on
the AC power outlet the device is plugged into
The terminal emulation screen on your management device displays the switch’s boot
sequence.
10. When the following prompt appears, press the Spacebar to access the switch’s
bootstrap loader command prompt:
Hit [Enter] to boot immediately, or space bar for command prompt.
Booting [kernel] in 9 seconds...
11. At the following prompt, enter boot -s to start up the system in single-user mode.
ok boot -s
12. At the following prompt, enter recovery to start the root password recovery procedure.
Enter full pathname of shell or 'recovery' for root password recovery or RETURN
for /bin/sh: recovery
13. Enter configuration mode in the CLI.
17. After you have finished configuring the password, commit the configuration.
root@host# commit
commit complete
If Junos OS on the QFX Series is damaged in some way that prevents the software from
loading properly, you can use an emergency boot device to repartition the primary disk
and load a fresh installation of Junos OS. Use the following procedure to create an
emergency boot device.
Before you begin, you need to download the installation media image for your device
and Junos OS release from http://www.juniper.net/customers/support/ .
NOTE: In the following procedure, we assume that you are creating the
emergency boot device on a QFX device. You can create the emergency boot
device on another Juniper Networks switch or router, or any PC or laptop that
supports Linux. The steps you take to create the emergency boot device vary,
depending on the device.
1. Use FTP to copy the installation media image into the /var/tmp directory on the QFX
device.
% su
Password: password
NOTE: The password is the root password for the QFX device. If you logged
in to the device as root, you do not need to perform this step.
5. Enter the following command on the QFX3500, QFX3600, and QFX3600-I devices:
root@device% exit
% exit
user@device>
If Junos OS on your device is damaged in some way that prevents the software from
loading correctly, you may need to perform a recovery installation using an emergency
boot device (for example, a USB flash drive) to restore the default factory installation.
Once you have recovered the software, you need to restore the device configuration. You
can either create a new configuration as you did when the device was shipped from the
factory, or if you saved the previous configuration, you can simply restore that file to the
device.
If at all possible, you should try to perform the following steps before you perform the
recovery installation:
1. Ensure that you have an emergency boot device to use during the installation. See
“Creating an Emergency Boot Device for a QFX Series Device” on page 50 for
information on how to create an emergency boot device.
2. Copy the existing configuration in the file /config/juniper.conf.gz from the device to a
remote system, such as a server, or to an emergency boot device. For extra safety,
you can also copy the backup configurations (the files named /config/juniper.conf.n,
where n is a number from 0 through 9) to a remote system or to an emergency boot
device.
1. Insert the emergency boot device into the QFX Series device.
[edit system]
user@device> request system reboot
If you do not have access to the CLI, power cycle the QFX Series device.
The emergency boot device (external USB install media) is detected. At this time, you
can load the Junos OS from the emergency boot device onto the internal flash storage.
4. Type f to format the internal flash storage and install the Junos OS on the emergency
boot device onto the internal flash storage.
-- IMPORTANT INFORMATION --
Installer has detected settings to format system boot media.
This operation will erase all data from your system.
5. The device copies the software from the emergency boot device, occasionally
displaying status messages. Copying the software can take up to 12 minutes.
When the device is finished copying the software, you are presented with the following
prompt:
*** Fri September 4 01:19:00 UTC 2012***
Installation successful..
Please select one of the following options:
Reboot to installed Junos after removing install media (default) ... 1
Reboot to installed Junos by disabling install media ............... 2
Exit to installer debug shell ...................................... 3
Install Junos to alternate slice ................................... 4
Your choice: 4
NOTE: System installer will now install Junos to alternate slice
Do not power off or remove the external installer media or
interrupt the installation mechanism.
6. Select 4 to install Junos OS to the alternate slice of the partition, and then press Enter.
7. Remove the emergency boot device when prompted and then press Enter. The device
then reboots from the internal flash storage on which the software was just installed.
When the reboot is complete, the device displays the login prompt.
8. Create a new configuration as you did when the device was shipped from the factory,
or restore the previously saved configuration file to the device.
Related • Creating an Emergency Boot Device for a QFX Series Device on page 50
Documentation
Layer 3 Protocols
If you enable route leaking between the routing instances (by using the rib-group
statement, for example), the downstream device cannot connect to the upstream device
because the QFX switch connects to the upstream device over a direct route and these
routes are not leaked between instances.
NOTE: You can see a route to the upstream device in the routing table of the
downstream device, but this route is not functional.
Indirect routes are leaked between routing instances, so the downstream device can
connect to any upstream devices that are connected to the QFX switch over indirect
routes.
• rib-group
Security
A switch returns this message during the commit operation if the firewall filter that has
been applied to a port, VLAN, or Layer 3 interface exceeds the amount of space available
in the TCAM table. The filter is not applied, but the commit operation for the firewall filter
configuration is completed in the CLI module.
Solution When a firewall filter configuration exceeds the amount of available TCAM table space,
you must configure a new firewall filter with fewer filter terms so that the space
requirements for the filter do not exceed the available space in the TCAM table.
You can perform either of the following procedures to correct the problem:
To delete the filter and its binding and apply the new smaller firewall filter to the same
binding:
1. Delete the filter and its binding to ports, VLANs, or Layer 3 interfaces. For example:
[edit]
user@switch# delete firewall family ethernet-switching filter ingress-vlan-rogue-block
user@switch# delete vlans employee-vlan description "filter to block rogue devices on
employee-vlan"
user@switch# delete vlans employee-vlan filter input ingress-vlan-rogue-block
2. Commit the changes:
[edit]
user@switch# commit
3. Configure a smaller filter with fewer terms that does not exceed the amount of
available TCAM space. For example:
[edit]
user@switch# set firewall family ethernet-switching filter new-ingress-vlan-rogue-block ...
4. Apply (bind) the new firewall filter to a port, VLAN , or Layer 3 interface. For example:
[edit]
user@switch# set vlans employee-vlan description "filter to block rogue devices on
employee-vlan"
user@switch# set vlans employee-vlan filter input new-ingress-vlan-rogue-block
5. Commit the changes:
[edit]
user@switch# commit
To apply a new firewall filter and overwrite the existing binding but not delete the original
filter:
1. Configure a firewall filter with fewer terms than the original filter:
[edit]
user@switch# set firewall family ethernet-switching filter new-ingress-vlan-rogue-block...
2. Apply the firewall filter to the port, VLAN, or Layer 3 interfaces to overwrite the binding
of the original filter—for example:
[edit]
user@switch# set vlans employee-vlan description "smaller filter to block rogue devices on
employee-vlan"
user@switch# set vlans employee-vlan filter input new-ingress-vlan-rogue-block
Because you can apply no more than one firewall filter per VLAN per direction, the
binding of the original firewall filter to the VLAN is overwritten with the new firewall
filter new-ingress-vlan-rogue-block.
[edit]
user@switch# commit
NOTE: The original filter is not deleted and is still available in the
configuration.
• You configure the filter that is applied to packets first to discard certain packets. For
example, imagine that you have a VLAN filter that accepts packets sent to 10.10.1.0/24
addresses and implicitly discards packets sent to any other addresses. You apply the
filter to the admin VLAN in the output direction, and interface xe-0/0/1 is a member
of that VLAN.
• You configure a subsequent filter to accept and count packets that are dropped by
the first filter. In this example, you have a port filter that accepts and counts packets
sent to 192.168.1.0/24 addresses that is also applied to xe-0/0/1 in the output direction.
The egress VLAN filter is applied first and correctly discards packets sent to 192.168.1.0/24
addresses. The egress port filter is applied next and counts the discarded packets as
matched packets. The packets are not forwarded, but the counter displayed by the egress
port filter is incorrect.
Remember that the order in which filters are applied depends on the direction in which
they are applied, as indicated here:
Ingress filters:
2. VLAN filter
Egress filters:
2. VLAN filter
For example:
• You configure an egress port filter with a counter for interface xe-0/0/1.
• You configure an egress VLAN filter with a counter for the adminVLAN, and interface
xe-0/0/1 is a member of that VLAN.
In this case, the packet is counted by only one of the counters even though it matched
both filters.
• Assume that your filter has term1, term2, and term3, and each term has a counter that
has already counted matching packets. If you edit any of the terms in any way, the
counters for all the terms are reset to 0.
• Assume that your filter has term1 and term2. Also assume that term2 has a policer
action modifier and the implicit counter of the policer has already counted 1000
matching packets. If you edit term1 or term2 in any way, the counter for the policer
referenced by term2 is reset to 0.
• loss-priority
• policer
If you do so, you see the following error message when you attempt to commit the
configuration: “cannot support policer action if loss-priority is configured.”
Solution This is expected behavior. To set the Q-in-Q EtherType to 0x8100, enter the set
dot1q-tunneling ethertype 0x8100 statemen at the [edit ethernet-switching-options]
hierarchy level. You must also configure the other end of the link to use the same
Ethertype.
• Traffic forwarded from a secondary VLAN trunk port to a promiscuous port (trunk or
access)
• Traffic forwarded from a secondary VLAN trunk port that carries an isolated VLAN to
a PVLAN trunk port.
• Traffic forwarded from a PVLAN trunk port. to a secondary VLAN trunk port
If you apply a firewall filter in the output direction to a primary VLAN, the filter does not
apply to traffic that egresses with a community VLAN tag, as listed below:
• Traffic forwarded from a secondary VLAN trunk port that carries a community VLAN
to a PVLAN trunk port
If you apply a firewall filter in the output direction to a community VLAN, the following
behaviors apply:
• The filter is applied to traffic forwarded from a promiscuous port (trunk or access) to
a community trunk port (because the traffic egresses with the community VLAN tag).
• The filter is applied to traffic forwarded from a community port to a PVLAN trunk port
(because the traffic egresses with the community VLAN tag).
• The filter is not applied to traffic forwarded from a community port to a promiscuous
port (because the traffic egresses with the primary VLAN tag or untagged).
Solution These are expected behaviors. They occur only if you apply a firewall filter to a private
VLAN in the output direction and do not occur if you apply a firewall filter to a private
VLAN in the input direction.
• Assume that you configure egress filters that include a total of 512 policers and no
counters. Later in your configuration file you include another egress filter with 10 terms,
1 of which has a counter action modifier. None of the terms in this filter are committed
because there is not enough TCAM space for the counter.
• Assume that you configure egress filters that include a total of 500 policers, so 1000
TCAM entries are occupied. Later in your configuration file you include the following
two egress filters:
• Filter A with 20 terms and 20 counters. All the terms in this filter are committed
because there is enough TCAM space for all the counters.
• Filter B comes after Filter A and has five terms and five counters. None of the terms
in this filter are committed because there is not enough memory space for all the
counters. (Five TCAM entries are required but only four are available.)
Solution You can prevent this problem by ensuring that egress firewall filter terms with counter
actions are placed earlier in your configuration file than terms that include policers. In
this circumstance, Junos OS commits policers even if there is not enough TCAM space
for the implicit counters. For example, assume the following:
• You have 1024 egress firewall filter terms with counter actions.
• Later in your configuration file you have an egress filter with 10 terms. None of the terms
have counters but one has a policer action modifier.
You can successfully commit the filter with 10 terms even though there is not enough
TCAM space for the implicit counters of the policer. The policer is committed without
the counters.
If packets are dropped because of ingress admission control, policer statistics might not
show the number of packet drops you would expect by calculating the difference between
ingress and egress packet counts. This might happen if you apply an ingress policer to
multiple interfaces, and the aggregate ingress rate of those interfaces exceeds the line
rate of a common egress interface. In this case, packets might be dropped from the
ingress buffer. These drops are not included in the count of packets dropped by the
policer, which causes policer statistics to underreport the total number of drops.
• Assume that your filter has term1, term2, and term3, and each term has a counter that
has already counted matching packets. If you edit any of the terms in any way, the
counters for all the terms are reset to 0.
• Assume that your filter has term1 and term2. Also assume that term2 has a policer
action modifier and the implicit counter of the policer has already counted 1000
matching packets. If you edit term1 or term2 in any way, the counter for the policer
referenced by term2 is reset to 0.
Egress Policers on QFX3500 Devices Might Allow More Throughput Than Is Configured
Problem If you configure a policer to rate-limit throughput and apply it on egress to multiple
interfaces on a QFX3500 switch or Node, the measured aggregate policed rate might
be twice the configured rate, depending on which interfaces you apply the policer to. The
doubling of the policed rate occurs if you apply a policer to multiple interfaces and both
of the following are true:
• There is at least one policed interface in the range xe-0/0/0 to xe-0/0/23 or the range
xe-0/1/1 to xe-0/1/7.
• There is at least one policed interface in the range xe-0/0/24 to xe-0/0/47 or the range
xe-0/1/8 to xe-0/1/15.
For example, if you configure a policer to rate-limit traffic at 1 Gbps and apply the policer
(by using a firewall filter) to xe-0/0/0 and xe-0/0/24 in the output direction, each
interface is rate-limited at 1 Gbps, for a total allowed throughput of 2 Gbps. The same
behavior occurs if you apply the policer to xe-0/1/1 and xe-0/0/24—each interface is
rate-limited at 1 Gbps.
If you apply the same policer on egress to multiple interfaces in these groups, each group
is rate-limited at 1 Gbps. For example, if you apply the policer to xe-0/0/0 through
xe-0/0/4 (five interfaces) and xe-0/0/24 through xe-0/0/33 (ten interfaces), each
group is rate-limited at 1 Gbps, for a total allowed throughput of 2 Gbps.
Here is another example: If you apply the policer to xe-0/0/0 through xe-0/0/4 and
xe-0/1/1 through xe-0/1/5 (a total of ten interfaces), that group is rate-limited at 1 Gbps
in aggregate. If you also apply the policer to xe-0/0/24, that one interface is rate-limited
at 1 Gbps while the other ten are still rate-limited at 1 Gbps in aggregate.
Interfaces xe-0/1/1 through xe-0/1/15 are physically located on the QSFP+ uplink ports,
according to the following scheme:
The doubling of the policed rate occurs only if the policer is applied in the output direction.
If you configure a policer as described above but apply it in the input direction, the total
allowed throughput for all interfaces is 1 Gbps.
Filter-Specific Egress Policers on QFX3500 Devices Might Allow More Throughput Than Is
Configured
Problem You can configure policers to be filter-specific, which means that Junos OS creates only
one policer instance regardless of how many times the policer is referenced. When you
do this, rate limiting is applied in aggregate, so if you configure a policer to discard traffic
that exceeds 1 Gbps and reference that policer in three different terms, the total bandwidth
allowed by the filter is 1 Gbps. However, the behavior of a filter-specific policer is affected
by how the firewall filter terms that reference the policer are stored in ternary content
addressable memory (TCAM). If you create a filter-specific policer and reference it in
multiple firewall filter terms, the policer allows more traffic than expected if the terms
are stored in different TCAM slices. For example, if you configure a policer to discard
traffic that exceeds 1 Gbps and reference that policer in three different terms that are
stored in three separate memory slices, the total bandwidth allowed by the filter is 3 Gbps,
not 1 Gbps.
Solution To prevent this unexpected behavior, use the information about TCAM slices presented
in Planning the Number of Firewall Filters to Create to organize your configuration file so
that all the firewall filter terms that reference a given filter-specific policer are stored in
the same TCAM slice.
• Assume that you configure egress filters that include a total of 512 policers and no
counters. Later in your configuration file you include another egress filter with 10 terms,
1 of which has a counter action modifier. None of the terms in this filter are committed
because there is not enough TCAM space for the counter.
• Assume that you configure egress filters that include a total of 500 policers, so 1000
TCAM entries are occupied. Later in your configuration file you include the following
two egress filters:
• Filter A with 20 terms and 20 counters. All the terms in this filter are committed
because there is enough TCAM space for all the counters.
• Filter B comes after Filter A and has five terms and five counters. None of the terms
in this filter are committed because there is not enough memory space for all the
counters. (Five TCAM entries are required but only four are available.)
Solution You can prevent this problem by ensuring that egress firewall filter terms with counter
actions are placed earlier in your configuration file than terms that include policers. In
this circumstance, Junos OS commits policers even if there is not enough TCAM space
for the implicit counters. For example, assume the following:
• You have 1024 egress firewall filter terms with counter actions.
• Later in your configuration file you have an egress filter with 10 terms. None of the terms
have counters but one has a policer action modifier.
You can successfully commit the filter with 10 terms even though there is not enough
TCAM space for the implicit counters of the policer. The policer is committed without
the counters.
Services
The following constraints and limitations apply to local and remote port mirroring with
the QFX Series:
• You can create a total of four port-mirroring configurations on a QFX Series standalone
switch.
• You can create a total of four port-mirroring configurations on each Node group in a
QFabric system, subject to the following constraints:
• Regardless of whether you are configuring a standalone switch or a Node group, the
following limits apply:
• There can be no more than two configurations that mirror ingress traffic. (If you
configure a firewall filter to send traffic to a port mirror—that is, you use the analyzer
action modifier in a filter term—this counts as an ingress mirroring configuration for
switch or Node group on which the filter is applied.)
• There can be no more than two configurations that mirror egress traffic.
• You can configure no more than one type of output in one port-mirroring configuration.
That is, you can use no more than one of the following to complete a set analyzer name
output statement:
• interface
• ip-address
• vlan
• If you configure Junos OS to mirror egress packets, do not configure more than 2000
VLANs on a QFX3500 device or QFabric system. If you do so, some VLAN packets
might contain incorrect VLAN IDs. This applies to any VLAN packets—not only the
mirrored copies.
• Packets with physical layer errors are filtered out and are not sent to the output port
or VLAN.
• If you use sFlow monitoring to sample traffic, it does not sample the mirror copies
when they exit from the output interface.
• Do not include an 802.1Q subinterface that has a unit number other than 0 in a port
mirroring configuration. Port mirroring does not work with subinterfaces if their unit
number is not 0. (You configure 802.1Q subinterfaces using the vlan-tagging statement.)
• When packet copies are sent out the output interface, they are not modified for any
changes that are normally applied on egress, such as CoS rewriting.
• An interface can be the input interface for only one mirroring configuration. Do not use
the same interface as the input interface for multiple mirroring configurations.
• CPU-generated packets (such as ARP, ICMP, BPDU, and LACP packets) cannot be
mirrored on egress.
• (QFabric systems only) If you configure a QFabric analyzer to mirror egress traffic and
the input and output interfaces are on different Node devices, the mirrored copies have
incorrect VLAN IDs. This limitation does not apply if you configure a QFabric analyzer
to mirror egress traffic and the input and output interfaces are on the same Node device.
In this case the mirrored copies have the correct VLAN IDs (as long as you do not
configure more than 2000 VLANs on the QFabric system).
The following constraints and limitations apply to remote port mirroring with the QFX
Series:
• If you configure an output IP address, the address cannot be in the same subnetwork
as any of the switch’s management interfaces.
• If you create virtual routing instances and also create an analyzer configuration that
includes an output IP address, the output address belongs to the default virtual routing
instance (inet.0 routing table).
• On the source (monitored) switch, only one interface can be a member of the analyzer
VLAN.
• Promiscuous trunk port that carries primary VLANs pvlan100 and pvlan400.
• Isolated access port that carries secondary VLAN isolated200. This VLAN is a member
of primary VLAN pvlan100.
• Community port that carries secondary VLAN comm300. This VLAN is also a member
of primary VLAN pvlan100.
• Output interface (monitor interface) that connects to the analyzer system. This interface
forwards the mirrored traffic to the analyzer.
If a packet for pvlan100 enters on the promiscuous trunk port and exits on the isolated
access port, the original packet is untagged on egress because it is exiting on an access
port. However, the mirror copy retains the tag for pvlan100 when it is sent to the analyzer.
Here is another example: If a packet for comm300 ingresses on the community port and
egresses on the promiscuous trunk port, the original packet carries the tag for pvlan100
on egress, as expected. However, the mirrored copy retains the tag for comm300 when
it is sent to the analyzer.
Traffic Management
Cause When you configure bandwidth for a queue or a priority group, the switch accounts for
the configured bandwidth as data only. The switch does not rate-shape the preamble
and the interframe gap (IFG) associated with frames, so the switch does not account
for the bandwidth consumed by the preamble and the IFG in its maximum bandwidth
calculations.
The measured egress bandwidth can exceed the configured maximum bandwidth when
small packet sizes (64 or 128 bytes) are transmitted because the preamble and the IFG
are a larger percentage of the total traffic. For larger packet sizes, the preamble and IFG
overhead are a small portion of the total traffic, and the effect on egress bandwidth is
minor.
Solution When you calculate the bandwidth requirements for queues on which you expect a
significant amount of traffic with small packet sizes, consider the shaping rate as the
maximum bandwidth for the data only. Add sufficient bandwidth to your calculations to
account for the preamble and IFG so that the port bandwidth is sufficient to handle the
combined maximum data rate (shaping rate) and the preamble and IFG.
If the maximum bandwidth measured at the egress port exceeds the amount of bandwidth
that you want to allocate to the queue, reduce the shaping rate for that queue.
Related • shaping-rate
Documentation
• Example: Configuring Maximum Output Bandwidth
Cause When you configure bandwidth for a queue or a priority group, the switch accounts for
the configured bandwidth as data only. The switch does not include the preamble and
the interframe gap (IFG) associated with frames, so the switch does not account for the
bandwidth consumed by the preamble and the IFG in its minimum bandwidth calculations.
The measured egress bandwidth can exceed the configured minimum bandwidth when
small packet sizes (64 or 128 bytes) are transmitted because the preamble and the IFG
are a larger percentage of the total traffic. For larger packet sizes, the preamble and IFG
overhead are a small portion of the total traffic, and the effect on egress bandwidth is
minor.
NOTE: The sum of the queue transmit rates in a priority group should not
exceed the guaranteed rate for the priority group. (You cannot guarantee a
minimum bandwidth for the queues that is greater than the minimum
bandwidth guaranteed for the entire set of queues.)
Solution When you calculate the bandwidth requirements for queues and priority groups on which
you expect a significant amount of traffic with small packet sizes, consider the transmit
rate and the guaranteed rate as the minimum bandwidth for the data only. Add sufficient
bandwidth to your calculations to account for the preamble and IFG so that the port
bandwidth is sufficient to handle the combined minimum data rate and the preamble
and IFG.
If the minimum bandwidth measured at the egress port exceeds the amount of bandwidth
that you want to allocate to a queue or to a priority group, reduce the transmit rate for
that queue and reduce the guaranteed rate of the priority group that contains the queue.
Related • guaranteed-rate
Documentation
• transmit-rate
Cause Egress queue congestion can cause the ingress port buffer to fill above a certain threshold
and affect the flow to the queues on the egress port. One queue receives its configured
bandwidth, but the other queues on the egress port are affected and do not receive their
configured share of bandwidth.
Solution The solution is to configure a drop profile to apply weighted random early detection
(WRED) to the queue or queues on the congested ports.
Configure a drop profile on the queue that is receiving its configured bandwidth. This
queue is preventing the other queues from receiving their expected bandwidth. The drop
profile prevents the queue from affecting the other queues on the port.
• Name the drop profile and set the drop start point, drop end point, minimum drop rate,
and maximum drop rate for the drop profile:
[edit class-of-service]
user@switch# set drop-profile drop-profile-name interpolate fill-level percentage fill-level
percentage drop-probability 0 drop-probability percentage
Related • drop-profile
Documentation
• Example: Configuring Tail-Drop Profiles
NOTE: For packets that carry both an inner VLAN tag and an outer VLAN tag,
the rewrite rules rewrite only the outer VLAN tag.
Cause If you configure a rewrite rule for a forwarding class on an egress port but you do not
configure a rewrite rule for every forwarding class on that egress port, then the forwarding
classes that do not have a configured rewrite rule are assigned random rewrite values.
For example:
2. Configure rewrite rules for forwarding classes fc1 and fc2, but not for forwarding class
fc3.
When traffic for these forwarding classes flows through the port, traffic for forwarding
classes fc1 and fc2 is rewritten correctly. However, traffic for forwarding class fc3 is
assigned a random rewrite value.
Solution If any forwarding class on an egress port has a configured rewrite rule, then all forwarding
classes on that egress port must have a configured rewrite rule. Configuring a rewrite rule
for any forwarding class that is assigned a random rewrite value solves the problem.
TIP: If you want the forwarding class to use the same code point value
assigned to it by the ingress classifier, specify that value as the rewrite rule
value. For example, if a forwarding class has the IEEE 802.1 ingress classifier
code point value 011, configure a rewrite rule for that forwarding class that
uses the IEEE 802.1p code point value 011.
NOTE: There are no default rewrite rules. You can bind one rewrite rule for
each type (DSCP and IEEE 802.1) to a given interface. A rewrite rule can
contain multiple forwarding-class-to-rewrite-value associations.
1. Assign a rewrite value to a forwarding class. Add the new rewrite value to the same
rewrite rule as the other forwarding classes on the port:
[edit class-of-service rewrite-rules]
user@switch# set (dscp | ieee-802.1) rewrite-name forwarding-class class-name loss-priority
priority code-point (alias | bits)
For example, if the other forwarding classes on the port use rewrite values defined in
the rewrite rule custom-rw, the forwarding class fcoe is being randomly rewritten, and
you want to use IEEE 802.1 code point 011 for the fcoe forwarding class:
2. Enable the rewrite rule on an interface if it is not already enabled on the desired
interface:
[edit]
user@switch# set class-of-service interfaces interface-name unit unit rewrite-rules (dscp |
ieee-802.1) rewrite-rule-name
[edit]
user@switch# set class-of-service interfaces xe-0/0/24 unit 0 rewrite-rules ieee-802.1
custom-rw
Related • interfaces
Documentation
• rewrite-rules