LNX Storage 1a
LNX Storage 1a
2. Another very good utility is the systool command from the sysfsutils package (see: man
systool). With this command the information from the remote ports can be gathered in one pass by
running the command with following options:
$ systool -c fc_remote_ports -v
where -c indicates class, and -v is for verbose.
Not all "rports" are storage ports. Examine the "roles" parameter under rport to determine if the
remote port is a storage target.
$ grep -Hv "zz" /sys/class/fc_remote_ports/rport-*/roles
/sys/class/fc_remote_ports/rport-6:0-0/roles:FCP Target << storage target
/sys/class/fc_remote_ports/rport-6:0-3/roles:FCP Target << storage target
/sys/class/fc_remote_ports/rport-6:0-4/roles:FCP Initiator << HBA
/sys/class/fc_remote_ports/rport-8:0-8/roles:Fabric Port << fabric port
/sys/class/fc_remote_ports/rport-8:0-9/roles:Directory Server << fabric directory server
Remote ports associated with storage ports with also have a positive (not -1) value assigned to its
scsi_target_id value.
$ grep -Hv "zz" /sys/class/fc_remote_ports/rport-*/scsi_target_id
/sys/class/fc_remote_ports/rport-6:0-0/scsi_target_id:0
/sys/class/fc_remote_ports/rport-6:0-3/scsi_target_id:1
/sys/class/fc_remote_ports/rport-6:0-4/scsi_target_id:-1
/sys/class/fc_remote_ports/rport-8:0-8/scsi_target_id:-1
/sys/class/fc_remote_ports/rport-8:0-9/scsi_target_id:-1
FCP-SCSI targets, once discovered, are retained within the system until the next reboot. If access
to a FCP-SCSI target is lost, then you'll probably notice a message of:
"blocked FC remote port time out: removing target and saving binding" or something similar.
Essentially the message is indicating that the scsi target is being removed from the active configu
ration but its saving its binding. The "saving binding" part just means if or when this storage target
port returns to the configuration -- can be seen by this host again -- it will retain/be assigned its
original scsi target id. So if, for example, a storage target attached to scsi host0 that was assigned
scsi target id 5 left and returned then the scsi addresses to luns behind that storage target would be
0:0:5:* (h:c:t:l) before and after that event. This will prevent devices from moving around and
being renamed if a storage target goes missing for a period of time.
In order to maintain the above logic, that is returning original bindings to returning devices, two
pieces of information need to be retained:
a) the wwpn of the storage port, and
b) the original assigned scsi target id.
This is why the left over data from ports that are no longer available to this host is kept. Ports that
have left the active configuration and are no longer accessible with have a role of "unknown" and
a port_state of "Not Present".
So this fc remote port that no longer present within the storage configuration was associated with
Clariion storage.
The 2.6.11 Linux kernel introduced certain changes to the lpfc (emulex driver) and qla2xxx (Qlog
ic driver) Fibre Channel HBA drivers which removed the following entries from the proc pseudo -
filesystem: /proc/scsi/qla2xxx, /proc/scsi/lpfc. These entries had provided a centralized repository
of information about the drivers and connected hardware. After the changes, the drivers started sto
ring all this information within the /sys filesystem. Since Red Hat Enterprise Linux 5 uses version
2.6.18 of the Linux kernel it is affected by this change.
Using the /sys filesystem has the advantage that all the Fibre Channel drivers now use a unified
and consistent manner to report data. However it also means that the data previously available in a
single file is now scattered across a myriad of files in different parts of the /sys filesystem.
One basic example is the status of a Fibre Channel HBA: checking this can now be accomplished
with the following command:
# cat /sys/class/scsi_host/host#/state
...where host# is the H-value in the HBTL SCSI addressing format, which references the approp
riate FC HBA. For emulex adapters (lpfc driver) for example, this command would yield:
# cat /sys/class/scsi_host/host1/state
Link Up - Ready:
Fabric
For qlogic devices (qla2xxx driver) the output would instead be as follows:
# cat /sys/class/scsi_host/host1/state
Link Up - F_Port
Obviously it becomes quite impractical to search through the /sys filesystem for the relevant files
when there is a large variety of Fibre Channel-related information of interest. Instead of manual
searching, the systool (1) command provides a simple but powerful means of examining and
analyzing this information. Detailed below are several commands which demonstrate samples of
information which the systool command can be used to examine.
To examine some simple information about the Fibre Channel HBAs in a machine:
# systool -c fc_host -v
To look at verbose information regarding the SCSI adapters present on a system:
# systool -c scsi_host -v
To see what Fibre Channel devices are connected to the Fibre Channel HBA cards:
# systool -c fc_remote_ports -v -d
To examine more disk information including which hosts are connected to which disks:
# systool -b scsi -v
Furthermore, by installing the sg3_utils package it is possible to use the sg_map command to view
more information about the SCSI map. After installing the package, run:
# modprobe sg
# sg_map -x
Finally, to obtain driver information, including version numbers and active parameters, the
following commands can be used for the lpfc and qla2xxx drivers respectively:
# systool -m lpfc -v
# systool -m qla2xxx -v
ATTENTION: The syntax of the systool (1) command differs across versions of Red Hat
Enterprise Linux. Therefore the commands above are only valid for Red Hat Enterprise Linux 5.
I'm checking 2.6.16-rc5 with 2 QLogic 2312 adapters using qla2xxx driver from 2.6.16-rc5.
As with earlier kernels, I think > 2.6.12 (since scsi_transport_fc gained functionality) I have the
following problem.
2 scsi hosts available, 4 and 5 (for QLogic).
I disconnect the cable from one of QLogic cards. After timeout I have the message:
rport-4:0-0: blocked FC remote port time out: removing target and saving binding and appropriate
SCSI devices that came from adapter 4 disappear from /proc/scsi/scsi.
So far, so good. I reconnect the cable, the directory /sys/class/fc_remote_ports/rport-4:0-1 appears
along with the old ones rport-4:0-0 and rport-5:0-0, so currently I have 3.
However, no automatic rescan appears on adapter 4.
What's worse, if I try echo "0 1 0" > /sys/class/scsi_host/host4/scan, the process is stuck.
For example, with Emulex if I had in the beginning rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-7:0-0
rport-7:0-1 rport-7:0-2, then disconnected adapter 7, got rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-
7:0-0 rport-7:0-2
(7-0-0 and 7-0-2 didn't disappear while 7-0-1 did) connected 7 back rport-6:0-0 rport-6:0-1 rport-
6:0-2 rport-7:0-2 rport-7:0-4 rport-7:0-5 rport-7:0-6
(7-0-0 disappeared, but 7-0-2 is still here).
So specifically one target-port at rport-7:0-2 which has three luns (0, 1, and 2).
/devices/pci0000:00/0000:00:06.0/0000:05:00.2/000:07:01.1/host7/rport-7:0-2/target7:0:0/7:0:0:0
/devices/pci0000:00/0000:00:06.0/0000:05:00.2/000:07:01.1/host7/rport-7:0-2/target7:0:0/7:0:0:1
/devices/pci0000:00/0000:00:06.0/0000:05:00.2/000:07:01.1/host7/rport-7:0-2/target7:0:0/7:0:0:2
So what are the other rports? Other initiators?
2). disconnected the cable between adapter 7 and the switch rport-7:0-0 disappeared momentarily
with Emulex LinkDown event.
lpfc 0000:07:01.1: 1:1305 Link Down Event x2 received Data: x2 x20 x110
# ls /sys/class/fc_remote_ports
rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-7:0-0 rport-7:0-2
Actually, rport-7:0-1 disappears -- maybe an initiator. Is rport-7:0-0 an FCP_TARGET with no
luns?
3). Then after a timeout I got a message about blocking rport-7:0-2, but nothing changed.
rport-7:0-2: blocked FC remote port time out: removing target and saving binding
Exactly, the scsi_target and scsi_device is reaped after TMO expires.
# ls /sys/class/fc_remote_ports
rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-7:0-0 rport-7:0-2
This is still correct. However, all scsi entries related to adapter 7 are removed from /proc/scsi/scsi.
If your question is, are the /proc/scsi/scsi devices (/dev/sda, sdb,...) supposed to disappear when
the rport TMO expires , then yes they are supposed to disappear. The rports persist with a port_
state of 'not present' for persistent-binding purposes, until the port return (i.e. you reinsert the
cable). If that were to occur, then upon rport addition, the 3-lun storage would be attached to rport-
7:0-2 and a request signaled for lun scanning by the midlayer.
Then after a timeout, the role of rport-7:0-2 is changed to unknown and relevant entries are remov
ed from /proc/scsi/scsi. rport-7:0-0 is still here.
rport-7:0-2: blocked FC remote port time out: removing target and saving binding
# ls /sys/class/fc_remote_ports/
rport-7:0-0 rport-7:0-2
# cat /sys/class/fc_remote_ports/*/roles
Fabric Port
unknown
After reconnecting the cable, rport-7:0:0 disappears and rport-7:0:4 and rport-7:0-5 appear along
with newly recognized LUNs in /proc/scsi/scsi.
# ls /sys/class/fc_remote_ports/
rport-7:0-2 rport-7:0-4 rport-7:0-5
# cat /sys/class/fc_remote_ports/*/roles
FCP Target, FCP Initiator
Fabric Port
Directory Server
If I'm not mistaken, in QLogic case only 1 rport per adapter appeared instead of 3. Tomorrow I'll
connect QLogic and report again.
I can confirm that very problem (pulling the cable between HBA and switch results in only LUN 0
or nothing coming back afterward) on 2.6.15.4 here too.
Please try recent 2.6.16-rcX kernels as there have been a number of patches submitted since 2.6.15
which attempt to address most of these holes
[PATCH] qla2xxx: Close window on race between rport removal and fcport transition.
[SCSI] qla2xxx: Drop legacy 'bypass lun scan for tape device' code.
[SCSI] qla2xxx: Correct issue where the rport's upcall was not being made after relogin.
[SCSI] qla2xxx: Correct synchronization issues during rport addition/deletion.
[SCSI] qla2xxx: Disable port-type RSCN handling via driver state-machine.
Default dev_loss_tmo is 6 (1+5) while in Emulex case the default was 35.
After disconnecting the cable between the HBA and the switch:
qla2xxx 0000:03:01.0: LOOP DOWN detected (2).
rport-4:0-0: blocked FC remote port time out: removing target and saving binding
# ls /sys/class/fc_remote_ports/
rport-4:0-0
# cat /sys/class/fc_remote_ports/*/roles
unknown
Before you try the patch I sent earlier, could you send be the output from the following:
# echo t > /proc/sysrq-trigger
Unfortunately I don't have the directory /proc/scsi/qla2xxx. However the target sees PRLI from
the host again after reconnecting the cable between the initiator and the switch.
Does it mean the rediscovering new devices on initiator side is already done?
The two stage discovery process has not been needed since FC transport integration. Instead, the
driver simply makes up-calls to signal rport visiblity (add on PLOGI/PRLI; delete on LOGO/cabl
e-pull/etc).
Yes, after plugging the cable back in, the driver rediscovers ports:
Mar 3 01:07:22 multipath kernel: scsi(4): RSNN_NN exiting normally.
Mar 3 01:07:22 multipath kernel: scsi(4): GID_PT entry - nn 200000e08b079a69 pn 210000e08b
079a69 portid=010700.
Mar 3 01:07:22 multipath kernel: scsi(4): GID_PT entry - nn 2000001738279c00 pn 1000001738
279c11 portid=010200.
Mar 3 01:07:22 multipath kernel: scsi(4): device wrap (010200)
Initiates PLOGI/PRLI:
Mar 3 01:07:22 multipath kernel: scsi(4): Trying Fabric Login w/loop id 0x0081 for port 010200.
Firmware then notifies software that the port has logged out:
Mar 3 01:07:22 multipath kernel: scsi(4): Async PORT UPDATE ignored 0081/0007/7 ee5.
Mar 3 01:07:22 multipath kernel: scsi(4:0:0): status_entry: Port Down pid=43, compl status=0x29,
port state=0x4
Relogin complete
Mar 3 01:07:23 multipath kernel: scsi(4): port login OK: logged in ID 0x81
Upcall to fc_remote_port_add() done.
Mar 3 01:07:23 multipath kernel: scsi(4): qla2x00_port_login - end
Mar 3 01:07:23 multipath kernel: scsi(4): Async PORT UPDATE ignored 0000/0006 /00 01.
Mar 3 01:07:23 multipath kernel: scsi(4): Async PORT UPDATE ignored 0000/0007/0001.
Mar 3 01:07:23 multipath kernel: scsi(4): Async PORT UPDATE ignored 0000/0004/0001
I also noticed that scsi_transport_fc.c::fc_user_scan() is not called with the host_lock held...
hmm.. could you try out the patch I sent earlier and provide the results.
Also, could you send the "echo t > /proc/..." output after the cable has been reinserted, but, before
the 'echo "- - -" > /sys/class' scan is initiated.
The kernel from scsi-rc-fixes git and your patch are working.
By the way, could you, please, tell me how I get only scsi patches from the git repository, cause I
got the whole kernel by using cg-clone
http://kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fix es-2.6.git
And the disks appear. Could you tell me, please, where this 22sec timeout came from?
Looks like your fiber fabric decided to renegotiate, and halfway it went for a coffee and donuts
break to not upset the union rules :)
I've seen LOOP negotiations take 10+ seconds before, and that is on a really simple setup , so noth
ing super special.
Actually Mike R. and James S. deserve the credit for the composite patch which consists of:
1) [PATCH] FC transport : Avoid device offline cases by stalling aborts until device unblocked
http://marc.theaimsgroup.com/?l=linux-scsi&m=114225658724378&w=2
2) Serialize scan work during fc_remote_port_delete() so rport removal doesn't deadlock midlayer
scans. The problem you were seeing. (Mike R.)
3) rport race fixes during removal (James S.).
...
Essentially there's currently several issues with rport consumers making delete() calls during mid-
layer scanning.
At a minimum to get Mike R's fixes into 2.6.16, and address the additional races going forward...
Here's a minimal the serialize scan-work patch, could you check to see that this addresses your
issue? Start with any latest linux-2.6.git tree.
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 929032e..3d09920 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -1649,6 +1649,8 @@ fc_remote_port_delete(struct fc_rport *
+ /* flush any scan work */ /* which can sleep */
+ scsi_flush_work(rport_to_shost(rport));
scsi_target_block(&rport->dev);
/* cap the length the devices can be blocked until they are deleted */
There are several commands to determine the WWN of a Fibre Channel (FC) HBA and their status
(online/offline). The post discusses few of the most commonly used methods.
Method 1
# lspci -nn | grep -i hba
07:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2532 8Gb FC PCI-E HBA [1077:2532] (rev 02)
07:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2532 8Gb FC PCI-E HBA [1077:2532] (rev 02)
To check the available HBA ports :
# ls -l /sys/class/fc_host
total 0
drwxr-xr-x 3 root root 0 Feb 3 2015 host2
drwxr-xr-x 3 root root 0 Feb 3 2015 host3