0% found this document useful (0 votes)
116 views12 pages

LNX Storage 1a

The document discusses checking the status of remote fibre channel ports using the /sys filesystem. Key information about remote ports like the port_name, port_state, roles, and scsi_target_id can be viewed. The systool command provides a unified way to examine fibre channel driver and hardware information stored throughout the /sys filesystem. Remote ports with an unknown role indicate the port is no longer present, but the data is kept to preserve original scsi target id bindings if the port returns.

Uploaded by

Yulin Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views12 pages

LNX Storage 1a

The document discusses checking the status of remote fibre channel ports using the /sys filesystem. Key information about remote ports like the port_name, port_state, roles, and scsi_target_id can be viewed. The systool command provides a unified way to examine fibre channel driver and hardware information stored throughout the /sys filesystem. Remote ports with an unknown role indicate the port is no longer present, but the data is kept to preserve original scsi target id bindings if the port returns.

Uploaded by

Yulin Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

"rport" is the short form for remote port.

On your host the remote port(s) would ideally be the


ports that the fiber connection is connected to on the other end. For example, if a system is conn
ected to a storage array with two controllers (i.e., nothing but the ports), then the remote port
information and state can be checked. There are a few ways for checking this:
1. The information related to rhost ports is at the location:
/sys/class/fc_remote_ports/rport-*
where information like the node_name, port_name, port_status, scsi_target_id, etc. can be checked
with the cat command. For example, the following command will check the state of the port and if
it is online or not:
$ cat /sys/class/fc_remote_ports/rport-1:0-0/port_state
which will show "Online", if the port is online and working normally. A simple grep command can
be used to see all port states at once:
$ grep -Hv "zz" /sys/class/fc_remote_ports/rport*/port_state

2. Another very good utility is the systool command from the sysfsutils package (see: man
systool). With this command the information from the remote ports can be gathered in one pass by
running the command with following options:
$ systool -c fc_remote_ports -v
where -c indicates class, and -v is for verbose.
Not all "rports" are storage ports. Examine the "roles" parameter under rport to determine if the
remote port is a storage target.
$ grep -Hv "zz" /sys/class/fc_remote_ports/rport-*/roles
/sys/class/fc_remote_ports/rport-6:0-0/roles:FCP Target << storage target
/sys/class/fc_remote_ports/rport-6:0-3/roles:FCP Target << storage target
/sys/class/fc_remote_ports/rport-6:0-4/roles:FCP Initiator << HBA
/sys/class/fc_remote_ports/rport-8:0-8/roles:Fabric Port << fabric port
/sys/class/fc_remote_ports/rport-8:0-9/roles:Directory Server << fabric directory server

Remote ports associated with storage ports with also have a positive (not -1) value assigned to its
scsi_target_id value.
$ grep -Hv "zz" /sys/class/fc_remote_ports/rport-*/scsi_target_id
/sys/class/fc_remote_ports/rport-6:0-0/scsi_target_id:0
/sys/class/fc_remote_ports/rport-6:0-3/scsi_target_id:1
/sys/class/fc_remote_ports/rport-6:0-4/scsi_target_id:-1
/sys/class/fc_remote_ports/rport-8:0-8/scsi_target_id:-1
/sys/class/fc_remote_ports/rport-8:0-9/scsi_target_id:-1

What does an 'unknown' roles mean?


$ cat /sys/class/fc_remote_ports/rport-0:0-17/roles
unknown
How can we clean up remote ports with unknown role?
Is there a way to reset the driver or HBA in a way to clean this up (with out rebooting)?
The 'unknown' role means the device is no longer connected/available to this host at this time.
There is no detrimental effect of leaving these fc remote ports within the running configuration.
Reboot is the supported method to remove fc remote ports from sysfs that are no longer present on
the san or zoned to this host.

FCP-SCSI targets, once discovered, are retained within the system until the next reboot. If access
to a FCP-SCSI target is lost, then you'll probably notice a message of:
"blocked FC remote port time out: removing target and saving binding" or something similar.
Essentially the message is indicating that the scsi target is being removed from the active configu
ration but its saving its binding. The "saving binding" part just means if or when this storage target
port returns to the configuration -- can be seen by this host again -- it will retain/be assigned its
original scsi target id. So if, for example, a storage target attached to scsi host0 that was assigned
scsi target id 5 left and returned then the scsi addresses to luns behind that storage target would be
0:0:5:* (h:c:t:l) before and after that event. This will prevent devices from moving around and
being renamed if a storage target goes missing for a period of time.

In order to maintain the above logic, that is returning original bindings to returning devices, two
pieces of information need to be retained:
a) the wwpn of the storage port, and
b) the original assigned scsi target id.
This is why the left over data from ports that are no longer available to this host is kept. Ports that
have left the active configuration and are no longer accessible with have a role of "unknown" and
a port_state of "Not Present".

Examine the number of "unknown" roles in fc_remote_ports:


$ ls -1c ./sys/class/fc_remote_ports/rport*/roles 2> /dev/null | xargs -I {} grep -H -v \"ZzZz\" {} |
grep unknown
./sys/class/fc_remote_ports/rport-0:0-17/roles:unknown
./sys/class/fc_remote_ports/rport-0:0-2/roles:unknown
./sys/class/fc_remote_ports/rport-0:0-3/roles:unknown
./sys/class/fc_remote_ports/rport-1:0-5/roles:unknown
Then examine the full contents of each such rport* directory/contents. Remote ports with "unkn
own" should list port_status as "Not Present" along with the saved wwpn (port_name) and assi
gned scsi target id.
$ cd ./sys/class/fc_remote_ports/rport-1:0-7; ls -1c 2> /dev/null |xargs -I {} grep -H -v \"ZzZz\" {}
dev_loss_tmo:30
fast_io_fail_tmo:off
maxframe_size:4294967295 bytes
node_name:0xffffffffffffffff
port_id:0xffffffff
port_name:0x5006016b3b241784
port_state:Not Present
roles:unknown
scsi_target_id:5
In the above case the preserved scsi addresses will be 0:0:5:*. You can also use systool (from sysfs
utils package) to get the same information as above:
$ systool -c fc_transport -v
:
The port_name is a WWN (world wide name) in NAA format. This can be decoded, if desired, to
see what storage type this wwpn belonged to.
0x5006016b3b241784::
Vendor
NAA OUI Specific
5 00-60-16 B3B241784 CLARIION

So this fc remote port that no longer present within the storage configuration was associated with
Clariion storage.

The 2.6.11 Linux kernel introduced certain changes to the lpfc (emulex driver) and qla2xxx (Qlog
ic driver) Fibre Channel HBA drivers which removed the following entries from the proc pseudo -
filesystem: /proc/scsi/qla2xxx, /proc/scsi/lpfc. These entries had provided a centralized repository
of information about the drivers and connected hardware. After the changes, the drivers started sto
ring all this information within the /sys filesystem. Since Red Hat Enterprise Linux 5 uses version
2.6.18 of the Linux kernel it is affected by this change.

Using the /sys filesystem has the advantage that all the Fibre Channel drivers now use a unified
and consistent manner to report data. However it also means that the data previously available in a
single file is now scattered across a myriad of files in different parts of the /sys filesystem.

One basic example is the status of a Fibre Channel HBA: checking this can now be accomplished
with the following command:
# cat /sys/class/scsi_host/host#/state
...where host# is the H-value in the HBTL SCSI addressing format, which references the approp
riate FC HBA. For emulex adapters (lpfc driver) for example, this command would yield:
# cat /sys/class/scsi_host/host1/state
Link Up - Ready:

Fabric
For qlogic devices (qla2xxx driver) the output would instead be as follows:
# cat /sys/class/scsi_host/host1/state
Link Up - F_Port

Obviously it becomes quite impractical to search through the /sys filesystem for the relevant files
when there is a large variety of Fibre Channel-related information of interest. Instead of manual
searching, the systool (1) command provides a simple but powerful means of examining and
analyzing this information. Detailed below are several commands which demonstrate samples of
information which the systool command can be used to examine.
To examine some simple information about the Fibre Channel HBAs in a machine:
# systool -c fc_host -v
To look at verbose information regarding the SCSI adapters present on a system:
# systool -c scsi_host -v
To see what Fibre Channel devices are connected to the Fibre Channel HBA cards:
# systool -c fc_remote_ports -v -d

For Fibre Channel transport information:


# systool -c fc_transport -v
For information on SCSI disks connected to a system:
# systool -c scsi_disk -v

To examine more disk information including which hosts are connected to which disks:
# systool -b scsi -v
Furthermore, by installing the sg3_utils package it is possible to use the sg_map command to view
more information about the SCSI map. After installing the package, run:
# modprobe sg
# sg_map -x

Finally, to obtain driver information, including version numbers and active parameters, the
following commands can be used for the lpfc and qla2xxx drivers respectively:
# systool -m lpfc -v
# systool -m qla2xxx -v
ATTENTION: The syntax of the systool (1) command differs across versions of Red Hat
Enterprise Linux. Therefore the commands above are only valid for Red Hat Enterprise Linux 5.

I'm checking 2.6.16-rc5 with 2 QLogic 2312 adapters using qla2xxx driver from 2.6.16-rc5.
As with earlier kernels, I think > 2.6.12 (since scsi_transport_fc gained functionality) I have the
following problem.
2 scsi hosts available, 4 and 5 (for QLogic).
I disconnect the cable from one of QLogic cards. After timeout I have the message:
rport-4:0-0: blocked FC remote port time out: removing target and saving binding and appropriate
SCSI devices that came from adapter 4 disappear from /proc/scsi/scsi.
So far, so good. I reconnect the cable, the directory /sys/class/fc_remote_ports/rport-4:0-1 appears
along with the old ones rport-4:0-0 and rport-5:0-0, so currently I have 3.
However, no automatic rescan appears on adapter 4.
What's worse, if I try echo "0 1 0" > /sys/class/scsi_host/host4/scan, the process is stuck.

Most of the problem seems to be a QLogic driver problem.


HBAs are connected to target via FC switch.
1). If I have several LUNs on each HBA, with QLogic only 1 directory per adapter (for LUN 0) is
created in /sys/class/fc_remote_ports, while with Emulex a directory for every LUN is created.
2). The situation I described occurs with QLogic only if the cable connecting between HBA and
switch is pulled out/in. If I (dis)connect the cable between switch and target, disks come back.
3). With Emulex in both cases disks come back. However, both with Emulex and QLogic stale
directories in /sys/classfc_remote_ports are left.

For example, with Emulex if I had in the beginning rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-7:0-0
rport-7:0-1 rport-7:0-2, then disconnected adapter 7, got rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-
7:0-0 rport-7:0-2
(7-0-0 and 7-0-2 didn't disappear while 7-0-1 did) connected 7 back rport-6:0-0 rport-6:0-1 rport-
6:0-2 rport-7:0-2 rport-7:0-4 rport-7:0-5 rport-7:0-6
(7-0-0 disappeared, but 7-0-2 is still here).

I applied the patch changing 2 lines in scsi_transport_fc.c to if (fc_host_tgtid_bind_type(shost) !=


FC_TGTID_BIND_NONE)
I tried first with Emulex as QLogic seemed yesterday to have more problems, (not only orphan
rports, but also not creating all rports), so let's start solving problems one by one.
I saw no change in behavior with Emulex.
1). I had 3 LUNs on adapters 6 and 7.
# ls /sys/class/fc_remote_ports
rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-7:0-0 rport-7:0-1 rport-7:0-2

So specifically one target-port at rport-7:0-2 which has three luns (0, 1, and 2).
/devices/pci0000:00/0000:00:06.0/0000:05:00.2/000:07:01.1/host7/rport-7:0-2/target7:0:0/7:0:0:0
/devices/pci0000:00/0000:00:06.0/0000:05:00.2/000:07:01.1/host7/rport-7:0-2/target7:0:0/7:0:0:1
/devices/pci0000:00/0000:00:06.0/0000:05:00.2/000:07:01.1/host7/rport-7:0-2/target7:0:0/7:0:0:2
So what are the other rports? Other initiators?

2). disconnected the cable between adapter 7 and the switch rport-7:0-0 disappeared momentarily
with Emulex LinkDown event.
lpfc 0000:07:01.1: 1:1305 Link Down Event x2 received Data: x2 x20 x110
# ls /sys/class/fc_remote_ports
rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-7:0-0 rport-7:0-2
Actually, rport-7:0-1 disappears -- maybe an initiator. Is rport-7:0-0 an FCP_TARGET with no
luns?

3). Then after a timeout I got a message about blocking rport-7:0-2, but nothing changed.
rport-7:0-2: blocked FC remote port time out: removing target and saving binding
Exactly, the scsi_target and scsi_device is reaped after TMO expires.
# ls /sys/class/fc_remote_ports
rport-6:0-0 rport-6:0-1 rport-6:0-2 rport-7:0-0 rport-7:0-2
This is still correct. However, all scsi entries related to adapter 7 are removed from /proc/scsi/scsi.

If your question is, are the /proc/scsi/scsi devices (/dev/sda, sdb,...) supposed to disappear when
the rport TMO expires , then yes they are supposed to disappear. The rports persist with a port_
state of 'not present' for persistent-binding purposes, until the port return (i.e. you reinsert the
cable). If that were to occur, then upon rport addition, the 3-lun storage would be attached to rport-
7:0-2 and a request signaled for lun scanning by the midlayer.

I'll just describe Emulex situation to confirm.


Let's connect only 1 Emulex port (adapter 7) to a switch and leave adapter 6 not connected. Then
we have
# ls /sys/class/fc_remote_ports/
rport-7:0-0 rport-7:0-1 rport-7:0-2
# cat /sys/class/fc_remote_ports/*/roles
Fabric Port
Directory Server
FCP Target, FCP Initiator
When the cable is disconnected from adapter 7, immediately with LinkDown event, the rport with
the role of Directory server disappears and only 2 are left:
# ls /sys/class/fc_remote_ports/
rport-7:0-0 rport-7:0-2
# cat /sys/class/fc_remote_ports/*/roles
Fabric Port
FCP Target, FCP Initiator

Then after a timeout, the role of rport-7:0-2 is changed to unknown and relevant entries are remov
ed from /proc/scsi/scsi. rport-7:0-0 is still here.

rport-7:0-2: blocked FC remote port time out: removing target and saving binding
# ls /sys/class/fc_remote_ports/
rport-7:0-0 rport-7:0-2
# cat /sys/class/fc_remote_ports/*/roles
Fabric Port
unknown

After reconnecting the cable, rport-7:0:0 disappears and rport-7:0:4 and rport-7:0-5 appear along
with newly recognized LUNs in /proc/scsi/scsi.
# ls /sys/class/fc_remote_ports/
rport-7:0-2 rport-7:0-4 rport-7:0-5
# cat /sys/class/fc_remote_ports/*/roles
FCP Target, FCP Initiator
Fabric Port
Directory Server

If I'm not mistaken, in QLogic case only 1 rport per adapter appeared instead of 3. Tomorrow I'll
connect QLogic and report again.
I can confirm that very problem (pulling the cable between HBA and switch results in only LUN 0
or nothing coming back afterward) on 2.6.15.4 here too.

Please try recent 2.6.16-rcX kernels as there have been a number of patches submitted since 2.6.15
which attempt to address most of these holes
[PATCH] qla2xxx: Close window on race between rport removal and fcport transition.
[SCSI] qla2xxx: Drop legacy 'bypass lun scan for tape device' code.
[SCSI] qla2xxx: Correct issue where the rport's upcall was not being made after relogin.
[SCSI] qla2xxx: Correct synchronization issues during rport addition/deletion.
[SCSI] qla2xxx: Disable port-type RSCN handling via driver state-machine.

Today I tested disconnecting QLogic port.


Adapter 4 is connected via switch to a storage and 3 LUNs are seen via the adapter.
Only 1 rport is created (for FCP Target) while in Emulex case there were 3: (Fabric Port, Directory
Server and FCP Target, FCP Initiator).
# ls /sys/class/fc_remote_ports/
rport-4:0-0
# cat /sys/class/fc_remote_ports/*/roles
FCP Target

Default dev_loss_tmo is 6 (1+5) while in Emulex case the default was 35.

After disconnecting the cable between the HBA and the switch:
qla2xxx 0000:03:01.0: LOOP DOWN detected (2).
rport-4:0-0: blocked FC remote port time out: removing target and saving binding
# ls /sys/class/fc_remote_ports/
rport-4:0-0
# cat /sys/class/fc_remote_ports/*/roles
unknown

Relevant scsi devices are removed from /proc/scsi/scsi.

After reconnecting the cable


qla2xxx 0000:03:01.0: LIP reset occured (f7f7).
qla2xxx 0000:03:01.0: LOOP UP detected (2 Gbps).
# ls /sys/class/fc_remote_ports/
rport-4:0-0
# cat /sys/class/fc_remote_ports/*/roles
FCP Target
However, scsi devices don't reappear in /proc/scsi/scsi.
When I issue rescan, the command is stuck
echo - - - > /sys/class/scsi_host/host4/scan
That's correct, we currently don't make an upcall for the SNS server port nor the switch F-port.
...
Could you add the enable-debug patch I sent you earlier and retry the test? Again forward the rel
event snippets from var/log/messages.

Here's the patch again.


diff --git a/drivers/scsi/qla2xxx/qla_dbg.h b/drivers/scsi/qla2xxx/qla_dbg.h
index 935a59a..632f653 100644
--- a/drivers/scsi/qla2xxx/qla_dbg.h
+++ b/drivers/scsi/qla2xxx/qla_dbg.h
@@ -9,6 +9,7 @@
*/
/* #define QL_DEBUG_LEVEL_1 */ /* Output register accesses to COM1 */
/* #define QL_DEBUG_LEVEL_2 */ /* Output error msgs to COM1 */
+#define QL_DEBUG_LEVEL_2 /* Output error msgs to COM1 */
/* #define QL_DEBUG_LEVEL_3 */ /* Output function trace msgs to COM1 */
/* #define QL_DEBUG_LEVEL_4 */ /* Output NVRAM trace msgs to COM1 */
/* #define QL_DEBUG_LEVEL_5 */ /* Output ring trace msgs to COM1 */

diff --git a/drivers/scsi/qla2xxx/qla_settings.h b/drivers/scsi/qla2xxx/qla_settings.h


index 363205c..b2e22b0 100644
--- a/drivers/scsi/qla2xxx/qla_settings.h
+++ b/drivers/scsi/qla2xxx/qla_settings.h
@@ -8,7 +8,7 @@
* Compile time Options:
* 0 - Disable and 1 - Enable
*/
-#define DEBUG_QLA2100 0 /* For Debug of qla2x00 */
+#define DEBUG_QLA2100 1 /* For Debug of qla2x00 */

#define USE_ABORT_TGT 1 /* Use Abort Target mbx cmd *

Please see the log with debug-patch.


The module is loaded with option qlport_down_retry=1. Adapter 4 is connected to switch, adapter
5 doesn't have cable attached.
After reconnecting the cable the disks don't reappear and rescan is stuck. Before applying your pat
ches ghost rport was staying, now it's OK.

Before you try the patch I sent earlier, could you send be the output from the following:
# echo t > /proc/sysrq-trigger

Historically the qlogic driver rescan is a 2-phase process:


1) schedule the rescan, e.g.: echo scsi-qlascan > /proc/scsi/qla2xxx/4
2) rescan, e.g.: echo - - - > /sys/class/scsi_host/host4/scan
BUT, I've just used scsi-qlascan to discover _new_ devices... not existing devices that experienced
FC connection loss. I assume the qla driver _should_ just bring those lost devices back? But does
the historic 2-phase rescan for new devices speak to why the qlogic driver doesn't automagically
bring the old devices back? Or has the latest qlogic driver in mainline advanced past this 2-phase
requirement in

Unfortunately I don't have the directory /proc/scsi/qla2xxx. However the target sees PRLI from
the host again after reconnecting the cable between the initiator and the switch.
Does it mean the rediscovering new devices on initiator side is already done?

The two stage discovery process has not been needed since FC transport integration. Instead, the
driver simply makes up-calls to signal rport visiblity (add on PLOGI/PRLI; delete on LOGO/cabl
e-pull/etc).

Yes, after plugging the cable back in, the driver rediscovers ports:
Mar 3 01:07:22 multipath kernel: scsi(4): RSNN_NN exiting normally.
Mar 3 01:07:22 multipath kernel: scsi(4): GID_PT entry - nn 200000e08b079a69 pn 210000e08b
079a69 portid=010700.
Mar 3 01:07:22 multipath kernel: scsi(4): GID_PT entry - nn 2000001738279c00 pn 1000001738
279c11 portid=010200.
Mar 3 01:07:22 multipath kernel: scsi(4): device wrap (010200)
Initiates PLOGI/PRLI:
Mar 3 01:07:22 multipath kernel: scsi(4): Trying Fabric Login w/loop id 0x0081 for port 010200.

And upcall via fc_remote_port_add() is done.


Mar 3 01:07:22 multipath kernel: scsi(4): LOOP READY
Mar 3 01:07:22 multipath kernel: scsi(4): qla2x00_loop_resync - end

Firmware then notifies software that the port has logged out:
Mar 3 01:07:22 multipath kernel: scsi(4): Async PORT UPDATE ignored 0081/0007/7 ee5.
Mar 3 01:07:22 multipath kernel: scsi(4:0:0): status_entry: Port Down pid=43, compl status=0x29,
port state=0x4

A CDB also returns with a completion status of PORT_LOGGED_OUT. From


the driver's DPC routine (process-context), the upcall to fc_remote_port_delete() is issued:

Driver attempts a relogin:


Mar 3 01:07:22 multipath kernel: scsi(4): Port login retry:1000008279c11, id =0x0081 retry cnt=8
Mar 3 01:07:23 multipath kernel: scsi(4): fcport-0 - port retry count: 0 remaining
Mar 3 01:07:23 multipath kernel: scsi(4): qla2x00_port_login()
Mar 3 01:07:23 multipath kernel: scsi(4): Trying Fabric Login w/loop id 0x0081 for port 010200.

Relogin complete
Mar 3 01:07:23 multipath kernel: scsi(4): port login OK: logged in ID 0x81
Upcall to fc_remote_port_add() done.
Mar 3 01:07:23 multipath kernel: scsi(4): qla2x00_port_login - end
Mar 3 01:07:23 multipath kernel: scsi(4): Async PORT UPDATE ignored 0000/0006 /00 01.
Mar 3 01:07:23 multipath kernel: scsi(4): Async PORT UPDATE ignored 0000/0007/0001.
Mar 3 01:07:23 multipath kernel: scsi(4): Async PORT UPDATE ignored 0000/0004/0001

I also noticed that scsi_transport_fc.c::fc_user_scan() is not called with the host_lock held...
hmm.. could you try out the patch I sent earlier and provide the results.

Also, could you send the "echo t > /proc/..." output after the cable has been reinserted, but, before
the 'echo "- - -" > /sys/class' scan is initiated.

Here's sysrq output after reconnecting cable without manual disk


rescan. Before applying a patch.
The same lock exists:
#001: [ffff81006ee20080] {scsi_host_alloc}
.. held by: scsi_wq_4: 4255 [ffff81006f9147b0, 110]
... acquired at: scsi_scan_target+0x51/0x87 [scsi_mod]

After applying the patch the same lock exists:


#001: [ffff81006edc4080] {scsi_host_alloc}
.. held by: scsi_wq_4: 4255 [ffff81007edaf770, 110]
... acquired at: scsi_scan_target+0x51/0x87 [scsi_mod]

The kernel from scsi-rc-fixes git and your patch are working.
By the way, could you, please, tell me how I get only scsi patches from the git repository, cause I
got the whole kernel by using cg-clone
http://kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fix es-2.6.git

Now the process looks like following:


Mar 11 23:54:22 multipath kernel: qla2xxx 0000:03:01.0: LOOP DOWN detected (2).
Mar 11 23:54:28 multipath kernel: rport-4:0-0: blocked FC remote port time out:
removing target and saving binding
Mar 11 23:54:37 multipath kernel: qla2xxx 0000:03:01.0: LIP reset occured (f7f7).
Mar 11 23:54:37 multipath kernel: qla2xxx 0000:03:01.0: LOOP UP detected (2 Gbps).
Mar 11 23:54:59 multipath kernel: 4:0:0:0: timing out command, waited 22s

And the disks appear. Could you tell me, please, where this 22sec timeout came from?

Looks like your fiber fabric decided to renegotiate, and halfway it went for a coffee and donuts
break to not upset the union rules :)
I've seen LOOP negotiations take 10+ seconds before, and that is on a really simple setup , so noth
ing super special.

Actually Mike R. and James S. deserve the credit for the composite patch which consists of:
1) [PATCH] FC transport : Avoid device offline cases by stalling aborts until device unblocked
http://marc.theaimsgroup.com/?l=linux-scsi&m=114225658724378&w=2
2) Serialize scan work during fc_remote_port_delete() so rport removal doesn't deadlock midlayer
scans. The problem you were seeing. (Mike R.)
3) rport race fixes during removal (James S.).
...
Essentially there's currently several issues with rport consumers making delete() calls during mid-
layer scanning.

At a minimum to get Mike R's fixes into 2.6.16, and address the additional races going forward...

Here's a minimal the serialize scan-work patch, could you check to see that this addresses your
issue? Start with any latest linux-2.6.git tree.
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 929032e..3d09920 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -1649,6 +1649,8 @@ fc_remote_port_delete(struct fc_rport *
+ /* flush any scan work */ /* which can sleep */
+ scsi_flush_work(rport_to_shost(rport));
scsi_target_block(&rport->dev);
/* cap the length the devices can be blocked until they are deleted */

There are several commands to determine the WWN of a Fibre Channel (FC) HBA and their status
(online/offline). The post discusses few of the most commonly used methods.
Method 1
# lspci -nn | grep -i hba
07:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2532 8Gb FC PCI-E HBA [1077:2532] (rev 02)
07:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2532 8Gb FC PCI-E HBA [1077:2532] (rev 02)
To check the available HBA ports :
# ls -l /sys/class/fc_host
total 0
drwxr-xr-x 3 root root 0 Feb 3 2015 host2
drwxr-xr-x 3 root root 0 Feb 3 2015 host3

To find the state of HBA ports (online/offline) :


# more /sys/class/fc_host/host?/port_state
/sys/class/fc_host/host2/port_state
Online
/sys/class/fc_host/host3/port_state
Online

To find the WWN numbers of the above ports :


# more /sys/class/fc_host/host?/port_name
/sys/class/fc_host/host2/port_name
0x500143802426baf4
/sys/class/fc_host/host3/port_name
0x500143802426baf6

Method 2 : Using systool


Another useful command to find the information about HBAs is systool. If not already install, you
may need to install the sysfsutils package.
# yum install sysfsutils

To check the available HBA ports :


# systool -c fc_host
Class = "fc_host"
Class Device = "host2"
Device = "host2"
Class Device = "host3"
Device = "host3"

To find the WWNs for the HBA ports :


# systool -c fc_host -v | grep port_name
port_name = "0x500143802426baf4"
port_name = "0x500143802426baf6"
To check the state of the HBA ports (online/offline) :
# systool -c fc_host -v | grep port_state
port_state = "Online"
port_state = "Online"

You might also like