Article 5GJ4K kernel hang while servicing IO requests through qlogic hba with nvme disk at backend

kernel hang while servicing IO requests through qlogic hba with nvme disk at backend

by
nvmestg
from LinuxQuestions.org on (#5GJ4K)
I am running a centos 7.x server which a qlogic 2xxx fc hba, and several nvme disks organized by linux LVM. When I export a small number of logical volumes through the fc hba to host machines, IO runs fine. However, when the number of luns exceeds 20, the centos kernel just hungs. Both drivers of qlogic card and nvme disks are provided by my linux package, not venders.

I have tried kernel version 5.10.11 and a few versions around, the same issue. In my environment, it is quite repeatable, however could not find any similar posts in this forum.

below is message from kernel.log:
WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:304 tcm_qla2xxx_free_cmd+0x84/0x90 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:tcm_qla2xxx_free_cmd+0x84/0x90 [tcm_qla2xxx]

WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:341 tcm_qla2xxx_release_cmd+0x31/0x40 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:tcm_qla2xxx_release_cmd+0x31/0x40 [tcm_qla2xxx]

WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:261 tcm_qla2xxx_complete_free+0x46/0x50 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: tcm_qla2xxx_free tcm_qla2xxx_complete_free [tcm_qla2xxx]
RIP: 0010:tcm_qla2xxx_complete_free+0x46/0x50 [tcm_qla2xxx]

refcount_t: underflow; use-after-free.
WARNING: CPU: 31 PID: 3762 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:refcount_warn_saturate+0xae/0xf0

kernel BUG at drivers/scsi/qla2xxx/qla_target.c:2400!
invalid opcode: 0000 [#1] SMP PTI
CPU: 30 PID: 34474 Comm: kworker/30:2 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:qlt_pci_map_calc_cnt+0x188/0x1c0 [qla2xxx]

in kernel 5.10.11, the corresponding code lines are:
line 304
WARN_ON(cmd->trc_flags & TRC_CMD_DONE);

line 341
if (WARN_ON(cmd->cmd_sent_to_fw))

line 261
WARN_ON(cmd->trc_flags & TRC_CMD_FREE);latest?d=yIl2AUoC8zA latest?i=K-XsmrqmKEw:_G3wA8NeYok:F7zBnMy latest?i=K-XsmrqmKEw:_G3wA8NeYok:V_sGLiP latest?d=qj6IDK7rITs latest?i=K-XsmrqmKEw:_G3wA8NeYok:gIN9vFwK-XsmrqmKEw
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments