kernel hang while servicing IO requests through qlogic hba with nvme disk at backend
by nvmestg from LinuxQuestions.org on (#5GJ4K)
I am running a centos 7.x server which a qlogic 2xxx fc hba, and several nvme disks organized by linux LVM. When I export a small number of logical volumes through the fc hba to host machines, IO runs fine. However, when the number of luns exceeds 20, the centos kernel just hungs. Both drivers of qlogic card and nvme disks are provided by my linux package, not venders.
I have tried kernel version 5.10.11 and a few versions around, the same issue. In my environment, it is quite repeatable, however could not find any similar posts in this forum.
below is message from kernel.log:
WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:304 tcm_qla2xxx_free_cmd+0x84/0x90 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:tcm_qla2xxx_free_cmd+0x84/0x90 [tcm_qla2xxx]
WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:341 tcm_qla2xxx_release_cmd+0x31/0x40 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:tcm_qla2xxx_release_cmd+0x31/0x40 [tcm_qla2xxx]
WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:261 tcm_qla2xxx_complete_free+0x46/0x50 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: tcm_qla2xxx_free tcm_qla2xxx_complete_free [tcm_qla2xxx]
RIP: 0010:tcm_qla2xxx_complete_free+0x46/0x50 [tcm_qla2xxx]
refcount_t: underflow; use-after-free.
WARNING: CPU: 31 PID: 3762 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:refcount_warn_saturate+0xae/0xf0
kernel BUG at drivers/scsi/qla2xxx/qla_target.c:2400!
invalid opcode: 0000 [#1] SMP PTI
CPU: 30 PID: 34474 Comm: kworker/30:2 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:qlt_pci_map_calc_cnt+0x188/0x1c0 [qla2xxx]
in kernel 5.10.11, the corresponding code lines are:
line 304
WARN_ON(cmd->trc_flags & TRC_CMD_DONE);
line 341
if (WARN_ON(cmd->cmd_sent_to_fw))
line 261
WARN_ON(cmd->trc_flags & TRC_CMD_FREE);


I have tried kernel version 5.10.11 and a few versions around, the same issue. In my environment, it is quite repeatable, however could not find any similar posts in this forum.
below is message from kernel.log:
WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:304 tcm_qla2xxx_free_cmd+0x84/0x90 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:tcm_qla2xxx_free_cmd+0x84/0x90 [tcm_qla2xxx]
WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:341 tcm_qla2xxx_release_cmd+0x31/0x40 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:tcm_qla2xxx_release_cmd+0x31/0x40 [tcm_qla2xxx]
WARNING: CPU: 31 PID: 3762 at drivers/scsi/qla2xxx/tcm_qla2xxx.c:261 tcm_qla2xxx_complete_free+0x46/0x50 [tcm_qla2xxx]
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: tcm_qla2xxx_free tcm_qla2xxx_complete_free [tcm_qla2xxx]
RIP: 0010:tcm_qla2xxx_complete_free+0x46/0x50 [tcm_qla2xxx]
refcount_t: underflow; use-after-free.
WARNING: CPU: 31 PID: 3762 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
CPU: 31 PID: 3762 Comm: kworker/31:1 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:refcount_warn_saturate+0xae/0xf0
kernel BUG at drivers/scsi/qla2xxx/qla_target.c:2400!
invalid opcode: 0000 [#1] SMP PTI
CPU: 30 PID: 34474 Comm: kworker/30:2 Tainted: G S W 5.10.11-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro SSG-2029P-DN2R24L/X11DSN-TS, BIOS 2.1 06/23/2018
Workqueue: events target_qf_do_work [target_core_mod]
RIP: 0010:qlt_pci_map_calc_cnt+0x188/0x1c0 [qla2xxx]
in kernel 5.10.11, the corresponding code lines are:
line 304
WARN_ON(cmd->trc_flags & TRC_CMD_DONE);
line 341
if (WARN_ON(cmd->cmd_sent_to_fw))
line 261
WARN_ON(cmd->trc_flags & TRC_CMD_FREE);