IO hung because request_queue is quiesced
by plznobug from LinuxQuestions.org on (#6E7Y0)
Hi guys,
There is a strange problem with my machine. There are many processes that have been suspended. I artificially crash the kernel to generate a vmcore for analysis. I analyzed the vmcore using the crash tool and found that some processes in UD are scheduled because of blk_mq_get_tag.
The backtrace of one of those process is as follow.
#0 [ffffba09c97f7890] __schedule at ffffffffa1074402
#1 [ffffba09c97f7930] schedule at ffffffffa1074a68
#2 [ffffba09c97f7938] io_schedule at ffffffffa08e3f32
#3 [ffffba09c97f7948] blk_mq_get_tag at ffffffffa0bf28f9
#4 [ffffba09c97f79b8] blk_mq_get_request at ffffffffa0bed45a
#5 [ffffba09c97f79f0] blk_mq_submit_bio at ffffffffa0befb30
#6 [ffffba09c97f7a78] generic_make_request at ffffffffa0be430f
#7 [ffffba09c97f7ad0] submit_bio at ffffffffa0be4595
#8 [ffffba09c97f7b10] ext4_mpage_readpages at ffffffffc05c6727 [ext4]
#9 [ffffba09c97f7bf8] read_pages at ffffffffa0a2934b
#10 [ffffba09c97f7c70] __do_page_cache_readahead at ffffffffa0a29631
#11 [ffffba09c97f7d08] ondemand_readahead at ffffffffa0a29849
#12 [ffffba09c97f7d50] generic_file_buffered_read at ffffffffa0a1707c
#13 [ffffba09c97f7e40] new_sync_read at ffffffffa0ac38e1
#14 [ffffba09c97f7ec8] vfs_read at ffffffffa0ac6131
#15 [ffffba09c97f7f00] ksys_read at ffffffffa0ac656f
#16 [ffffba09c97f7f38] do_syscall_64 at ffffffffa08041db
#17 [ffffba09c97f7f50] entry_SYSCALL_64_after_hwframe at ffffffffa12000ad
The scheduler is md-deadline.
I found out that sched_tags have been used up, but the tags are still available.
What's more, I found the request_queue is in QUEUE_FLAG_QUIESCED but the scsi_device.sdev_state is SDEV_RUNNING. This phenomenon makes me feel like there is a bug in the kernel.
Here is my process of using crash to get above information.
crash> dev -d
MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC
8 ffff9fe9833c1000 sda ffff9ff8a02489b8 1930 2 1928
crash> struct request_queue.queue_flags ffff9ff8a02489b8
queue_flags = 219287104
crash> eval (219287104>>26)
hexadecimal: 3
decimal: 3
octal: 3
binary: 0000000000000000000000000000000000000000000000000000000000000011
crash> struct request_queue ffff9ff8a02489b8 | grep queuedata
queuedata = 0xffff9ff8bc1c2000,
crash> struct scsi_device 0xffff9ff8bc1c2000 | grep sdev_state
sdev_state = SDEV_RUNNING,
Help
There is a strange problem with my machine. There are many processes that have been suspended. I artificially crash the kernel to generate a vmcore for analysis. I analyzed the vmcore using the crash tool and found that some processes in UD are scheduled because of blk_mq_get_tag.
The backtrace of one of those process is as follow.
#0 [ffffba09c97f7890] __schedule at ffffffffa1074402
#1 [ffffba09c97f7930] schedule at ffffffffa1074a68
#2 [ffffba09c97f7938] io_schedule at ffffffffa08e3f32
#3 [ffffba09c97f7948] blk_mq_get_tag at ffffffffa0bf28f9
#4 [ffffba09c97f79b8] blk_mq_get_request at ffffffffa0bed45a
#5 [ffffba09c97f79f0] blk_mq_submit_bio at ffffffffa0befb30
#6 [ffffba09c97f7a78] generic_make_request at ffffffffa0be430f
#7 [ffffba09c97f7ad0] submit_bio at ffffffffa0be4595
#8 [ffffba09c97f7b10] ext4_mpage_readpages at ffffffffc05c6727 [ext4]
#9 [ffffba09c97f7bf8] read_pages at ffffffffa0a2934b
#10 [ffffba09c97f7c70] __do_page_cache_readahead at ffffffffa0a29631
#11 [ffffba09c97f7d08] ondemand_readahead at ffffffffa0a29849
#12 [ffffba09c97f7d50] generic_file_buffered_read at ffffffffa0a1707c
#13 [ffffba09c97f7e40] new_sync_read at ffffffffa0ac38e1
#14 [ffffba09c97f7ec8] vfs_read at ffffffffa0ac6131
#15 [ffffba09c97f7f00] ksys_read at ffffffffa0ac656f
#16 [ffffba09c97f7f38] do_syscall_64 at ffffffffa08041db
#17 [ffffba09c97f7f50] entry_SYSCALL_64_after_hwframe at ffffffffa12000ad
The scheduler is md-deadline.
I found out that sched_tags have been used up, but the tags are still available.
What's more, I found the request_queue is in QUEUE_FLAG_QUIESCED but the scsi_device.sdev_state is SDEV_RUNNING. This phenomenon makes me feel like there is a bug in the kernel.
Here is my process of using crash to get above information.
crash> dev -d
MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC
8 ffff9fe9833c1000 sda ffff9ff8a02489b8 1930 2 1928
crash> struct request_queue.queue_flags ffff9ff8a02489b8
queue_flags = 219287104
crash> eval (219287104>>26)
hexadecimal: 3
decimal: 3
octal: 3
binary: 0000000000000000000000000000000000000000000000000000000000000011
crash> struct request_queue ffff9ff8a02489b8 | grep queuedata
queuedata = 0xffff9ff8bc1c2000,
crash> struct scsi_device 0xffff9ff8bc1c2000 | grep sdev_state
sdev_state = SDEV_RUNNING,
Help