Random Ubuntu Server Crashes
by jjbrunton from LinuxQuestions.org on (#4T9WH)
For a few months now, I have been getting random crashes on a Ubuntu Server I have colocated. The server needs to be fully restarted to recover from this issue and when asked to investigate, the data center staff mentioned that the server is unresponsive to keyboard/mouse input.
The last piece of the log that looks useful is this:
Code:Oct 27 21:30:57 Saturn kernel: [41976.366147] ------------[ cut here ]------------
Oct 27 21:30:57 Saturn kernel: [41976.366153] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
Oct 27 21:30:57 Saturn kernel: [41976.366177] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:466 dev_watchdog+0x221/0x230
Oct 27 21:30:57 Saturn kernel: [41976.366178] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport xt_nat xt_tcpudp veth xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc cfg80211 aufs overlay intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi aesni_intel aes_x86_64 snd_hda_codec_realtek crypto_simd cryptd glue_helper snd_hda_codec_generic intel_cstate ledtrig_audio intel_rapl_perf i915 eeepc_wmi asus_wmi input_leds sparse_keymap wmi_bmof kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm snd_hda_intel irqbypass serio_raw snd_hda_codec drm_kms_helper snd_hda_core drm snd_hwdep snd_pcm snd_timer i2c_algo_bit fb_sys_fops snd syscopyarea sysfillrect sysimgblt soundcore ie31200_edac mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456
Oct 27 21:30:57 Saturn kernel: [41976.366226] async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear r8169 ahci realtek i2c_i801 lpc_ich libahci wmi video
Oct 27 21:30:57 Saturn kernel: [41976.366243] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 5.0.0-32-generic #34-Ubuntu
Oct 27 21:30:57 Saturn kernel: [41976.366244] Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 9012 09/18/2018
Oct 27 21:30:57 Saturn kernel: [41976.366247] RIP: 0010:dev_watchdog+0x221/0x230
Oct 27 21:30:57 Saturn kernel: [41976.366249] Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 cc c7 ef 00 01 e8 63 2e fc ff 89 d9 4c 89 ee 48 c7 c7 f0 c1 7c b5 48 89 c2 e8 f1 38 79 ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
Oct 27 21:30:57 Saturn kernel: [41976.366251] RSP: 0018:ffff97560f903e68 EFLAGS: 00010286
Oct 27 21:30:57 Saturn kernel: [41976.366252] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
Oct 27 21:30:57 Saturn kernel: [41976.366254] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff97560f916440
Oct 27 21:30:57 Saturn kernel: [41976.366255] RBP: ffff97560f903e98 R08: 0000000000000001 R09: 00000000000003db
Oct 27 21:30:57 Saturn kernel: [41976.366256] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000001
Oct 27 21:30:57 Saturn kernel: [41976.366257] R13: ffff9756033a0000 R14: ffff9756033a04c0 R15: ffff9756033d5680
Oct 27 21:30:57 Saturn kernel: [41976.366259] FS: 0000000000000000(0000) GS:ffff97560f900000(0000) knlGS:0000000000000000
Oct 27 21:30:57 Saturn kernel: [41976.366260] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 27 21:30:57 Saturn kernel: [41976.366261] CR2: 00007fb4afd794d0 CR3: 00000003c4e84003 CR4: 00000000001606e0
Oct 27 21:30:57 Saturn kernel: [41976.366262] Call Trace:
Oct 27 21:30:57 Saturn kernel: [41976.366265] <IRQ>
Oct 27 21:30:57 Saturn kernel: [41976.366269] ? pfifo_fast_enqueue+0x120/0x120
Oct 27 21:30:57 Saturn kernel: [41976.366273] call_timer_fn+0x30/0x130
Oct 27 21:30:57 Saturn kernel: [41976.366277] run_timer_softirq+0x3e4/0x420
Oct 27 21:30:57 Saturn kernel: [41976.366279] ? ktime_get+0x40/0xa0
Oct 27 21:30:57 Saturn kernel: [41976.366282] ? lapic_next_deadline+0x26/0x30
Oct 27 21:30:57 Saturn kernel: [41976.366285] ? clockevents_program_event+0x93/0xf0
Oct 27 21:30:57 Saturn kernel: [41976.366290] __do_softirq+0xdc/0x2f3
Oct 27 21:30:57 Saturn kernel: [41976.366294] irq_exit+0xc0/0xd0
Oct 27 21:30:57 Saturn kernel: [41976.366297] smp_apic_timer_interrupt+0x79/0x140
Oct 27 21:30:57 Saturn kernel: [41976.366300] apic_timer_interrupt+0xf/0x20
Oct 27 21:30:57 Saturn kernel: [41976.366301] </IRQ>
Oct 27 21:30:57 Saturn kernel: [41976.366305] RIP: 0010:cpuidle_enter_state+0xbd/0x450
Oct 27 21:30:57 Saturn kernel: [41976.366307] Code: ff e8 b7 9c 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 4a cd 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 cf 01 00 00 41 c7 44 24 08 00 00 00 00 48 83 c4 18
Oct 27 21:30:57 Saturn kernel: [41976.366308] RSP: 0018:ffffb60f4193be60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Oct 27 21:30:57 Saturn kernel: [41976.366310] RAX: ffff97560f923200 RBX: ffffffffb5b53e20 RCX: 000000000000001f
Oct 27 21:30:57 Saturn kernel: [41976.366311] RDX: 0000262d62648439 RSI: 0000000025a5a65f RDI: 0000000000000000
Oct 27 21:30:57 Saturn kernel: [41976.366312] RBP: ffffb60f4193bea0 R08: 0000000000000002 R09: 0000000000022ac0
Oct 27 21:30:57 Saturn kernel: [41976.366313] R10: 000081ea9085bf94 R11: ffff97560f922084 R12: ffff97560f92e100
Oct 27 21:30:57 Saturn kernel: [41976.366314] R13: 0000000000000002 R14: ffffffffb5b53ef8 R15: ffffffffb5b53ee0
Oct 27 21:30:57 Saturn kernel: [41976.366318] cpuidle_enter+0x17/0x20
Oct 27 21:30:57 Saturn kernel: [41976.366321] call_cpuidle+0x23/0x40
Oct 27 21:30:57 Saturn kernel: [41976.366323] do_idle+0x23a/0x280
Oct 27 21:30:57 Saturn kernel: [41976.366326] cpu_startup_entry+0x1d/0x20
Oct 27 21:30:57 Saturn kernel: [41976.366330] start_secondary+0x1ab/0x200
Oct 27 21:30:57 Saturn kernel: [41976.366334] secondary_startup_64+0xa4/0xb0
Oct 27 21:30:57 Saturn kernel: [41976.366336] ---[ end trace e85dcae9ecb25671 ]---


The last piece of the log that looks useful is this:
Code:Oct 27 21:30:57 Saturn kernel: [41976.366147] ------------[ cut here ]------------
Oct 27 21:30:57 Saturn kernel: [41976.366153] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
Oct 27 21:30:57 Saturn kernel: [41976.366177] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:466 dev_watchdog+0x221/0x230
Oct 27 21:30:57 Saturn kernel: [41976.366178] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport xt_nat xt_tcpudp veth xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc cfg80211 aufs overlay intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi aesni_intel aes_x86_64 snd_hda_codec_realtek crypto_simd cryptd glue_helper snd_hda_codec_generic intel_cstate ledtrig_audio intel_rapl_perf i915 eeepc_wmi asus_wmi input_leds sparse_keymap wmi_bmof kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm snd_hda_intel irqbypass serio_raw snd_hda_codec drm_kms_helper snd_hda_core drm snd_hwdep snd_pcm snd_timer i2c_algo_bit fb_sys_fops snd syscopyarea sysfillrect sysimgblt soundcore ie31200_edac mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456
Oct 27 21:30:57 Saturn kernel: [41976.366226] async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear r8169 ahci realtek i2c_i801 lpc_ich libahci wmi video
Oct 27 21:30:57 Saturn kernel: [41976.366243] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 5.0.0-32-generic #34-Ubuntu
Oct 27 21:30:57 Saturn kernel: [41976.366244] Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 9012 09/18/2018
Oct 27 21:30:57 Saturn kernel: [41976.366247] RIP: 0010:dev_watchdog+0x221/0x230
Oct 27 21:30:57 Saturn kernel: [41976.366249] Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 cc c7 ef 00 01 e8 63 2e fc ff 89 d9 4c 89 ee 48 c7 c7 f0 c1 7c b5 48 89 c2 e8 f1 38 79 ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
Oct 27 21:30:57 Saturn kernel: [41976.366251] RSP: 0018:ffff97560f903e68 EFLAGS: 00010286
Oct 27 21:30:57 Saturn kernel: [41976.366252] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
Oct 27 21:30:57 Saturn kernel: [41976.366254] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff97560f916440
Oct 27 21:30:57 Saturn kernel: [41976.366255] RBP: ffff97560f903e98 R08: 0000000000000001 R09: 00000000000003db
Oct 27 21:30:57 Saturn kernel: [41976.366256] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000001
Oct 27 21:30:57 Saturn kernel: [41976.366257] R13: ffff9756033a0000 R14: ffff9756033a04c0 R15: ffff9756033d5680
Oct 27 21:30:57 Saturn kernel: [41976.366259] FS: 0000000000000000(0000) GS:ffff97560f900000(0000) knlGS:0000000000000000
Oct 27 21:30:57 Saturn kernel: [41976.366260] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 27 21:30:57 Saturn kernel: [41976.366261] CR2: 00007fb4afd794d0 CR3: 00000003c4e84003 CR4: 00000000001606e0
Oct 27 21:30:57 Saturn kernel: [41976.366262] Call Trace:
Oct 27 21:30:57 Saturn kernel: [41976.366265] <IRQ>
Oct 27 21:30:57 Saturn kernel: [41976.366269] ? pfifo_fast_enqueue+0x120/0x120
Oct 27 21:30:57 Saturn kernel: [41976.366273] call_timer_fn+0x30/0x130
Oct 27 21:30:57 Saturn kernel: [41976.366277] run_timer_softirq+0x3e4/0x420
Oct 27 21:30:57 Saturn kernel: [41976.366279] ? ktime_get+0x40/0xa0
Oct 27 21:30:57 Saturn kernel: [41976.366282] ? lapic_next_deadline+0x26/0x30
Oct 27 21:30:57 Saturn kernel: [41976.366285] ? clockevents_program_event+0x93/0xf0
Oct 27 21:30:57 Saturn kernel: [41976.366290] __do_softirq+0xdc/0x2f3
Oct 27 21:30:57 Saturn kernel: [41976.366294] irq_exit+0xc0/0xd0
Oct 27 21:30:57 Saturn kernel: [41976.366297] smp_apic_timer_interrupt+0x79/0x140
Oct 27 21:30:57 Saturn kernel: [41976.366300] apic_timer_interrupt+0xf/0x20
Oct 27 21:30:57 Saturn kernel: [41976.366301] </IRQ>
Oct 27 21:30:57 Saturn kernel: [41976.366305] RIP: 0010:cpuidle_enter_state+0xbd/0x450
Oct 27 21:30:57 Saturn kernel: [41976.366307] Code: ff e8 b7 9c 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 4a cd 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 cf 01 00 00 41 c7 44 24 08 00 00 00 00 48 83 c4 18
Oct 27 21:30:57 Saturn kernel: [41976.366308] RSP: 0018:ffffb60f4193be60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Oct 27 21:30:57 Saturn kernel: [41976.366310] RAX: ffff97560f923200 RBX: ffffffffb5b53e20 RCX: 000000000000001f
Oct 27 21:30:57 Saturn kernel: [41976.366311] RDX: 0000262d62648439 RSI: 0000000025a5a65f RDI: 0000000000000000
Oct 27 21:30:57 Saturn kernel: [41976.366312] RBP: ffffb60f4193bea0 R08: 0000000000000002 R09: 0000000000022ac0
Oct 27 21:30:57 Saturn kernel: [41976.366313] R10: 000081ea9085bf94 R11: ffff97560f922084 R12: ffff97560f92e100
Oct 27 21:30:57 Saturn kernel: [41976.366314] R13: 0000000000000002 R14: ffffffffb5b53ef8 R15: ffffffffb5b53ee0
Oct 27 21:30:57 Saturn kernel: [41976.366318] cpuidle_enter+0x17/0x20
Oct 27 21:30:57 Saturn kernel: [41976.366321] call_cpuidle+0x23/0x40
Oct 27 21:30:57 Saturn kernel: [41976.366323] do_idle+0x23a/0x280
Oct 27 21:30:57 Saturn kernel: [41976.366326] cpu_startup_entry+0x1d/0x20
Oct 27 21:30:57 Saturn kernel: [41976.366330] start_secondary+0x1ab/0x200
Oct 27 21:30:57 Saturn kernel: [41976.366334] secondary_startup_64+0xa4/0xb0
Oct 27 21:30:57 Saturn kernel: [41976.366336] ---[ end trace e85dcae9ecb25671 ]---