Hardlocks in Handbrake/video operations
by obobskivich from LinuxQuestions.org on (#58C63)
Let me preface this by saying: this is probably a bad motherboard or somesuch, based on the troubleshooting I've done, but I'd just love a sanity check of where I'm at with this...
Alright, so I have a pair of basically identical machines with AMD FX-9590s, we'll call them #1 and #2. The only material differences are:
#1 only has 8GB of RAM, #1 has a GeForce card, #1 has a somewhat inferior CPU cooler.
I've been playing around with Handbrake and other video encoding/editing apps recently and found #1 to be a decent performer for this - it has 8 cores, and the GeForce card also supports NVenc which runs pretty fast - sure it isn't my 32 core workstation, but that's probably an unfair comparison. #1 will sit and run more or less as much as I want video editing/encoding/transcoding/rendering/whatever - it will sit at 100% CPU load, CPU in the high 60* C range, and be fine.
#2 will run at a few degrees lower CPU temperature, and will consistently (As in you can bet on it) hardlock on the desktop 100% unresponsive where the only fix is to kill AC power.
I have tried:
- Swapping PSU
- Swapping RAM (have tried both the 'fancy' RAM it originally had when setup as a gaming box, and some plain-jane Dell-branded DDR3-1333 RAM that's generally very stable in anything)
- Swapping hard disk(s)
- Re-installing and changing distros (so it's tried both Slackware 14.2 x64 and Xubuntu 20.04 LTS)
- Swapping graphics cards (it has a Radeon in it right now)
- Changing case layout to improve airflow/cooling thinking it was a heat issue for the CPU
- Playing with BIOS settings to set more aggressive fan profile, laxer RAM timings, etc (trying to improve stability)
#2 will run CPU stress tests on the bench (100% load x8 threads) just fine, will load up and run games (I've tried Half-Life 2 and Portal) just fine, will decode media just fine, handles web browsing just fine, etc.
At this point I'm thinking it's probably just this motherboard's time - this machine previously ran as a gaming box for a few years, and who knows what before that (I bought the board second hand), but given that it only hardlocks like this when touching video it makes me a bit curious. As it is, I can't trust a machine that will randomly hardlock for anything work related, so while it seems to do 'other tasks' okay, since I can't prove its an issue with a software install vs an issue with hardware, I have to assume the machine is at fault (especially given all the above trouble-shooting).
Does anyone have any other thoughts on this perhaps? Anything I might be missing? I've checked every setting/library/configuration I can against the two - and as far as I can see #2 is 1:1 with #1 (and my big workstation, which is different enough I figure it isn't as fair of a comparison, but nonetheless the settings in userland all look to be the same) by this point (they're all running Xubuntu 20.04 right now, again mostly to try and troubleshoot #2 here).
And finally, the one troubleshooting thing I really have no interest in doing: pulling the CPUs out of #1 and #2 and swapping them. The heatsinks on these chips are very big and very obnoxious to install/remove, and I also don't want to put a known working chip (from a known working machine) into a potentially bad motherboard (and end up with, potentially, two bad CPUs).


Alright, so I have a pair of basically identical machines with AMD FX-9590s, we'll call them #1 and #2. The only material differences are:
#1 only has 8GB of RAM, #1 has a GeForce card, #1 has a somewhat inferior CPU cooler.
I've been playing around with Handbrake and other video encoding/editing apps recently and found #1 to be a decent performer for this - it has 8 cores, and the GeForce card also supports NVenc which runs pretty fast - sure it isn't my 32 core workstation, but that's probably an unfair comparison. #1 will sit and run more or less as much as I want video editing/encoding/transcoding/rendering/whatever - it will sit at 100% CPU load, CPU in the high 60* C range, and be fine.
#2 will run at a few degrees lower CPU temperature, and will consistently (As in you can bet on it) hardlock on the desktop 100% unresponsive where the only fix is to kill AC power.
I have tried:
- Swapping PSU
- Swapping RAM (have tried both the 'fancy' RAM it originally had when setup as a gaming box, and some plain-jane Dell-branded DDR3-1333 RAM that's generally very stable in anything)
- Swapping hard disk(s)
- Re-installing and changing distros (so it's tried both Slackware 14.2 x64 and Xubuntu 20.04 LTS)
- Swapping graphics cards (it has a Radeon in it right now)
- Changing case layout to improve airflow/cooling thinking it was a heat issue for the CPU
- Playing with BIOS settings to set more aggressive fan profile, laxer RAM timings, etc (trying to improve stability)
#2 will run CPU stress tests on the bench (100% load x8 threads) just fine, will load up and run games (I've tried Half-Life 2 and Portal) just fine, will decode media just fine, handles web browsing just fine, etc.
At this point I'm thinking it's probably just this motherboard's time - this machine previously ran as a gaming box for a few years, and who knows what before that (I bought the board second hand), but given that it only hardlocks like this when touching video it makes me a bit curious. As it is, I can't trust a machine that will randomly hardlock for anything work related, so while it seems to do 'other tasks' okay, since I can't prove its an issue with a software install vs an issue with hardware, I have to assume the machine is at fault (especially given all the above trouble-shooting).
Does anyone have any other thoughts on this perhaps? Anything I might be missing? I've checked every setting/library/configuration I can against the two - and as far as I can see #2 is 1:1 with #1 (and my big workstation, which is different enough I figure it isn't as fair of a comparison, but nonetheless the settings in userland all look to be the same) by this point (they're all running Xubuntu 20.04 right now, again mostly to try and troubleshoot #2 here).
And finally, the one troubleshooting thing I really have no interest in doing: pulling the CPUs out of #1 and #2 and swapping them. The heatsinks on these chips are very big and very obnoxious to install/remove, and I also don't want to put a known working chip (from a known working machine) into a potentially bad motherboard (and end up with, potentially, two bad CPUs).