Article 6HDC5 Nvidia-470-kernel issues with kernel-6.6 and above

Nvidia-470-kernel issues with kernel-6.6 and above

by
UrbanDesimator
from LinuxQuestions.org on (#6HDC5)
Nvidia kernel mod causing RIP's and Stack Traces on kernel-6.6 vanila and RT and above. Cured by replacing r8169 driver with r8168 version.

Bit of info I hope may help others on the nvidia-legacy470-kernel version 470.223.02 on kernel 6.6 and above.

The short answer (:-)) and my fix.
After finding it looked like issues with ASPM between r8169 and nvidia drivers after some experimentation I found by removing the r8169 driver and replacing it with r8168-8.052.01.tar.gz from https://github.com/mtorromeo/r8168.
And using thease module options in:
/lib/modprobe.d/r8168.conf
disable_wol_support=1 dynamic_aspm_packet_threshold=0 eee_enable=0 hwoptimize=1

It was only after trying different options that I added dynamic_aspm_packet_threshold=0 to the other 3 options that all the rips/stack traces stopped. I havent checked yet if only the dynamic_aspm*** option works on it's own or if it's the combination. Those options are not available with the r8169 driver. I am gouing to email the r8169 devs with my findings to see if they can determin if changes need to be made to there code or the option's re-enabled if present in there code.
Allways back up and settings or configs before making changes.
I hope this may help any one with similar issues.
UrbanMusic

Below more details of how I got to this fix.

I tried various patches and inttf-kernel-patcher.sh from,
https://nvidia.if-not-true-then-false.com/patcher/ and patches from github/slackbuilds. Nothing was curing the rips/stack traces some were RT related with scheduling while atomic others not related to RT kernel.
I tracked and traced the issue as they were happening sometimes a few minuets apart at most an hour. And found it looked like they were triggered by aspm and nvidia drivers apparent sensitivity to actions by other drivers to/with aspm.

I knew from experience that r8169 driver needed the pcie_aspm=force boot option to be able to disable aspm on my asus AMD sabertooth 990fx board. The nvidia driver wanted to have pcie_aspm=off boot option set which stopped the r8169 device working all together.

After much hunting and debuging my system and searching online I found no cure. opting for listing option's and trying each one.

The ./autorun.sh script in the r8168-8/052.02 pkg will take care of blacklisting the r8169 driver. If you find it doesn't help amd wish to change back blacklist the r8168 driver rename r8169 in /lib/modules/your kernel/kernel/drivers/net/realtek/r8169 to r8169.ko and the issue depmod -a you will then be able to load and use driver again.
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments