core handling!

h2-1

from LinuxQuestions.org on 2024-08-18 00:07 (#6Q228)

Any testers for this appreciated, this is another big CPU core refactor, but not as big as the last one we did here, it builds on it, and basically completes the logic I had a sneaking suspicion would be necessary on some topologies, but since no failure examples were found here, I decided to not make the more robust solution because I had no real data samples for it.

if you have pinxi installed, then:
Code:pinxi -Uif not, then (I find it easiest long term to pop inxi into /usr/local/bin, then chown it to my user for easy updates):
Code:wget -O pinxi smxi.org/pinxi && chmod +x pinxiIf you want some background on the cluster/die things, there's a good linux kernel patch with a 2021 discussion on adding clusters to topology here:
https://patchwork.kernel.org/project....com/#24849767

This goes with the 2020 dies explanation.
https://listman.redhat.com/archives/.../msg00277.html

This was started by a bug report here:
https://codeberg.org/smxi/inxi/issue...omment-2180996

where a RISC-V 8 core cpu was reporting as 4 core MT multithreaded.

To truly fix this required refactoring a bunch of the core (parrdon the pun) logic for CPU /sys data parsing.

The new data structure roughly emulates the physical cpu:

physical_id [> die_id] [> cluster_id] > core_id

Note that die and cluster ids are not necessarily there, in fact, that's one of the things that the above bug report found, a case where no die_id, and 2 clusters, with repeating core_ids per cluster.

I suspect an earlier AMD threadripper 2 die CPU wrong core count had a similar issue, though that person never supplied the required data so I had to just store it as possible, but when I got the new risc-v issue with the same symptom I knew the hack I had used for core counting was not robust enough for all scenarios.

Because the logic is now much more reliable, pinxi/inxi will always show if present the die and cluster counts, but it will not try to synthesize either.

These are using debugger sourced data.

The Raptor Lake is odd, it has one cluster per performance MT core, each I assume has its own L2 cache, which is very roughly what defines a cluster. Then it has 2 clusters with 4 efficiency cores.

The RISC-V has 2 clusters of 4 cores, 1 L2 per cluster, and no die_id, which actually follows the suggestion of the 2020 redhat thread.

The AMD Epyc has 2 cpus, with 4 dies each, and 1 cluster per die.
Code:pinxi -Ca --zv --vs
pinxi 3.3.35-08 (2024-08-17)
CPU:
Info: model: 13th Gen Intel Core i5-1345U bits: 64 type: MST AMCP
arch: Raptor Lake level: v3 note: check built: 2022+ process: Intel 7 (10nm)
family: 6 model-id: 0xBA (186) stepping: 3 microcode: 0x411C
Topology: cpus: 1x dies: 1 clusters: 4 cores: 10 threads: 12 mt: 2 tpc: 2
st: 8 smt: enabled cache: L1: 928 KiB desc: d-8x32 KiB, 2x48 KiB; i-2x32
KiB, 8x64 KiB L2: 6.5 MiB desc: 2x1.2 MiB, 2x2 MiB L3: 12 MiB
desc: 1x12 MiB
Speed (MHz): avg: 1535 high: 2820 min/max: 400/4700:3500 scaling:
driver: intel_pstate governor: powersave cores: 1: 0 2: 400 3: 429 4: 926
5: 1244 6: 1139 7: 2680 8: 1021 9: 2582 10: 2744 11: 2820 12: 2445
bogomips: 59904
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities: <filter>

# this is the RISC-V that started it all:
pinxi -Ca --zv --vs
pinxi 3.3.35-08 (2024-08-17)
CPU:
Info: model: Spacemit X60 bits: 64 type: MCP arch: N/A family: N/A
model-id: N/A microcode: N/A
Topology: cpus: 1x clusters: 2 cores: 8 smt: <unsupported> cache:
L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 1024 KiB desc: 2x512 KiB
Speed (MHz): avg: 1600 min/max: 614/1600 scaling: driver: cpufreq-dt
governor: performance cores: 1: 1600 2: 1600 3: 1600 4: 1600 5: 1600 6: 1600
7: 1600 8: 1600 bogomips: N/A
Flags: N/A
Vulnerabilities: No CPU vulnerability/bugs data available.

pinxi -Ca --zv --vs
pinxi 3.3.35-08 (2024-08-17)
CPU:
Info: model: AMD EPYC 7281 bits: 64 type: MT MCP SMP arch: Zen level: v3
note: check built: 2017-19 process: GF 14nm family: 0x17 (23) model-id: 1
stepping: 2 microcode: 0x8001230
Topology: cpus: 2x dies: 4 clusters: 4x1 cores: 16 threads: 32 tpc: 2
smt: enabled cache: L1: 2x 1.5 MiB (3 MiB) desc: d-16x32 KiB; i-16x64 KiB
L2: 2x 8 MiB (16 MiB) desc: 16x512 KiB L3: 2x 32 MiB (64 MiB)
desc: 8x4 MiB
Speed (MHz): avg: 1227 high: 2695 min/max: 1200/2100 boost: enabled
scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 0 2: 1200
3: 1200 4: 1200 5: 1200 6: 1200 7: 1200 8: 1200 9: 1200 10: 1200 11: 1200
12: 1200 13: 1200 14: 1200 15: 1200 16: 1200 17: 1200 18: 1200 19: 1200
20: 1200 21: 1200 22: 1200 23: 1200 24: 1200 25: 1200 26: 1200 27: 1200
28: 1200 29: 1200 30: 1200 31: 1200 32: 1200 33: 1200 34: 1200 35: 1700
36: 1200 37: 1200 38: 1200 39: 1200 40: 1197 41: 1200 42: 1200 43: 1200
44: 1200 45: 1200 46: 1200 47: 1200 48: 1200 49: 1197 50: 1200 51: 1700
52: 1200 53: 1200 54: 1200 55: 1200 56: 1200 57: 1200 58: 1700 59: 1200
60: 1200 61: 1200 62: 1200 63: 2695 64: 1200 bogomips: 267820
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>And these are some live systems:
Code:# Desktop
pinxi -Ca --zv
CPU:
Info: model: AMD Ryzen 5 2600 bits: 64 type: MT MCP arch: Zen+ gen: 2
level: v3 note: check built: 2018-21 process: GF 12nm family: 0x17 (23)
model-id: 8 stepping: 2 microcode: 0x800820D
Topology: cpus: 1x dies: 1 clusters: 1 cores: 6 threads: 12 tpc: 2
smt: enabled cache: L1: 576 KiB desc: d-6x32 KiB; i-6x64 KiB L2: 3 MiB
desc: 6x512 KiB L3: 16 MiB desc: 2x8 MiB
Speed (MHz): avg: 1419 min/max: 1550/3400 boost: enabled scaling:
driver: acpi-cpufreq governor: ondemand cores: 1: 1419 2: 1419 3: 1419
4: 1419 5: 1419 6: 1419 7: 1419 8: 1419 9: 1419 10: 1419 11: 1419 12: 1419
bogomips: 81599
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>

# Xeon server
pinxi -Ca --zv
CPU:
Info: model: Intel Xeon E5-2620 0 bits: 64 type: MCP SMP arch: Sandy Bridge level: v2
built: 2010-12 process: Intel 32nm family: 6 model-id: 0x2D (45) stepping: 7 microcode: 0x71A
Topology: cpus: 2x dies: 1 cores: 6 smt: disabled cache: L1: 2x 384 KiB (768 KiB)
desc: d-6x32 KiB; i-6x32 KiB L2: 2x 1.5 MiB (3 MiB) desc: 6x256 KiB L3: 2x 15 MiB (30 MiB)
desc: 1x15 MiB
Speed (MHz): avg: 2312 high: 2324 min/max: 1200/2500 scaling: driver: intel_cpufreq
governor: ondemand cores: 1: 2300 2: 2300 3: 2300 4: 2300 5: 2300 6: 2300 7: 2324 8: 2324
9: 2324 10: 2324 11: 2324 12: 2324 bogomips: 48045
Flags: avx ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities: <filter>

# VPS Server
pinxi -Ca --zv
CPU:
Info: model: QEMU Virtual version 1.0 bits: 64 type: SMP arch: K7 level: v1 built: 2000-01
process: AMD 180nm family: 6 model-id: 2 stepping: 3 microcode: 0x1000065
Topology: cpus: 4x dies: 1 cores: 1 smt: <unsupported> cache: L1: 4x 128 KiB (512 KiB)
desc: d-1x64 KiB; i-1x64 KiB L2: 4x 512 KiB (2 MiB) desc: 1x512 KiB
Speed (MHz): avg: 2200 min/max: N/A cores: 1: 2200 2: 2200 3: 2200 4: 2200 bogomips: 17599
Flags: lm nx pae sse sse2 sse3
Vulnerabilities: <filter>This took a few iterations to hopefully get right, it's tricky handling the case of no die-id / cluster_id (older systems), die_id no cluster_id, cluster_id no die_id, and die_id and cluster_id.

If you want to see the data uses and the structures:
Code:pinxi --vs --zv -Ca --dbg 8,39,68If you find a CPU where the results are wrong, or you just want to see the raw data used to generate these results:

Code:for i in $(ls /sys/devices/system/cpu/{cpu*/topology,cpu*/cpufreq,cpu*/cache/index*,smt}/*);do echo -n "$i::";cat $i;done > cpu-sys.txt
cat /proc/cpuinfo > cpuinfo.txtgenerates the required /sys files and the cpuinfo copy.

Older kernels had very little data, and the CPUs I assume were basically mostly all 1 die.

These results of course are not quaranteed since they depend on how the CPU decides to report itself to the Linux kernel.

2005-2008 era Linux kernel 2.4, 2.6 early, had none of this, but it starts trickling in reasonably soon after.

The previous logic really just had dies tacked on as a crude counter hack, but it was not robust.

I had to go all the way because I considered the case of:

1 physical with 2 dies, each containing 2 or more clusters, but each cluster repeating the cpu core_ids per die or per cluster, so the counters had to handle this most complex possible case as well as the simplest.

I have not found any multi die test systems, I think I used to have access to one, but that went away recently. So that's the one most valuable re seeing if the new logic works, and if anyone has such a cpu, the two debugger files would be very welcome so I can add those pairs to the pair debugger, which is fairly critical for testing this stuff.

I'll have to add some of the multi cpu systems to the debugger as well since those had the last bugs, just fixed now.

Note that if there is a > 1 die CPU with > 1 clusters per die, that should show as:
clusters: 2x4, aka, 2 dies each with 4 clusters. This was the cleanest and most robust solution I could think of since then it does not matter if the cluster_ids repeat per die or not. This was the case that made me decide to go all in one the deeper data structure since if someone had that, it would again have broken my previous fixed logic.

Thanks for looking

Source	RSS or Atom Feed
Feed Location	https://feeds.feedburner.com/linuxquestions/latest
Feed Title	LinuxQuestions.org
Feed Link	https://www.linuxquestions.org/questions/