[SOLVED] bad parallel-processing gain on some linux machines, good on others
by cmspooner from LinuxQuestions.org on (#59PYE)
I'm measuring the run time for a complex signal-processing algorithm written in C on several different computers, all running linux. The code is partially parallelized by some of my own methods using the pthreads library, and by the parallel processing afforded by the FFTW package. Three of the four computers show expected decreases in execution time as the parallelism factor P increases from 1 to the number of physical cores on the machine. The problem is that one of the machines shows the opposite behavior: Going from P=1 to P=2 yields an increase in execution time.
The code is identical on the four machines, but the hardware and the OSs are not. Hyperthreading is on. All have large RAMs.
toren: Ubuntu 18.04.5, 4.15.0-122-generic, 4 physical cores, Xeon(R) CPU E3-1535M v5 @ 2.90GHz, gcc version 7.5
cmspooner@toren.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 1.025756e+01
P = 2 ssca2: elapsed PROGRAM TIME 7.500780e+00
P = 4 ssca2: elapsed PROGRAM TIME 6.246292e+00
P = 8 ssca2: elapsed PROGRAM TIME 5.483851e+00
P = 16 ssca2: elapsed PROGRAM TIME 7.457973e+00
P = 32 ssca2: elapsed PROGRAM TIME 8.497999e+00
P = 64 ssca2: elapsed PROGRAM TIME 1.035557e+01
twelve: Fedora Core 31, 5.8.15-101.fc31.x86_64, 12 physical cores, Xeon(R) CPU X5650 @ 2.67GHz, gcc version 9.3
cmspooner@twelve.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 1.833605e+01
P = 2 ssca2: elapsed PROGRAM TIME 1.171951e+01
P = 4 ssca2: elapsed PROGRAM TIME 8.526402e+00
P = 8 ssca2: elapsed PROGRAM TIME 7.255262e+00
P = 16 ssca2: elapsed PROGRAM TIME 1.268390e+01
P = 32 ssca2: elapsed PROGRAM TIME 1.188247e+01
P = 64 ssca2: elapsed PROGRAM TIME 1.417718e+01
flash: Fedora Core 31, 5.8.15-101.fc31.x86_64, 28 physical cores, Xeon(R) CPU E5-2697 v3 @
2.60GHz, gcc version 9.3
cmspooner@flash.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 1.462457e+01
P = 2 ssca2: elapsed PROGRAM TIME 1.066421e+01
P = 4 ssca2: elapsed PROGRAM TIME 8.244503e+00
P = 8 ssca2: elapsed PROGRAM TIME 7.694590e+00
P = 16 ssca2: elapsed PROGRAM TIME 1.030944e+01
P = 32 ssca2: elapsed PROGRAM TIME 1.079129e+01
P = 64 ssca2: elapsed PROGRAM TIME 1.413906e+01
barra: Ubuntu 18.04.4, 5.4.0-52-generic #57~18.04.1-Ubuntu, 8 physical cores, Xeon(R) W-3225 CPU @ 3.70GHz, gcc version 7.5
cmspooner@barra.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 8.891541e+00
P = 2 ssca2: elapsed PROGRAM TIME 1.006297e+01
P = 4 ssca2: elapsed PROGRAM TIME 8.619259e+00
P = 8 ssca2: elapsed PROGRAM TIME 7.233214e+00
P = 16 ssca2: elapsed PROGRAM TIME 7.765741e+00
P = 32 ssca2: elapsed PROGRAM TIME 8.301655e+00
P = 64 ssca2: elapsed PROGRAM TIME 1.114389e+01
I've tried reverting barra's kernel to 4.15, but the result is similar.
The code is identical on each, I use gcc to compile on each, with the same Makefile (compiler switches).
Any ideas about why barra does not behave like the others?
Thanks,
C


The code is identical on the four machines, but the hardware and the OSs are not. Hyperthreading is on. All have large RAMs.
toren: Ubuntu 18.04.5, 4.15.0-122-generic, 4 physical cores, Xeon(R) CPU E3-1535M v5 @ 2.90GHz, gcc version 7.5
cmspooner@toren.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 1.025756e+01
P = 2 ssca2: elapsed PROGRAM TIME 7.500780e+00
P = 4 ssca2: elapsed PROGRAM TIME 6.246292e+00
P = 8 ssca2: elapsed PROGRAM TIME 5.483851e+00
P = 16 ssca2: elapsed PROGRAM TIME 7.457973e+00
P = 32 ssca2: elapsed PROGRAM TIME 8.497999e+00
P = 64 ssca2: elapsed PROGRAM TIME 1.035557e+01
twelve: Fedora Core 31, 5.8.15-101.fc31.x86_64, 12 physical cores, Xeon(R) CPU X5650 @ 2.67GHz, gcc version 9.3
cmspooner@twelve.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 1.833605e+01
P = 2 ssca2: elapsed PROGRAM TIME 1.171951e+01
P = 4 ssca2: elapsed PROGRAM TIME 8.526402e+00
P = 8 ssca2: elapsed PROGRAM TIME 7.255262e+00
P = 16 ssca2: elapsed PROGRAM TIME 1.268390e+01
P = 32 ssca2: elapsed PROGRAM TIME 1.188247e+01
P = 64 ssca2: elapsed PROGRAM TIME 1.417718e+01
flash: Fedora Core 31, 5.8.15-101.fc31.x86_64, 28 physical cores, Xeon(R) CPU E5-2697 v3 @
2.60GHz, gcc version 9.3
cmspooner@flash.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 1.462457e+01
P = 2 ssca2: elapsed PROGRAM TIME 1.066421e+01
P = 4 ssca2: elapsed PROGRAM TIME 8.244503e+00
P = 8 ssca2: elapsed PROGRAM TIME 7.694590e+00
P = 16 ssca2: elapsed PROGRAM TIME 1.030944e+01
P = 32 ssca2: elapsed PROGRAM TIME 1.079129e+01
P = 64 ssca2: elapsed PROGRAM TIME 1.413906e+01
barra: Ubuntu 18.04.4, 5.4.0-52-generic #57~18.04.1-Ubuntu, 8 physical cores, Xeon(R) W-3225 CPU @ 3.70GHz, gcc version 7.5
cmspooner@barra.mry.nwra.com>./ssca2_parallelism
P = 1 ssca2: elapsed PROGRAM TIME 8.891541e+00
P = 2 ssca2: elapsed PROGRAM TIME 1.006297e+01
P = 4 ssca2: elapsed PROGRAM TIME 8.619259e+00
P = 8 ssca2: elapsed PROGRAM TIME 7.233214e+00
P = 16 ssca2: elapsed PROGRAM TIME 7.765741e+00
P = 32 ssca2: elapsed PROGRAM TIME 8.301655e+00
P = 64 ssca2: elapsed PROGRAM TIME 1.114389e+01
I've tried reverting barra's kernel to 4.15, but the result is similar.
The code is identical on each, I use gcc to compile on each, with the same Makefile (compiler switches).
Any ideas about why barra does not behave like the others?
Thanks,
C