lock in libpthread occurs only on one Arch installation only with gcc-fortran

mostlyharmless

from LinuxQuestions.org on 2020-11-27 15:05 (#5AY7A)

So, I have an unusual problem I first thought was a gfortran compiler bug. As far as can tell, however, it is specific to one machine's Arch install, and I can't reproduce it on another with Debian, or Manjaro, with the same kernel and compiler...(else I'd report it on GCC bugzilla)

So I'm posting it in General; it's a puzzle!

Using either 5.7.x, 5.8.x, or 5.9.x kernels on Arch and GNU Fortran (GCC) 10.2.0, we have a program calling a function as part of a write statement, where the function also has a write statement.

Code:PROGRAM bugs
USE badwrite
x=AC(0)
write(*,*) 'x: ',x ! this works
write (*,*) '0: ',AC(0) !this does not
STOP
END

MODULE badwrite
CONTAINS
function AC(m2) result(c)
INTEGER,INTENT(IN) :: m2
write(*,*) m2 !killer statement with lapack or other linked library
c = m2+3
end function AC
END MODULE badwritecompiled with
gfortran -c -llapack badwrite.f90
gfortran -llapack badwrite.f90 bugs.f90

should result in
0
x: 3.0000000
0: 0
3.00000000

However with the -llapack library (or the blas library, and possibly other external libraries)
the result is
0
x: 3.000000

(program hangs here)

Adding the -ggdb flag, running in gdb and interrupting with ^C results in
Code:
Starting program: /home/me/build/bugs.lib90/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0
x: 3.00000000
^C
Program received signal SIGINT, Interrupt.
0x00007ffff6baddb0 in __lll_lock_wait () from /usr/lib/libpthread.so.0Apparently the two writes are deadlocked?

Conditions:

(1) Only happens with an external library linked in, so

gfortran -c badwrite.f90
gfortran badwrite.f90 bugs.f90

runs normally without hanging

(2) Removing the write statement in the function AC also removes the hang.

(3) Separating the writes, as in the first statement with x=AC(0) then write(*,*) x, removes the deadlock/hang

(4) Another machine running the same kernel/gfortran version under Manjaro does not have the hang

(5) The problem does not occur with pgfortran (aka nvfortran) 20.7-0 LLVM on the machine in question.

(6) Changing kernels on the same machine does not solve the problem.

(7) Reinstalling packages with pacman, rebooting does not solve the problem.

In conclusion, it does seem to be a gcc-fortran bug with a race condition, but what triggers it on this machine is beyond me. I'd rather not reinstall the whole system, which is otherwise working perfectly.

Any ideas? I'm going to boot a live version of Manjaro on this hardware to see if it's a weird CPU bug.

latest?i=thHE6YpM6D4:-X24SU-Colw:F7zBnMy

latest?i=thHE6YpM6D4:-X24SU-Colw:V_sGLiP

latest?i=thHE6YpM6D4:-X24SU-Colw:gIN9vFw

Source	RSS or Atom Feed
Feed Location	https://feeds.feedburner.com/linuxquestions/latest
Feed Title	LinuxQuestions.org
Feed Link	https://www.linuxquestions.org/questions/