lock in libpthread occurs only on one Arch installation only with gcc-fortran
by mostlyharmless from LinuxQuestions.org on (#5AY7A)
So, I have an unusual problem I first thought was a gfortran compiler bug. As far as can tell, however, it is specific to one machine's Arch install, and I can't reproduce it on another with Debian, or Manjaro, with the same kernel and compiler...(else I'd report it on GCC bugzilla)
So I'm posting it in General; it's a puzzle!
Using either 5.7.x, 5.8.x, or 5.9.x kernels on Arch and GNU Fortran (GCC) 10.2.0, we have a program calling a function as part of a write statement, where the function also has a write statement.
Code:PROGRAM bugs
USE badwrite
x=AC(0)
write(*,*) 'x: ',x ! this works
write (*,*) '0: ',AC(0) !this does not
STOP
END
MODULE badwrite
CONTAINS
function AC(m2) result(c)
INTEGER,INTENT(IN) :: m2
write(*,*) m2 !killer statement with lapack or other linked library
c = m2+3
end function AC
END MODULE badwritecompiled with
gfortran -c -llapack badwrite.f90
gfortran -llapack badwrite.f90 bugs.f90
should result in
0
x: 3.0000000
0: 0
3.00000000
However with the -llapack library (or the blas library, and possibly other external libraries)
the result is
0
x: 3.000000
(program hangs here)
Adding the -ggdb flag, running in gdb and interrupting with ^C results in
Code:
Starting program: /home/me/build/bugs.lib90/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0
x: 3.00000000
^C
Program received signal SIGINT, Interrupt.
0x00007ffff6baddb0 in __lll_lock_wait () from /usr/lib/libpthread.so.0Apparently the two writes are deadlocked?
Conditions:
(1) Only happens with an external library linked in, so
gfortran -c badwrite.f90
gfortran badwrite.f90 bugs.f90
runs normally without hanging
(2) Removing the write statement in the function AC also removes the hang.
(3) Separating the writes, as in the first statement with x=AC(0) then write(*,*) x, removes the deadlock/hang
(4) Another machine running the same kernel/gfortran version under Manjaro does not have the hang
(5) The problem does not occur with pgfortran (aka nvfortran) 20.7-0 LLVM on the machine in question.
(6) Changing kernels on the same machine does not solve the problem.
(7) Reinstalling packages with pacman, rebooting does not solve the problem.
In conclusion, it does seem to be a gcc-fortran bug with a race condition, but what triggers it on this machine is beyond me. I'd rather not reinstall the whole system, which is otherwise working perfectly.
Any ideas? I'm going to boot a live version of Manjaro on this hardware to see if it's a weird CPU bug.


So I'm posting it in General; it's a puzzle!
Using either 5.7.x, 5.8.x, or 5.9.x kernels on Arch and GNU Fortran (GCC) 10.2.0, we have a program calling a function as part of a write statement, where the function also has a write statement.
Code:PROGRAM bugs
USE badwrite
x=AC(0)
write(*,*) 'x: ',x ! this works
write (*,*) '0: ',AC(0) !this does not
STOP
END
MODULE badwrite
CONTAINS
function AC(m2) result(c)
INTEGER,INTENT(IN) :: m2
write(*,*) m2 !killer statement with lapack or other linked library
c = m2+3
end function AC
END MODULE badwritecompiled with
gfortran -c -llapack badwrite.f90
gfortran -llapack badwrite.f90 bugs.f90
should result in
0
x: 3.0000000
0: 0
3.00000000
However with the -llapack library (or the blas library, and possibly other external libraries)
the result is
0
x: 3.000000
(program hangs here)
Adding the -ggdb flag, running in gdb and interrupting with ^C results in
Code:
Starting program: /home/me/build/bugs.lib90/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0
x: 3.00000000
^C
Program received signal SIGINT, Interrupt.
0x00007ffff6baddb0 in __lll_lock_wait () from /usr/lib/libpthread.so.0Apparently the two writes are deadlocked?
Conditions:
(1) Only happens with an external library linked in, so
gfortran -c badwrite.f90
gfortran badwrite.f90 bugs.f90
runs normally without hanging
(2) Removing the write statement in the function AC also removes the hang.
(3) Separating the writes, as in the first statement with x=AC(0) then write(*,*) x, removes the deadlock/hang
(4) Another machine running the same kernel/gfortran version under Manjaro does not have the hang
(5) The problem does not occur with pgfortran (aka nvfortran) 20.7-0 LLVM on the machine in question.
(6) Changing kernels on the same machine does not solve the problem.
(7) Reinstalling packages with pacman, rebooting does not solve the problem.
In conclusion, it does seem to be a gcc-fortran bug with a race condition, but what triggers it on this machine is beyond me. I'd rather not reinstall the whole system, which is otherwise working perfectly.
Any ideas? I'm going to boot a live version of Manjaro on this hardware to see if it's a weird CPU bug.