GlusterFS volumes fail to mount after client reboot

penyuan

from LinuxQuestions.org on 2019-09-21 07:48 (#4QVTZ)

Hello,

For the past several months, I've successfully set up two GlusterFS volumes with bricks on two nodes. These volumes could be easily mounted on a client system running RHEL 7 (3.10.0-1062.1.2.el7.x86_64). However, after a recent reboot of the client, the mount fails. The following are the details:

Both GlusterFS nodes are running fully-updated Armbian Linux 4.14.144-odroidxu4, with glusterfs-server version 3.13.2-1ubuntu1. The two nodes are on the same LAN with IPs 10.0.2.4 and 10.0.2.5, with hostnames "alboguttata" and "verrucosa" respectively. They seem to be running fine as always, and collectively host two volumes called "cyclorana0" and "cyclorana1".

Here's the output of the command "glusterfs volume info":

Code:Volume Name: cyclorana0
Type: Distribute
Volume ID: edbc9b23-6252-4725-9652-e46c280dae2b
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: alboguttata:/bricks/brick0
Brick2: verrucosa:/bricks/brick0
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Volume Name: cyclorana1
Type: Distribute
Volume ID: 7687a336-8708-4b91-abdd-4ef35a8c31c9
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: verrucosa:/bricks/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: onHere's the output of "gluster volume status":

Code:Status of volume: cyclorana0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick alboguttata:/bricks/brick0 49152 0 Y 2238
Brick verrucosa:/bricks/brick0 49152 0 Y 1944

Task Status of Volume cyclorana0
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: cyclorana1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick verrucosa:/bricks/brick1 49153 0 Y 1953

Task Status of Volume cyclorana1
------------------------------------------------------------------------------
There are no active volume tasksThe above reflects how I've set up the volumes and I don't see any problems.

Up until yesterday, I used /etc/fstab to automatically mount the two volumes on a RHEL 7-based client system upon boot. They are mounted within the home directory of one of the system's local users, let's call it [username] for now. There were no problems. Here are the fstab entries:

Code:10.0.2.5:/cyclorana0 /home/[username]/cyclorana0 glusterfs defaults,_netdev 0 0
10.0.2.5:/cyclorana1 /home/[username]/cyclorana1 glusterfs defaults,_netdev 0 0I was able to read/write to these mounted volumes easily with acceptable performance. This continued to work after several reboots of this client system, which uses glusterfs packages version 3.12.2.

However, after the rebooting the same RHEL-based client yesterday, the two volumes failed to mount. Here's what I found so far:

I tried to manually run "sudo mount -a" on the client only to see these errors:

Code:Mount failed. Please check the log file for more details.
Mount failed. Please check the log file for more details.To my knowledge, the log files should be in /var/log/glusterfs, and indeed I found two logs, one for each volume named "home-[username]-cyclorana*.log" where * would be 1 or 2 depending on which volume. At the end of the log files, I see entries corresponding to my failed attempt at "sudo mount -a". It looks like this:

Code:[2019-09-21 02:21:59.834507] I [MSGID: 100030] [glusterfsd.c:2646:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.2 (args: /usr/sbin/glusterfs --volfile-server=10.0.2.5 --volfile-id=/cyclorana1 /home/[username]/cyclorana1)
[2019-09-21 02:21:59.844494] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2019-09-21 02:21:59.863736] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-09-21 02:21:59.863931] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-09-21 02:21:59.873312] I [MSGID: 114020] [client.c:2361:notify] 0-cyclorana1-client-0: parent translators are ready, attempting connect on transport
[2019-09-21 02:21:59.883477] E [MSGID: 101075] [common-utils.c:482:gf_resolve_ip6] 0-resolver: getaddrinfo failed (family:2) (Name or service not known)
[2019-09-21 02:21:59.883561] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-cyclorana1-client-0: DNS resolution failed on host verrucosa
Final graph:
+------------------------------------------------------------------------------+
1: volume cyclorana1-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host verrucosa
5: option remote-subvolume /bricks/brick1
6: option transport-type socket
7: option transport.address-family inet
8: option transport.tcp-user-timeout 0
9: option transport.socket.keepalive-time 20
10: option transport.socket.keepalive-interval 2
11: option transport.socket.keepalive-count 9
12: option send-gids true
13: end-volume
14:
15: volume cyclorana1-dht
16: type cluster/distribute
17: option lock-migration off
18: subvolumes cyclorana1-client-0
19: end-volume
20:
21: volume cyclorana1-write-behind
22: type performance/write-behind
23: subvolumes cyclorana1-dht
24: end-volume
25:
26: volume cyclorana1-read-ahead
27: type performance/read-ahead
28: subvolumes cyclorana1-write-behind
29: end-volume
30:
31: volume cyclorana1-readdir-ahead
32: type performance/readdir-ahead
33: option parallel-readdir off
34: option rda-request-size 131072
35: option rda-cache-limit 10MB
36: subvolumes cyclorana1-read-ahead
37: end-volume
38:
39: volume cyclorana1-io-cache
40: type performance/io-cache
41: subvolumes cyclorana1-readdir-ahead
42: end-volume
43:
44: volume cyclorana1-quick-read
45: type performance/quick-read
46: subvolumes cyclorana1-io-cache
47: end-volume
48:
49: volume cyclorana1-open-behind
50: type performance/open-behind
51: subvolumes cyclorana1-quick-read
52: end-volume
53:
54: volume cyclorana1-md-cache
55: type performance/md-cache
56: subvolumes cyclorana1-open-behind
57: end-volume
58:
59: volume cyclorana1-io-threads
60: type performance/io-threads
61: subvolumes cyclorana1-md-cache
62: end-volume
63:
64: volume cyclorana1
65: type debug/io-stats
66: option log-level INFO
67: option latency-measurement off
68: option count-fop-hits off
69: subvolumes cyclorana1-io-threads
70: end-volume
71:
72: volume meta-autoload
73: type meta
74: subvolumes cyclorana1
75: end-volume
76:
+------------------------------------------------------------------------------+
[2019-09-21 02:21:59.887159] I [fuse-bridge.c:4915:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.23
[2019-09-21 02:21:59.887257] I [fuse-bridge.c:5548:fuse_graph_sync] 0-fuse: switched to graph 0
[2019-09-21 02:21:59.887755] E [fuse-bridge.c:4983:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2019-09-21 02:21:59.891025] W [fuse-bridge.c:1242:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
[2019-09-21 02:21:59.896995] W [fuse-bridge.c:1242:fuse_attr_cbk] 0-glusterfs-fuse: 3: LOOKUP() / => -1 (Transport endpoint is not connected)
[2019-09-21 02:21:59.905103] I [fuse-bridge.c:5822:fuse_thread_proc] 0-fuse: initating unmount of /home/[username]/cyclorana1
[2019-09-21 02:21:59.905610] W [glusterfsd.c:1462:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7ea5) [0x7f2075213ea5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x557ca6064d05] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x557ca6064b6b] ) 0-: received signum (15), shutting down
[2019-09-21 02:21:59.905659] I [fuse-bridge.c:6611:fini] 0-fuse: Unmounting '/home/[username]/cyclorana1'.
[2019-09-21 02:21:59.905685] I [fuse-bridge.c:6616:fini] 0-fuse: Closing fuse connection to '/home/[username]/cyclorana1'.It's hard for me to read and parse this log, but from what I can tell, my RHEL client can "see" the volume but somehow couldn't mount it. The main problematic lines in the log seem to be (tell me if I'm wrong):

Code:[2019-09-21 02:21:59.883477] E [MSGID: 101075] [common-utils.c:482:gf_resolve_ip6] 0-resolver: getaddrinfo failed (family:2) (Name or service not known)
[2019-09-21 02:21:59.883561] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-cyclorana1-client-0: DNS resolution failed on host verrucosaI don't understand the "DNS resolution" error, since I'm connecting directly via it's LAN IP and the RHEL client is on the same LAN.

I also tried rebooting the two nodes, but get the same errors. Changing the server IP from 10.0.2.5 to 10.0.2.4 in the client's fstab didn't help either.

Lastly, I have another fully-updated Manjaro Linux client on the same LAN, but trying to mount the two volumes on this client also failed. The log output looks similar but also seems more verbose, I've pasted it in this pastebin. Oh, and all nodes and the clients and ping each other with no problems.

I've tried to post all relevant details, output, and log entries that I can think of, but let me know if there's more info I can provide. What can I do to troubleshoot this problem? Thank you.

P.S. Full disclosure: I've cross-posted this to ServerFault, hope that's OK?

latest?i=bh7AXqRkUCM:VQZM1cABTAw:F7zBnMy

latest?i=bh7AXqRkUCM:VQZM1cABTAw:V_sGLiP

latest?i=bh7AXqRkUCM:VQZM1cABTAw:gIN9vFw

Source	RSS or Atom Feed
Feed Location	https://feeds.feedburner.com/linuxquestions/latest
Feed Title	LinuxQuestions.org
Feed Link	https://www.linuxquestions.org/questions/