BTRFS 2-disk array with data=single slow writes on first disk.
by BoraxMan from LinuxQuestions.org on (#5K18C)
I have a VANTEC NexStar 2-disk raid enclosure with 2x 2T WD Green disk drives in it. The enclosure is connected via USB3 and is set to "individual mode", so that each disk presents itself as a separate disk to the OS (which is GNU/Linux (Fedora)).
On the disks is a BTRFS filesystem with metadata set as RAID1 and data as SINGLE. This has worked well for years, but the problem is that I now get slow write speeds, specifically on one disk. Write speeds are typically 5-15MB/s. The other disk is faster, at about 50-70MB/s but still not as fast as what the disks can do. Reads occur at full speed, about 170MB/s from both disks.
So when writing, when it writes a 1G chunk on one disk, it averages about 10-15MB/s, on the other disk it averages about 5x that speed, then it alternates back and forth. It's always the same disk POSITION which is slower. I have even tried swapping them in the enclosure and the symptoms don't change. The disk in the first position is slower. I have also replaced the two disks with 2 x 500G disks, formatted them with data=single, metadata=raid1, and that BTRFS filesystem in the same enclosure worked with fast writes. This seems to eliminate the enclosure/USB connection. Also, we can eliminate the fault being one disk, as slowdown is based on disk order.
At the start when the filesystem was first created writing files was fast, close to what the disk could theoretically do, but for some reason the performance dropped to this level. Suddenly. Even more mysterious, the problem resolved itself for no reason for a while several months ago, then returned again now.
I have tried various mount options, nssd, nobarrier, turning off checksums, doing a partial balance (not a full), but nothing has made a difference so far. I have verified that the disks are OK, and hdparm -Tt returns 182MB/s for both disks. Using the eSATA connection instead of USB3 makes no difference either.
I am using kernel 5.11.7, but this has persisted with previous kernels, and also persists with using the built in Fedora kernel version 5.10.20 (I run a custom kernel).
There are no errors on DMESG, and btrfs-fsck reports nothing unusual. I am wondering how I might start troubleshooting this to find where the fault lies, or at what point this slowdown is being introduced.
On the disks is a BTRFS filesystem with metadata set as RAID1 and data as SINGLE. This has worked well for years, but the problem is that I now get slow write speeds, specifically on one disk. Write speeds are typically 5-15MB/s. The other disk is faster, at about 50-70MB/s but still not as fast as what the disks can do. Reads occur at full speed, about 170MB/s from both disks.
So when writing, when it writes a 1G chunk on one disk, it averages about 10-15MB/s, on the other disk it averages about 5x that speed, then it alternates back and forth. It's always the same disk POSITION which is slower. I have even tried swapping them in the enclosure and the symptoms don't change. The disk in the first position is slower. I have also replaced the two disks with 2 x 500G disks, formatted them with data=single, metadata=raid1, and that BTRFS filesystem in the same enclosure worked with fast writes. This seems to eliminate the enclosure/USB connection. Also, we can eliminate the fault being one disk, as slowdown is based on disk order.
At the start when the filesystem was first created writing files was fast, close to what the disk could theoretically do, but for some reason the performance dropped to this level. Suddenly. Even more mysterious, the problem resolved itself for no reason for a while several months ago, then returned again now.
I have tried various mount options, nssd, nobarrier, turning off checksums, doing a partial balance (not a full), but nothing has made a difference so far. I have verified that the disks are OK, and hdparm -Tt returns 182MB/s for both disks. Using the eSATA connection instead of USB3 makes no difference either.
I am using kernel 5.11.7, but this has persisted with previous kernels, and also persists with using the built in Fedora kernel version 5.10.20 (I run a custom kernel).
There are no errors on DMESG, and btrfs-fsck reports nothing unusual. I am wondering how I might start troubleshooting this to find where the fault lies, or at what point this slowdown is being introduced.