writing to disk without flooding kernel buffers
by Skaperen from LinuxQuestions.org on (#59WCT)
when i write a big bunch of blocks to a disk partition (a few to many GB), it ends up flooding kernel buffers, usually causing lots of dirty pages to be swapped out, contributing to seek thrashing on the disk that has swap. so i have been looking at ways to deal with that. i have tried using O_DIRECT and that seems to work. but performance seems to be reduced. in some cases it is a drastic reduction. i suspect this is because after a write is done on the physical device, the next physical write won't happen until a complete trip through the userland process and it doing another write, even if it acquired the next block of data concurrent with the previous write and has it ready to go.
i am wondering about how the kernel handles write completion. i assume that if it is writing out dirty buffers and one completes, it starts the next one about as fast as its interrupt handling. what i would like to know is if a process makes a write on one block via an O_DIRECT fd, if another process does a write on the very next block via its different O_DIRECT fd, will the kernel start the 2nd write similarly as fast (faster than waiting for a userland write call).
my thought is for my program to start 2 or more threads or fork 2 or more child process to have them interleave the write calls so the kernel always has one in the queue ready to go when the physical write completes. it woul use pwrite to interleave. the parent would be reading into a shared buffer and sending something (to be decided) to the writers to indicate what is ready.


i am wondering about how the kernel handles write completion. i assume that if it is writing out dirty buffers and one completes, it starts the next one about as fast as its interrupt handling. what i would like to know is if a process makes a write on one block via an O_DIRECT fd, if another process does a write on the very next block via its different O_DIRECT fd, will the kernel start the 2nd write similarly as fast (faster than waiting for a userland write call).
my thought is for my program to start 2 or more threads or fork 2 or more child process to have them interleave the write calls so the kernel always has one in the queue ready to go when the physical write completes. it woul use pwrite to interleave. the parent would be reading into a shared buffer and sending something (to be decided) to the writers to indicate what is ready.