Costa: Designing a Userspace Disk I/O Scheduler for Modern Datastores: the Scylla example (Part 1)
Over at the Scylla blog, Glauber Costa looks at why a high-performance datastore application might want to do its own I/O scheduling. "If one is using a threaded approach for managing I/O, a thread can be assigned to a different priority group by tools such as ionice. However, ionice only allows us to choose between general concepts like real-time, best-effort and idle. And while Linux will try to preserve fairness among the different actors, that doesn't allow any fine tuning to take place. Dividing bandwidth among users is a common task in network processing, but it is usually not possible with disk I/O without resorting to infrastructure like cgroups.More importantly, modern designs like the Seastar framework used by Scylla to build its infrastructure may stay away from threads in favor of a thread-per-core design in the search for better scalability. In the light of these considerations, can a userspace application like Scylla somehow guarantee that all actors are served according to the priorities we would want them to obey?"