The kernel's memory-management subsystem is optimized for the sharing ofresources to the greatest extent possible. But, as Pasha Tatashin pointedout during a memory-management session at the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, a lot of memory has a single ownerand will never be shared. He presented some ideas for optimizing themanagement of that memory to a somewhat skeptical crowd.
Security updates have been issued by Debian (sniproxy), Fedora (c-ares), Oracle (apr-util, curl, emacs, git, go-toolset and golang, go-toolset:ol8, gssntlmssp, libreswan, mysql:8.0, thunderbird, and webkit2gtk3), Red Hat (go-toolset-1.19 and go-toolset-1.19-golang and go-toolset:rhel8), Slackware (ntfs), SUSE (rmt-server), and Ubuntu (linux-raspi, linux-raspi-5.4 and python-django).
Over on the Collabora blog, Marius Vlad looks at the Weston 12.0 release. Weston is the reference compositor for the Wayland project. The highlights include two new backends and support for multiple scanout devices, along with "multiple fixes and internal changes that would further facilitate integration of functionality like color management or the ability to load up multiple backends at the same time".
Issues around zoned storage for filesystems was the topic of a combinedstorage and filesystem session at 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit led byBart Van Assche, Viacheslav A. Dubeyko, and Naohiro Aota. Zoned storage began with theadvent of shingledmagnetic recording (SMR) devices, but is now implemented by NVMe zonednamespaces (ZNS) as well.SMR devices can have multiple zones with differentcharacteristics, with some zones that can only be written in sequentialorder, while other, conventional zones can be written in any order. Thetalk was focused on filesystems using the sequential type of zonessince the conventional zones are already well-supported in Linux and itsfilesystems.
The conversion to folios is intended to,among other things, make it easy for the kernel to manage chunks of memoryin a number of different sizes. So far, though, that flexibility is notbeing used in the kernel's handling of anonymous pages. At the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, Yu Zhao and Yang Shi ran a session inthe memory-management track aimed at charting a path toward support foranonymous pages in a variety of sizes.
Security updates have been issued by Debian (python2.7), Fedora (maradns), Red Hat (devtoolset-12-binutils, go-toolset and golang, httpd24-httpd, jenkins and jenkins-2-plugins, rh-ruby27-ruby, and sudo), Scientific Linux (git), Slackware (texlive), SUSE (cups-filters, poppler, texlive, distribution, golang-github-vpenso-prometheus_slurm_exporter, kubernetes1.18, kubernetes1.23, openvswitch, rmt-server, and ucode-intel), and Ubuntu (ca-certificates, calamares-settings-ubuntu, Jhead, libhtml-stripscripts-perl, and postgresql-10, postgresql-12, postgresql-14, postgresql-15).
It is, it seems, a week of Python Package Index (PyPI) news. On the PyPI blog, Director of Infrastructure at the Python Software Foundation (PSF), Ee Durbin, has posted an admirably detailed description of the organization's response to three subpoenas it received for PyPI user information in March and April. The requests for information were quite broad and the PSF did produce the requested material (to the extent possible), which involved five PyPI user accounts, under the advice of counsel.
Greg Kroah-Hartman has released the 6.3.4,6.1.30, and 5.15.113 stable kernels. They each contain alarge group of important fixes throughout the kernel tree.
Amir Goldstein kicked off a session on monitoring mounts at the2023 Linux Storage, Filesystem,Memory-Management and BPF Summit. In particular, there are problemswhen trying to efficiently monitor "a very large number of mounts in amount namespace"; some user-space programs need an accurate view of themount tree without having to constantly parse /proc/mounts or thelike. There are a number of questions to be answered, including what the API should look like and what entity should be watchedin order to get notifications of new mount operations.
Security updates have been issued by Debian (libssh and sofia-sip), Fedora (cups-filters, dokuwiki, qt5-qtbase, and vim), Oracle (git, python-pip, and python3-setuptools), Red Hat (git, kernel, kpatch-patch, rh-git227-git, and sudo), SUSE (openvswitch, rmt-server, and texlive), and Ubuntu (binutils, cinder, cloud-init, firefox, golang-1.13, Jhead, liblouis, ncurses, node-json-schema, node-xmldom, nova, python-glance-store, python-os-brick, and runc).
In the filesystem track of the2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, Amir Goldstein led a session on usingfanotifyfor hierarchicalstorage management (HSM). Linux had some support for HSM in the XFSfilesystem's implementation of the data management API (DMAPI),but that code was removedback in 2010. Goldstein has done some work on using fanotify for HSM features, but he has run into some problems withdeadlocks that he wanted to discuss with attendees.
A complete stack trace is needed for a number of debugging and optimizationtasks, but getting such traces reliably can be surprisingly challenging.At the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, Steve Rostedt and Indu Bhagatdescribed a mechanism called SFrame that enables the creation of reliableuser-space stack traces in the kernel withoutthe memory and run-time overhead of some other solutions.
The kernel developers try hard to avoid duplicating functionality in thekernel, which is enough of a challenge to maintain as it is. So it hasoften seemed out of character for the kernel to support three differentslab allocators (called SLAB, SLOB, and SLUB), all of which handle themanagement of small memory allocations in similar ways. At the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, slab maintainer Vlastimil Babkaupdated the group on progress toward the goal of reducing the number ofslab allocators in the kernel and gave an overview of what to expect inthat area.
The kernel's swapping code tends to not get much love. Users try to avoidit, and developers often find better things to do with their time thantrying to improve it. At the 2023 LinuxStorage, Filesystem, Memory-Management and BPF Summit, though, YosryAhmed dedicated a memory-management-track session to the problem of theswap layer and what might be done to make it better.
Security updates have been issued by Debian (cups-filters, imagemagick, libwebp, sqlite, and texlive-bin), Fedora (chromium and vim), Gentoo (librecad, mediawiki, modsecurity-crs, snakeyaml, and tinyproxy), Mageia (apache-mod_security, cmark, dmidecode, freetype2, glib2.0, libssh, patchelf, python-sqlparse, sniproxy, suricata, and webkit2), Oracle (apr-util and firefox), Red Hat (git), SUSE (containerd, openvswitch, python-Flask, runc, terraform-provider-aws, and terraform-provider-null), and Ubuntu (tar).
Memory control groups (or "memcgs") allow an administrator to manage thememory resources given to the processes running on a system. Often,though, memcgs seem to have memory-use problems of their own, and that hasmade them into a recurring Linux Storage, Filesystem, and Memory-ManagementSummit topic since at least 2019. The topic returned at the 2023 event with a focus on thehandling of shared, anonymous memory. The quirks associated with thismemory type, it seems, can subject systems to an unpleasant sort of zombieinvasion; a session in the memory-management track led by T.J. Mercier,Yosry Ahmed, and Chris Li discussed possible solutions.
Bernd Schubert led a session at the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit on the intersectionof FUSEand io_uring. Heworks for DDN Storage, which is using FUSE for two network-storageproducts; he has found FUSE to be a bottleneck for those filesystems. Thatcould perhaps be improved by using io_uring, which is something he has been working on andwanted to discuss.
The "scatterlist" is a core-kernel data structure used to describe DMA I/Ooperations from the point of view of both the CPU and the peripheraldevice. Over the years, the shortcomings of scatterlists have become moreapparent, but there has not been a viable replacement on the horizon.During a memory-management session at the 2023 Linux Storage, Filesystem, Memory-Managementand BPF Summit, Jason Gunthorpe described a possible alternative, knownalternatively as "phyr", "physr", or "rlist", that might improve onscatterlists for at least some use cases.
Memory management is tricky enough on it own, but virtualization addsanother twist: now there are two kernels (host and guest) managing the samememory. This duplicated effort can be wasteful if not implementedcarefully, so it is not surprising that a lot of effort, from both hardwareand software developers, has gone into this problem. As Pasha Tatashinpointed out during a memory-management-track session at the 2023 Linux Storage, Filesystem, Memory-Managementand BPF Summit, though, there are still ways in which these systems runless efficiently than they could. He has put some effort into improvingthat situation.
Security updates have been issued by Fedora (cups-filters, kitty, mingw-LibRaw, nispor, rust-ybaas, and rust-yubibomb), Mageia (kernel-linus), Red Hat (jenkins and jenkins-2-plugins), SUSE (openvswitch and ucode-intel), and Ubuntu (linux-azure, linux-azure-4.15, linux-gcp, linux-gcp-5.15, linux-gke, linux-gke-5.15, linux-gkeop, linux-oracle-5.15, linux-ibm, linux-oracle, and linux-oem-6.0).
Joel Fernandes introduced himself to the memory-management track at the2023 Linux Storage, Filesystem,Memory-Management and BPF Summit as a co-maintainer of theread-copy-update (RCU) subsystem and an implementer of the "lazy RCU"functionality. Lazy RCU can improve performance, especially on systemsthat are not heavily utilized, but it also has some implications for memorymanagement that he wanted to discuss with the group.
The memory-management subsystem has the unenviable task of trying topredict which pages of memory will be needed in the near future. Sincepredictions tend to be difficult, the code relies heavily on the heuristicthat memory used in the recent past is likely to be used again in the nearfuture. However, even knowing which memory has been recently used can be achallenge. At the 2023 Linux Storage,Filesystem, Memory-Management and BPF Summit, Aneesh Kumar and Wei Xu,both presenting remotely,discussed some ways to use the increasingly capable hardware counters thatare provided by current and upcoming CPUs.
The buffer head is a kernel data structure that dates back to the firstLinux release; for much of the time since then, kernel developers have beenhoping to get rid of it. Hannes Reineckestarted a plenary session at the 2023 Linux Storage, Filesystem, Memory-Managementand BPF Summit by saying that everybody agrees that buffer heads are abad idea, but there is less agreement on how to take them out of thekernel. The core functionality they provide — facilitating sector-size I/Ooperations to a block device underlying a filesystem — must be providedsomehow.
When OpenAI made its chatbot ChatGPT available to the publicin November 2022, it immediately became a hit. However, despite thecompany's name, the underlying algorithm isn't open. Furthermore, ChatGPTusers require a connection to OpenAI's cloud service and face usagerestrictions. In the meantime, several open-source or freely availablealternatives have emerged, with some even able to run on consumer hardware. Although theycan't match ChatGPT's performance yet, rapid advancements are occurring inthis field, to the extent that some people at the companies developing theseartificial intelligence (AI) models have begun to worry.
Version2.39 of the util-linux tool collection has been released. The mostsignificant change, perhaps, is support for the new filesystem-mounting API, which enables anumber of new features, including ID-mappedmounts.
There are some filesystems that use the Filesystemin Userspace (FUSE) framework but only to provide a different view ofan underlying filesystem, such as different filemetadata, a changed directory hierarchy, or other changes of that sort.The read-only filteredfilesystem, which simply filters the view of which filesare available, is one example; the file data could come directly from theunderlying filesystem, but currently needs to traverse the FUSE user-space serverprocess. Finding a way to bypass the server, so that the file I/O operations godirectly from the application to the underlying filesystem would be beneficial. Ina filesystem session at the 2023 Linux Storage,Filesystem, Memory-Management and BPF Summit, Miklos Szeredi wanted to exploredifferent options for adding such a mechanism, which was referred to asa "FUSE passthrough"—though "bypass" might be a better alternative.
The conversion of the kernel's memory-management subsystem over to folios was never going to be done in a day.At a plenary session at the start of the second day of the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, Matthew Wilcox discussed the currentstate and future direction of this work. Quite a lot of progress has beenmade — and a lot of work remains to be done.
A new development in the NVMe world was the subject of a combined storageand filesystem session led by Stephen Bates at the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit. Computational storage namespaceswill allow NVMe devices to offer various types of computation—anything fromsimple compression through complex queries and data manipulations—to beperformed on the data stored on the device.
The use of huge pages can make memory management more efficient in a numberof ways, but it can also impose costs in the form of internal fragmentation andI/O amplification. At the 2023 LinuxStorage, Filesystem, Memory-Management and BPF Summit, James Houghtonran a session on a scheme to get the best of both worlds: using huge pageswhile maintaining base-page mappings within them.
The6.3.3,6.2.16,6.1.29,5.15.112,5.10.180,5.4.243,4.19.283, and4.14.315stable kernels have all been released; each contains another set ofimportant fixes. Note that 6.2.16 will be the final update for the 6.2kernel.
In a plenary session on the first day of the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, Stephen Bates led a discussion about peer-to-peer DMA (P2PDMA). The idea is toremove the host system's participation in a transfer of data from onePCIe-connected device to another. The feature was originally aimed at NVMeSSDs so that data could simply be copied directly to and from the storagedevice without needing to move it to system memory and then fromthere to somewhere else.
DAMON is a framework that allows user spaceto influence and control the kernel's memory-management operations. Itfirst entered the kernel with the 5.15 release, and has been gainingcapabilities ever since. At the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, DAMON author Seongjae Park providedan overview of the current status of DAMON development and where it can beexpected to go in the near future.
In a remotely presented, memory-management-track session at the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, Frank van der Linden pointed out thatthe line dividing resources controlled by the kernel from those managed byuser space has moved back and forth over the years. He is currentlyinterested in making it possible for user space to take more control overthe management of memory resources. A proposal was discussed in generalterms, but it will require some real scrutiny on its way toward themainline, if it ever gets there.
Sourceware.org, which has long played host to many important projects, hasannounced that it has become a member project of the Software FreedomConservancy — a move that has been in theworks for some time.
Overcommitting memory is a longstanding tradition in the Linux world(and beyond); it is rare that an application uses all of the memoryallocated to it, so overcommitting can help to improve overall memoryutilization. In situations where memory has been overcommitted, though, itmay be necessary to respond quickly to ensure that applications have thememory they actually need, even when those needs change. At the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, T.J. Alumbaugh (in the room) andYuanchu Xie (remotely)presented a new mechanism intended to help hosts provide containerizedguests with the memory resources they need.
Virtual-machine hosting can be a fickle business; once a virtual machinehas been placed on a physical host, there may arise a desire to move it toa different host. The problem with migrating virtual machines, though, isthat there is a period during which the machine is not running; that can bedisruptive even if it is brief. At the 2023 Linux Storage, Filesystem,Memory-Management and BPF Summit, Dragan Stancevic, presentingremotely, showed how CXLshared memory can be used to migrate virtual machines with no offline time.
Security updates have been issued by Debian (golang-websocket, kernel, postgresql-11, and thunderbird), Fedora (firefox, kernel, libreswan, libssh, tcpreplay, and thunderbird), SUSE (dcmtk, gradle, libraw, postgresql12, postgresql13, postgresql14, and postgresql15), and Ubuntu (firefox, nova, and thunderbird).
The second 6.4 kernel prepatch is out fortesting. "This being rc2, it's been a fairly calm week as people areonly starting to find any issues from the merge window, but it all looksfine."