One of the long-term goals of the folio conversion in the kernel'smemory-management subsystem is the replacement of the pagestructure, which describes a page of physical memory, with an eight-byte"memory descriptor". This change would reduce the overhead of trackingphysical memory, increase type safety, and make memory management moreflexible. Thus far, though, details on what the memory-descriptor futurewill look like have been relatively scarce. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Matthew Wilcox led adiscussion to try to fill in the picture somewhat.
Security updates have been issued by Debian (apache2, bluez, chromium, fossil, libreoffice, python-pymysql, redmine, and ruby-rack), Fedora (buildah, crosswords, dotnet7.0, glycin-loaders, gnome-tour, helix, helvum, libipuz, loupe, maturin, mingw-libxml2, ntpd-rs, perl-Email-MIME, and a huge list of Rust-based packages due to a "mini-mass-rebuild" that updated the toolchain to Rust 1.78 and picked up fixes for various pieces), Mageia (chromium-browser-stable, mariadb, and roundcubemail), Oracle (kernel, libreoffice, nodejs, and tomcat), and SUSE (cJSON, libfastjson, opera, postgresql15, python3, and qt6-networkauth).
Linus Torvalds released 6.10-rc1 and closedthe 6.10 merge window on May26. By that time, 11,534 non-mergechangesets had been pulled into the mainline for the next release; nearly5,000 of those came in after "The first half ofthe 6.10 merge window" was written. While the latter half of the mergewindow tends to focus more on fixes, there was also a lot of newfunctionality that landed during this time.
The maple tree data structure was addedduring the 6.1 development cycle; since then, it has taken itsplace at the core of the kernel's memory-management subsystem.Unsurprisingly, work on maple trees is not yet done. Maple-tree maintainerLiam Howlett ran a session in the memory-management track of the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit to discuss the currentstate of the maple tree and which features can be expected next.
Linus has released6.10-rc1 and closed the merge window for this release. For reasonsthat have not been spelled out, the codename for the release has beenchanged to "Baby Opossum Posse".
The 6.9.2, 6.8.11, 6.6.32, 6.1.92, 5.15.160, 5.10.218, 5.4.277, and 4.19.315stable kernel updates have all been released. Each contains animportant set of fixes. Users of those kernels should upgrade.
Using huge pages has been known for years to improve the performance ofmany workloads. But traditional huge pages, often sized by the CPU at 2MB,can be difficult to allocate and can waste memory due to internalfragmentation. Driven by both the folio transition and hardwareimprovements, attention to smaller, multi-size transparent huge pages(mTHPs) has been on the rise. In two memory-management-track sessions atthe 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, developers discussed thekernel's ability to reliably allocate mTHPs and the performance gains thatresult.
John Garry and Ted Ts'o led a discussion about supporting atomic writes for bufferedI/O, without any torn (or partial) writes to the device, at the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit. It is something of acontinuation of a discussion at last year'ssummit. The goal is to help PostgreSQL, which writes its data using16KB buffered I/O; it currently has to do a lot of extra work to ensurethat its data is safe on disk. A promise of non-torn, 16KB buffered writeswould allow the database to avoid doing double writes.
The original Linux kernel, posted in 1991, ran on a system with a 4KB pagesize. Over 30years later, most of us are still running on systemswith 4KB pages, even though the amount of installed memory has grown by afew orders of magnitude. It is generally accepted that using large pagesizes results in better performance for most applications, but allocatinglarger pages is often difficult. During a memory-management session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Yu Zhao presented hisideas on improving the allocation of huge pages in the kernel.
Kui-Feng Lee spoke early in the BPF track at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit about some of therecent improvements to BPF. These changes were largely driven by thesched_ext work that David Vernet had covered inthe previous talk. Lee focused on changes relevant to struct_opsprograms, but several of those changes apply to all BPF programs.
With the release of Fedora40 it's time tostart looking ahead to what Fedora41 has in store. One of the largestchanges planned for the next release is a switch toDNF5, a C++ rewrite of the DNFpackage manager. A previous attempt to make the switch, during the Fedora39 cycle, was called off, anddeferred to Fedora41. The developers have had nearly a year to addresscompatibility problems and bring DNF5 to a state suitable to replace DNF4. Signs point to a successful switch inthe upcoming release, though there may be a few surprises lurking for Fedora users.
The kernel contains a pair of related filesystems that, among other things,can be used for shared-memory applications; shmem is an internal mechanismused within the kernel, while the tmpfs filesystem is mounted andaccessible from user space. As is the case elsewhere in the kernel, thesesubsystems would benefit from the addition of large-folio support. Duringa joint storage, filesystem, and memory-management session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Daniel Gomez talked aboutthe work he is doing to add that support.
Security updates have been issued by Fedora (chromium, libreoffice, and thunderbird), Red Hat (.NET 7.0, .NET 8.0, gdk-pixbuf2, git-lfs, glibc, python3, and xorg-x11-server-Xwayland), SUSE (firefox, opensc, and ucode-intel), and Ubuntu (cjson and gnome-remote-desktop).
Swapping may be a memory-management technique at its core, but itsimplementation also involves the kernel's filesystem and storage layers.So it is not surprising that a session on the kernel's swap abstractionlayer, led by Chris Li at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, was held jointly by allthree of those tracks. Li has some ambitious ideas for an improvedsubsystem, but getting to a workable implementation may not be easy.
David Vernet's second talk at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit was a summary of the state ofsched_ext, the extensible BPF scheduler that LWNcovered in early May. In short, sched_ext is intended as a platform forrapid experimentation with schedulers, and a tool to let performance-mindedadministrators customize the scheduler to their workload. The patch set has seenseveral revisions, becoming more generic and powerful over time.Vernet spoke about what has been done in the past year,and what is still missing before sched_ext can be considered pretty muchcomplete.
The KDE Project has announced therelease of KDE Gear 24.05.0, with new features and updates for themore than 200 applications thatare part of the project. In addition to new versions of the Dolphinfile manager, Kdenlive videoeditor, and Elisa music player, thisrelease includes five applications new to KDE Gear: the Audex CD-ripper application,an application AccessibilityInspector, the FrancisPomodoro timer, Kalm to teach breathing techniques, and a Sokoban-like gamecalled Skladnik. See thefullchangelog for a complete list of changes.
Almost immediately after the merging of controlgroups, kernel developers set their sights on reimplementing themproperly. The second version of the control-group API started tricklinginto the kernel around the 3.16 release in 2014 and users have long sincebeen encouraged to migrate, but support for (and users of) the initial APIremain. At the 2024Linux Storage, Filesystem, Memory-Management and BPF Summit,memory-management developers discussed whether (and when) it might bepossible to remove the version-1 memory controller. The session was led byShakeel Butt and (participating remotely) Roman Gushchin.
In a combined storage and filesystem session at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit, Luis Chamberlain led adiscussion on filesystem support for block sizes larger than the usual 4KBpage size, which followed up on discussion from last year. While thesession was meant to look at the intersection of larger block sizeswith atomic block writes that avoid torn(partial) writes (which was also discussed last year), it mostly focused on thefilesystem side. Over time, theblock sizes offered by storage devices have risen from the original512bytes; Chamberlainwanted to discuss filesystem support for block sizes larger than 4KB.
The term "memory model" is used in a couple of ways within the kernel.Perhaps the more obscure meaning is the memory-management subsystem's viewof how physical memory is organized on a given system. A properrepresentation of physical memory will be more efficient in terms of memoryand CPU use. Since hardware comes in numerous variations, the kernelsupports a number of memory models to match; see this article for details. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Oscar Salvador,presenting remotely, made the case for removing one of those models.
ComputeExpress Link (CXL) is a data-center-oriented memory solution that,according to some in the industry, will yield large cost savings andperformance improvements. Others are more skeptical. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, two sessions covered CXLand how it will be supported in future kernels.
For every page of memory in the system, the kernel maintains a set of pageflags describing how the page is used and various aspects of its currentstate. Space for page flags has been in chronic short supply, leading to a desire toeliminate or consolidate them whenever possible. That objective, though,is hampered by the fact that the purpose of many page flags is not wellunderstood. In a memory-management-track session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Matthew Wilcox set out tocooperatively update the page-flag documentation to improve that situation.
The problem of sharing page tables across processes has been discussednumerous times over the years, Khalid Aziz said at the beginning of his 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit session on the topic. Hewas there to, once again, talk about the proposed mshare() system call (which, in itscurrent form, is no longer actually a system call but the feature stillgoes by that name) and to see what can be done to finally get it into themainline.
The kernel's hugetlbfssubsystem was the first mechanism by which the kernel made huge pagesavailable to user space; it was added to the 2.5.46 development kernel in2002. While hugetlbfs remains useful, it is also viewed as a sort ofsecond memory-management subsystem that would be best unified with the restof the kernel. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Peter Xu raised thequestion of what that unification would involve and what the first stepsmight be.
KeePassXC is an open-source (GPLv3),cross-platform password manager with local-only data storage. Theproject comes with a number of buildoptions that can be used to toggle optional features, such as browserintegration and passworddatabase sharing. However, controversy ensued when Debian Developer Julian Klode decided tomake use of these compile flags to disable these features to improve security in thekeepassxc package uploaded to Debian unstable for theupcoming Debian 13 ("Trixie") release.
The 2024 LinuxStorage, Filesystem, Memory-Management and BPF Summit was a developmentconference, where discussion was prioritized and presentations with a lotof slides were discouraged. Paul McKenney seemingly flouted thisconvention in a joint session of the storage, filesystem, andmemory-management tracks where he presented about 50slides - in fiveminutes, twice. The subject was the use of the read-copy-update (RCU)mechanism in the memory-reclaim process, and whether changes to RCU wouldbe needed for that purpose.
Version3.20.0 of the Alpine Linuxdistribution has been released with initial support for 64-bitRISC-V. Other important changes include updates to GNOME46, KDEPlasma6, and replacing Redis with Valkey due to Redis'sadoption of a non-freelicense model. See the releasenotes for more on this release.
Looking up a virtual memory area (VMA) in a process's address space, forthe handling of page faults or any of a number of other tasks, inmulti-threaded processes has long been bedeviled by lock contention in thekernel. As a result, developer gatherings have been subjected to manysessions on how to improve the situation. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, developers in thememory-management track met, in a session led by Liam Howlett, to talkabout a situation that has improved considerably in recent times, but whichstill offers opportunities for optimization.
Security updates have been issued by Debian (webkit2gtk), Fedora (kernel), Mageia (chromium-browser-stable, djvulibre, gdk-pixbuf2.0, nss & firefox, postgresql15 & postgresql13, python-pymongo, python-sqlparse, stb, thunderbird, and vim), Red Hat (go-toolset:rhel8, nodejs, and varnish:6), SUSE (gitui, glibc, and kernel), and Ubuntu (libspreadsheet-parseexcel-perl, linux-aws, linux-aws-5.15, linux-gke, linux-gcp, python-idna, and thunderbird).
Vineeth Pillai gave a remote talk at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit explaining how BPF could beused to improve the performance of virtual machines (VMs). Pillai hasa patchset designed to let guest and host machines share scheduling information inorder to eliminate some of the overhead of running in a VM. The assembleddevelopers had several comments on the design, but seemed overall to approve ofthe prospect.
Brendan Jackman started his memory-management-track session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit by saying that, for someyears now, the kernel community has been stuck in a reactive posture withregard to hardware vulnerabilities. Each problem shows up with its ownscary name, and kernel developers find a way to mitigate it, usually losingperformance in the process. Jackman said that it is time to take back theinitiative against these vulnerabilities by reconsidering the moregeneral use of address-space isolation.
Optimizing the kernel's memory use is made much easier if developers havean accurate idea of how memory is being used, but the kernel'sinstrumentation is not as good as it could be. When Suren Baghdasaryan andKent Overstreet presented theirmemory-allocation profiling work, which is meant to address thisshortcoming, at the 2023 Linux Storage, Filesystem, Memory Management, andBPF Summit, their objective was uncontroversial but the proposed solutionran into opposition that played out at length on the mailing lists (example)over the last year. So it may be a bit surprising that, when the tworeturned to the memory-management track in the 2024 gathering, thecontroversy was gone and the discussion focused on improving details of theimplementation.
The kernel stack is a scarce and tightly constrained resource; kerneldevelopers often have to go far out of their way to avoid using too muchstack space. The size of the stack is also fixed, leading to situationswhere it is too small for some code paths, while wastefully large forothers. At the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit, Pasha Tatashin proposedmaking the kernel stack size dynamic, making more space available whenneeded while saving memory overall. This change is not as easy toimplement as it might seem, though.
The pagestructure is a complicated beast, but some parts of it are moreintimidating than others. The mapcount field is one of thescarier parts. It allegedly records the number of references to the pagein page tables, but, as David Hildenbrand described during thememory-management track at the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit, things are morecomplicated than that. Few people truly understand the semantics of thisfield, but the situation will hopefully get better over time.
Security updates have been issued by AlmaLinux (firefox, nodejs, and thunderbird), Fedora (uriparser), Oracle (firefox and thunderbird), Slackware (mariadb), SUSE (cairo, gdk-pixbuf, krb5, libosinfo, postgresql14, and python310), and Ubuntu (firefox, linux-aws, linux-aws-5.15, and linux-azure).
There are two fundamental levels of memory allocator in the Linux kernel:the page allocator, which allocates memory in units of pages, and the slaballocator, which allocates arbitrarily-sized chunks that are usually (butnot necessarily) smaller than a page. The slab allocator is the one thatstands behind commonly used kernel functions like kmalloc(). Atthe 2024 LinuxStorage, Filesystem, Memory Management, and BPF Summit, slab maintainerVlastimil Babka provided an update on recent changes at the slab level anddiscussed the changes that are yet to come.
David Vernet kicked off the BPF track at 2024's BPF track at theLinux Storage,Filesystem, Memory Management, and BPF Summitwith atalk about polymorphic kfuncs - or, with less jargon, kernel functions that canbe called from BPF which use different implementations depending on context.He explained how this would be useful tothe sched_ext BPF scheduling framework,but expected it to be helpful inother areas as well.
The term "memory tiering" refers to the management of memory placement onsystems with multiple types of memory, each of which has its ownperformance characteristics. On such systems, poor placement can lead tosignificantly worse performance. A memory-management-track discussion atthe 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit took yet another look attiering challenges with a focus on upcoming technologies that may simplify(or complicate) the picture.
As the shiny new KDEPlasma6 desktop makes its way into distributionreleases, a small group of developers is still trying to preserve theKDE experience circa2008. The TrinityDesktopEnvironment(TDE), is a continuation of KDE3 that has maintained theold-school desktop with semi-regular releases since 2010. The mostrecent release, R14.1.2,was announcedon April 28. TDE does deliver a usable retro desktop, but withsome limitations that hamper its usability on modern systems.
Security updates have been issued by Debian (bind9, chromium, and thunderbird), Fedora (buildah, chromium, firefox, mingw-python-werkzeug, and suricata), Mageia (golang), Oracle (firefox and nodejs:20), Red Hat (firefox, httpd:2.4, nodejs, and thunderbird), and SUSE (firefox, git-cliff, and ucode-intel).
Non-uniform memory access (NUMA) systems are organized with their CPUsgrouped into nodes, each of which has memory attached to it. All memory inthe system is accessible from all CPUs, but memory attached to the localnode is faster. The kernel's memory-policy("mempolicy") interface allows threads to inform the kernel about howthey would like their memory placed to get the best performance. In recentyears, the NUMA concept has been extended to support the management ofdifferent types of memory in a system, pushing the limits of the mempolicysubsystem. In a remotely presented session at the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit, Gregory Price discussedthe ways in which the kernel's memory-policy support should evolve tohandle today's more-complex systems.
Working on the Linux kernel has always been unlike working onmany other software projects.One particularly noticeable difference is the decentralized nature of thekernel's testing infrastructure. Projects such assyzkaller, KernelCI,or the kernel self teststest the kernel in different ways. On February 28, HelenKoikeposted a patch set that would add continuous integration (CI) scripts forthe whole kernel. The response was generally positive, but several peoplesuggested changes.
The6.9.1,6.8.10,6.6.31,6.1.91,5.15.159,5.10.217,5.4.276, and4.19.314 stable kernels have been released.These versions include important fixes; as usual, Greg Kroah-Hartman advisesusers to update right away.
The DAMONsubsystem was the subject of the first session in the memory-managementtrack at the LinuxStorage, Filesystem, Memory Management, and BPF Summit. DAMONmaintainer SeongJae Park introduced the data-access monitoringframework, which can generate snapshots of how memory is accessed, enablingthe detection of hot and cold regions of memory in both the virtual andphysical address spaces. The session covered recent changes and futureplans for this tool.
Ronnie Sahlberg, Jonathan Maple, and Jeremy Allison of CiQ have publisheda whitepaper looking at the security-relevant bug fixes applied (or notapplied) to the RHEL8.x kernel over time.