The original Linux kernel, posted in 1991, ran on a system with a 4KB pagesize. Over 30years later, most of us are still running on systemswith 4KB pages, even though the amount of installed memory has grown by afew orders of magnitude. It is generally accepted that using large pagesizes results in better performance for most applications, but allocatinglarger pages is often difficult. During a memory-management session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Yu Zhao presented hisideas on improving the allocation of huge pages in the kernel.
Kui-Feng Lee spoke early in the BPF track at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit about some of therecent improvements to BPF. These changes were largely driven by thesched_ext work that David Vernet had covered inthe previous talk. Lee focused on changes relevant to struct_opsprograms, but several of those changes apply to all BPF programs.
With the release of Fedora40 it's time tostart looking ahead to what Fedora41 has in store. One of the largestchanges planned for the next release is a switch toDNF5, a C++ rewrite of the DNFpackage manager. A previous attempt to make the switch, during the Fedora39 cycle, was called off, anddeferred to Fedora41. The developers have had nearly a year to addresscompatibility problems and bring DNF5 to a state suitable to replace DNF4. Signs point to a successful switch inthe upcoming release, though there may be a few surprises lurking for Fedora users.
The kernel contains a pair of related filesystems that, among other things,can be used for shared-memory applications; shmem is an internal mechanismused within the kernel, while the tmpfs filesystem is mounted andaccessible from user space. As is the case elsewhere in the kernel, thesesubsystems would benefit from the addition of large-folio support. Duringa joint storage, filesystem, and memory-management session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Daniel Gomez talked aboutthe work he is doing to add that support.
Security updates have been issued by Fedora (chromium, libreoffice, and thunderbird), Red Hat (.NET 7.0, .NET 8.0, gdk-pixbuf2, git-lfs, glibc, python3, and xorg-x11-server-Xwayland), SUSE (firefox, opensc, and ucode-intel), and Ubuntu (cjson and gnome-remote-desktop).
Swapping may be a memory-management technique at its core, but itsimplementation also involves the kernel's filesystem and storage layers.So it is not surprising that a session on the kernel's swap abstractionlayer, led by Chris Li at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, was held jointly by allthree of those tracks. Li has some ambitious ideas for an improvedsubsystem, but getting to a workable implementation may not be easy.
David Vernet's second talk at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit was a summary of the state ofsched_ext, the extensible BPF scheduler that LWNcovered in early May. In short, sched_ext is intended as a platform forrapid experimentation with schedulers, and a tool to let performance-mindedadministrators customize the scheduler to their workload. The patch set has seenseveral revisions, becoming more generic and powerful over time.Vernet spoke about what has been done in the past year,and what is still missing before sched_ext can be considered pretty muchcomplete.
The KDE Project has announced therelease of KDE Gear 24.05.0, with new features and updates for themore than 200 applications thatare part of the project. In addition to new versions of the Dolphinfile manager, Kdenlive videoeditor, and Elisa music player, thisrelease includes five applications new to KDE Gear: the Audex CD-ripper application,an application AccessibilityInspector, the FrancisPomodoro timer, Kalm to teach breathing techniques, and a Sokoban-like gamecalled Skladnik. See thefullchangelog for a complete list of changes.
Almost immediately after the merging of controlgroups, kernel developers set their sights on reimplementing themproperly. The second version of the control-group API started tricklinginto the kernel around the 3.16 release in 2014 and users have long sincebeen encouraged to migrate, but support for (and users of) the initial APIremain. At the 2024Linux Storage, Filesystem, Memory-Management and BPF Summit,memory-management developers discussed whether (and when) it might bepossible to remove the version-1 memory controller. The session was led byShakeel Butt and (participating remotely) Roman Gushchin.
In a combined storage and filesystem session at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit, Luis Chamberlain led adiscussion on filesystem support for block sizes larger than the usual 4KBpage size, which followed up on discussion from last year. While thesession was meant to look at the intersection of larger block sizeswith atomic block writes that avoid torn(partial) writes (which was also discussed last year), it mostly focused on thefilesystem side. Over time, theblock sizes offered by storage devices have risen from the original512bytes; Chamberlainwanted to discuss filesystem support for block sizes larger than 4KB.
The term "memory model" is used in a couple of ways within the kernel.Perhaps the more obscure meaning is the memory-management subsystem's viewof how physical memory is organized on a given system. A properrepresentation of physical memory will be more efficient in terms of memoryand CPU use. Since hardware comes in numerous variations, the kernelsupports a number of memory models to match; see this article for details. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Oscar Salvador,presenting remotely, made the case for removing one of those models.
ComputeExpress Link (CXL) is a data-center-oriented memory solution that,according to some in the industry, will yield large cost savings andperformance improvements. Others are more skeptical. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, two sessions covered CXLand how it will be supported in future kernels.
For every page of memory in the system, the kernel maintains a set of pageflags describing how the page is used and various aspects of its currentstate. Space for page flags has been in chronic short supply, leading to a desire toeliminate or consolidate them whenever possible. That objective, though,is hampered by the fact that the purpose of many page flags is not wellunderstood. In a memory-management-track session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Matthew Wilcox set out tocooperatively update the page-flag documentation to improve that situation.
The problem of sharing page tables across processes has been discussednumerous times over the years, Khalid Aziz said at the beginning of his 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit session on the topic. Hewas there to, once again, talk about the proposed mshare() system call (which, in itscurrent form, is no longer actually a system call but the feature stillgoes by that name) and to see what can be done to finally get it into themainline.
The kernel's hugetlbfssubsystem was the first mechanism by which the kernel made huge pagesavailable to user space; it was added to the 2.5.46 development kernel in2002. While hugetlbfs remains useful, it is also viewed as a sort ofsecond memory-management subsystem that would be best unified with the restof the kernel. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, Peter Xu raised thequestion of what that unification would involve and what the first stepsmight be.
KeePassXC is an open-source (GPLv3),cross-platform password manager with local-only data storage. Theproject comes with a number of buildoptions that can be used to toggle optional features, such as browserintegration and passworddatabase sharing. However, controversy ensued when Debian Developer Julian Klode decided tomake use of these compile flags to disable these features to improve security in thekeepassxc package uploaded to Debian unstable for theupcoming Debian 13 ("Trixie") release.
The 2024 LinuxStorage, Filesystem, Memory-Management and BPF Summit was a developmentconference, where discussion was prioritized and presentations with a lotof slides were discouraged. Paul McKenney seemingly flouted thisconvention in a joint session of the storage, filesystem, andmemory-management tracks where he presented about 50slides - in fiveminutes, twice. The subject was the use of the read-copy-update (RCU)mechanism in the memory-reclaim process, and whether changes to RCU wouldbe needed for that purpose.
Version3.20.0 of the Alpine Linuxdistribution has been released with initial support for 64-bitRISC-V. Other important changes include updates to GNOME46, KDEPlasma6, and replacing Redis with Valkey due to Redis'sadoption of a non-freelicense model. See the releasenotes for more on this release.
Looking up a virtual memory area (VMA) in a process's address space, forthe handling of page faults or any of a number of other tasks, inmulti-threaded processes has long been bedeviled by lock contention in thekernel. As a result, developer gatherings have been subjected to manysessions on how to improve the situation. At the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit, developers in thememory-management track met, in a session led by Liam Howlett, to talkabout a situation that has improved considerably in recent times, but whichstill offers opportunities for optimization.
Security updates have been issued by Debian (webkit2gtk), Fedora (kernel), Mageia (chromium-browser-stable, djvulibre, gdk-pixbuf2.0, nss & firefox, postgresql15 & postgresql13, python-pymongo, python-sqlparse, stb, thunderbird, and vim), Red Hat (go-toolset:rhel8, nodejs, and varnish:6), SUSE (gitui, glibc, and kernel), and Ubuntu (libspreadsheet-parseexcel-perl, linux-aws, linux-aws-5.15, linux-gke, linux-gcp, python-idna, and thunderbird).
Vineeth Pillai gave a remote talk at the 2024Linux Storage,Filesystem, Memory Management, and BPF Summit explaining how BPF could beused to improve the performance of virtual machines (VMs). Pillai hasa patchset designed to let guest and host machines share scheduling information inorder to eliminate some of the overhead of running in a VM. The assembleddevelopers had several comments on the design, but seemed overall to approve ofthe prospect.
Brendan Jackman started his memory-management-track session at the 2024 Linux Storage,Filesystem, Memory-Management and BPF Summit by saying that, for someyears now, the kernel community has been stuck in a reactive posture withregard to hardware vulnerabilities. Each problem shows up with its ownscary name, and kernel developers find a way to mitigate it, usually losingperformance in the process. Jackman said that it is time to take back theinitiative against these vulnerabilities by reconsidering the moregeneral use of address-space isolation.
Optimizing the kernel's memory use is made much easier if developers havean accurate idea of how memory is being used, but the kernel'sinstrumentation is not as good as it could be. When Suren Baghdasaryan andKent Overstreet presented theirmemory-allocation profiling work, which is meant to address thisshortcoming, at the 2023 Linux Storage, Filesystem, Memory Management, andBPF Summit, their objective was uncontroversial but the proposed solutionran into opposition that played out at length on the mailing lists (example)over the last year. So it may be a bit surprising that, when the tworeturned to the memory-management track in the 2024 gathering, thecontroversy was gone and the discussion focused on improving details of theimplementation.
The kernel stack is a scarce and tightly constrained resource; kerneldevelopers often have to go far out of their way to avoid using too muchstack space. The size of the stack is also fixed, leading to situationswhere it is too small for some code paths, while wastefully large forothers. At the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit, Pasha Tatashin proposedmaking the kernel stack size dynamic, making more space available whenneeded while saving memory overall. This change is not as easy toimplement as it might seem, though.
The pagestructure is a complicated beast, but some parts of it are moreintimidating than others. The mapcount field is one of thescarier parts. It allegedly records the number of references to the pagein page tables, but, as David Hildenbrand described during thememory-management track at the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit, things are morecomplicated than that. Few people truly understand the semantics of thisfield, but the situation will hopefully get better over time.
Security updates have been issued by AlmaLinux (firefox, nodejs, and thunderbird), Fedora (uriparser), Oracle (firefox and thunderbird), Slackware (mariadb), SUSE (cairo, gdk-pixbuf, krb5, libosinfo, postgresql14, and python310), and Ubuntu (firefox, linux-aws, linux-aws-5.15, and linux-azure).
There are two fundamental levels of memory allocator in the Linux kernel:the page allocator, which allocates memory in units of pages, and the slaballocator, which allocates arbitrarily-sized chunks that are usually (butnot necessarily) smaller than a page. The slab allocator is the one thatstands behind commonly used kernel functions like kmalloc(). Atthe 2024 LinuxStorage, Filesystem, Memory Management, and BPF Summit, slab maintainerVlastimil Babka provided an update on recent changes at the slab level anddiscussed the changes that are yet to come.
David Vernet kicked off the BPF track at 2024's BPF track at theLinux Storage,Filesystem, Memory Management, and BPF Summitwith atalk about polymorphic kfuncs - or, with less jargon, kernel functions that canbe called from BPF which use different implementations depending on context.He explained how this would be useful tothe sched_ext BPF scheduling framework,but expected it to be helpful inother areas as well.
The term "memory tiering" refers to the management of memory placement onsystems with multiple types of memory, each of which has its ownperformance characteristics. On such systems, poor placement can lead tosignificantly worse performance. A memory-management-track discussion atthe 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit took yet another look attiering challenges with a focus on upcoming technologies that may simplify(or complicate) the picture.
As the shiny new KDEPlasma6 desktop makes its way into distributionreleases, a small group of developers is still trying to preserve theKDE experience circa2008. The TrinityDesktopEnvironment(TDE), is a continuation of KDE3 that has maintained theold-school desktop with semi-regular releases since 2010. The mostrecent release, R14.1.2,was announcedon April 28. TDE does deliver a usable retro desktop, but withsome limitations that hamper its usability on modern systems.
Security updates have been issued by Debian (bind9, chromium, and thunderbird), Fedora (buildah, chromium, firefox, mingw-python-werkzeug, and suricata), Mageia (golang), Oracle (firefox and nodejs:20), Red Hat (firefox, httpd:2.4, nodejs, and thunderbird), and SUSE (firefox, git-cliff, and ucode-intel).
Non-uniform memory access (NUMA) systems are organized with their CPUsgrouped into nodes, each of which has memory attached to it. All memory inthe system is accessible from all CPUs, but memory attached to the localnode is faster. The kernel's memory-policy("mempolicy") interface allows threads to inform the kernel about howthey would like their memory placed to get the best performance. In recentyears, the NUMA concept has been extended to support the management ofdifferent types of memory in a system, pushing the limits of the mempolicysubsystem. In a remotely presented session at the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit, Gregory Price discussedthe ways in which the kernel's memory-policy support should evolve tohandle today's more-complex systems.
Working on the Linux kernel has always been unlike working onmany other software projects.One particularly noticeable difference is the decentralized nature of thekernel's testing infrastructure. Projects such assyzkaller, KernelCI,or the kernel self teststest the kernel in different ways. On February 28, HelenKoikeposted a patch set that would add continuous integration (CI) scripts forthe whole kernel. The response was generally positive, but several peoplesuggested changes.
The6.9.1,6.8.10,6.6.31,6.1.91,5.15.159,5.10.217,5.4.276, and4.19.314 stable kernels have been released.These versions include important fixes; as usual, Greg Kroah-Hartman advisesusers to update right away.
The DAMONsubsystem was the subject of the first session in the memory-managementtrack at the LinuxStorage, Filesystem, Memory Management, and BPF Summit. DAMONmaintainer SeongJae Park introduced the data-access monitoringframework, which can generate snapshots of how memory is accessed, enablingthe detection of hot and cold regions of memory in both the virtual andphysical address spaces. The session covered recent changes and futureplans for this tool.
Ronnie Sahlberg, Jonathan Maple, and Jeremy Allison of CiQ have publisheda whitepaper looking at the security-relevant bug fixes applied (or notapplied) to the RHEL8.x kernel over time.
The merge window for the 6.10 kernel release opened on May12; betweenthen and the time of this writing, 6,819 non-merge commits were pulled intothe mainline kernel for that release. Your editor has taken some time outfrom LSFMM+BPF in an attempt to keepup with the commit flood. Read on for an overview of the most significantchanges that were pulled in the early part of the 6.10 merge window.
Version0.10 of the Vim-based text editor Neovim is now available. This releaseincludes a new default color scheme, enhanced support for renderingmultibyte characters, support for hyperlinks, system clipboardsynchronization, and more. Many features have been deprecatedin 0.10 and will be removed in future release. Neovim core contributorGregory Anders has written a summaryof some of the highlights and thoughts on upcoming releases:
Security updates have been issued by AlmaLinux (.NET 7.0, .NET 8.0, and nodejs:20), Debian (chromium, firefox-esr, ghostscript, and libreoffice), Fedora (djvulibre, mingw-glib2, mingw-python-jinja2, and mingw-python-werkzeug), Oracle (.NET 7.0, .NET 8.0, kernel, and nodejs:18), Red Hat (nodejs:20), Slackware (gdk and git), SUSE (python), and Ubuntu (linux-hwe-5.15, linux-raspi).
Ars technica looksat a arecent report on the Ebury root kit, with a focus on the 2011 compromise of kernel.org, which may havebeen more extensive than believed at the time.
Version126.0 of the Firefox browser is out. Changes include improvements tothe "copy link without site tracking" feature, support for zstdcompression, and a new tracking "feature": "Telemetry was added to createan aggregate count of searches by category to broadly inform search featuredevelopment."
The advent of the folio structure todescribe groups of pages has been one of the most fundamentaltransformations within the kernel in recent years. Since the foliotransition affects many subsystems, it is fitting that the subject wascovered at the beginning of the 2024 Linux Storage,Filesystem, Memory Management, and BPF Summit in a joint session of thestorage, filesystem, and memory-management tracks. Matthew Wilcox used thesession to review the work that has been done in this area and to discusswhat comes next.