CERN Scientists Burn AI Into Silicon to Stem Data Deluge
Arthur T Knackerbracket writes:
https://go.theregister.com/feed/www.theregister.com/2026/03/22/cern_eggheads_burn_ai_into/
Like the major league pitcher who comes to his kid's take-your-parent-to-school day, CERN's Thea Aarrestad gave a presentation at the virtual Monster Scale Summit earlier this month about meeting a set of ultra-stringent requirements that few of her peers may ever experience.
Aarrestad is an assistant professor of particle physics at ETH Zurich. AT CERN (European Organization for Nuclear Research), she uses machine learning to optimize data collection from the Large Hadron Collider (LHC). Her specialty is anomaly detection, a core component of any proper observability system.
Each year the LHC produces 40,000 EBs of unfiltered sensor data alone, or about a fourth of the size of the entire Internet, Aarrestad estimated. CERN can't store all that data. As a result, "We have to reduce that data in real time to something we can afford to keep."
By "real time," she means extreme real time. The LHC detector systems process data at speeds up to hundreds of terabytes per second, far more than Google or Netflix, whose latency requirements are also far easier to hit as well.
Algorithms processing this data must be extremely fast," Aarrestad said. So fast that decisions must be burned into the chip design itself.
Contained in a 27-kilometer ring located a hundred meters underground between the border of Switzerland and France, the LHC smashes subatomic particles together at near-light speeds. The resulting collisions are expected to produce new types of matter that fill out our understanding of the Standard Model of particle physics - the operating system of the universe.
At any given time, there are about 2,800 bunches of protons whizzing around the ring at nearly the speed of light, separated by 25-nanosecond intervals. Just before they reach one of the four underground detectors, specialized magnets squeeze these bunches together to increase the odds of an interaction. Nonetheless, a direct hit is incredibly rare: out of the billions of protons in each bunch, only about 60 pairs actually collide during a crossing.
When particles do collide, their energy is converted into a mass of new outgoing particles (E=MC2 in the house!). These new particles "shower" through CERN's detectors, making traces "which we try to reconstruct," she said, in order to identify any new particles produced in ensuing melee.
Each collision produces a few megabytes of data, and there are roughly a billion collisions per second, resulting in about a petabyte of data (about the size of the entire Netflix library).
Rather than try to transport all this data up to ground level, CERN found it more feasible to create a monster-sized edge compute system to sort out the interesting bits at the detector-level instead.
"If we had infinite compute we could look at all of it," Aarrestad said. But less than 0.02% of this data actually gets saved and analyzed. It is up to the detectors themselves to pick out the action scenes.
The detectors, built on ASICs, buffer the captured data for up to 4 microseconds, after which the data "falls over the cliff," forever lost to history if it is not saved.
Making that decision is the"Level One Trigger," an aggregate of about 1,000 FPGAs that digitally reconstruct the event information from a set ofreduced event information provided by the detector via fiber optic line at about 10 TB/sec.The trigger produces a single value, either an "accept" (1), or "reject" ("0").
Making the decision to keep or lose a collision is the job of the anomaly-detection algorithm. It has to be incredibly selective, rejecting more than 99.7 percent of the input outright. The algo, affectionately named AXOL1TL, is trained on the "background" - the areas of the Standard Model that have largely been sussed out already. It knows the typical topology of a standard collision, allowing it to instantly flag events that fall outside those boundaries. As Aarrestad put it, it's hunting for "rare physics."
The algorithm must make a decision within 50 nanoseconds. Only about 0.02% of all collision data, or about 110,000 events per second, make the cut, and are subsequently saved and transported to ground level. Even this slimmed-down throughput results in terabytes per second being sent up to the on-ground servers.
Once on the surface, the data goes through a second round of filtering, called the "High Level Trigger," which again discards the vast majority of captured collisions, identifying only about 1,000 interesting collisions from the 100,000 events per second that come through the pipe. This system has 25,600 CPUs and 400 GPUs, to reproduce the original collision and analyze the results, and produces about a petabyte a day.
"This is the data we will actually analyze," Aarrestad said.
Read more of this story at SoylentNews.