Coronavirus pushes Folding@Home’s crowdsourced molecular science to exaflop levels
The long-running Folding@Home program to crowdsource the enormously complex task of solving molecular interactions has hit a major milestone as thousands of new users sign up to put their computers to work. The network now comprises an "exaflop" of computing power: 1,000,000,000,000,000,000 operations per second.
Folding@Home started some 20 years ago as a way - then novel, and pioneered by the now-hibernating SETI@Home - to break up computation-heavy problems and parcel them out to individuals for execution. It amounts to a crude supercomputer distributed over the globe, and while it's not as effective as a "real" supercomputer in blasting through calculations, it can make short work of complex problems.
The problem in question being addressed by this tool (administrated by a group at Washington University in St. Louis) is that of protein folding. Proteins are one of the many chemical structures that make our biology work, and they range from small, relatively well-understood molecules to truly enormous ones.
The thing about proteins is that they change their shape depending on the conditions - temperature, pH, the presence or absence of other molecules. This change in shape is often what makes them useful - for example, a kinesin protein changes shape like a pair of legs taking steps to carry a payload across a cell. Another protein like an ion channel will open to let charged atoms through only if another protein is present, which fits into it like a key in a lock.
Image Credits: Voelz et al.
Some such changes, or convolutions, are well-documented, but most by far are totally unknown. But through robust simulation of the molecules and their surroundings we can discover new information about proteins that may lead to important discoveries. For example, what if you could show that once that ion channel is open, another protein could lock it that way for longer than usual, or close it quickly? Finding those kinds of opportunities is what this sort of molecular science is all about.
Unfortunately it's also extremely computation-expensive. These inter- and intra-molecular interactions are the kind of thing supercomputers can grind away at endlessly to cover every possibility. Twenty years ago supercomputers were a lot rarer than they are today, so Folding@Home started as a way to do this sort of heavy computing load without buying a $500 million Cray setup.
The program has been chugging along the whole time, and likely got a boost when SETI@Home recommended it as an alternative to its many users. But the coronavirus crisis has made the idea of contributing one's resources to a greater cause highly attractive, and as such there has been a huge increase in users - so much so that the servers are struggling to get problems out to everyone's computers to solve.
Examples of COVID-19-related proteins as visualized by Folding@Home.
The milestone it's celebrating is the achievement of an exaflop of processing power, which is I believe a sextillion (a billion billion) operations per second. An operation is a logical operation, like AND or NOR, and several of them together form mathematical expressions, which eventually add up to useful stuff like saying "at temperatures above 38 degrees Celsius this protein deforms to allow a drug to bind at this site and disable it."
Exascale computing is the next goal of supercomputers; Intel and Cray are building exascale computers for the National Laboratories that are expected to come online in the next couple of years - but the fastest supercomputers available today operate at a scale of hundreds of petaflops, or about half to a third the speed as an exaflop.
Intel and Argonne National Lab on 'exascale' and their new Aurora supercomputer
Naturally these two things are not directly comparable - Folding@Home is marshaling an exaflop's worth of computing power, but it is not operating as a single unit working on a single problem, as the exascale systems are built to. The exa- label is there to give a sense of scale.
Will this kind of analysis lead to coronavirus treatments? Perhaps later, but almost certainly not in the immediate future. Proteomics is "basic research" in that it is at heart about better understanding the world around (and within) us - period.
COVID-19 (like Parkinson's, Alzheimer's, ALS and others) isn't a single problem, but a large, poorly bounded set of unknowns; its proteome and related interactions are part of that set. The point isn't to stumble onto a magic bullet but to lay a foundation for understanding so that when we are evaluating potential solutions, we can pick the right one even 1% faster because we know that this molecule in that situation acts like so.
As the project noted in a blog post announcing the release of coronavirus-related work:
This initial wave of projects focuses on better understanding how these coronaviruses interact with the human ACE2 receptor required for viral entry into human host cells, and how researchers might be able to interfere with them through the design of new therapeutic antibodies or small molecules that might disrupt their interaction.
If you want to help, you can download the Folding@Home client and donate your spare CPU and GPU cycles to the cause.