Video: Recent Results and Open Problems for Resilience at Scale
by Rich Brueckner from High-Performance Computing News Analysis | insideHPC on (#3WNRQ)
In this video from PASC18, Yves Robert from icole normale supi(C)rieure de Lyon in France presents: Recent Results and Open Problems for Resilience at Scale. "The talk will address the following three questions: (i) fail-stop errors: checkpointing or replication or both? (ii) silent errors: application-specific detectors or plain old trustworthy replication? In terms of workflows: how to avoid checkpointing every task?"
The post Video: Recent Results and Open Problems for Resilience at Scale appeared first on insideHPC.