Article 3WNRQ Video: Recent Results and Open Problems for Resilience at Scale

Video: Recent Results and Open Problems for Resilience at Scale

by
Rich Brueckner
from Inside HPC & AI News | High-Performance Computing & Artificial Intelligence on (#3WNRQ)
yvesrobert-150x124.jpg

In this video from PASC18, Yves Robert from icole normale supi(C)rieure de Lyon in France presents: Recent Results and Open Problems for Resilience at Scale. "The talk will address the following three questions: (i) fail-stop errors: checkpointing or replication or both? (ii) silent errors: application-specific detectors or plain old trustworthy replication? In terms of workflows: how to avoid checkpointing every task?"

The post Video: Recent Results and Open Problems for Resilience at Scale appeared first on insideHPC.

External Content
Source RSS or Atom Feed
Feed Location http://insidehpc.com/feed/
Feed Title Inside HPC & AI News | High-Performance Computing & Artificial Intelligence
Feed Link https://insidehpc.com/
Reply 0 comments