Program

08h30: Welcome

08h40: Keynote
“A few recent results and open problems for resilience at scale” (Yves Roberts, INRIA)

09h15: Assuming failure independence: are we right to be wrong?

09h40: MACORD: Online Adaptive Machine Learning Framework for Silent Error Detection

10h05: cudaCR: An In-Kernel Application-level Checkpoint/Restart Scheme for CUDA-enabled GPUs

10h30: Coffee Break

11h00: Application-Based Fault Tolerance Techniques for Fully Protecting Sparse Matrix Solvers

11h25: Performance Implications of Failures on MapReduce Applications

11h50: A Malleable and Fault Tolerant Task Pool Framework for X10

12h15: Closing

12h30: End