08h30: Welcome
08h40: Keynote
“A few recent results and open problems for resilience at scale” (Yves Roberts, INRIA)
09h15: Assuming failure independence: are we right to be wrong?
09h40: MACORD: Online Adaptive Machine Learning Framework for Silent Error Detection
10h05: cudaCR: An In-Kernel Application-level Checkpoint/Restart Scheme for CUDA-enabled GPUs
10h30: Coffee Break
11h00: Application-Based Fault Tolerance Techniques for Fully Protecting Sparse Matrix Solvers
11h25: Performance Implications of Failures on MapReduce Applications
11h50: A Malleable and Fault Tolerant Task Pool Framework for X10
12h15: Closing
12h30: End