CheckWork - Checkpoint-Aware ML Simulation
First-author paper accepted to APNet 2026. CheckWork is an open-source framework for generating checkpoint-aware execution traces in distributed ML training, enabling reproducible simulation of AI training checkpointing strategies without large GPU clusters.