RL Infrastructure Survey
Long-form survey of modern RL post-training infrastructure aimed at mathematicians and RL theorists who want engineering literacy. Six engineering primitives (hybrid engine, memory choreography, zero-copy weight sync, four update_weights_from_* paths, RadixAttention × GRPO, async + staleness corrections), three kernel-level DSLs (CUDA Python, Triton, TileLang), Megatron-LM’s 5D parallelism, the quantization / MoE routing problem, a case study on Miles’ DeepSeek-V3 RL pipeline, and a comparison of nine production frameworks across five axes.
Built as a single self-contained HTML page with 8 inline SVG diagrams and 12 source-code excerpts linked back to GitHub. Synthesized from primary-source reading of verl, SGLang, Megatron-LM, Miles, slime, AReaL, Triton, TileLang, and Chenyang Zhao’s Awesome-ML-SYS-Tutorial.