9 min read
Disaggreated Inference (Part 2): Designing the RoCE Fabric. Why L3 to the Host Wins
For most of the last few years, when network engineers talked about RoCE, they were talking about the inside of an AI training cluster — the lossless, high-bandwidth back-end stitching thousands of GPUs together, with synchronous all-reduce as the...
Read More