<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Disaggregated Inference Part 2</title>
    <link>https://hedgehog.cloud/disaggregated-inference-part-2</link>
    <description>How to design a RoCE fabric for disaggregated inference: why L3 to the host beats stretched L2, where EVPN Type-5 fits, and what breaks at AI scale.</description>
    <language>en</language>
    <pubDate>Wed, 10 Jun 2026 00:54:42 GMT</pubDate>
    <dc:date>2026-06-10T00:54:42Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>Disaggreated Inference (Part 2): Designing the RoCE Fabric. Why L3 to the Host Wins</title>
      <link>https://hedgehog.cloud/disaggregated-inference-part-2/</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://hedgehog.cloud/disaggregated-inference-part-2/" title="" class="hs-featured-image-link"&gt; &lt;img src="https://hedgehog.cloud/hubfs/Disag-Inference-fabric-recipe-card.png" alt="Disaggreated Inference (Part 2):&amp;nbsp;Designing the RoCE Fabric. Why L3 to the Host Wins" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;p style="line-height: 1.5;"&gt;&lt;span&gt;For most of the last few years, when network engineers talked about RoCE, they were talking about the inside of an AI training cluster — the lossless, high-bandwidth back-end stitching thousands of GPUs together, with synchronous all-reduce as the performance-limiting step and the network as the bottleneck. That association is now incomplete. Modern AI inference stacks split the inference computation into two stages to maximize performance and reduce cost and power — and the RoCE back-end has quietly been promoted from plumbing for training runs to the critical interconnect for inference in production.&lt;/span&gt;&lt;/p&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://hedgehog.cloud/disaggregated-inference-part-2/" title="" class="hs-featured-image-link"&gt; &lt;img src="https://hedgehog.cloud/hubfs/Disag-Inference-fabric-recipe-card.png" alt="Disaggreated Inference (Part 2):&amp;nbsp;Designing the RoCE Fabric. Why L3 to the Host Wins" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;p style="line-height: 1.5;"&gt;&lt;span&gt;For most of the last few years, when network engineers talked about RoCE, they were talking about the inside of an AI training cluster — the lossless, high-bandwidth back-end stitching thousands of GPUs together, with synchronous all-reduce as the performance-limiting step and the network as the bottleneck. That association is now incomplete. Modern AI inference stacks split the inference computation into two stages to maximize performance and reduce cost and power — and the RoCE back-end has quietly been promoted from plumbing for training runs to the critical interconnect for inference in production.&lt;/span&gt;&lt;/p&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=21430285&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fhedgehog.cloud%2Fdisaggregated-inference-part-2%2F&amp;amp;bu=https%253A%252F%252Fhedgehog.cloud%252Fdisaggregated-inference-part-2&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <category>AI Network</category>
      <category>AI Inference</category>
      <pubDate>Wed, 10 Jun 2026 00:52:36 GMT</pubDate>
      <author>Manish@githedgehog.com (Manish Vachharajani)</author>
      <guid>https://hedgehog.cloud/disaggregated-inference-part-2/</guid>
      <dc:date>2026-06-10T00:52:36Z</dc:date>
    </item>
  </channel>
</rss>
