AI Training vs. Inference: Designing Networks for Real-World AI Workloads
The same AI model can require significant differences in infrastructure depending on whether you're training it or running it. Here's why your...
3 min read
Marc Austin : Jun 17, 2025 3:21:40 PM
A good friend of mine from the way, way back - during our time building internet retail 1.0 and internet search 1.0 together - recently sent me a list of articles to read. Steve's a pretty smart guy. If he makes a suggestion, I usually listen. Here's the first in a series of quick posts on my reactions to Steve's reading list.
Kudos to Srinivas Rao for writing 99% of AI Startups Will Be Dead by 2026 — Here’s Why.
Srinivas writes well. This isn't genAI bath water. He punchy, entertaining, and straight to the hard truth.
When I first started reading this post, I thought for sure that Srinivas wrote this over a year ago. I mean, really, why would anyone do anything more than prototype with token wrappers? Clearly you need to own your own infrastructure to fine tune your own open source model with your own unique data to make a legitimate business. Doesn't everyone know this formula?
Nope. This one was written just a month ago.
Hedgehog has been preaching this for almost three years now. I guess every prophet feels like they are shouting into the wind, so I'll preach again. The 1% of AI startups that survive will build businesses on open source models tuned with unique data served by low-cost distributed AI cloud infrastructure at the data edge. There. I said it again.
Srinivas does a good job of explaining "NVIDIA: The Silent Kingmaker." Heavy is the head that wears the crown. Monopolies are difficult to maintain in rapidly evolving markets. The good news for NVIDIA is that Jensen invests in hardware and software. AMD may catch up on GPU performance, but they still need software that AI developers want to use. The Ultra Ethernet Consortium may catch up on AI networking hardware, but they still need software to make it all work. And yes, the network matters for AI workload performance. And yes, software still has better margins than hardware. And yes, it's already possible to deliver an open AI network that is just as good as NVIDIA Spectrum X without building your empire on the fault line beneath it all.
Srinivas is a little more explicit with "Microsoft: The Infrastructure Middleman with All the Leverage."
I like to think of it as Steve Ballmer's revenge a la Bing 2.0. We all learned this lesson in Internet 1.0. It's about the data, stupid. If you have the largest product catalog, you become the king of internet retail. If you have the largest index of web content, you become the king of internet search. If you have everyone's personal information, you become the king of social media. If you have all the world's data in your cloud platform, including proprietary data from every enterprise on the planet, well, you become king of something a lot bigger than Bing.
Yep. Vendor lock-in is the "The Fault Line Beneath It All." We haven't seen the last supply shock by a long way. If COVID or Liberation Day weren't enough supply chain risk already, how about another regional war? The only way to manage risk is to diversify it. The smart kids do this with open source software, the Open Compute Project, and sourcing components from multiple vendors.
"Infrastructure Wins — But No One’s Building It" is only partly true. Companies like Hedgehog are building it. The better observation may be "Infrastructure Wins - But No One's Funding It." Yes. "The Gold Rush Is the Point. Every time a wave like this hits, the same psychology takes over. People don’t just chase opportunity — they chase belonging." A smart fund manager on Wall Street told me way back in 2001 that "we're all just lemmings, don't you know?" The thing about investing in infrastructure is that not everybody is doing it. Infrastructure is hard. There are a lot of details. Real engineering. Real technical risk. You probably can't evaluate a deal with the ex-bankers and MBAs you currently have on staff. It requires patience. Longer time to revenue. A LOT longer than a wrapper that aims to disrupt a SaaS category. And it feels a lot more uncomfortable. You know that SaaS category, why not place the same bet on SaaS 2.0 with an AI wrapper? You know it's sugar, but maybe you can shorten your holding period. Maybe you can survive longer than Idealab. Then again, maybe you aren't smarter than Bill Gross. Maybe you could take the harder road. If you want to be the next Y Combinator, take a queue from Ashmeet Sidana and build a fund like Engineering Capital. Be a Chief Engineer.
The same AI model can require significant differences in infrastructure depending on whether you're training it or running it. Here's why your...
AI cloud builders now have a complete solution for open AI infrastructure. Hedgehog now supports Supermicro switches, and Supermicro includes...
In my “AI Needs a New Network” post last week, I noted that NVIDIA reported $13 billion in networking ARR on $18.4 billion of annual data center...