9 min read
Disaggregated Inference - Part 2: Designing the RoCE Fabric. Why L3 to the Host Wins
For most of the last few years, when network engineers talked about RoCE, they were talking about the inside of an AI training cluster — the...
9 min read
For most of the last few years, when network engineers talked about RoCE, they were talking about the inside of an AI training cluster — the...
3 min read
6 min read
Cohere's recent $6.8 billion valuation signals something interesting about enterprise AI preferences, but not quite what you might expect. While...
8 min read
Bottom Line Up Front: The supply chain crisis in high-speed ethernet transceivers has forced organizations building AI clusters into technical...
9 min read
If you work with AI infrastructure, build applications with large language models, or fine-tune models for your organization, you've probably heard...
9 min read
Why Your Advanced Kubernetes AI Scheduler Might Be Fighting a Losing Battle If you work with AI infrastructure, manage Kubernetes clusters for...
14 min read
When a CFO approves a $50 million GPU cluster purchase, they're essentially buying the world's most expensive waiting rooms. The uncomfortable truth...
9 min read
The same AI model can require significant differences in infrastructure depending on whether you're training it or running it. Here's why your...
11 min read
Why Traditional Networks Fail AI Workloads The billion-dollar bottleneck hiding in your artificial intelligence infrastructure
8 min read
The fog of skepticism that once surrounded AI's practical applications has fully lifted. As I sit here reflecting on Jensen Huang's GTC 2025 keynote,...