1 min read

Industry Leading AI Network Performance with Hedgehog

Marc Austin : Oct 19, 2025 10:10:11 AM

Press AI Network AI Training

2:44

SemiAnalysis recently completed ClusterMAX testing on NVIDIA B200 GPU services offered by FarmGPU and RunPod. NCCL testing of their Hedgehog AI network showed industry leading performance. FarmGPU and RunPod offer better AI network performance than NVIDIA Israel-1, an AMD MI300X cluster running untuned RoCEv2, and even Crusoe Iceland's Infiniband network.

Screen Shot 2025-10-19 at 9.41.07 AM

Semi Analysis is an independent analyst that breaks the mold on market research. They don't try to cover every industry. They focus on neoclouds. They don't scheduling briefings to write a lot of subjective content. If you open your doors, they will test your GPU cluster for a number of ClusterMAX rating criteria. Then they will give you a ClusterMAX rating. The intent of all this is to help GPU renters make smart purchase decisions for their AI workloads.

Semi Analysis evaluate features that GPU renters care about, such as:

Security
LifeCycle and Technical Expertise
Slurm and Kubernetes
Storage
NCCL/RCCL Networking Performance
Reliability and Service Level Agreements (SLAs)
Automated Active and Passive Health Checks and Monitoring
Consumption Models, Price Per Value, and Availability
Technical Partnerships

Hedgehog impacts most of these criteria, but for this report we are focusing on NCCL/RCCL Networking Performance. Hedgehog optimizes NCCL/RCCL networking performance with open source AI networking software that tunes configuration of network switches and SuperNICs running in GPU servers. Our software does this in an automated fashion, so you don't have to spend weeks learning the secret sauce. That means rapid time to value for your very expensive GPU cluster.

GPU renters need to get lots of data in and out of GPU clusters. That requires a high performance network. When they run training workloads, GPUs need to share memory over the network. That means the AI network needs to run at peak performance if you want to get optimal GPU performance.

Here's a snapshot of the FarmGPU AI infrastructure stack. Hedgehog's job is to make it really easy to run an AI network while we squeeze every byte of performance out of the network equipment. And yes, there are a lot of hardware components in the solution architecture.

FarmGPU, Celestica and Hedgehog presented these results together at OCP Global Summit. We also talked about the challenges of getting this to work for fully automated, hyperscale operations. It's not easy. To make it easier for the Open Compute Project community to network like hyperscalers, we committed to contributing this solution as an OCP reference architecture. We'll post again when this is live in the OCP Marketplace.

Hedgehog AI Network delivers $50K minimum ROI per GPU

Marc Austin : Feb 27, 2024 6:03:15 PM

In my “AI Needs a New Network” post last week, I noted that NVIDIA reported $13 billion in networking ARR on $18.4 billion of annual data center...

AI Network

Kubernetes for AI Workloads: When Perfect Scheduling Meets Imperfect Networks

Art Fewell : Jun 12, 2025 10:29:17 AM

Why Your Advanced Kubernetes AI Scheduler Might Be Fighting a Losing Battle If you work with AI infrastructure, manage Kubernetes clusters for...

AI Network AI Edge AI Cloud Kubernetes AI Inference AI Training AI Fine-tuning

Collective Communications Explained: The Hidden Coordination Behind Distributed AI Training

Art Fewell : Jun 26, 2025 2:19:19 PM

If you work with AI infrastructure, build applications with large language models, or fine-tune models for your organization, you've probably heard...

AI Network AI Edge GPU network AI Cloud AI Inference AI Training Machine Learning AI Fine-tuning