Hedgehog AI Network

Industry Leading AI Network Performance with Hedgehog

Written by Marc Austin | Oct 19, 2025 5:10:11 PM

SemiAnalysis recently completed ClusterMAX testing on NVIDIA B200 GPU services offered by FarmGPU and RunPod.  NCCL testing of their Hedgehog AI network showed industry leading performance.  FarmGPU and RunPod offer better AI network performance than NVIDIA Israel-1, an AMD MI300X cluster running untuned RoCEv2, and even Crusoe Iceland's Infiniband network. 

Semi Analysis is an independent analyst that breaks the mold on market research.  They don't try to cover every industry.  They focus on neoclouds.  They don't scheduling briefings to write a lot of subjective content.  If you open your doors, they will test your GPU cluster for a number of ClusterMAX rating criteria.  Then they will give you a ClusterMAX rating.  The intent of all this is to help GPU renters make smart purchase decisions for their AI workloads.  

Semi Analysis evaluate features that GPU renters care about, such as:

  • Security
  • LifeCycle and Technical Expertise
  • Slurm and Kubernetes
  • Storage
  • NCCL/RCCL Networking Performance
  • Reliability and Service Level Agreements (SLAs)
  • Automated Active and Passive Health Checks and Monitoring
  • Consumption Models, Price Per Value, and Availability
  • Technical Partnerships

Hedgehog impacts most of these criteria, but for this report we are focusing on NCCL/RCCL Networking Performance.  Hedgehog optimizes NCCL/RCCL networking performance with open source AI networking software that tunes configuration of network switches and SuperNICs running in GPU servers.  Our software does this in an automated fashion, so you don't have to spend weeks learning the secret sauce.  That means rapid time to value for your very expensive GPU cluster.   

GPU renters need to get lots of data in and out of GPU clusters.  That requires a high performance network.  When they run training workloads, GPUs need to share memory over the network.  That means the AI network needs to run at peak performance if you want to get optimal GPU performance.  

Here's a snapshot of the FarmGPU AI infrastructure stack.  Hedgehog's job is to make it really easy to run an AI network while we squeeze every byte of performance out of the network equipment.  And yes, there are a lot of hardware components in the solution architecture. 

FarmGPU, Celestica and Hedgehog presented these results together at OCP Global Summit.  We also talked about the challenges of getting this to work for fully automated, hyperscale operations.  It's not easy.  To make it easier for the Open Compute Project community to network like hyperscalers, we committed to contributing this solution as an OCP reference architecture.  We'll post again when this is live in the OCP Marketplace.