Hedgehog
AI Network

The network AI needs

Standard enterprise networks rely on fragile, element-by-element CLI configuration. Hedgehog replaces this with a continuous, intent-based software architecture built entirely on Kubernetes.

The Fabric Controller: At the heart of the Hedgehog architecture is a dedicated Control Node running Kubernetes and the Hedgehog Fabric Controller.
Declarative Intent via CRDs: Operators interact natively with the Kubernetes API, declaring network intents—like VPCs, peerings, and Gateways—as standard Custom Resource Definitions (CRDs).
Continuous Reconciliation: The Fabric Controller translates these intents into exact per-switch configurations. A lightweight Hedgehog Agent running on each switch continuously reconciles the declared state against the physical hardware, ensuring your network always matches your intent.
Gateway Nodes: Dedicated, high-performance Gateway nodes sit at the edge of the fabric, providing stateful NAT, firewalling, and seamless external connectivity to the rest of your grid.

We built Hedgehog on the following core tenets to bridge the gap between bare-metal AI performance and cloud-native agility.

Freedom to Choose: Choose your switches and your preferred NOS, and change them freely down the line. No layer is a hostage. Where the software is ours, the source is open, allowing you to verify exactly what the system is meant to do rather than just trusting that it does. This commitment to choice keeps us honest twice over: on price, because you can always leave, and on behavior, because you can always look.
Manage Outcomes, Not Network Gear: Declare what you need, like a VPC or a service insertion, and the fabric handles the underlying routing, addressing, and per-switch state. We deliver true cloud abstraction: opinionated cloud constructs instead of protocol soup. This works because our software is rigorously tested and open-source. When something breaks, you can see what the fabric intended versus what it did.
Best-in-Class, Not Built-in-House We don’t rebuild infrastructure that already works. Our control plane is a Kubernetes API; our observability relies on the LGTM stack. By integrating with the tools and skills your team already possesses, we focus our engineering purely on the network—instead of reinventing inferior versions of everything around it.
Secure from the Ground Up Multi-tenancy is an assumption, not an add-on. Tenant isolation lives in the same declarative model as everything else and is enforced in the data plane, Workloads, models, and data remain separated by architectural design—never by a fragile config you just hope someone set correctly.
Hyperscale Engineering for the Rest of Us Hyperscalers run networks like software—version-controlled, pipelined, and deeply observed—using massive teams. We built that operating model directly into our product, minus the headcount. We lead with an API because real operations run on Infrastructure-as-Code, not clicking through GUI screens at 2 AM. Built for Day Two, not the keynote.
Two Switches or Two Thousand The exact same system runs a two-switch lab and a two-thousand-switch cluster. Scale up to massive build-outs without control-plane bottlenecks, or start small and grow into production on the same fabric. Zero re-platforming required on the way up.

Secure Multi Tenancy

The Challenge: Securely sharing expensive GPU clusters across different internal teams or external clients without data cross-talk.
The Solution: Hedgehog brings hyperscaler-grade logical isolation directly to bare metal. Operators can instantly spin up fully isolated Virtual Private Clouds (VPCs) with strict boundary enforcement, allowing you to partition and monetize your AI services securely.

Provides the core abstractions modern teams need
Enforces strict multi-tenant isolation across physical clusters

Network Performance

The Challenge: AI workloads demand two distinct network profiles: massive, lossless bandwidth to synchronize distributed training jobs without stalling GPUs, and predictable, ultra-low latency for high-concurrency inference serving.
The Solution: Hedgehog delivers an automated underlay and overlay network dynamically optimized for these unique traffic flows. By deploying validated configurations on open hardware, our fabric eliminates dropped packets to slash training time-to-completion, while ensuring the high-speed, reliable data delivery required to maximize your inference tokens per second.

Lossless, high-throughput underlay and overlay automation
Permanent hardware independence and vendor choice

Network Availability

The Challenge: Brittle configurations and manual updates lead to downtime and broken training runs.
The Solution: The continuous reconciliation of our Fabric Agents guarantees that the network state remains exactly as intended. If a state drifts or a link fails, the fabric automatically reroutes and heals without manual intervention.

Native management via the Kubernetes API and CRDs
Empowers platform teams to control networking within existing workflows

Lifecycle Management

The Challenge: Racking, provisioning, and updating network hardware manually takes months and requires specialized engineers.
The Solution: Zero Touch Lifecycle Management (ZTLM) accelerates your time to GPU value. Our software automatically discovers bare-metal hardware, provisions the OS, and pushes validated configurations the moment a switch is plugged in—taking you from rack to ready in hours.

Automated device discovery and declarative provisioning
Free up engineering resources with hitless lifecycle maintenance

Scales to Fit

The Challenge: Network architectures that require massive upfront over-provisioning or require forklift upgrades to grow.
The Solution: Hedgehog supports highly flexible, automated spine-leaf topologies. Start with the capacity you need today and scale out your physical topology non-disruptively as your AI cluster grows

High-bandwidth external routing and simplified BGP peering
Unifies distributed AI workloads without traffic choke points

Observability

The Challenge: Traditional monitoring tools sample traffic too slowly to catch the micro-bursts that stall GPU workloads.
The Solution: Deep, real-time telemetry mapped directly to cluster performance. Hedgehog exposes granular flow and queue-depth visibility, streaming natively into Prometheus and Grafana to proactively detect and resolve packet drops.

Real-time visibility into micro-bursts and queue depths
Full automation ensures clusters live up to their absolute potential

Firewall & NAT

The Challenge: Securing proprietary models and training data at line rate without degrading cluster performance.
The Solution: Integrated, stateful NAT and firewalling at the Gateway layer. Enforce zero-trust micro-segmentation and robust security policies directly within the fabric's flow, keeping your multi-tenant boundaries locked down.

Policy-driven security enforcement within the fabric
Strict tenant isolation guards against internal and external cross-talk

Data Center Interconnect

The Challenge: Distributed AI training requires bridging "AI islands" to external data lakes and public clouds.
The Solution: Unify your distributed workloads. We simplify BGP peering and Data Center Interconnect (DCI) routing, providing high-bandwidth external ingress and egress to keep your training pipelines fed without choke points.

Predictable Costs for High-Volume Data Transfer
Hedgehog never charge ingress or egress fees

Hedgehog
AI Network

The network AI needs

How the Hedgehog Fabric Works

Key Design Principles

Hedgehog Solves the Hardest AI Networking Challenges

Secure Multi Tenancy

Network Performance

Network Availability

Lifecycle Management

Scales to Fit

Observability

Firewall & NAT

Data Center Interconnect

HedgehogAI Network

The network AI needs

How the Hedgehog Fabric Works

Key Design Principles

Hedgehog Solves the Hardest AI Networking Challenges

Secure Multi Tenancy

Network Performance

Network Availability

Lifecycle Management

Scales to Fit

Observability

Firewall & NAT

Data Center Interconnect

Hedgehog
AI Network