How Hedgehog Works: The Control Model
Course: Course 1 - Foundations & Interfaces Duration: 15 minutes Prerequisites: Module 1.1: Welcome to Fabric Operations
Introduction
The Magic Explained
In Module 1.1, you used kubectl to explore the fabric. You saw switches, servers, and connections—all represented as Kubernetes resources. But how does typing kubectl apply -f vpc.yaml
actually configure switches?
It's not magic. It's declarative infrastructure with a well-designed control loop. Understanding this pattern is the key to operating confidently: you'll know what's happening behind the scenes, when things are working, and how to detect when they're not.
From Intent to Reality
Traditional networking: You configure each switch manually. If a switch reboots, you reconfigure it. If you want consistency, you write scripts.
Hedgehog approach: You declare your intent once. The fabric continuously reconciles actual state to match desired state. If a switch reboots, it automatically rejoins and reconfigures itself.
This is how hyperscalers run networks at scale—and now you can too.
What You'll Learn
This module demystifies the control model:
- How CRDs represent network intent
- How controllers watch for changes and take action
- How switch agents apply configurations
- How to observe this process in real-time
You'll create your first VPC and watch the reconciliation happen step by step.
Learning Objectives
By the end of this module, you will be able to:
- Explain the CRD reconciliation pattern - Describe how desired state (kubectl apply) becomes actual state (switch configuration)
- Identify the key components - Recognize Fabric Controller, Agent CRDs, and switch agents in the control flow
- Observe reconciliation in action - Watch Kubernetes events as a VPC is created and configured
- Interpret Agent CRD status - Use Agent CRD to view switch operational state and reconciliation progress
- Understand abstraction boundaries - Explain what the operator manages (CRDs) vs. what the system manages (switch configs)
Core Concepts
Concept 1: Desired State vs. Current State
Every declarative system has two views of the world:
Desired State: What you want (defined in CRDs)
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPC
metadata:
name: my-vpc
spec:
subnets:
default:
subnet: 10.10.1.0/24
vlan: 1001
Current State: What exists right now (switch configurations, actual VLANs, routes)
The Control Loop: Continuously measures the gap and takes actions to close it.
┌─────────────┐
│ Desired │ "I want VPC with VLAN 1001"
│ State │
│ (CRD) │
└──────┬──────┘
│
▼
┌─────────────┐
│ Controller │ "Compare desired vs. current"
│ Loop │ "Generate switch configs"
└──────┬──────┘
│
▼
┌─────────────┐
│ Current │ "VLAN 1001 is configured on switches"
│ State │
│ (Switches) │
└─────────────┘
If current ≠ desired, the controller takes action. If current = desired, nothing happens (idempotent).
Concept 2: The Hedgehog Control Plane Components
Three key actors:
1. Fabric Controller (Control Plane)
- Runs in Kubernetes (fab namespace)
- Watches for CRD changes (VPC, VPCAttachment, etc.)
- Computes desired switch configurations
- Generates Agent CRD specs (instructions for switches)
- Doesn't directly touch switches—works through Agents
2. Agent CRD (Bridge)
- One Agent CRD per switch (lives in Kubernetes)
- Contains two key sections:
spec
: Desired configuration (written by Fabric Controller)status
: Current state (reported by switch agent)
- The "contract" between controller and switch
3. Switch Agent (On Each Switch)
- Runs on SONiC switches (not in Kubernetes)
- Watches its Agent CRD for spec changes
- Applies configurations via gNMI (SONiC config interface)
- Reports status back to Agent CRD (NOS version, interface states, BGP neighbors, etc.)
Data Flow:
You Fabric Controller Agent CRD Switch Agent SONiC Switch
│ │ │ │ │
├─ kubectl apply vpc.yaml ─>│ │ │ │
│ │ │ │ │
│ ├─ Watch VPC CRD │ │ │
│ ├─ Compute config │ │ │
│ │ │ │ │
│ ├─ Update Agent spec ───> │ │
│ │ │ │ │
│ │ │<── Watch spec ───┤ │
│ │ │ │ │
│ │ │ ├─ Apply via gNMI ──>│
│ │ │ │ │
│ │ │ │<─── Config ACK ────┤
│ │ │ │ │
│ │ │<── Report status─┤ │
│ │ │ │ │
│<─── kubectl get vpc ──────┤ │ │ │
│ (shows Ready) │ │ │ │
Concept 3: Reconciliation and Events
When you apply a CRD, a series of events occurs:
Example: Creating a VPC
- Created: VPC object appears in Kubernetes
- Reconciling: Fabric Controller picks up the change
- Computing: Controller calculates which switches need configuration
- Agent Update: Controller writes specs to relevant Agent CRDs
- Applying: Switch agents apply configurations via gNMI
- Status Report: Agents report success back to Agent CRD status
- Ready: VPC status becomes Ready (when all switches configured)
Each step generates Kubernetes events that you can observe:
kubectl get events --field-selector involvedObject.name=my-vpc
# Example events:
# Normal Created VPC created
# Normal Reconciling Fabric controller processing
# Normal AgentUpdated Agent specs generated
# Normal Ready VPC reconciliation complete
Events are your window into the control loop.
Concept 4: Why This Model Matters for Operators
Benefits:
- Self-healing: Switch reboots? Agent reapplies config automatically
- Declarative: Describe what you want, not how to configure it
- Idempotent: Apply the same YAML repeatedly—safe and predictable
- Observable: Events and Agent status show exactly what's happening
- Git-friendly: CRDs are YAML files—version control, code review, GitOps
What You Control:
- VPC definitions (subnets, VLANs, DHCP)
- VPCAttachments (which servers connect to which VPCs)
- VPCPeering (inter-VPC connectivity)
- External connectivity (border leaf peering)
What the System Controls:
- Individual switch configurations (SONiC config DB)
- BGP peering between switches
- VXLAN tunnel setup
- Route distribution
You work at a higher abstraction level—the system handles the complexity. This isn't a limitation; it's empowerment. You're freed from managing thousands of configuration lines across dozens of switches, allowing you to focus on network intent rather than implementation details.
Concept 5: The Agent CRD as Source of Truth
Remember from Module 1.1: Most Hedgehog CRDs have minimal status fields. The real operational state lives in Agent CRDs.
Agent CRD contains:
- NOS version (SONiC version)
- Platform info (switch model)
- Interface states (up/down, speed)
- BGP neighbor status (established, down)
- ASIC resource usage (routes, nexthops)
- System health metrics
Why this matters:
- VPC says "Ready" but you want to see which switches configured it → Check Agent CRDs
- Troubleshooting connectivity → Check Agent CRD for interface status
- Monitoring fabric health → Watch Agent CRD status fields
Agent CRDs are your deep observability layer.
Hands-On Lab
Lab Title: Create a VPC and Observe Reconciliation
Overview:
You'll create your first VPC, watch Kubernetes events as it reconciles, and inspect Agent CRDs to see how switches report their state. This is your first time creating something (not just exploring), and you'll see the control loop in action.
Environment:
- Same Hedgehog vlab from Module 1.1
- kubectl access to create VPCs
- Events visible during active reconciliation
Task 1: Prepare to Observe
Objective: Open a second terminal to watch events in real-time (optional but recommended)
Steps (if using two terminals):
# Terminal 1: You'll create the VPC here
# (this is your main terminal)
# Terminal 2 (optional): Watch events continuously
kubectl get events --watch --field-selector involvedObject.kind=VPC
# This will stream events as they happen
# You'll see events appear as you create the VPC in Terminal 1
Note: If you only have one terminal, you can view events after creating the VPC—it works fine, just less dramatic!
Success Criteria:
- ✅ Have terminal(s) ready
- ✅ kubectl access verified
Task 2: Create Your First VPC
Objective: Create a simple VPC and apply it to the fabric
Steps:
Create VPC YAML:
cat > test-vpc.yaml <<'EOF'
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPC
metadata:
name: test-vpc
namespace: default
spec:
ipv4Namespace: default
vlanNamespace: default
subnets:
default:
subnet: 10.99.1.0/24
gateway: 10.99.1.1
vlan: 1999
dhcp:
enable: true
range:
start: 10.99.1.10
end: 10.99.1.99
EOF
Apply the VPC:
kubectl apply -f test-vpc.yaml
# Expected output:
# vpc.vpc.githedgehog.com/test-vpc created
Immediately check events (Terminal 1):
kubectl get events --field-selector involvedObject.name=test-vpc --sort-by='.lastTimestamp'
# You should see events like:
# Normal Created VPC test-vpc created
# Normal Reconciling Processing VPC configuration
# (more events as reconciliation progresses)
Validation:
# Check VPC status
kubectl get vpc test-vpc
# Expected output:
# NAME AGE
# test-vpc 30s
# Get full VPC details
kubectl get vpc test-vpc -o yaml | grep -A 5 status:
# Status may be empty {} initially, or show conditions
Success Criteria:
- ✅ VPC created without errors
- ✅ Events visible showing reconciliation
- ✅ VPC object exists in cluster
Task 3: Observe Agent CRD Updates
Objective: See how the Fabric Controller updates Agent CRDs with configuration
Steps:
# View all Agent CRDs (one per switch)
kubectl get agents -n fab
# Expected output:
# NAME AGE
# leaf-01 Xh
# leaf-02 Xh
# ... (7 agents total)
Inspect an Agent to see configuration:
# Look at leaf-01's Agent CRD
kubectl get agent leaf-01 -n fab -o yaml | head -50
# Look for these sections:
# - spec: Contains desired configuration (from Fabric Controller)
# - status: Contains switch operational state (from switch agent)
Check Agent status for VPC-related info:
# View Agent status (switch reported state)
kubectl get agent leaf-01 -n fab -o yaml | grep -A 20 "status:"
# Look for:
# - nos.version: SONiC version
# - platform: Switch platform info
# - interfaces: Interface states
# - bgpNeighbors: BGP peering status (if relevant)
Observation Questions:
- Can you see the NOS version in the Agent status? _______
- Are interfaces listed in the status? _______
- What information does the Agent provide that VPC doesn't? _______
Success Criteria:
- ✅ Can view Agent CRDs
- ✅ Can see spec and status sections
- ✅ Understand Agent shows switch operational state
Task 4: Verify VPC is Ready
Objective: Confirm reconciliation completed successfully
Steps:
# Check VPC status (wait 30-60 seconds after creation)
kubectl get vpc test-vpc -o yaml | grep -A 10 status:
# Look for status fields (may be minimal or empty)
Check recent events:
kubectl get events --field-selector involvedObject.name=test-vpc --sort-by='.lastTimestamp' | tail -10
# Look for "Ready" or "ReconcileSuccess" events
Describe the VPC:
kubectl describe vpc test-vpc
# Look at the Events section at the bottom
# Should show creation, reconciliation, and completion
Key Insight:
Even if VPC status is minimal, events tell the story. You can see:
- When VPC was created
- When controller reconciled it
- If any errors occurred
- When reconciliation completed
Success Criteria:
- ✅ VPC shows no error events
- ✅ Reconciliation events visible
- ✅ VPC object stable (can be retrieved without errors)
Task 5: Optional - Modify and Re-Reconcile
Objective: See reconciliation happen again when you change desired state
Steps:
Modify the VPC (add a second subnet):
cat > test-vpc-updated.yaml <<'EOF'
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPC
metadata:
name: test-vpc
namespace: default
spec:
ipv4Namespace: default
vlanNamespace: default
subnets:
default:
subnet: 10.99.1.0/24
gateway: 10.99.1.1
vlan: 1999
dhcp:
enable: true
range:
start: 10.99.1.10
end: 10.99.1.99
backend:
subnet: 10.99.2.0/24
gateway: 10.99.2.1
vlan: 1998
EOF
Apply the update:
kubectl apply -f test-vpc-updated.yaml
# Expected output:
# vpc.vpc.githedgehog.com/test-vpc configured
Watch events:
kubectl get events --field-selector involvedObject.name=test-vpc --sort-by='.lastTimestamp' | tail -10
# You should see new reconciliation events as the update is applied
Observe: The controller detects the change, updates Agent specs, switches reconfigure—all automatically.
Success Criteria:
- ✅ Update applied without errors
- ✅ New reconciliation events visible
- ✅ VPC now has two subnets (verify with
kubectl get vpc test-vpc -o yaml
)
Lab Summary
What you did:
- ✅ Created your first VPC using kubectl apply
- ✅ Watched Kubernetes events during reconciliation
- ✅ Inspected Agent CRDs to see switch operational state
- ✅ Verified VPC reconciliation completed successfully
- ✅ (Optional) Modified VPC and watched re-reconciliation
What you learned:
- Declarative infrastructure: You declare desired state, the system reconciles
- Control loop: Fabric Controller → Agent CRD → Switch Agent → SONiC
- Events are your observability window into reconciliation
- Agent CRDs contain deep switch operational state
- Changes are automatically reconciled (self-healing)
Key insight: You didn't log into a single switch, yet network configuration happened across the fabric. That's the power of the abstraction—and why understanding the control model lets you operate confidently.
Wrap-Up
Key Takeaways
- Desired state (CRD) → Actual state (switches) via the reconciliation loop
- Three actors: Fabric Controller, Agent CRD, Switch Agent
- Events show progress: Watch reconciliation happen in real-time
- Agent CRDs are deep observability: Switch operational state lives here
- Abstraction means power: You manage intent, the system handles complexity
Real-World Impact
You now understand:
- Why kubectl apply is safe (idempotent, declarative)
- What "reconciling" means (closing the gap between desired and actual)
- Where to look when troubleshooting (events, Agent CRD status)
- Why Hedgehog is self-healing (continuous reconciliation)
Preview of Module 1.3
Next, you'll go deeper into kubectl and YAML workflows: how to write VPC specs, validate before applying, use kubectl efficiently, and follow best practices for managing CRDs. You'll become fluent in the primary tool you'll use daily.
Assessment
Quiz Questions
Question 1: Multiple Choice
In the Hedgehog control model, what is the role of the Fabric Controller?
- A) Directly configures switches via SSH
- B) Watches CRDs, computes configurations, updates Agent CRD specs
- C) Runs on switches and applies configurations
- D) Stores VPC definitions in a database
Correct Answer: B
Explanation:
The Fabric Controller runs in Kubernetes, watches for CRD changes (VPCs, VPCAttachments, etc.), computes desired switch configurations, and writes them to Agent CRD specs. It doesn't directly touch switches (that's the Switch Agent's job, C), doesn't use SSH (A), and CRDs are stored in Kubernetes etcd, not a separate database (D).
Question 2: Scenario-Based
You create a VPC with kubectl apply -f vpc.yaml
. What is the correct order of operations?
- Switch Agent applies config via gNMI
- Fabric Controller updates Agent CRD spec
- VPC CRD created in Kubernetes
- Fabric Controller watches VPC CRD and computes config
Answer: 3 → 4 → 2 → 1
Explanation:
Correct flow: VPC CRD created (3), Fabric Controller watches and computes (4), Controller updates Agent spec (2), Switch Agent applies via gNMI (1). This is the reconciliation loop in action.
Question 3: True/False
True or False: If a switch reboots, you must manually re-run kubectl apply
to reconfigure it.
Answer: False
Explanation:
The switch agent automatically re-reads its Agent CRD when it comes back online and reapplies all configurations. This is self-healing—one of the key benefits of declarative infrastructure. Manual intervention is not required.
Question 4: Multiple Choice
Where can you find detailed information about a switch's BGP neighbors and interface states?
- A) VPC CRD status field
- B) Agent CRD status field
- C) Fabric Controller logs only
- D) You must SSH into the switch
Correct Answer: B
Explanation:
Agent CRD status contains comprehensive switch operational state reported by the switch agent, including BGP neighbors, interface states, ASIC resources, and NOS version. VPC status (A) is minimal. Controller logs (C) help but aren't the primary source. SSH (D) is possible but not the operator's tool—kubectl is.
Question 5: Practical
You create a VPC but want to verify reconciliation completed successfully. What kubectl commands would you use? (List 2)
Answer:
# Option 1: Check events for reconciliation status
kubectl get events --field-selector involvedObject.name=<vpc-name> --sort-by='.lastTimestamp'
# Option 2: Describe VPC to see events in summary
kubectl describe vpc <vpc-name>
# Option 3: Get VPC status (if populated)
kubectl get vpc <vpc-name> -o yaml | grep -A 10 status:
Rubric:
- Full credit: Any 2 valid commands that show reconciliation status
- Partial credit: 1 valid command
- No credit: Commands that don't reveal reconciliation state
Practical Assessment
Task: Create a VPC, verify it reconciled successfully, then delete it cleanly.
Steps:
- Create VPC with
kubectl apply -f vpc.yaml
- Use kubectl to verify reconciliation (events or describe)
- Delete VPC with
kubectl delete vpc <name>
- Verify deletion completed (no lingering resources)
Success Criteria:
- ✅ VPC created without errors
- ✅ Used appropriate kubectl commands to verify reconciliation
- ✅ Can explain what events indicate success
- ✅ Deleted VPC cleanly
- ✅ Understands reconciliation happened both on create and delete
Reference
Hedgehog CRDs Used in This Module
VPC - Virtual Private Cloud with isolated subnets
- View:
kubectl get vpc
- Inspect:
kubectl describe vpc <name>
- Apply:
kubectl apply -f vpc.yaml
- Full Reference
Agent - Switch agent containing detailed operational state
- View:
kubectl get agents -n fab
- Inspect:
kubectl get agent <switch-name> -n fab -o yaml
- Full Reference
kubectl Commands Reference
VPC creation:
kubectl apply -f vpc.yaml # Create or update VPC
kubectl get vpc <name> # View VPC summary
kubectl describe vpc <name> # View VPC details and events
kubectl get vpc <name> -o yaml # View full VPC YAML
kubectl delete vpc <name> # Delete VPC
Event observation:
kubectl get events --field-selector involvedObject.name=<name> # Filter events by resource
kubectl get events --watch # Watch events in real-time
kubectl get events --sort-by='.lastTimestamp' # Sort events by time
kubectl describe vpc <name> # View events in context
Agent CRD inspection:
kubectl get agents -n fab # List all agents
kubectl get agent <switch-name> -n fab -o yaml # View agent details
kubectl describe agent <switch-name> -n fab # View agent summary
Workflow Reference
This module uses Workflow 1: Create VPC from Scratch from the workflow reference:
Related Documentation
Module Complete! Ready to continue to Module 1.3: Interfaces: kubectl/YAML.