Decommission & Cleanup
Introduction
In Modules 2.1-2.3, you completed the provisioning workflow: creating the web-app-prod VPC, attaching two servers, and validating connectivity. Your infrastructure is operational and serving traffic. But what happens when it's time to decommission?
Every resource has a lifecycle—from creation to deletion. Proper decommissioning is as critical as proper provisioning. Delete resources in the wrong order, and you risk orphaned configurations, switch misconfigurations, or leaving resources that consume fabric capacity unnecessarily.
In this final module of Course 2, you'll learn safe decommissioning workflows: deleting VPCAttachments before VPCs, validating cleanup completion, and understanding when to keep versus delete resources. This completes the full Day 1 operations lifecycle: Provision → Attach → Validate → Cleanup.
Learning Objectives
By the end of this module, you will be able to:
- Decommission VPCAttachments safely - Remove server-to-VPC connections in correct order
- Decommission VPCs safely - Delete VPCs after all attachments removed
- Validate cleanup completion - Verify resources deleted and switches reconfigured
- Understand lifecycle management - Know when to keep vs delete resources
- Apply Day 1 operations knowledge - Complete the provisioning→validation→cleanup workflow
Prerequisites
- Module 2.1 completion (VPC Provisioning Essentials)
- Module 2.2 completion (VPC Attachments)
- Module 2.3 completion (Connectivity Validation)
- Existing
web-app-prodVPC with 2 VPCAttachments (from previous modules) - kubectl access to Hedgehog fabric
Scenario: Application Decommission
The web-app-prod application is being decommissioned after a successful migration to a new platform. Your task: safely remove the VPCAttachments and VPC without disrupting other fabric operations. You'll follow the proper cleanup order (attachments first, then VPC), validate each step, and verify the fabric returns to a clean state. This is your opportunity to complete the full lifecycle workflow and demonstrate Day 1 operations mastery.
Lab Steps
Step 1: Pre-Decommission Review
Objective: Identify resources to delete and understand current state
Before deleting anything, understand what you're removing and document the current state for validation.
List current VPCs:
kubectl get vpcs
Expected output (similar to):
NAME AGE
web-app-prod 2h
List current VPCAttachments:
kubectl get vpcattachments
Expected output (similar to):
NAME AGE
server-01-web-servers 2h
server-05-worker-nodes 2h
Review what will be deleted:
Resources to decommission:
- web-app-prod VPC with 2 subnets:
- web-servers (10.10.10.0/24, VLAN 1010)
- worker-nodes (10.10.20.0/24, VLAN 1020, DHCP enabled)
- server-01-web-servers VPCAttachment (MCLAG connection to leaf-01/leaf-02)
- server-05-worker-nodes VPCAttachment (ESLAG connection to leaf-03/leaf-04)
Document current state for later validation:
# Save current VPC configuration
kubectl get vpc web-app-prod -o yaml > web-app-prod-backup.yaml
# List attachments for documentation
kubectl get vpcattachments -o wide
Critical decommissioning order:
1. Delete VPCAttachments FIRST (server-01-web-servers, server-05-worker-nodes)
2. Validate all attachments removed
3. Delete VPC LAST (web-app-prod)
4. Validate cleanup complete
Success Criteria:
- ✅ Identified all resources to delete (1 VPC, 2 VPCAttachments)
- ✅ Documented current state
- ✅ Understand decommissioning order: Attachments → VPC
Step 2: Delete VPCAttachments (Must Delete First)
Objective: Remove server-to-VPC connections before deleting VPC
CRITICAL RULE: Always delete VPCAttachments BEFORE deleting the VPC.
VPCAttachments depend on the VPC they reference. Deleting the VPC first would leave orphaned attachments that reference a non-existent VPC, causing reconciliation errors.
Delete the first VPCAttachment (server-01):
kubectl delete vpcattachment server-01-web-servers
Expected output:
vpcattachment.vpc.githedgehog.com "server-01-web-servers" deleted
Delete the second VPCAttachment (server-05):
kubectl delete vpcattachment server-05-worker-nodes
Expected output:
vpcattachment.vpc.githedgehog.com "server-05-worker-nodes" deleted
Verify VPCAttachments are deleted:
# List all VPCAttachments (web-app resources should be gone)
kubectl get vpcattachments
# Specifically check for web-app attachments (should return empty)
kubectl get vpcattachments | grep web-app
Check cleanup reconciliation events:
# View events for attachment deletions
kubectl get events --sort-by='.lastTimestamp' | tail -20
# Look for events indicating cleanup reconciliation
What happens during VPCAttachment deletion:
- Fabric controller detects VPCAttachment deletion
- Identifies affected switches (leaf-01, leaf-02 for server-01; leaf-03, leaf-04 for server-05)
- Computes cleanup configuration:
- Remove VLAN from server-facing ports
- Remove VXLAN tunnels (if no other VPCs using them)
- Remove BGP EVPN routes
- Updates Agent CRDs (removes config from Agent spec)
- Switch agents apply cleanup (unconfigure ports)
- VPCAttachment deleted from Kubernetes
Wait for reconciliation to complete (typically 10-30 seconds).
Success Criteria:
- ✅ Both VPCAttachments deleted successfully
- ✅
kubectl get vpcattachmentsshows no web-app resources - ✅ Events show successful cleanup reconciliation
- ✅ No error events
Step 3: Delete VPC (After Attachments Removed)
Objective: Remove VPC only after all attachments are gone
Before deleting the VPC, verify no attachments remain:
# Check for any remaining attachments referencing web-app-prod
kubectl get vpcattachments | grep web-app-prod
# Expected: No results (empty output)
If any attachments remain, DO NOT proceed. Delete them first.
Delete the VPC:
kubectl delete vpc web-app-prod
Expected output:
vpc.vpc.githedgehog.com "web-app-prod" deleted
Verify VPC deletion:
# Attempt to get the VPC (should return NotFound error)
kubectl get vpc web-app-prod
Expected output:
Error from server (NotFound): vpcs.vpc.githedgehog.com "web-app-prod" not found
This error is expected and correct—it confirms the VPC is deleted.
Check VPC cleanup events:
# View recent events
kubectl get events --sort-by='.lastTimestamp' | tail -20
List remaining VPCs to confirm:
# web-app-prod should not appear in this list
kubectl get vpcs
What happens during VPC deletion:
- Fabric controller verifies no attachments exist (deletion would fail if attachments remain)
- Removes VPC configuration:
- VXLAN VNI released back to namespace pool
- VLAN namespace entries removed
- IP namespace entries removed
- VPC deleted from Kubernetes
Success Criteria:
- ✅ VPC deleted successfully
- ✅
kubectl get vpc web-app-prodreturns NotFound error - ✅ Events show successful VPC cleanup
- ✅ No error events
- ✅ VPC no longer appears in
kubectl get vpcs
Step 4: Validate Cleanup Completion
Objective: Verify all resources deleted and switches reconfigured
Verify VPC completely removed:
# Should return NotFound error
kubectl get vpc web-app-prod
Verify VPCAttachments completely removed:
# web-app resources should be gone
kubectl get vpcattachments | grep web-app
Check Agent CRDs for switch cleanup:
# Check leaf-01 (should no longer have VLAN 1010 for server-01)
kubectl get agent leaf-01 -n fab -o yaml | grep -A 5 "1010"
# Check leaf-03 (should no longer have VLAN 1020 for server-05)
kubectl get agent leaf-03 -n fab -o yaml | grep -A 5 "1020"
Expected result: VLANs 1010 and 1020 should be removed from the relevant Agent CRDs, indicating switches have been reconfigured.
Verify no orphaned resources:
# List all VPCAttachments to ensure none reference deleted VPC
kubectl get vpcattachments -o yaml | grep "web-app-prod"
# Expected: No results
Review cleanup event timeline:
# View all recent events to see cleanup progression
kubectl get events --sort-by='.lastTimestamp' | tail -30
Validation checklist:
- ✅ VPC web-app-prod deleted (NotFound error when queried)
- ✅ VPCAttachments deleted (no web-app resources in list)
- ✅ Agent CRDs updated (VLANs removed from switch configurations)
- ✅ No orphaned resources
- ✅ Events show successful cleanup reconciliation
- ✅ Fabric returned to clean state
Success Criteria:
- ✅ All validation checks passed
- ✅ Fabric state clean (no web-app resources)
- ✅ Switches reconfigured (VLANs removed)
- ✅ No errors in cleanup process
Step 5: Course 2 Completion - Full Lifecycle Review
Objective: Understand the complete Day 1 operations workflow
You've now completed the entire Day 1 operations lifecycle for Hedgehog Fabric:
Module 2.1: Provision VPC
- Created web-app-prod VPC with two subnets
- Configured IPv4 (static) and DHCPv4 subnets
- Learned VPC CRD structure and reconciliation
Module 2.2: Attach Servers
- Attached server-01 (MCLAG) to web-servers subnet
- Attached server-05 (ESLAG) to worker-nodes subnet
- Understood connection types and VPCAttachment workflow
Module 2.3: Validate Connectivity
- Validated VPC and VPCAttachment configurations
- Inspected Agent CRDs for switch-level state
- Learned event-based validation and troubleshooting
Module 2.4: Decommission & Cleanup (this module)
- Deleted VPCAttachments in correct order
- Deleted VPC after attachments removed
- Validated cleanup completion
Complete Lifecycle:
Provision → Attach → Validate → Operate → Decommission
When to keep vs delete resources:
Keep resources when:
- Application temporarily offline (maintenance, updates)
- Troubleshooting connectivity issues (keep for debugging)
- Resource reserved for future use
- Testing in progress (don't delete mid-test)
Delete resources when:
- Application permanently decommissioned
- Migration to new VPC complete
- Testing finished (dev/test environments)
- Resource no longer needed
- Cleaning up failed deployments
Deletion impact:
- VPCAttachment deletion: Server immediately loses VPC connectivity
- VPC deletion: All subnets, VLANs, and routing removed
- Recovery: Can re-create from YAML manifests (but auto-assigned VLANs may change)
Production best practices:
- Verify before deleting: Confirm with application team
- Plan maintenance window: Deletion causes immediate connectivity loss
- Document deletion: Record why, when, who approved
- Check dependencies: Ensure no other resources depend on VPC
- Validate after deletion: Confirm cleanup completed successfully
- Save manifests: Keep YAML backups for recovery if needed
Success Criteria:
- ✅ Understand complete Day 1 lifecycle workflow
- ✅ Know when to delete vs keep resources
- ✅ Understand deletion impact and recovery options
- ✅ Ready for production decommissioning tasks
- ✅ Course 2 Complete!
Concepts & Deep Dive
Decommissioning Order: Why Attachments First
The Golden Rule: Always delete VPCAttachments BEFORE deleting VPCs.
This order is not a best practice—it's a requirement for safe decommissioning.
Why this order matters:
1. Dependency chain
VPCAttachments depend on VPCs. Each VPCAttachment references a VPC and subnet:
spec:
connection: server-01--mclag--leaf-01--leaf-02
subnet: web-app-prod/web-servers # References VPC
If you delete the VPC first, VPCAttachments reference a non-existent VPC, causing reconciliation errors.
2. Switch configuration order
Server ports need to be unconfigured (VLANs removed) before the VPC configuration is removed from the fabric. Deleting the VPC first leaves switches partially configured.
3. Orphaned resources
Deleting the VPC first creates orphaned VPCAttachments that no longer serve any purpose but still exist in Kubernetes, consuming resources and causing confusion.
4. Reconciliation failures
The fabric controller cannot properly reconcile VPCAttachments without the parent VPC definition. Events will show errors, and manual cleanup becomes necessary.
What happens if you delete VPC first?
# DON'T DO THIS - Wrong order!
kubectl delete vpc web-app-prod # VPC deleted
kubectl delete vpcattachment server-01-web-servers # Attachment references deleted VPC
Consequences:
- VPCAttachments reference non-existent VPC
- Events show reconciliation errors: "VPC web-app-prod not found"
- Switch ports may not be properly unconfigured
- Manual cleanup required
- Agent CRDs may retain partial configuration
Correct decommissioning order:
# Step 1: Delete ALL VPCAttachments first
kubectl delete vpcattachment server-01-web-servers
kubectl delete vpcattachment server-05-worker-nodes
# Step 2: Verify all attachments deleted
kubectl get vpcattachments | grep web-app-prod # Should be empty
# Step 3: Delete VPC
kubectl delete vpc web-app-prod
# Step 4: Validate cleanup complete
kubectl get vpc web-app-prod # Should return NotFound
Kubernetes safeguards:
Kubernetes has some protections:
- If you attempt to delete a VPC with active attachments, the deletion may be blocked
- Finalizers prevent premature deletion in some cases
However, don't rely on safeguards alone. Follow the correct order as a matter of operational discipline.
Cleanup Reconciliation Process
Understanding what happens during deletion helps troubleshoot cleanup issues.
VPCAttachment deletion reconciliation:
1. Kubernetes receives delete request
kubectl delete vpcattachment server-01-web-servers
Kubernetes marks the resource for deletion.
2. Fabric Controller detects deletion
The fabric controller watches for deleted VPCAttachment CRDs and picks up the deletion event.
3. Identifies affected switches
Controller determines which switches serve this connection:
- server-01 (MCLAG): leaf-01 and leaf-02
- server-05 (ESLAG): leaf-03 and leaf-04
4. Computes cleanup configuration
Controller calculates what needs to be removed:
- Remove VLAN from server-facing ports (e.g., VLAN 1010 from port E1/5)
- Remove VXLAN tunnels (if no other VPCs using the same VNI)
- Remove BGP EVPN routes for this VPC/subnet
- Remove DHCP relay configuration (if applicable)
5. Updates Agent CRDs
Controller removes configuration from affected Agent CRD specs:
# Before deletion: Agent spec contains VLAN 1010
spec:
ports:
E1/5:
mode: access
vlan: 1010
# After deletion: VLAN 1010 removed from port config
spec:
ports:
E1/5:
mode: disabled # or removed entirely
6. Switch agents apply cleanup
Each switch agent (running on or for the switch):
- Watches its Agent CRD
- Detects spec change (VLAN removed)
- Applies configuration to physical switch via gNMI
- Unconfigures port, removes VLAN, updates routing
7. VPCAttachment deleted
Once reconciliation completes, the VPCAttachment is fully removed from Kubernetes.
Timeline:
- VPCAttachment deletion request: < 1 second
- Reconciliation: 10-30 seconds (depends on fabric size)
- Switch configuration cleanup: 10-20 seconds
- Full cleanup: 30-60 seconds
VPC deletion reconciliation:
1. Kubernetes receives delete request
kubectl delete vpc web-app-prod
2. Fabric Controller verifies no attachments
Controller checks if any VPCAttachments reference this VPC. If attachments exist, deletion may be blocked or delayed.
3. Removes VPC configuration
- VXLAN VNI released: VNI returned to namespace pool for reuse
- VLAN namespace entries removed: VLANs 1010 and 1020 freed
- IP namespace entries removed: Subnet CIDRs freed
4. VPC deleted
VPC removed from Kubernetes etcd.
Timeline:
- VPC deletion: < 5 seconds (namespace cleanup is fast)
When to Keep vs Delete Resources
Decommissioning is not always the right choice. Understanding when to keep versus delete resources is critical for production operations.
Keep resources when:
1. Application temporarily offline
- Planned maintenance windows
- Software updates or patches
- Database migrations
- Temporary scaling down
Why: Re-creating VPCs and attachments later is more work than keeping them.
2. Troubleshooting connectivity issues
- Debugging network problems
- Investigating performance issues
- Testing configuration changes
Why: Deleting resources during troubleshooting eliminates evidence and makes root cause analysis harder.
3. Resource reserved for future use
- Pre-provisioned for upcoming deployment
- Capacity planning (staging environment ready)
- Reserved for specific team or project
Why: Reprovisioning later may result in different auto-assigned VLANs or other configuration drift.
4. Testing in progress
- Development environments with active work
- Integration tests running
- Performance benchmarks in flight
Why: Deleting mid-test invalidates results and wastes effort.
Delete resources when:
1. Application permanently decommissioned
- Service retired, no longer needed
- Business unit shut down
- Product end-of-life
Why: Keeping unused resources wastes fabric capacity and creates confusion.
2. Migration to new VPC complete
- Traffic cut over to new infrastructure
- Old VPC verified empty
- Rollback window passed
Why: No reason to keep old infrastructure after successful migration.
3. Testing finished
- Development testing complete
- Staging environment no longer needed
- Temporary test infrastructure
Why: Test environments should be ephemeral to free resources for other tests.
4. Resource no longer needed
- Over-provisioned capacity being scaled down
- Duplicate or redundant resources
- Misconfigured resources being replaced
Why: Clean up reduces operational complexity.
5. Cleaning up failed deployments
- VPC provisioned incorrectly (wrong subnets, VLANs)
- Attachments created in error
- Testing mistakes
Why: Start fresh rather than trying to fix broken configurations.
Production decommissioning checklist:
Before deleting resources in production, verify:
- Application team confirms decommission approved
- No active traffic to servers in VPC
- Maintenance window scheduled (deletion causes immediate connectivity loss)
- Backup of YAML manifests saved (for recovery if needed)
- Dependencies checked (no other resources depend on this VPC)
- Decommission documented (who, what, when, why)
- Post-deletion validation plan ready
Deletion Impact and Recovery
VPCAttachment deletion impact:
Immediate effects:
- Server connectivity: Server immediately loses VPC connectivity
- Active connections: All active TCP/UDP connections dropped
- Switch ports: VLANs removed from server-facing ports within 10-30 seconds
- No warning: Deletion is immediate, no graceful shutdown
What survives:
- Server OS configuration: Server network config (static IPs, routes) unchanged
- Server itself: Server CRD and Connection CRD remain
- VPC: VPC still exists and can be attached to other servers
Recovery: Re-create the VPCAttachment from YAML manifest:
kubectl apply -f server-01-attachment.yaml
Connectivity restores within 30-60 seconds after reconciliation.
VPC deletion impact:
Immediate effects:
- All subnets deleted: Every subnet in the VPC removed
- All VLANs released: VLANs returned to namespace pool for reuse
- All routing removed: VPC routing tables deleted from fabric
- Cannot have attachments: Deletion blocked if attachments exist
What survives:
- Servers: Server CRDs and Connection CRDs remain (can attach to other VPCs)
- Switches: Switch hardware unaffected, Agent CRDs updated
Recovery: Re-create the VPC from YAML manifest:
kubectl apply -f web-app-prod-vpc.yaml
Important recovery notes:
- If VLANs were auto-assigned, new VLANs may differ (namespace reuses freed VLANs)
- All VPCAttachments must be re-created after VPC restored
- Full recovery time: 1-2 minutes (VPC creation + attachment reconciliation)
Cannot delete VPC if attachments exist:
kubectl delete vpc web-app-prod
# Error: VPC has active attachments
Error message (example):
Error: cannot delete VPC "web-app-prod": active VPCAttachments exist
Solution:
- List all attachments:
kubectl get vpcattachments | grep web-app-prod - Delete each attachment:
kubectl delete vpcattachment <name> - Retry VPC deletion:
kubectl delete vpc web-app-prod
Orphaned Resources and Cleanup
What are orphaned resources?
Orphaned resources are Kubernetes objects that no longer serve a purpose but still exist, consuming resources and causing operational confusion.
Common causes:
Deleting VPC before VPCAttachments (most common)
- VPCAttachments reference non-existent VPC
- Reconciliation errors in events
- Switch ports may retain partial configuration
Manual switch configuration without CRD cleanup
- Direct switch CLI changes bypassing Hedgehog
- Agent CRDs out of sync with switch reality
Failed reconciliation leaving partial config
- Controller or agent pod crashed mid-reconciliation
- Network issues during cleanup
- Agent CRD spec updated but status not reflecting completion
How to identify orphaned VPCAttachments:
# List all VPCAttachments
kubectl get vpcattachments
# Check each attachment's VPC reference
kubectl get vpcattachment <name> -o yaml | grep "subnet:"
# Example orphaned attachment:
# subnet: web-app-prod/web-servers # VPC doesn't exist!
Verify the VPC exists:
kubectl get vpc web-app-prod
# Error from server (NotFound): vpcs.vpc.githedgehog.com "web-app-prod" not found
If VPC doesn't exist but VPCAttachment does, it's orphaned.
How to clean up orphaned VPCAttachments:
# Delete the orphaned VPCAttachment
kubectl delete vpcattachment server-01-web-servers
# Verify deletion
kubectl get vpcattachment server-01-web-servers
# Error from server (NotFound) - expected
Check Agent CRDs to ensure switch cleanup occurred:
kubectl get agent leaf-01 -n fab -o yaml | grep -A 5 "1010"
# Should not show VLAN 1010 configuration
Prevention strategies:
- Always follow correct deletion order (attachments → VPC)
- Validate cleanup after each deletion
- Use GitOps (Git is source of truth, prevents manual errors)
- Monitor events for reconciliation errors
- Avoid manual switch configuration (use Hedgehog CRDs only)
Troubleshooting
Issue: Cannot delete VPC - "VPC has active attachments"
Symptom: kubectl delete vpc web-app-prod fails with error message
Error message:
Error: cannot delete VPC "web-app-prod": active VPCAttachments exist
Cause: One or more VPCAttachments still reference the VPC
Fix:
# Step 1: List all VPCAttachments
kubectl get vpcattachments
# Step 2: Identify attachments referencing this VPC
kubectl get vpcattachments -o yaml | grep "web-app-prod"
# Example output:
# subnet: web-app-prod/web-servers
# subnet: web-app-prod/worker-nodes
# Step 3: Delete each attachment
kubectl delete vpcattachment server-01-web-servers
kubectl delete vpcattachment server-05-worker-nodes
# Step 4: Verify all attachments deleted
kubectl get vpcattachments | grep web-app-prod
# Should return empty
# Step 5: Retry VPC deletion
kubectl delete vpc web-app-prod
# Should succeed now
Issue: VPCAttachment deleted but switch ports not cleaned up
Symptom: Agent CRD still shows VLAN configuration after VPCAttachment deleted
Example:
kubectl get agent leaf-01 -n fab -o yaml | grep "1010"
# Still shows VLAN 1010 configuration even though attachment deleted
Cause: Reconciliation not complete, or agent pod issue
Fix:
# Step 1: Wait for reconciliation to complete (30-60 seconds)
sleep 60
# Step 2: Check again
kubectl get agent leaf-01 -n fab -o yaml | grep "1010"
# Step 3: If still present, check agent pod status
kubectl get pods -n fab | grep agent
# Step 4: Check agent logs for errors
kubectl logs <agent-pod-name> -n fab | tail -50
# Step 5: Check events for reconciliation progress
kubectl get events -n fab --sort-by='.lastTimestamp' | tail -20
# Step 6: If agent pod crashed, restart it
kubectl delete pod <agent-pod-name> -n fab
# Pod will restart and reapply configuration
# Step 7: Verify cleanup after restart
kubectl get agent leaf-01 -n fab -o yaml | grep "1010"
Issue: Orphaned VPCAttachment after VPC accidentally deleted
Symptom: VPCAttachment exists but references non-existent VPC
Example:
kubectl get vpcattachment server-01-web-servers
# NAME AGE
# server-01-web-servers 3h
kubectl describe vpcattachment server-01-web-servers
# Shows error: VPC "web-app-prod" not found
Cause: VPC deleted before attachments (wrong order)
Fix:
# Step 1: Confirm VPC is gone
kubectl get vpc web-app-prod
# Error from server (NotFound) - confirms VPC deleted
# Step 2: Delete orphaned VPCAttachment
kubectl delete vpcattachment server-01-web-servers
# Step 3: Verify deletion
kubectl get vpcattachment server-01-web-servers
# Error from server (NotFound) - expected
# Step 4: Check for other orphaned attachments
kubectl get vpcattachments -o yaml | grep "web-app-prod"
# Should return nothing
# Step 5: Verify switch cleanup
kubectl get agent leaf-01 -n fab -o yaml | grep "1010"
# VLAN 1010 should be removed
Prevention: Always delete VPCAttachments BEFORE VPC.
Issue: VPC deletion stuck in "Terminating" state
Symptom: kubectl get vpc shows VPC in Terminating state for extended time
Example:
kubectl get vpcs
# NAME STATUS AGE
# web-app-prod Terminating 5m
Cause: Finalizers preventing deletion, or controller issue
Fix:
# Step 1: Check VPC for finalizers
kubectl get vpc web-app-prod -o yaml | grep finalizers -A 5
# Step 2: Check events for errors
kubectl get events --field-selector involvedObject.name=web-app-prod --sort-by='.lastTimestamp'
# Step 3: Verify no attachments exist
kubectl get vpcattachments | grep web-app-prod
# Should be empty - if not, delete attachments
# Step 4: Check fabric controller pod
kubectl get pods -n fab | grep controller
kubectl logs <controller-pod-name> -n fab | tail -50
# Step 5: If finalizers blocking, remove them (advanced - use caution)
kubectl patch vpc web-app-prod -p '{"metadata":{"finalizers":[]}}' --type=merge
# Step 6: Verify deletion completes
kubectl get vpc web-app-prod
# Should return NotFound
Note: Removing finalizers manually should be a last resort. Usually waiting or restarting the controller resolves the issue.
Issue: Accidentally deleted VPC - can I recover it?
Symptom: VPC deleted by mistake, need to restore
Cause: Human error, wrong VPC name, accidental command
Fix:
# Step 1: Check if you have a backup YAML manifest
ls -la web-app-prod-vpc.yaml
# If you saved the YAML earlier:
kubectl apply -f web-app-prod-vpc.yaml
# If you don't have the YAML, you'll need to recreate from scratch
# Step 2: Recreate VPC manually
cat > web-app-prod-vpc.yaml <<'EOF'
apiVersion: vpc.githedgehog.com/v1beta1
kind: VPC
metadata:
name: web-app-prod
namespace: default
spec:
ipv4Namespace: default
vlanNamespace: default
subnets:
web-servers:
subnet: 10.10.10.0/24
gateway: 10.10.10.1
vlan: 1010
worker-nodes:
subnet: 10.10.20.0/24
gateway: 10.10.20.1
vlan: 1020
dhcp:
enable: true
range:
start: 10.10.20.10
end: 10.10.20.250
EOF
kubectl apply -f web-app-prod-vpc.yaml
# Step 3: Verify VPC created
kubectl get vpc web-app-prod
# Step 4: Recreate VPCAttachments
kubectl apply -f server-01-attachment.yaml
kubectl apply -f server-05-attachment.yaml
# Step 5: Validate connectivity (Module 2.3)
kubectl describe vpc web-app-prod
kubectl describe vpcattachment server-01-web-servers
Important notes:
- If VLANs were auto-assigned originally, new assignment may differ
- All VPCAttachments must be recreated after VPC restored
- Recovery time: 1-2 minutes for full reconciliation
- Prevention: Always save YAML manifests before deletion (or use GitOps)
Issue: Server still has connectivity after VPCAttachment deleted
Symptom: Server can still reach network after VPCAttachment deleted
Example:
# VPCAttachment deleted
kubectl get vpcattachment server-01-web-servers
# Error from server (NotFound)
# But server still has network connectivity
# SSH to server: ping 10.10.10.1 works
Cause: Server OS network configuration not changed (static IP remains)
Explanation:
VPCAttachment deletion removes fabric-side configuration (VLANs on switch ports, VXLAN tunnels, routing). It does NOT change server-side configuration (IP addresses, routes in server OS).
The server still has the IP address and routes configured in its operating system, but:
- The fabric is no longer forwarding traffic for that VLAN
- Server cannot reach other servers in the VPC
- Server cannot reach resources outside the VPC
This is expected behavior.
Fix (if you want to remove server network config):
# SSH to server
ssh server-01
# Remove IP address
sudo ip addr del 10.10.10.10/24 dev eth0
# Remove default route
sudo ip route del default via 10.10.10.1
# Verify connectivity gone
ping 10.10.10.1
# Should fail: Network is unreachable
In production: Server OS configuration is typically managed separately (Ansible, configuration management tools). VPCAttachment manages fabric config only.
Resources
Hedgehog CRDs
VPC - Virtual Private Cloud definition
- View all:
kubectl get vpcs - Delete:
kubectl delete vpc <name> - View YAML:
kubectl get vpc <name> -o yaml
VPCAttachment - Binds server connection to VPC subnet
- View all:
kubectl get vpcattachments - Delete:
kubectl delete vpcattachment <name> - View YAML:
kubectl get vpcattachment <name> -o yaml
Agent - Per-switch operational state (for cleanup validation)
- View all:
kubectl get agents -n fab - View specific:
kubectl get agent <switch-name> -n fab -o yaml - Validate cleanup: Check for removed VLANs after deletion
Connection - Server-to-switch wiring (not deleted in this module)
- View all:
kubectl get connections -n fab - Note: Connections persist after VPCAttachment deletion
kubectl Commands Reference
Decommissioning workflow:
# Step 1: Pre-decommission review
kubectl get vpcs
kubectl get vpcattachments
kubectl get vpc <name> -o yaml > backup.yaml # Save backup
# Step 2: Delete VPCAttachments FIRST
kubectl delete vpcattachment <name>
# Step 3: Verify attachments deleted
kubectl get vpcattachments | grep <vpc-name> # Should be empty
# Step 4: Delete VPC
kubectl delete vpc <name>
# Step 5: Verify VPC deleted
kubectl get vpc <name> # Should return NotFound
# Step 6: Validate cleanup
kubectl get events --sort-by='.lastTimestamp' | tail -20
kubectl get agent <switch-name> -n fab -o yaml | grep <vlan>
Event monitoring during cleanup:
# View recent events
kubectl get events --sort-by='.lastTimestamp' | tail -20
# Watch events in real-time
kubectl get events --watch
# View events for specific resource
kubectl get events --field-selector involvedObject.name=<resource-name>
# View events in fab namespace (agents)
kubectl get events -n fab --sort-by='.lastTimestamp'
Validation commands:
# Verify resource deleted (should return NotFound)
kubectl get vpc <name>
kubectl get vpcattachment <name>
# List all resources to confirm deletion
kubectl get vpcs
kubectl get vpcattachments
# Check Agent CRD for cleanup
kubectl get agent <switch-name> -n fab -o yaml | grep -A 5 "<vlan-id>"
Recovery commands:
# Restore from backup YAML
kubectl apply -f backup.yaml
# Recreate VPCAttachments
kubectl apply -f attachment.yaml
Related Modules
- Previous: Module 2.3: Connectivity Validation
- Module 2.1: VPC Provisioning Essentials
- Module 2.2: VPC Attachments
- Course 2 Complete! Preview Course 3: Observability & Fabric Health
External Documentation
- Hedgehog VPC Documentation
- Hedgehog Lifecycle Management
- Kubernetes Resource Deletion
- kubectl delete Command Reference
Module Complete! You've successfully learned safe decommissioning workflows and completed Course 2: Provisioning & Day 1 Operations. You now understand the full lifecycle: Provision → Attach → Validate → Cleanup. Ready to move on to Course 3: Observability & Fabric Health!
Course 2 Achievement Unlocked: Day 1 Operations Master