Course 3: Observability & Fabric Health

4 modules 60 minutes total
Ready to Start This Course?
Enroll to track your progress and access personalized learning resources as you complete each module.
Your Progress
Started: 0
Completed: 0

Learn to monitor fabric health using industry-standard observability tools! This course teaches you to interpret Grafana dashboards, track Kubernetes events, correlate metrics with configuration state, and collect comprehensive diagnostics before escalating to support.

You'll gain confidence in identifying normal vs. abnormal fabric behavior, using pre-built dashboards to answer operational questions, and developing systematic troubleshooting habits that accelerate problem resolution.

Welcome to Observability & Fabric Health

This course transforms you from a fabric provisioner into a confident operator who can monitor, diagnose, and maintain fabric health using Prometheus, Grafana, and kubectl-native diagnostics.

Part 1: Understanding Fabric Telemetry

Discover how Hedgehog collects and exposes metrics from switches, controllers, and fabric resources. Learn the architecture of the telemetry stack and what questions each data source can answer.

Part 2: Interpreting Grafana Dashboards

Master the six core Hedgehog dashboards: Fabric Overview, Switch Details, BGP Status, VLAN Health, Connection Status, and Agent Health. Learn to read metrics and identify problems at a glance.

Part 3: Events & Status Monitoring

Correlate Kubernetes events with Grafana metrics for complete troubleshooting visibility. Learn to read Agent CRD status fields and track resource reconciliation state.

Part 4: Pre-Support Diagnostic Collection

Master the systematic diagnostic checklist used before escalating to support. Learn to collect comprehensive evidence that accelerates support case resolution.

Next Steps

Congratulations on completing Observability & Fabric Health! You're now ready to tackle advanced troubleshooting scenarios in Course 4.