Can I Vibe Code My Cloud Infrastructure?

Written by Marc Austin | Feb 24, 2026 9:32:08 PM

The Margin for Error Matters — And Infrastructure Has Zero Tolerance

ArsTechnica recently reported on the Kiro Incident, with the headline “An AI coding bot took down Amazon Web Services.” This reminded me to point out that some people in our industry are claiming their products will automate your network operations with AI agents. HPE claims that Mist AI will “Strengthen every connection with a secure, self-driving network that continuously learns, adapts, and resolves issues in real time.” Aviz claims their Network Copilot will “execute a precise, pre-defined network operation.”

I think AIOps has its place. We get a LOT of telemetry in mass scale network infrastructure. Using ML to correlate alerts into incidents and then correlate incidents to network performance anomalies makes a lot of sense. In my last gig at Cisco, I led a partnership with Vitria to add AIOps to our observability solutions. Customers like Charter Communications got great results from improving their observability. In the case of these AWS incidents, AIOps for observability will speed up root cause analysis and mean time to resolution.

On the flip side, using AIOps for configuration is likely to create incidents. In application software development, “vibe coding” with AI — where you sketch out ideas and iteratively refine, often involving writing code (sometimes within an IDE) — has a relatively high tolerance for error and allows for easier debugging, allowing developers to iterate quickly. If a UI component is slightly off or something breaks in a beta feature, the blast radius is limited to your app users and often discovered early in testing.

Cloud infrastructure is different. A tiny misconfiguration — a missing IAM policy, an overly broad role, incorrect permissions, malformed network rule, or unresolved dependencies — can lead to data exposure, security breaches, or wide-scale outages. The Verge also reported on the Kiro incident, underscoring this reality: an AI coding bot caused Amazon Web Services to experience at least one significant outage after it deleted and recreated an environment without adequate oversight. AWS termed it “user error” — but the chain of events makes the risk clear: giving autonomous tools the ability to modify infrastructure without guardrails can quickly turn from convenience to catastrophe. Risk also exists for various use cases in Google Cloud and Azure.

Even if you trust the AI model, trusting it with production infrastructure is a leap most teams aren’t equipped to make without strict workflow controls.

The Real World Isn’t a Playground

The AWS outage incidents aren’t isolated in spirit — others have documented AI-assisted tools making egregious mistakes in cloud contexts. Tech Republic has reported on several other incidents where “AI-Generated Code is Causing Outages and Security Issues in Businesses.” In some cases, AI assistants issued destructive actions against real data bases and environments because they misinterpreted commands or lacked full context.

These stories aren’t meant to shame AI; they’re meant to remind us that tools amplify intent — both good and bad. When the subject is infrastructure that underpins critical business operations, we need controls, observability, and traceability — features that IaC + GitOps deliver out of the box.

Why IaC, GitOps, and Better Abstractions are the Way

At Hedgehog, we believe the future of safe, scalable cloud infrastructure isn’t about handing over your stack to a black-box AI agent. It’s about automating safely, reliably, and transparently:

📌 1. Infrastructure as Code (IaC)

IaC — whether through Terraform, Pulumi, or CloudFormation — puts your entire infrastructure definition in version control. You get:

Reviewable configurations
Audit trails
Repeatable and consistent deployments
Rollback support if something goes wrong

This is fundamentally how modern platforms ensure infrastructure changes are predictable rather than accidental or ad hoc.

🔁 2. GitOps Workflow Discipline

GitOps takes IaC a step further: you define desired state in Git (GitHub), and an operator continuously reconciles your cloud environment to match it. This provides:

Declarative state that matches reality
Automated drift detection
Safe, auditable promotions from staging to production
Guardrails against unintended change

With GitOps, changes are human-reviewed, automated through pipelines, and safe by design — you know when and why something changed.

AI tools, including generative-AI LLMs (OpenAI’s ChatGPT and google’s Gemini) and even some open source solutions (often built with Python) that might require API keys, can accelerate drafting configs and suggesting improvements, but they shouldn’t be the authoritative source of truth for production state. Letting an AI-powered agent modify infrastructure without a GitOps pipeline is like letting someone rewrite your firewall rules with no review or logging.

🧠 3. Better Abstraction

Even if you use GitOps, if AI suggests hundreds of lines of code generation with detailed configuration changes, no one can really verify that the AI’s work is correct. Instead, you want the AI to suggest bite-sized changes that make the intent of the change clear, and in a form where the human operator can make sure that the change makes sense and performs its intended function.

From that point forward, a thoroughly validated, proven, and deterministic solution can translate that intent into network configuration, and apply the network configuration in a reliable manner. Hedgehog is that solution.

The Hedgehog API automates “AI networking” the safe way

When people say “AI will configure your network,” what they really want is: declare intent, perhaps through natural language, let automation do the boring parts, and keep guardrails tight. Hedgehog is built for exactly that model—without turning production networking into a black box.

Hedgehog exposes its user-facing API as Kubernetes CRDs, so your “network intent” lives as declarative objects you can manage with standard tooling (kubectl, kustomize/helm, Argo CD/Flux). The controllers continuously reconcile that desired state into actual switch and gateway backend configuration. (docs.hedgehog.cloud)

Where “AI” fits best:

AI drafts: generate manifests, propose changes, suggest policy diffs, or create templates.
IaC defines: CRDs in Git are the source of truth.
GitOps enforces: PR review, policy checks, and an automated reconciler apply changes predictably.

That separation is the key: you get ideas from AI, but safety from validated, proven, and deterministic automated reconciliation from a description that is high-level enough that the operator can understand the intent and double check the AI’s work, not thousands of lines of opaque, unverifiable, suggested changes that users are encouraged to rubber stamp.

Moving Forward: Use AI With IaC, Not Instead Of It

AI has a role to play — for example:

Generating starter Terraform modules
Suggesting optimizations
Helping write reusable policies and guards
Assisting in documentation

But the authoritative state should always live in version control with human review and automated guardrails. Reconciliation of detailed configuration should always be from a specification a human can understand and properly review. Massive AI changes sets are only human reviewable in theory; in practice, coders often just click “OK”, kind of like those click-wrap license agreements.

At Hedgehog, our infrastructure philosophy centers around principles you can trust:

Declarative definitions in code that abstractly declare intent
GitOps pipelines for safe rollout and reliable reconciliation
Reproducibility and traceability
Collision-resistant change management

Because your infrastructure is too important for opaque automation without governance.

Learn more about Hedgehog by visiting our Learning Center at hedgehog.cloud/learn.

View full post