🔧Cloud DevOps•8 min read•5/25/2026

Why Your GitOps Workflow Breaks Down When Your Infra Is Spread Across Three Clouds

IDACORE

IDACORE Team

Quick Navigation

← More Cloud DevOps ← All Cloud Infrastructure

You started with one cloud provider because it was fast and familiar. Then a compliance requirement pushed a workload to a second one. Then a vendor relationship or a cost experiment landed something on a third. Now you have infrastructure spread across AWS, Azure, and maybe GCP — and your GitOps workflow, which looked elegant six months ago, has quietly become a liability.

This isn't a hypothetical. It's the most common architecture I see when companies reach out after their deployment pipelines start causing more problems than they solve. The repos are there. The manifests are there. The automation is there. But something about the whole thing feels fragile, and it usually is.

Here's what's actually happening — and what you can do about it.

The Abstraction Layer Is Lying to You

GitOps works beautifully when your infrastructure is homogeneous. One cluster, one cloud, one provider API surface. Your Git repo is the source of truth, your operator reconciles state, and the feedback loop is tight. That's the happy path.

Add a second cloud and you immediately have a problem: the abstraction isn't real. Terraform modules that look identical for AWS and Azure are not actually doing the same thing. IAM in AWS and Entra ID in Azure have fundamentally different permission models. A terraform apply that succeeds in both places doesn't mean you've achieved parity — it means you've hidden the differences behind a thin layer of HCL that will bite you when something goes wrong at 2am.

The same is true for Kubernetes. Yes, EKS, AKS, and GKE all run Kubernetes. But node provisioning, CNI behavior, storage class defaults, and load balancer integration are all provider-specific. Your Helm charts or Kustomize overlays have environment-specific patches, and those patches multiply with every cloud you add. What started as a clean, declarative workflow becomes a maze of conditional logic and cloud-specific branches.

The abstraction layer isn't lying maliciously — it's just incomplete. And in a GitOps context, incomplete abstractions mean your repo stops being the actual source of truth. It becomes the intended source of truth, which is a very different thing.

State Drift Compounds Across Providers

In a single-cloud GitOps setup, drift is manageable. You run terraform plan or let Flux/ArgoCD do its thing, and you can see the delta between desired and actual state. The feedback loop is contained.

Across three clouds, drift compounds. Each provider has its own state backend, its own API rate limits, its own eventual consistency behavior. A change that reconciles cleanly in AWS might sit in a pending state in Azure for reasons that have nothing to do with your config. Your pipeline sees a failure, retries, and now you have a partial application of state that your Git history doesn't reflect.

Here's a concrete example of what this looks like in practice. A dev team has a Terraform workspace per environment per cloud — nine workspaces for three environments across three providers. They're using remote state in S3, Azure Blob, and GCS respectively. A refactor of their VPC/VNet addressing scheme requires coordinated changes across all three. They write the PR, it looks clean, CI passes. But the apply order matters because of cross-cloud peering dependencies, and nothing in their GitOps tooling enforces apply order across state backends. The first two providers apply successfully. The third fails on a dependency that doesn't exist yet because the peering on provider two hasn't propagated.

Now their state is split. The Git history says one thing. Two of three providers reflect the new state. One doesn't. Reconciling this manually takes hours and requires someone who understands all three provider APIs well enough to know what "actually happened" versus what the state files claim.

This is not an edge case. This is Tuesday.

Your Secret Management Surface Area Is Bigger Than You Think

Every cloud you add is another secrets backend, another rotation policy, another set of service account credentials, another place where a leaked key can cause damage. AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager all work differently. They have different access patterns, different audit log formats, and different integration points with your workload identity systems.

A mature GitOps workflow uses something like External Secrets Operator or Vault to abstract this — and that works, up to a point. But now you've added another layer of infrastructure (Vault clusters, or ESO controllers per cluster) that itself needs to be managed, secured, and reconciled. The operational overhead isn't zero.

More importantly: your audit trail fragments. If you're in a regulated industry — healthcare, finance, anything touching HIPAA or SOC 2 — you need to be able to answer "who accessed what secret, when, from where" with a single coherent answer. Across three clouds with three different audit log formats going to three different SIEM integrations, that answer requires correlation work that is genuinely hard to do correctly. I've seen compliance audits get expensive fast because the engineering team couldn't produce a clean access log — not because they were doing anything wrong, but because the data was spread across systems that didn't talk to each other.

The Pipeline Complexity Isn't Free

When your infrastructure is in one place, your CI/CD pipeline is relatively straightforward. Lint, plan, apply, verify. Maybe some policy checks with OPA or Checkov in there. The pipeline is fast enough that developers actually run it, and short enough that failures are easy to diagnose.

Multi-cloud pipelines get long. Fast. You're running provider-specific auth steps, provider-specific plan stages, provider-specific apply stages — and if you want any kind of intelligent dependency ordering, you're writing custom orchestration logic. GitHub Actions or GitLab CI can handle this, but you end up with workflow files that are hundreds of lines long and require deep context to modify safely.

The real cost isn't the pipeline runtime. It's the cognitive overhead. Every engineer who touches that pipeline needs to understand three provider APIs, three auth models, three sets of provider-specific gotchas. That's not knowledge that distributes evenly across a team. It concentrates in one or two people, and when those people are unavailable, the pipeline becomes a black box.

A useful diagnostic: how long does it take a new engineer on your team to make a confident infrastructure change end-to-end? If the answer is "months," your pipeline complexity is carrying more risk than you're accounting for.

What Actually Helps

I'm not going to tell you to collapse everything to one cloud — that's not always possible, and sometimes the multi-cloud distribution is genuinely justified. But there are a few things that meaningfully reduce the friction.

Consolidate your state management first. If you're not using a unified state backend with a consistent locking mechanism, start there. Terraform Cloud or Atlantis with a single backend configuration across providers gives you at least a consistent interface for state operations, even if the underlying providers are still heterogeneous.

Be honest about what actually needs to be multi-cloud. In most architectures I've looked at, 60-70% of the workload could run on a single provider without any real downside. The multi-cloud distribution happened incrementally, not by design. Audit your workloads and ask which ones are actually provider-dependent versus which ones just ended up there.

Treat cross-cloud dependencies as first-class infrastructure. If you have workloads that talk across clouds, that peering, that latency, that failure mode — all of it needs to be in your IaC and tested explicitly. Don't let it be implicit. The cross-cloud boundary is where the interesting failures happen.

Standardize your observability layer before your deployment layer. You can tolerate some heterogeneity in how you deploy if you have a single, coherent view of what's running and how it's behaving. Get your metrics, logs, and traces into one place first. Everything else is easier to reason about when you can actually see what's happening.

And honestly? If you're running Idaho-based workloads — healthcare, finance, state government, anything with data residency requirements — ask yourself whether the multi-cloud complexity is solving a real problem or whether it's accumulated technical debt dressed up as architecture. A well-run single-region infrastructure with real redundancy, real support, and actual sub-5ms latency to your users is often a better answer than a three-cloud setup that requires a specialist to operate.

If you're re-evaluating your infrastructure footprint and want to talk through what consolidating some of those workloads to Idaho-based infrastructure actually looks like — including how flat, predictable pricing compares to what you're paying across three provider billing systems right now — talk to someone who's actually run this infrastructure, not a sales team reading from a feature matrix.

IDACORE

IDACORE Team

Expert insights from the IDACORE team on data center operations and cloud infrastructure.

Why Your Cloud Database Latency Problem Is Actually a Geography Problem

High cloud database latency killing your app performance? The fix isn't tuning — it's geography. Here's what Idaho businesses need to know.

8 min read

Why Your Cloud Monitoring Alerts Fire After the Problem Already Killed You

Most cloud monitoring alerts fire too late to matter. Here's why alert latency kills you and how to fix your observability stack before the next outage.

8 min read