Skip to main content

RFC-002

Global Multi-Cloud Governance

A Reference Architecture for Azure, AWS, and GCP

A reference architecture for operating globally distributed infrastructure across Azure, AWS, and GCP — with compliance built into the platform layer, not bolted on after the fact.


Motivation

Organizations at global scale face compounding challenges: regulatory fragmentation across jurisdictions, provider concentration risk, M&A integration pressure, and best-of-breed requirements that no single provider satisfies. The cost of not standardizing is inconsistent security posture, audit fatigue, and ungoverned shadow IT.

This RFC proposes a unified governance model built on seven principles:

#PrincipleRationale
P1Primary + secondary modelOne cloud is the default; others justified by workload
P2Policy as codeGovernance rules are version-controlled and auto-enforced
P3Identity is the perimeterZero-trust, identity-centric security across all providers
P4Data sovereignty by designResidency constraints encoded in the platform
P5Automate everythingNo manual provisioning, no ClickOps
P6Least privilege, just-in-timeNo standing privileged access
P7Centralize observe, decentralize executeCentral visibility; federated workload ownership

Organizational Structure

Each cloud has its own hierarchy model, but the concept is identical: platform resources separated from landing zones, with sandbox and quarantine boundaries. The separation matters because platform teams and workload teams operate at different cadences — platform changes are slow, deliberate, and high-blast-radius. Landing zone changes are fast and scoped.

Azure — Management Group Hierarchy

  • Platform — Identity (Entra ID), Management (Sentinel, Log Analytics), Connectivity (Hub vNets, ExpressRoute)
  • Landing Zones — Corp, Online, Regulated (PCI/HIPAA/FedRAMP), Confidential (sovereign)
  • Sandbox — Experimentation, no prod connectivity
  • Quarantine — Non-compliant subs auto-moved here

AWS — Organization OU Structure

  • Security OU — Log Archive, Security Tooling (GuardDuty, Security Hub), Audit
  • Infrastructure OU — Network Hub (Transit Gateway), Shared Services
  • Workloads OU — Corp / Online / Regulated with Prod, Staging, Dev per OU
  • Sandbox / Quarantine — Deny-all SCP

GCP — Resource Hierarchy

  • Platform — Networking (Shared VPC), Logging (centralized sink), Security (SCC, KMS)
  • Landing Zones — Corp, Analytics (BigQuery, Vertex AI), Regulated (Assured Workloads)
  • Sandbox / Quarantine — Deny-all org policy

Cross-Cloud Mapping

Despite the naming differences, every concept maps 1:1 across providers. This is what makes a unified governance model possible.

ConceptAzureAWSGCP
Top-level containerManagement GroupOrganizational UnitFolder
Billing boundarySubscriptionAccountProject
Policy engineAzure PolicySCPsOrg Policies
IdentityEntra IDIAM Identity CenterCloud Identity
Network hubHub vNetTransit GatewayShared VPC

Use subscriptions/accounts/projects as the unit of scale. One per workload per environment.


Identity & Access

Identity is the most critical layer. Get it wrong and every other control is compromised. The approach here is a single Identity Provider (Entra ID) federated into all three clouds — authentication is centralized, authorization is per-cloud. This avoids the alternative of managing three separate identity systems with inevitable configuration drift.

LayerMechanism
AuthenticationEntra ID with phishing-resistant MFA (FIDO2 / passkeys)
AuthorizationRBAC via group-to-role mappings, per cloud, per scope
Privileged accessPIM (Azure) / temporary elevated access (AWS, GCP)
Service-to-serviceWorkload Identity Federation — no long-lived keys
Break-glassSealed emergency accounts, hardware tokens in safe

Long-lived service account keys are prohibited. Workload Identity Federation for all service-to-service and CI/CD authentication eliminates the largest class of credential exposure.


Networking

VPN overlays between clouds are fragile, bandwidth-limited, and hard to troubleshoot at scale. Cross-cloud connectivity instead runs through a colocation fabric (Megaport or Equinix) — dedicated, high-throughput, and provider-neutral. Each cloud has a hub network that peers to landing zone spokes.

IP space is pre-allocated across all four domains to prevent overlap, which is the single most painful networking problem to fix retroactively:

ProviderCIDRRange
Azure10.0.0.0/1010.0 – 10.63
AWS10.64.0.0/1010.64 – 10.127
GCP10.128.0.0/1010.128 – 10.191
On-prem10.192.0.0/1010.192 – 10.255

Within each /10: /14 per region, /16 per environment, /20 per landing zone.

Sovereign regions (China, GovCloud) are isolated by design — no cross-cloud mesh. Fully separate identity and networking.


Compliance

Running workloads across three clouds means satisfying overlapping regulatory frameworks simultaneously. The key insight is that compliance at this scale cannot rely on manual review — it has to be enforced through preventive controls (deny non-compliant actions before they happen) and detective controls (alert on drift after the fact).

FrameworkScope
SOC 2 Type IIAll production workloads
ISO 27001Entire organization
PCI DSS v4.0Payment processing (regulated zones)
HIPAAHealthcare data (regulated zones)
FedRAMP HighUS government (gov regions)
GDPREU personal data (residency enforced)

Preventive Controls

Each cloud has its own policy engine, but the rules are equivalent. The same intent — "deny unapproved regions" — is expressed differently in each provider:

ControlAzureAWSGCP
Deny unapproved regionsallowedLocationsaws:RequestedRegiongcp.resourceLocations
Require encryption at restPolicy denyConfig RuleOrg policy
Deny public storageDeny public blobs3:PutBucketPublicAccessBlockuniformBucketLevelAccess
Enforce TLS 1.2+MinimumTlsVersionConfig RuleOrg policy
Deny long-lived keysDeny keys > 90dDeny iam:CreateAccessKeydisableServiceAccountKeyCreation

Data Classification

Every resource gets a classification tag at creation. This isn't optional — untagged resources are denied by policy. The classification determines where data can live, how it's encrypted, and how long it's retained.

LevelResidencyEncryptionRetention
PublicNoneIn-transitPer policy
InternalPreferred regionAt-rest + in-transit3 years
ConfidentialCountry-levelCMK + in-transit7 years
RestrictedSpecific regionHSM-backed CMKPer regulation

Resources tagged "Restricted" can only be created in approved regions — preventive policy, not documentation.


Security Operations

All three clouds feed logs into a central SIEM for cross-cloud correlation. Without this, a compromise that spans two providers looks like two unrelated events. Automated response handles containment for known patterns — analysts focus on novel threats.

SeverityScopeResponse SLA
SEV-1Data breach, active intrusion15 min engage, 1 hr contain
SEV-2Policy violation with data exposure risk1 hr engage, 4 hr contain
SEV-3Policy drift, non-critical misconfiguration24 hr
SEV-4InformationalNext business day

Secrets Management

Secrets are centralized in HashiCorp Vault rather than spread across three native secret stores. The reason is rotation — Vault can issue ephemeral database credentials and short-lived CI/CD tokens that native stores can't. Every secret has a maximum lifetime.

Secret TypeStoreRotation
Application secretsHashiCorp Vault30 days
Database credentialsVault dynamic secretsPer-session (ephemeral)
Encryption keysAzure KV / AWS KMS / GCP KMSAnnual
Service account keysProhibitedWorkload identity
CI/CD tokensVault + OIDC federationPer-pipeline-run

Database credentials are ephemeral. CI/CD tokens live for minutes. Long-lived credentials are the single largest attack surface — eliminate them entirely.


Vending & Cost

Provisioning a new workload environment should take minutes, not weeks. A vending machine automates the full lifecycle: request, approval, provisioning, baseline policies, RBAC, budget alerts, and network peering — all via IaC pipelines. A developer submits a structured request and the platform handles everything else:

request:
  workload_name: "payment-service"
  business_unit: "fintech"
  environment: "prod"
  cloud_provider: "aws"
  landing_zone: "regulated"
  compliance_scope: ["pci-dss", "sox"]
  data_classification: "restricted"
  regions:
    primary: "us-east-1"
    dr: "us-west-2"
  budget_monthly_usd: 15000

FinOps

Cost visibility is only useful if it's actionable. Every resource is tagged with cost-center, owner, and environment — enforced by policy, not convention. Chargeback happens automatically.

PillarImplementation
VisibilityAll resources tagged with cost-center, owner, environment (policy-enforced)
AllocationChargeback to BU based on actual usage
OptimizationRIs, Savings Plans, CUDs reviewed quarterly — 70% coverage target
GovernanceBudget alerts at 50/75/90/100%. Auto-notify owner + manager

Disaster Recovery

DR tiers are assigned per workload based on business impact, not technical preference. Tier 0 is reserved for systems where even minutes of downtime have material consequences.

TierRPORTOStrategy
Tier 00< 15 minActive-active, multi-region
Tier 1< 1 hr< 1 hrWarm standby
Tier 2< 24 hr< 4 hrPilot light
Tier 3< 72 hr< 24 hrBackup & restore

Cost anomaly detection runs on all three clouds. A >20% day-over-day increase triggers automatic investigation.


Implementation

This architecture is delivered in phases over 12 months. The sequence matters — identity and networking must be in place before landing zones make sense, and compliance tooling is validated against real workloads during migration.

PhaseDurationScope
FoundationMonths 1–3Identity federation, networking hubs, central logging, baseline policies
Landing ZonesMonths 3–6Vending automation, RBAC, CI/CD, first migrations
ComplianceMonths 6–9CSPM, GRC integration, audit preparation, regulated zones
OptimizationMonths 9–12FinOps maturity, commitment purchases, DR testing
ContinuousOngoingPolicy iteration, new frameworks, sovereign regions, M&A

Team

Building this requires 5–7 engineers during the 12-month build phase, scaling down to 3–4 for steady-state operations. These aren't generalists — each role maps to a specific domain of the architecture.

RoleCountScope
Platform Architect1Overall design, cross-cloud standards, stakeholder alignment
Cloud Engineers2–3IaC modules, vending pipelines, landing zone provisioning (ideally one per primary cloud)
Identity & Security Engineer1Entra ID federation, RBAC design, PIM/JIT, Vault integration
Network Engineer1Hub-spoke topology, colocation fabric, IP allocation, firewall rules
Compliance / GRC Lead1Policy-as-code authoring, audit prep, framework mapping
FinOps Analyst0–1Tagging enforcement, chargeback, commitment optimization (can be part-time or shared)

After the build phase, the Platform Architect role transitions to part-time oversight. Cloud Engineers rotate into an on-call model. The steady-state team of 3–4 handles policy updates, new landing zone requests, incident response, and FinOps reviews.

The most common mistake is understaffing identity and networking. These two domains block everything else — if they slip, the entire timeline shifts.

Tooling

FunctionTool
IaCTerraform (OpenTofu)
SecretsHashiCorp Vault
CSPMWiz / Prisma Cloud
SIEMSentinel / Splunk
IdPEntra ID
ObservabilityDatadog / Grafana Cloud
GRCDrata / Vanta
FinOpsCloudHealth / Apptio
Developer PortalBackstage

This architecture maps directly to Microsoft's Cloud Adoption Framework, AWS Control Tower, and Google Cloud Architecture Framework. The value is in the cross-cloud unification — same principles, same controls, one operating model.

Back to RFCs