A reference architecture for operating globally distributed infrastructure across Azure, AWS, and GCP — with compliance built into the platform layer, not bolted on after the fact.
Motivation
Organizations at global scale face compounding challenges: regulatory fragmentation across jurisdictions, provider concentration risk, M&A integration pressure, and best-of-breed requirements that no single provider satisfies. The cost of not standardizing is inconsistent security posture, audit fatigue, and ungoverned shadow IT.
This RFC proposes a unified governance model built on seven principles:
| # | Principle | Rationale |
|---|---|---|
| P1 | Primary + secondary model | One cloud is the default; others justified by workload |
| P2 | Policy as code | Governance rules are version-controlled and auto-enforced |
| P3 | Identity is the perimeter | Zero-trust, identity-centric security across all providers |
| P4 | Data sovereignty by design | Residency constraints encoded in the platform |
| P5 | Automate everything | No manual provisioning, no ClickOps |
| P6 | Least privilege, just-in-time | No standing privileged access |
| P7 | Centralize observe, decentralize execute | Central visibility; federated workload ownership |
Organizational Structure
Each cloud has its own hierarchy model, but the concept is identical: platform resources separated from landing zones, with sandbox and quarantine boundaries. The separation matters because platform teams and workload teams operate at different cadences — platform changes are slow, deliberate, and high-blast-radius. Landing zone changes are fast and scoped.
Azure — Management Group Hierarchy
- Platform — Identity (Entra ID), Management (Sentinel, Log Analytics), Connectivity (Hub vNets, ExpressRoute)
- Landing Zones — Corp, Online, Regulated (PCI/HIPAA/FedRAMP), Confidential (sovereign)
- Sandbox — Experimentation, no prod connectivity
- Quarantine — Non-compliant subs auto-moved here
AWS — Organization OU Structure
- Security OU — Log Archive, Security Tooling (GuardDuty, Security Hub), Audit
- Infrastructure OU — Network Hub (Transit Gateway), Shared Services
- Workloads OU — Corp / Online / Regulated with Prod, Staging, Dev per OU
- Sandbox / Quarantine — Deny-all SCP
GCP — Resource Hierarchy
- Platform — Networking (Shared VPC), Logging (centralized sink), Security (SCC, KMS)
- Landing Zones — Corp, Analytics (BigQuery, Vertex AI), Regulated (Assured Workloads)
- Sandbox / Quarantine — Deny-all org policy
Cross-Cloud Mapping
Despite the naming differences, every concept maps 1:1 across providers. This is what makes a unified governance model possible.
| Concept | Azure | AWS | GCP |
|---|---|---|---|
| Top-level container | Management Group | Organizational Unit | Folder |
| Billing boundary | Subscription | Account | Project |
| Policy engine | Azure Policy | SCPs | Org Policies |
| Identity | Entra ID | IAM Identity Center | Cloud Identity |
| Network hub | Hub vNet | Transit Gateway | Shared VPC |
Use subscriptions/accounts/projects as the unit of scale. One per workload per environment.
Identity & Access
Identity is the most critical layer. Get it wrong and every other control is compromised. The approach here is a single Identity Provider (Entra ID) federated into all three clouds — authentication is centralized, authorization is per-cloud. This avoids the alternative of managing three separate identity systems with inevitable configuration drift.
| Layer | Mechanism |
|---|---|
| Authentication | Entra ID with phishing-resistant MFA (FIDO2 / passkeys) |
| Authorization | RBAC via group-to-role mappings, per cloud, per scope |
| Privileged access | PIM (Azure) / temporary elevated access (AWS, GCP) |
| Service-to-service | Workload Identity Federation — no long-lived keys |
| Break-glass | Sealed emergency accounts, hardware tokens in safe |
Long-lived service account keys are prohibited. Workload Identity Federation for all service-to-service and CI/CD authentication eliminates the largest class of credential exposure.
Networking
VPN overlays between clouds are fragile, bandwidth-limited, and hard to troubleshoot at scale. Cross-cloud connectivity instead runs through a colocation fabric (Megaport or Equinix) — dedicated, high-throughput, and provider-neutral. Each cloud has a hub network that peers to landing zone spokes.
IP space is pre-allocated across all four domains to prevent overlap, which is the single most painful networking problem to fix retroactively:
| Provider | CIDR | Range |
|---|---|---|
| Azure | 10.0.0.0/10 | 10.0 – 10.63 |
| AWS | 10.64.0.0/10 | 10.64 – 10.127 |
| GCP | 10.128.0.0/10 | 10.128 – 10.191 |
| On-prem | 10.192.0.0/10 | 10.192 – 10.255 |
Within each /10: /14 per region, /16 per environment, /20 per landing zone.
Sovereign regions (China, GovCloud) are isolated by design — no cross-cloud mesh. Fully separate identity and networking.
Compliance
Running workloads across three clouds means satisfying overlapping regulatory frameworks simultaneously. The key insight is that compliance at this scale cannot rely on manual review — it has to be enforced through preventive controls (deny non-compliant actions before they happen) and detective controls (alert on drift after the fact).
| Framework | Scope |
|---|---|
| SOC 2 Type II | All production workloads |
| ISO 27001 | Entire organization |
| PCI DSS v4.0 | Payment processing (regulated zones) |
| HIPAA | Healthcare data (regulated zones) |
| FedRAMP High | US government (gov regions) |
| GDPR | EU personal data (residency enforced) |
Preventive Controls
Each cloud has its own policy engine, but the rules are equivalent. The same intent — "deny unapproved regions" — is expressed differently in each provider:
| Control | Azure | AWS | GCP |
|---|---|---|---|
| Deny unapproved regions | allowedLocations | aws:RequestedRegion | gcp.resourceLocations |
| Require encryption at rest | Policy deny | Config Rule | Org policy |
| Deny public storage | Deny public blob | s3:PutBucketPublicAccessBlock | uniformBucketLevelAccess |
| Enforce TLS 1.2+ | MinimumTlsVersion | Config Rule | Org policy |
| Deny long-lived keys | Deny keys > 90d | Deny iam:CreateAccessKey | disableServiceAccountKeyCreation |
Data Classification
Every resource gets a classification tag at creation. This isn't optional — untagged resources are denied by policy. The classification determines where data can live, how it's encrypted, and how long it's retained.
| Level | Residency | Encryption | Retention |
|---|---|---|---|
| Public | None | In-transit | Per policy |
| Internal | Preferred region | At-rest + in-transit | 3 years |
| Confidential | Country-level | CMK + in-transit | 7 years |
| Restricted | Specific region | HSM-backed CMK | Per regulation |
Resources tagged "Restricted" can only be created in approved regions — preventive policy, not documentation.
Security Operations
All three clouds feed logs into a central SIEM for cross-cloud correlation. Without this, a compromise that spans two providers looks like two unrelated events. Automated response handles containment for known patterns — analysts focus on novel threats.
| Severity | Scope | Response SLA |
|---|---|---|
| SEV-1 | Data breach, active intrusion | 15 min engage, 1 hr contain |
| SEV-2 | Policy violation with data exposure risk | 1 hr engage, 4 hr contain |
| SEV-3 | Policy drift, non-critical misconfiguration | 24 hr |
| SEV-4 | Informational | Next business day |
Secrets Management
Secrets are centralized in HashiCorp Vault rather than spread across three native secret stores. The reason is rotation — Vault can issue ephemeral database credentials and short-lived CI/CD tokens that native stores can't. Every secret has a maximum lifetime.
| Secret Type | Store | Rotation |
|---|---|---|
| Application secrets | HashiCorp Vault | 30 days |
| Database credentials | Vault dynamic secrets | Per-session (ephemeral) |
| Encryption keys | Azure KV / AWS KMS / GCP KMS | Annual |
| Service account keys | Prohibited | Workload identity |
| CI/CD tokens | Vault + OIDC federation | Per-pipeline-run |
Database credentials are ephemeral. CI/CD tokens live for minutes. Long-lived credentials are the single largest attack surface — eliminate them entirely.
Vending & Cost
Provisioning a new workload environment should take minutes, not weeks. A vending machine automates the full lifecycle: request, approval, provisioning, baseline policies, RBAC, budget alerts, and network peering — all via IaC pipelines. A developer submits a structured request and the platform handles everything else:
request:
workload_name: "payment-service"
business_unit: "fintech"
environment: "prod"
cloud_provider: "aws"
landing_zone: "regulated"
compliance_scope: ["pci-dss", "sox"]
data_classification: "restricted"
regions:
primary: "us-east-1"
dr: "us-west-2"
budget_monthly_usd: 15000
FinOps
Cost visibility is only useful if it's actionable. Every resource is tagged with cost-center, owner, and environment — enforced by policy, not convention. Chargeback happens automatically.
| Pillar | Implementation |
|---|---|
| Visibility | All resources tagged with cost-center, owner, environment (policy-enforced) |
| Allocation | Chargeback to BU based on actual usage |
| Optimization | RIs, Savings Plans, CUDs reviewed quarterly — 70% coverage target |
| Governance | Budget alerts at 50/75/90/100%. Auto-notify owner + manager |
Disaster Recovery
DR tiers are assigned per workload based on business impact, not technical preference. Tier 0 is reserved for systems where even minutes of downtime have material consequences.
| Tier | RPO | RTO | Strategy |
|---|---|---|---|
| Tier 0 | 0 | < 15 min | Active-active, multi-region |
| Tier 1 | < 1 hr | < 1 hr | Warm standby |
| Tier 2 | < 24 hr | < 4 hr | Pilot light |
| Tier 3 | < 72 hr | < 24 hr | Backup & restore |
Cost anomaly detection runs on all three clouds. A >20% day-over-day increase triggers automatic investigation.
Implementation
This architecture is delivered in phases over 12 months. The sequence matters — identity and networking must be in place before landing zones make sense, and compliance tooling is validated against real workloads during migration.
| Phase | Duration | Scope |
|---|---|---|
| Foundation | Months 1–3 | Identity federation, networking hubs, central logging, baseline policies |
| Landing Zones | Months 3–6 | Vending automation, RBAC, CI/CD, first migrations |
| Compliance | Months 6–9 | CSPM, GRC integration, audit preparation, regulated zones |
| Optimization | Months 9–12 | FinOps maturity, commitment purchases, DR testing |
| Continuous | Ongoing | Policy iteration, new frameworks, sovereign regions, M&A |
Team
Building this requires 5–7 engineers during the 12-month build phase, scaling down to 3–4 for steady-state operations. These aren't generalists — each role maps to a specific domain of the architecture.
| Role | Count | Scope |
|---|---|---|
| Platform Architect | 1 | Overall design, cross-cloud standards, stakeholder alignment |
| Cloud Engineers | 2–3 | IaC modules, vending pipelines, landing zone provisioning (ideally one per primary cloud) |
| Identity & Security Engineer | 1 | Entra ID federation, RBAC design, PIM/JIT, Vault integration |
| Network Engineer | 1 | Hub-spoke topology, colocation fabric, IP allocation, firewall rules |
| Compliance / GRC Lead | 1 | Policy-as-code authoring, audit prep, framework mapping |
| FinOps Analyst | 0–1 | Tagging enforcement, chargeback, commitment optimization (can be part-time or shared) |
After the build phase, the Platform Architect role transitions to part-time oversight. Cloud Engineers rotate into an on-call model. The steady-state team of 3–4 handles policy updates, new landing zone requests, incident response, and FinOps reviews.
The most common mistake is understaffing identity and networking. These two domains block everything else — if they slip, the entire timeline shifts.
Tooling
| Function | Tool |
|---|---|
| IaC | Terraform (OpenTofu) |
| Secrets | HashiCorp Vault |
| CSPM | Wiz / Prisma Cloud |
| SIEM | Sentinel / Splunk |
| IdP | Entra ID |
| Observability | Datadog / Grafana Cloud |
| GRC | Drata / Vanta |
| FinOps | CloudHealth / Apptio |
| Developer Portal | Backstage |
This architecture maps directly to Microsoft's Cloud Adoption Framework, AWS Control Tower, and Google Cloud Architecture Framework. The value is in the cross-cloud unification — same principles, same controls, one operating model.