1. Introduction
1. Introduction
← Previous: Index | Index | Next: Requirements →
1.1 Background
1.1.1 The Identity Challenge
Modern cloud-native infrastructure runs thousands of non-human principals: containers, services, pipelines, operators, scheduled jobs, and increasingly, AI agents. Each of these workloads needs to:
- Authenticate: Prove its identity to other systems
- Authorize: Access only permitted resources
- Communicate securely: Encrypt traffic with verified peers
- Audit: Leave traceable evidence of its actions
Unlike human users who can interactively authenticate with passwords and MFA, workloads must authenticate programmatically, often at high frequency, without human intervention.
1.1.2 The Evolution of Workload Authentication
The industry has progressed through several generations of workload authentication:
| Generation | Method | Problems |
|---|---|---|
| Gen 1 | Static credentials in code | Credentials leaked in repos, never rotated |
| Gen 2 | Credentials in environment | Visible in process lists, config dumps |
| Gen 3 | Credentials in secrets | Still static, manual rotation burden |
| Gen 4 | Dynamic credentials | Short-lived, but "secret zero" problem |
| Gen 5 | Attestation-based identity | No pre-shared secrets, workload proves itself |
This RFC establishes a Gen 5 workload identity architecture based on attestation rather than pre-shared secrets.
1.1.3 Relationship to Human Identity
RFC-IAM-0001 establishes the human identity architecture:
This RFC establishes a parallel workload identity architecture:
Both architectures share:
- Azure AD as authorization ceiling: Ultimate source of truth for permissions
- Vault as credential authority: Issues dynamic credentials
- ESO for secret distribution: Kubernetes-native secret delivery
- Audit requirements: Traceability across all identity actions
But implement different patterns:
- Human: Interactive authentication, session-based access
- Workload: Programmatic authentication, certificate-based identity
1.2 Current State Analysis
1.2.1 Existing Workload Identity Patterns
Current workload authentication is fragmented across multiple approaches:
| Workload Type | Current Method | Issues |
|---|---|---|
| Kubernetes pods | ServiceAccount tokens | Legacy tokens long-lived, new tokens time-limited but not attested |
| CI/CD pipelines | Static secrets in pipeline config | Secrets visible to pipeline operators, rarely rotated |
| GitOps operators | Static tokens in Kubernetes secrets | Token compromise affects all deployed resources |
| External services | API keys, service accounts | Manual rotation, broad permissions |
| VMs/machines | SSH keys, static credentials | Key sprawl, no central management |
| AI agents | (not yet implemented) | No established pattern |
1.2.2 Identified Gaps
| Gap | Risk | Impact |
|---|---|---|
| No unified identity framework | Inconsistent security posture | Hard to audit, hard to secure |
| Long-lived credentials | Credential theft window | Compromised credentials usable indefinitely |
| No attestation | Impersonation possible | Stolen token works from anywhere |
| No delegation tracking | Lost accountability | Can't trace what acted on whose behalf |
| Siloed identity systems | Operational burden | Different patterns per system |
1.2.3 Compliance Requirements
Various compliance frameworks require workload identity controls:
| Framework | Requirement | This RFC Addresses |
|---|---|---|
| SOC 2 | Service account management | INV-2 (no long-lived credentials) |
| ISO 27001 | Access control for systems | INV-7 (namespace-scoped permissions) |
| PCI DSS 4.0 | Unique IDs for non-consumer users | INV-1 (cryptographic identity) |
| NIST 800-207 | Zero Trust for workloads | INV-6 (mTLS for service communication) |
1.3 Problem Statement
1.3.1 Core Problems
This RFC addresses five core problems:
Problem 1: Credential Management Complexity Organizations struggle to manage credentials for hundreds or thousands of workloads. Static credentials require manual rotation, lead to secret sprawl, and create operational burden.
Problem 2: Lack of Workload Attestation Traditional authentication relies on "something you know" (a secret). If that secret is stolen, the thief can authenticate from anywhere. Workload attestation proves "what you are" based on verifiable workload properties.
Problem 3: No Unified Identity Model Different workload types use different identity mechanisms: Kubernetes ServiceAccounts, cloud IAM, SSH keys, API tokens. This fragmentation makes it difficult to implement consistent security policies.
Problem 4: AI Agent Identity Gap Emerging AI agents (LLMs performing automated tasks) need to act on behalf of humans or other systems, but traditional identity models don't support delegation chains with proper accountability.
Problem 5: Cross-Boundary Identity Multi-cluster, multi-cloud environments need federated identity that works across trust boundaries while maintaining security guarantees.
1.3.2 Solution Approach
This RFC addresses these problems through:
| Problem | Solution Component |
|---|---|
| Credential complexity | Dynamic, short-lived credentials from Vault |
| Attestation gap | SPIFFE/SPIRE for attestation-based identity |
| Fragmented identity | Unified identity framework across workload types |
| AI agent gap | OAuth 2.0 Token Exchange for delegation |
| Cross-boundary | SPIFFE federation for trust domain bridging |
1.4 Document Structure
1.4.1 Section Overview
This RFC is organized into functional sections:
| Section | Purpose |
|---|---|
| §2 Requirements | What the architecture MUST achieve |
| §3 Architecture | How components fit together |
| §4 Components | What each component does |
| §5-10 Workload Categories | Patterns for each workload type |
| §11 Service Mesh | Network-layer identity |
| §12 Federation | Cross-boundary identity |
| §13 Rationale | Why we made these choices |
| §14 Evolution | Future considerations |
1.4.2 How to Use This Document
For understanding: Read sections 1-3 to grasp the overall architecture.
For implementation: Read section 4 for components, then the relevant workload category section (5-10) for your use case.
For security review: Focus on section 2 (invariants) and section 13 (rationale).
For operations: Sections 5-10 provide operational patterns for each workload type.
1.5 Terminology
Key terms used throughout this RFC:
| Term | Definition |
|---|---|
| Workload | Any non-human principal: container, service, pipeline, operator, agent |
| SPIFFE ID | A URI identifying a workload: spiffe://trust-domain/path |
| SVID | SPIFFE Verifiable Identity Document—a certificate proving identity |
| Attestation | Proof of workload properties (Kubernetes metadata, cloud instance, etc.) |
| Trust Domain | A SPIFFE administrative boundary |
| Delegation | A workload acting on behalf of another principal |
See Appendix A for complete glossary.
Document Navigation
| Previous | Index | Next |
|---|---|---|
| ← Index | Table of Contents | 2. Requirements → |
End of Section 1