1. Introduction and Motivation
1. Introduction and Motivation
1.1 Background and Context
Secret management is deceptively simple at small scale and brutally unforgiving at larger scale.
In early stages of a platform, secrets are often handled through a combination of:
- environment variables,
- CI/CD-injected credentials,
- manually created Kubernetes Secrets,
- and ad-hoc automation scripts.
This approach is common, pragmatic, and initially effective.
The platform described in this RFC followed exactly this trajectory.
It is important to state clearly:
The architecture proposed in this RFC was not designed or implemented from the beginning. It emerged as a necessity, driven by accumulated operational pain, recurring failures, and scaling constraints.
This RFC exists because the original system reached a point where:
- incremental fixes no longer worked,
- operational risk increased faster than delivery velocity,
- and secret management became a persistent source of incidents and human error.
1.2 The Initial System (Pre-Architecture State)
The original secret management approach evolved organically alongside the platform.
At a high level, it consisted of:
-
Secrets sourced from multiple places:
- local
.envfiles, - CI/CD environment variables,
- manually generated credentials in external dashboards,
- manually created Kubernetes Secrets.
- local
-
A growing collection of bash scripts responsible for:
- reading environment variables,
- generating random values,
- templating Kubernetes manifests,
- encrypting or sealing secrets,
- applying resources to the cluster.
-
Partial GitOps adoption:
- application manifests were Git-managed,
- secrets were only partially declarative.
Rotation, bootstrap, and recovery were handled through procedural execution, not through a system with explicit guarantees.
This system worked — until it didn't.
1.3 Operational Shortcomings
As the platform grew, several structural problems became apparent.
1.3.1 Secrets Were Not First-Class Entities
Secrets existed implicitly inside:
- scripts,
- CI/CD configuration,
- human memory,
- and external SaaS dashboards.
There was no single authoritative answer to:
- What secrets exist?
- Who owns them?
- Which services depend on them?
- Which ones expire and when?
The absence of a clear inventory made auditing, reasoning, and troubleshooting increasingly difficult.
1.3.2 Manual Rotation Became a Persistent Risk
Many secrets had finite lifetimes:
- API tokens,
- cloud provider credentials,
- SMTP passwords,
- identity provider secrets.
Rotation followed a fragile manual loop:
- Remember that a secret was expiring.
- Locate the correct dashboard or service.
- Generate a new credential.
- Update environment variables or local files.
- Re-run scripts.
- Re-apply manifests.
- Restart workloads and hope the change propagated correctly.
This process:
- did not scale,
- relied heavily on human discipline,
- and regularly failed silently.
Production issues caused by expired or partially rotated secrets became routine.
1.4 The Cost of Script-Driven Secret Management
Bash scripts became the backbone of the system.
Over time, they accumulated responsibilities well beyond their original intent:
- validation logic,
- secret generation,
- conditional execution,
- encryption and sealing,
- orchestration and ordering.
This introduced systemic problems:
-
Implicit state Script behavior depended on local files, cached outputs, and environment variables that were not versioned or observable.
-
Non-reproducibility Running the same script on two machines could yield different results.
-
Opaque failure modes Partial failures were common and difficult to diagnose.
-
Human coupling Correct operation depended on tribal knowledge rather than enforced guarantees.
The scripts did not merely automate work — they became an undocumented control plane.
1.5 Why Incremental Fixes Failed
Multiple attempts were made to improve the system incrementally:
- better scripts,
- stricter operational runbooks,
- stronger encryption mechanisms,
- additional validation steps.
These efforts reduced symptoms but never addressed the root problem.
The underlying issue was architectural:
Secrets were treated as deployment artifacts instead of lifecycle-managed system resources.
As long as secrets remained procedural, scattered, and human-driven, complexity and risk continued to grow.
1.6 Why a Dedicated Secrets Platform Became Necessary
At scale, secret management requires:
- explicit authority boundaries,
- automated lifecycle handling,
- deterministic bootstrap,
- safe and observable rotation,
- strong auditability,
- and reproducibility from source control.
These requirements CANNOT be satisfied by:
- scripts,
- ad-hoc conventions,
- or partial GitOps adoption.
The system described in subsequent sections represents a structural correction, not an optimization.
It formalizes secret management as:
- a platform subsystem,
- with clearly defined phases,
- controlled handovers,
- and strict separation between bootstrap, runtime, and rotation responsibilities.
Only with such a system can the platform:
- eliminate entire classes of human error,
- scale across clusters and environments,
- and remain operationally sustainable.
Document Navigation
| Previous | Index | Next |
|---|---|---|
| — | Table of Contents | 2. Requirements → |
End of Section 1