RFC: Dynamic Tables Data Model Modernization

Authors: - Prathik Shetty(@pshettydev)
Date: 2024-11-19
Status: Draft

Abstract

Our current Dynamic Tables implementation persists tenant-defined data using an Entity–Attribute–Value (EAV) layout (TableMeta → Field → Record → Data). This maximizes flexibility but produces O(columns) joins per row, shifts type enforcement entirely to the application (which can be managed), and but it makes simple filters or aggregates super expensive (as explained in RFC-002-eav-approach.md).

After evaluating few of the practices from open-source projects and our workload, the recommended path is to collapse per-record values into a single jsonb column with targeted indexes. This keeps runtime schema changes, removes the Data table, and unlocks native PostgreSQL JSON querying with far fewer trade-offs than enhanced EAV variants.

Motivation

Performance bottlenecks: Fetching or updating one logical row can touch N data rows, leading to super-linear latency and complex query plans that prevent PostgreSQL from using indexes effectively.
Type enforcement gap: All values are stored as strings, so numeric/date validation lives entirely in the API layer and is inconsistently enforced.
Operational complexity: Supporting filters, sorts, and aggregates requires application-driven pivots, caches, or materialized views that are expensive to maintain.
Product velocity: The EAV storage model makes adding richer field types and per-field history harder than necessary, because each new capability still funnels through a single string column.

Approaches

1. Status Quo EAV

Description: Maintain data rows keyed by (record_id, field_id) with a single value string column.
Strengths: Unlimited flexibility, sparse storage, and a natural fit for cell-level auditing.
Weaknesses: Multiplicative reads/writes, expensive joins, no native type checking, no efficient range queries, and complicated SQL for even basic predicates.
Verdict: Works for prototypes but scales poorly past tens of fields per table or thousands of records per tenant.

2. Typed-Value EAV

Description: Keep the data table but add typed columns (string_value, int_value, date_value, etc.) plus indexes per type.
Strengths: Improves type safety and allows narrower indexes for hot types.
Weaknesses: Still requires N joins per row, bloats the data table width, complicates migrations for new types, and adds NULL-heavy storage.
Verdict: Marginally better than status quo but retains the core join explosion and operational drag.

3. JSONB Per Record

Description: Store all field values inside Record.data Json @db.JsonB, keyed by field identifiers (field_<uuid>). Use GIN indexes plus generated columns for frequently filtered attributes.
Strengths: Fetch/update a full row with a single statement; Postgres can index JSON paths; predicates like data->>'status' = 'active' remain fast; schema remains dynamic; storage is proportional to populated fields only.
Weaknesses: Updates rewrite the full JSON blob (though still a single row), database cannot auto-validate JSON shape (validations from application layer), and large records require thoughtful patch semantics.
Verdict: Best balance of performance, flexibility, and operational simplicity for our use case.

4. Hybrid (Core columns + JSONB custom fields)

Description: Model stable, high-value columns (e.g., status, owner) as native Postgres fields and put custom fields in JSONB.
Strengths: Core queries stay simple SQL; JSONB handles tail-end customization.
Weaknesses: Requires up-front schema curation, complicates migrations when fields move between “core” and “custom”, and still inherits JSON-specific caveats.
Verdict: Useful when a small set of columns dominate workload; less compelling when nearly every tenant field is bespoke.

Comparison Matrix

Criterion	Status Quo EAV	Typed EAV	JSONB Record (Rec.)	Hybrid
Runtime schema changes	✅	✅	✅	✅
Row fetch complexity	❌ O(N joins)	❌ O(N)	✅ O(1)	✅
Type enforcement	❌	⚠️ Partial	⚠️ (app/schema)	⚠️
Indexing/queryability	❌	⚠️	✅ (GIN + generated)	✅
Operational overhead	❌	❌	✅	⚠️
Audit compatibility	✅	✅	✅ (snapshot JSON)	✅

Proposal

Adopt JSONB-backed records (approach 3) as the canonical persistence layer and retire the Data table.

model Record {
  id        String   @id @default(uuid())
  table_id  String
  tenant_id String
  data      Json     @db.JsonB   // { "<field_id>": <value> }
  created_at DateTime @default(now())
  updated_at DateTime @updatedAt
 
  @@index([table_id, tenant_id])
  @@index([tenant_id, data], type: GIN, ops: raw("(jsonb_path_ops)"))
}

Key Elements

Field addressing: Use deterministic keys (field_<uuid> or human-safe slugs) to avoid collisions during field renames.
Validation: Keep per-field JSON Schema (or Zod) definitions next to Field.options; validate before persisting and enforce server-side transforms for numeric/date types.
Indexes:
- A global GIN index on (table_id, data) for generic filtering.
- Generated columns for frequently filtered values (e.g., status), with dedicated B-tree indexes to keep ORDER/GROUP BY operations fast.
History: Replace cell-level auditing with per-record JSON snapshots or JSON diff storage in RecordHistory.
Access layer: Repository code reads one row per record, keeping hot data in memory minus the current join fan-out.

Migration Outline

Dual-write phase: Expand Record with a data_jsonb column. On write, populate both JSONB and legacy data rows.
Backfill: Script existing cells into JSONB grouped by (record_id) using a streaming job.
Cut-over: Flip readers to JSONB behind a feature flag once parity is proven with regression tests.
Cleanup: Drop dual writes, archive data table, and shrink indexes.

Residual Risks & Mitigations

Large document updates: Mitigate with jsonb_set operations and optimistic locking to avoid clobbering concurrent edits.
Validation drift: Keep schema definitions versioned per field and run periodic offline validators to catch inconsistencies.
Query ergonomics: Provide helper scopes (e.g., whereJsonEquals(table_id, field_id, value)) so product teams do not write raw JSONPath everywhere.

Implementation Notes

Prefer uuid keys for fields and records; avoid human-readable names in JSON to survive renames.
When a field must support range queries or sorting, promote it to a generated column (ALTER TABLE record ADD COLUMN amount_numeric numeric GENERATED ALWAYS AS ((data->>'amount')::numeric) STORED) and index it.
Keep the UI history features by recording per-field diffs in RecordHistory, even though primary storage is now JSONB.
Consider table-level row-level security (RLS) policies keyed off tenant_id to keep cross-tenant access safe without complex joins.

Next Steps

Prototype JSONB-backed read/write paths behind a feature flag in the dynamic tables service.
Benchmark critical workloads (list, filter, aggregate) comparing legacy EAV vs JSONB to establish performance deltas.
Finalize migration scripts and a rollback plan, then schedule a phased rollout per tenant cohort.
Update developer documentation and SDK helpers once JSONB storage is the source of truth.

RFC: Dynamic Tables Data Model Modernization

On this page