pgWarden

Infrastructure-Level Enforcement for PostgreSQL

A deterministic, policy‑enforcing PostgreSQL access proxy with an optional control plane and AI drift‑detection signals

1. Project Overview

pgWarden is a PostgreSQL wire‑protocol proxy that enforces least‑privilege, context‑aware access to sensitive data at the database boundary rather than inside application code.

It is designed for environments where:

pgWarden is shipped as a single product name (“pgWarden”) for Docker and Kubernetes distribution. Internally, it is composed of orthogonal services that can be deployed together or separately:

  1. Data Plane (Proxy / Enforcement Layer)
  2. Control Plane (Policy Definition & Compilation)
  3. Auth (Session Attestation in the Control Plane; optionally backed by a separate IdP such as Keycloak)
  4. Signal Plane (Audit, Drift & Anomaly Detection – optional; orthogonal)

Deterministic enforcement is always the source of truth. Heuristic detection systems never gate access.

2. Design Principles

3. Data Plane: Postgres Access Proxy (Enforcement Layer)

3.1 Responsibilities

The data plane is a PostgreSQL‑compatible wire‑protocol proxy that:

The proxy does not:

3.2 Context‑Bound DSNs

Each inbound DSN represents a policy surface, not a database.

Examples:

Each DSN maps to:

3.2.1 Security posture

In v1, authorization is primarily DSN/context‑based.

Upstream credential isolation (v1):

This reduces the blast radius of credential leakage from source repos.

3.3 Enforcement Model

Access control is enforced by:

Masking strategies (v1):

No salts/keys are required for these strategies.

The proxy ensures clients cannot escape their assigned context, even if client credentials leak.

3.4 Credential rotation (v1)

Credential and certificate rotation is manual and declarative in v1:

3.5 Upstream leases, pooling, and seamless refresh (v1)

Upstream credentials are never embedded into application repos. Instead, the proxy obtains authorization to create upstream connections via short‑lived leases issued by the control plane.

Lease model (connection-creation only):

Defaults (global; per‑DSN override):

Seamless refresh:

Failure behavior:

4. Control Plane: Policy Definition & Compilation

4.1 Purpose

The control plane exists to make database access policies:

It coexists with existing production database administration.

The control plane does not replace SQL and does not expose raw database access to operators.

4.2 Core Responsibilities

4.3 Policy Model (Conceptual)

Policies describe:

Policies are declarative and compiled into:

4.4 Compilation Flow

  1. Operator defines or updates a policy
  2. Control plane validates policy constraints
  3. Policy is compiled into deterministic Postgres artifacts
  4. Artifacts are applied idempotently to the target database(s)
  5. Proxy reloads mappings without downtime

Compiler permissions (explicit):

Coexistence rule: pgWarden must avoid clobbering non‑pgwarden objects. In practice, this is achieved by a clear ownership boundary (naming convention + metadata table) and by only mutating objects it owns.

4.5 State & Persistence

4.6 Artifact Ownership & Reconciliation

pgWarden must coexist with normal DB operations while remaining authoritative for pgwarden‑managed objects.

Ownership boundary (recommended):

Reconcile behavior (desired):

Rationale: overwrite‑back provides the least surprising path to “working state” for managed objects, while still allowing coexistence for everything else.

5. Audit & Observability

5.1 Deterministic Audit Signals

For every session, pgWarden emits structured metadata such as:

No query payloads are recorded. Audit output is derived metadata only; raw PII is not persisted in pgWarden‑owned stores.

5.2 Compliance & Forensics

Audit data is designed to support:

6. WardenSense: Activity Drift & Anomaly Detection (Optional, Orthogonal)

WardenSense is pgWarden’s optional heuristics signal service. It is toggle-able per DSN connection context and is OFF by default.

6.1 Purpose

WardenSense detects unexpected patterns in database access behavior—especially from AI-driven workloads—without influencing enforcement.

It exists solely to:

6.2 Deployment Shape

WardenSense runs as its own binary/service with its own database.

6.3 Detection Strategy (Current State)

The long-term intent for WardenSense is to support learned behavioral baselines (e.g., regression or ensemble-based models). In the current implementation, WardenSense deliberately uses deterministic, probabilistic heuristics rather than trained models.

Rationale

An early design goal for WardenSense was to support learned behavioral baselines. In practice, introducing a trained model without sufficient historical data would have produced worse outcomes than explicit heuristics.

A poorly trained model introduces:

In this context, a weak model would degrade operator trust while providing no meaningful improvement over simpler approaches.

Given limited and evolving data, deterministic heuristics provide:

For an advisory-only system, heuristics dominate poorly conditioned models on both correctness and operational trust.

Current Detection Pipeline

WardenSense operates over a strictly bounded and well-defined data source to preserve determinism and explainability.

Data Source

Feature Aggregation (per window) For each bucketed scope and window, the following features are computed:

Baseline Statistics For each scope, the last baselineWindows = 20 feature rows are used to compute:

Z-Score Computation Z-scores are computed as:

If stddev == 0, the z-score is treated as nil and the signal is considered stable.

Deterministic Rule Evaluation The following rules are evaluated for each window:

  1. error_rate_high
    • Trigger: error_rate >= ErrorRateHighThreshold
    • Default: 0.2
  2. error_rate_drift
    • Trigger: abs(z_error_rate) >= ZThreshold
    • Default ZThreshold: 3.0
  3. latency_drift
    • Trigger: abs(z_latency) >= ZThreshold
  4. query_volume_drift
    • Trigger: abs(z_query_count) >= ZThreshold

When any rule triggers, an alert is written to ws_alerts with the reason, scope, window, and computed z-scores.

Configuration Knobs

All thresholds, window sizes, and evaluation cadence are configuration-driven.

Design Invariant

Even with probabilistic heuristics, WardenSense never gates access or mutates enforcement policy. All outputs are advisory signals intended for human review.

Upgrade Path

The current architecture intentionally isolates feature aggregation from detection logic. This allows future replacement of heuristic evaluators with learned models once sufficient data and feedback loops exist, without altering ingestion, storage, or proxy behavior.

Future learning-based approaches are only justified once sufficient data exists to materially outperform explicit heuristics without sacrificing explainability.

7. Threat Model Summary

Protected against:

Not intended to protect against:

8. Non‑Goals

9. Deployment Model (High Level)

Works with:

10. What I’d Change Next

With additional time and scope:

None of these would alter the core enforcement model.

Closing Note

pgWarden is intentionally boring in its enforcement logic. The goal is not cleverness, but predictability under failure.

Most of the complexity in this system exists to ensure that should things go wrong the outcome is auditable and safe.