Security model

Seven guardrail categories ship in v0.1.0. No minimum-viable version skips any of them; the minimum viable MCP includes them all. This page is the customer-facing version of the contract; the implementation lives in the gateway-core repository.

~10 min read

1. Authentication and authorisation

OAuth 2.1 with PKCE for HTTP transport. Resource Indicators (RFC 8707) prevent confused-deputy attacks where a token issued for one resource is replayed against another. Tokens are short-lived and scope-specific: read tokens expire in 15 minutes, write tokens in 5 minutes. Rotation happens automatically; no long-lived bearer tokens.

Dynamic client registration uses CIMD (Client ID Metadata Documents) so the gateway can register new MCP clients without out-of-band coordination, while still verifying client identity cryptographically.

For the Paystack server specifically: read tools (verify, list, get) use read-scoped tokens; the two write tools (create_customer, initiate_refund) require write-scoped tokens. The customer-facing OAuth scope set lives in the Paystack landing page.

2. Input validation

Every tool has a Zod schema with .strict() at the boundary. Unknown fields are rejected, not silently accepted. Numeric ranges are bounded; strings have maximum lengths; enums are explicit. Money is always an integer in kobo (the smallest NGN unit), never floating-point, never decimal strings.

When validation fails, the tool returns a structured error (isError: true) with a human-readable message, not a thrown exception. The LLM sees the error and can adapt or surface it to the user; the MCP transport stays clean.

3. PII and data sovereignty

Every Paystack response runs through the gateway-core redactor before it reaches the LLM. The customer object carries an email, a phone, a first name, and a last name; all four are replaced with random vaulted tokens. The token-to-value mapping lives in an encrypted Supabase table with a TTL, never in process memory.

The detection layer is intentionally layered: structured field-name detection (the redactor knows that email contains an email), multi-region regex (Nigerian phone formats, Kenyan, South African, Ghanaian), and named-entity recognition for names and addresses that escape the structural layer. Low-confidence detections are logged for review.

Honest limit: regex alone is insufficient. The combined detection is more robust than any single mechanism, but no PII detection system is perfect, and we publish failure modes alongside the success cases. A name like “Bashir Yusuf” appearing in a free-text field is caught by the NER; an unusual name in an unusual field shape may not be. The redactor logs every low-confidence case; the audit log carries every redaction decision.

Tokens are not deterministic. The same email redacted twice produces two different tokens. This is a deliberate choice; deterministic hashing would let the LLM correlate identities across calls without ever seeing the plain value, defeating the redactor's purpose. The vaulted store is what makes correlation possible for legitimate flows (a refund follow-up to the same customer uses the vault, not the token).

4. Human-in-the-loop gates

Two Paystack tools are gated: create_customer (creates a record in your Paystack account) and initiate_refund (moves money). The gate is a complete state machine, not a confirm dialog.

The tool writes a pending record to hitl_pending and returns a pending status.
The user is notified via Brevo (email) and, when configured, a WhatsApp template message.
The confirmation handler verifies the user's 4-digit PIN (bcrypt-hashed via pgcrypto), transitions the pending record to confirmed or cancelled, and only then dispatches the Paystack call.
Pending records expire automatically: 5 minutes for create_customer, 1 minute for initiate_refund. Expired records become expired in the audit log and never execute.

The read tools are not gated. An AI agent that calls verify_payment or list_transactions does not require a PIN; only the writes do.

5. Semantic safety

Prompt injection scanning runs on every tool input. The scanner is normalised against paraphrase and homoglyph attacks: it does not match a single string “ignore previous instructions”; it matches the underlying semantic pattern.

Tool descriptions are cryptographically signed at build time; if a description is mutated in transit, the gateway refuses to list the tool. There is no dynamic tool selection by the LLM; the catalogue is fixed per server and changes only on a new release.

6. Rate limiting

Per-user, per-tool, per-window limits, backed by an atomic Postgres function (consume_token) in Supabase. The token-bucket algorithm handles bursts within the per-hour limit while refusing sustained excess. The per-tool limits are listed on the tools reference.

On top of rate limits: a per-call token budget (the gateway paginates beyond 5,000 tokens), per-customer cost accounting so platforms can attribute LLM usage back to their end-users, and a circuit breaker on the Paystack API itself. The breaker opens after three consecutive 5xx responses and auto-recovers after 30 seconds.

7. Observability

Structured logs in JSON to stderr (never stdout; stdout corrupts JSON-RPC framing on stdio transport). Sentry for error tracking, with a beforeSend hook that scrubs PII patterns before the event leaves the process. PostHog with person_profiles: "identified_only" so anonymous traffic does not create profiles.

The audit log is hash-chained: every entry includes the hash of the previous entry. Tampering with a historical entry breaks the chain at the next verification run. The verifier is open-source; anyone with a copy of the audit table can re-derive the chain.

Decision-path logging is mandatory for HITL gates: every state transition records who, when, and why. A subsequent NDPC audit can reconstruct any approval or cancellation from the log alone.