A stressed CTO looks at a laptop screen showing urgent, abstract data visualizations, reflecting a critical multi-tenant SaaS architecture problem.
For funded AI SaaS startupsUpdated

Multi-Tenant SaaS Architecture — Built So Tenants Cannot Leak

RLS-enforced tenant isolation, JWT auth with refresh-token rotation, per-tenant pricing on Stripe — architected so the 'one tenant saw another tenant's data' incident structurally cannot happen.

The problem

Every funded SaaS founder eventually faces the same architectural decision and most of them postpone it past the point where it stops being cheap.

Tenancy fractures across the second and third app

The MVP starts with a single org_id column, a few hand-rolled WHERE clauses, and a JWT that carries the user's identity. Six months in, the team adds a second app — a customer portal, an admin dashboard, a worker that processes uploads — and the tenancy model fractures. One service filters by tenant_id correctly. Another forgets in a single background job. A third trusts the JWT but does not validate that the requested resource actually belongs to the requesting tenant. The first time anyone notices is when a customer files a support ticket containing a screenshot of another customer's data, and from that moment the company is in an incident that ends careers and sometimes ends companies.

Cross-tenant exposure is not bounded by engineering time

For a Series A SaaS doing $5M ARR across a few hundred tenants, the cost of getting this wrong is not bounded by engineering time. It is bounded by churn, by the contractual indemnity clauses your enterprise customers asked for, and by the SOC 2 audit you are mid-way through. A single confirmed cross-tenant data exposure event tends to surface in three places: an emergency board call, a public security advisory, and a wave of cancellations from the customers who were paying you precisely because they trusted you with their data. The technical fix afterwards is a quarter of engineering work. The trust rebuild is longer than that and sometimes it does not finish.

Tenant isolation cannot be middleware

The deeper problem is that the brittle version of multi-tenancy looks fine in code review. WHERE tenant_id = $1 reads like an obvious safeguard. It only fails in the edge cases — a new endpoint a junior engineer shipped on a Friday, a worker that runs as a service account with elevated permissions, an analytics query that joined across orgs without realizing it. Architecture has to assume those edge cases will exist and prevent them at a lower layer than human discipline.

Engineers collaborate around a whiteboard with abstract data flow diagrams, building a robust multi-tenant SaaS architecture.

What changes for your business

A correctly architected multi-tenant SaaS pushes the tenant boundary down into Postgres itself, instead of asking every engineer who writes a query to remember it. Postgres Row Level Security treats a missing tenancy filter the safe way — when RLS is enabled and no policy matches, the official Postgres documentation describes a default-deny: no rows are returned, no rows can be modified. A forgotten WHERE clause in application code becomes an empty result set instead of a leak, and the leak class of bug becomes structurally impossible rather than merely unlikely.

Identity layer carries tenant as a JWT claim

The architecture has three layers, each enforcing tenant scope independently. First, identity: a JWT issued by the auth provider carries the tenant_id (or org_id) and role as claims, with short-lived access tokens and rotated refresh tokens. Supabase Auth, for example, uses single-use refresh tokens with a 10-second reuse window and revokes the entire session if a used token is replayed outside that window — which is the mechanism that catches a stolen token instead of letting it grant indefinite tenant access.

Database enforces with FORCE ROW LEVEL SECURITY

Second, database: every tenant-scoped table has RLS enabled with FORCE ROW LEVEL SECURITY set (because Postgres table owners would otherwise bypass the policy), and policies read tenant_id directly from the JWT via auth.jwt() or a session variable. Third, application: API handlers still filter by tenant_id explicitly, because defense in depth is the point — the application filter and the RLS policy each catch what the other misses.

Risky decisions stop being risky

For your business, this changes which engineering decisions feel risky. Adding a new background job that touches customer data stops being a security review. Onboarding a new engineer stops requiring a week of "here is how our tenancy works, please do not forget it." Shipping a second app into the same suite — a mobile companion, an admin console, a reporting tool — stops requiring a tenancy audit because the apps connect to the same database and inherit the same RLS policies automatically.

Per-tenant pricing without a custom billing layer

The architecture also makes per-tenant pricing tractable: each tenant maps to one Stripe Customer, plan and seat and usage live on Stripe, and entitlement is computed from Stripe state and cached locally via webhook handlers. Stripe supports multiple simultaneous subscriptions per customer, which covers the common combinations — a flat plan plus a usage add-on, an annual base plan plus monthly seats — without a custom billing layer.

The outcome the founder cares about: the 3am incident where one tenant sees another's records does not happen, the team ships features instead of auditing tenant scope, and pricing experiments become a Stripe configuration change instead of a sprint.

A calm founder views a clean, organized dashboard on a laptop, reflecting the success of a secure multi-tenant SaaS architecture.

More on this

What gets shipped

An engagement leaves your codebase with a tenancy layer your team can extend without calling an outside engineer back in. Concretely:

  • A tenancy schema: tenants (or organizations), memberships, roles, and the tenant_id column on every tenant-scoped table, with foreign keys and indexes set up so RLS policies can push down into index scans.
  • RLS policies on every tenant-scoped table, with ALTER TABLE ... FORCE ROW LEVEL SECURITY set on each, plus a CI check that fails the build if a new migration adds a tenant table without enabling and forcing RLS.
  • A JWT-based auth integration where tenant_id and role are claims on the access token, refresh tokens rotate on every use, and session revocation propagates within the refresh window.
  • Cross-app SSO wiring so users sign in once and land in any app in the suite with their tenant scope intact — verified against the same JWT issuer, reading the same claims.
  • Per-tenant Stripe Customer mapping with webhook handlers that translate Stripe state (subscription, seats, usage, credit pools) into entitlement rows in your database, so the application reads entitlement from your domain instead of round-tripping to Stripe.
  • A tenancy audit script that walks the schema, flags any tenant-scoped table missing RLS or missing FORCE, and reports any policy that references mutable JWT metadata (raw_user_meta_data) instead of the immutable raw_app_meta_data — the latter is the security-relevant claim source.
  • A runbook covering the failure modes that actually occur — RLS silently bypassed because FORCE was not set, policies that depended on auth.uid() returning null for service-account queries, refresh-token rotation breaking a long-running mobile session, Stripe webhook outages drifting tenant entitlement out of sync.
-- Every tenant-scoped table gets this treatment, not just enable
alter table public.documents enable row level security;
alter table public.documents force  row level security;

create policy tenant_isolation_modify on public.documents
  for all using (
    tenant_id = ((auth.jwt() -> 'app_metadata') ->> 'tenant_id')::uuid
  ) with check (
    tenant_id = ((auth.jwt() -> 'app_metadata') ->> 'tenant_id')::uuid
  );

This is the same architecture BoostFrame Engineering AI runs across seven production apps today, with the tenancy boundary enforced in Postgres, identity centralized at the auth provider, and per-tenant billing wired into Stripe through the same idempotent webhook pattern used across the suite.

What buyers ask first

The questions a technical founder asks before signing a multi-tenancy engagement tend to cluster around four: "Can you migrate our existing app without breaking paying customers?", "How do you avoid the tenant-leak incident I keep reading about on Hacker News?", "Will RLS slow down our queries?", and "How do we price per tenant without writing a billing layer?" The short answers: yes, with a backfill that gets its own rollback path; the FORCE ROW LEVEL SECURITY default plus a CI check that no new table ships without it; not in any way you will notice if tenant_id is indexed; and one Stripe Customer per tenant with entitlement cached locally via webhook. The FAQ below covers the longer versions.

Common failure modes

The tenancy bugs that take down SaaS apps cluster into a few patterns that are worth naming explicitly so your team recognizes them on sight.

The first is RLS enabled but not forced. Postgres lets table owners bypass RLS by default, which means a policy that passes manual testing under a normal user account can be silently bypassed in production by a migration that runs as the table owner. The fix is one ALTER TABLE per table, plus a CI check that the audit script enforces.

The second is reading tenant_id from mutable JWT metadata. raw_user_meta_data is updatable by the user — a malicious actor can change their own claim and read other tenants' data. Tenant claims belong in raw_app_meta_data, which only the service role can update.

The third is service-account queries that bypass tenancy by design. Background jobs, admin tools, and migration scripts often run with a service role that has BYPASSRLS or table-owner privileges. That is fine as long as those queries are audited individually — but the moment a service-account query is reused inside a user-facing endpoint, the tenant boundary disappears. The architecture separates the two paths so this confusion is hard to introduce.

The fourth is webhook-driven entitlement drift. If a Stripe webhook fails and the retry does not land, the tenant's subscription state diverges from your local entitlement cache and the customer either keeps access they no longer pay for or loses access they do. The reconciliation script (the same pattern used in the Stripe billing engagement) walks events.list and re-runs any handler whose row is missing.

How BoostFrame approaches this

BoostFrame Engineering AI (BFEAI) runs the architecture described above across seven production apps, with shared identity, RLS-enforced tenant isolation, and per-tenant Stripe billing — the same stack that has generated 200K+ AI-assisted keywords, run 1,500+ AI scans, and automated 7,000+ sites for paying customers across the suite. The engagement is sized to your stage: 3–5 weeks for a seed or Series A startup, scoped against whether you are greenfield or refactoring, with the goal of leaving your team confident enough to ship the second and third apps in the suite without calling an outside engineer back in.

The deliverable is working code in your repository — RLS policies, the CI check that enforces them, the auth integration with refresh-token rotation, the per-tenant Stripe wiring — plus a runbook for the failure modes above and an audit script your team runs before every release.

Outcomes you should expect

What this delivers

  • Ship a tenancy model that does not need to be rewritten at Series B — RLS-enforced isolation that scales from 10 tenants to 10,000 without changing the data layer.
  • Avoid the single failure mode that ends SaaS companies — accidental cross-tenant data exposure, where one customer sees another's records and the incident hits Twitter before legal hears about it.
  • Wire per-tenant pricing tiers so plan changes are configuration, not engineering — onboard a new pricing experiment in days instead of a quarter.
  • Hand your team an auth layer they can extend — cross-app SSO, JWT with rotating refresh tokens, and session revocation that actually invalidates active sessions instead of waiting for tokens to expire.

Industry data

By the numbers

  • Postgres Row Level Security uses a default-deny policy — when RLS is enabled on a table and no policy exists, no rows are visible or modifiable, so the safe default for a tenant table is total lockout until a tenancy policy is written.

    Source ↗

  • Postgres table owners normally bypass row security, which means a tenancy policy that looks correct in testing can be silently bypassed in production unless ALTER TABLE ... FORCE ROW LEVEL SECURITY is set on every tenant-scoped table.

    Source ↗

  • Supabase's documentation states that RLS must always be enabled on any tables stored in an exposed schema, and that once RLS is on, no data is accessible via the API with a publishable key until a policy is written — the platform's security model assumes RLS is the tenant boundary.

    Source ↗

  • Supabase Auth refresh tokens are single-use with a default 10-second reuse window; if a revoked refresh token is presented outside that window, the entire session is terminated and all refresh tokens belonging to it are revoked — which is how compromised tokens get caught instead of granting indefinite tenant access.

    Source ↗

  • Stripe supports creating multiple simultaneous subscriptions per customer, each with separate billing periods and invoices, which is what lets a multi-tenant SaaS price tiers per workspace rather than per end-user without writing a custom billing layer.

    Source ↗

Live in production today

The same engineering, shipped in production at BFEAI.

I'm co-founder & CTO of Be Found Everywhere (BFEAI), a 7-app AI SaaS platform running today. The work I deliver for clients is the work I do every week on my own platform.

7

Production apps

200K+

Keywords generated

1,500+

AI scans run

7,000+

Sites automated

Common questions

What buyers ask before reaching out

Why use Postgres RLS instead of just filtering by tenant_id in application code?

Application-level filtering puts your entire tenant boundary on the shoulders of every engineer who writes a query. One forgotten WHERE clause in a new endpoint and a customer is reading another customer's data. RLS pushes the boundary down to Postgres itself — a missing tenant filter in application code just returns zero rows instead of the wrong rows. It is defense in depth, not a replacement for application logic, and the official Postgres documentation describes it as a per-user restriction evaluated for each row before the user's query runs.

Doesn't RLS slow down queries?

RLS adds a WHERE-clause-equivalent to every query the planner sees, so the cost is dominated by whether the underlying indexes cover that filter. With a properly indexed tenant_id (or org_id) column and policy expressions that the planner can push down into the index scan, the overhead is small for most workloads. Where it gets expensive is on poorly indexed tables and on complex policy expressions that call functions — both of which are solvable with normal Postgres tuning.

How do you handle cross-app SSO across multiple SaaS apps in the same suite?

Centralize identity in one auth provider (Supabase Auth, Auth0, Clerk, or your own) and have every app verify the same JWT issuer. Each app reads tenant and role claims from the JWT and applies them to its own RLS policies, so the user signs in once and lands in any app with their tenant scope intact. Refresh-token rotation lives at the auth provider level, which is also where session revocation happens — sign-out propagates because the next refresh fails.

What's the 'tenant data bleed' failure mode you keep mentioning?

It's the SaaS founder's worst case: a query in one app — usually added in a hurry, often in a background job or admin endpoint — reads data scoped to the wrong tenant_id. With RLS enabled and FORCE ROW LEVEL SECURITY set, that query returns zero rows. Without RLS, it returns somebody else's data, and the first time you find out is when a customer screenshots their competitor's records in a support ticket. The architecture is built so this class of bug is structurally impossible, not just unlikely.

How do you price per tenant without writing a custom billing layer?

Each tenant gets its own Stripe Customer record and one or more subscriptions tied to that customer. Plan tier, seat count, and metered usage all live on Stripe; entitlement (what the tenant can access) is computed from Stripe state and cached in your database via webhook handlers. Stripe supports multiple simultaneous subscriptions per customer with separate billing periods, which covers the cases where one tenant is on an annual flat plan and a usage-based add-on at the same time.

What happens if a refresh token leaks?

With refresh-token rotation, the leaked token only works once. The next legitimate refresh by the real user gets rejected because the token was already used, and the auth provider — Supabase Auth documents this as the default behavior — revokes the entire session and forces re-authentication. That is the point of rotation: a stolen long-lived token gets caught instead of granting indefinite tenant access.

Can you migrate an existing single-tenant app to this architecture, or only greenfield?

Both. Migrations are a different shape — you typically add tenant_id (or org_id) columns to existing tables, backfill from current user-to-org mappings, add RLS policies behind a feature flag, run a shadow audit to confirm policies do not break legitimate queries, then flip enforcement on. The risk is concentrated in the backfill step; we scope migrations so that step has its own checkpoint and rollback path.

How long does a typical engagement run?

A multi-tenant architecture engagement for a seed or Series A startup is typically a 3–5 week build, scoped against your stack and how much of the tenancy story you already have. Greenfield apps land on the shorter end because there's nothing to migrate. Mid-flight refactors of an app that already has paying customers run longer because the data migration and the cutover both need their own plan.

Ready to see if this is a fit?

A 15-minute call. No deck, no slides. We talk about what you're shipping and where engineering is the bottleneck. Either way, you walk away with a senior engineer's read on your situation.