← Back to blog

The Circular RLS Trap: What We Learned Building Multi-Tenant PostgreSQL for Senquel

This post is written for developers who are new to PostgreSQL's Row Level Security — or who understand the concept but haven't yet hit the specific failure modes that come with multi-tenant schema design. Every section below maps to something that went wrong in production, or a decision we got right by accident and then had to understand properly.

If you're already comfortable with RLS basics, jump to section 4 where the circular recursion bug starts.

1. What "multi-tenant" means for a database

Senquel is used by multiple interior design firms. Each firm's data — projects, clients, invoices, messages — must be completely invisible to every other firm. There are two common ways to enforce this:

  • One database per tenant: Physically separate databases. Simple isolation, but expensive to operate and painful to query across.
  • Shared database, tenant column: All firms share one database. Every row carries a firm_id column identifying which firm owns it. Cheaper, easier to maintain — but isolation is only as strong as your enforcement mechanism.

Senquel uses the second approach. The enforcement mechanism is PostgreSQL's Row Level Security.

📘 What is Row Level Security (RLS)?

RLS is a PostgreSQL feature that lets you attach a filter condition to a table. Every query against that table — regardless of which application, user, or tool is making the request — automatically has that condition applied.

Think of it like an invisible WHERE clause that can't be removed from outside the database. Even if your application code has a bug that forgets to filter by firm, the database adds it anyway.

Without RLS: SELECT * FROM projects → returns all projects for all firms.
With RLS: SELECT * FROM projects → returns only the projects belonging to the logged-in user's firm.

Entity-relationship diagram: Senquel database tables

firms id uuid PK name text created_at timestamptz firm_members firm_id uuid FK → firms user_id uuid FK → auth.users role owner|pm|designer whatsapp_phone text projects firm_id uuid FK → firms id uuid PK client_phone text E.164 name, status... project_members project_id FK → projects user_id FK → auth.users role lead|collaborator milestones project_id FK tasks project_id FK notes proj FK (junction table) tenant scope

Gold arrows show firm_id foreign keys that anchor every table to a specific tenant. Dashed box = junction table (many-to-many). Dashed arrows = child tables that inherit isolation from their parent project.

2. The schema — every table and column explained

Before looking at policies, it helps to understand exactly what each table and column does and why it exists.

firms

The root of the tenant hierarchy. Every other table traces back to a firm.

id  uuid, primary key
Auto-generated unique identifier for this firm. Used as the foreign key in every other tenant-scoped table.
name  text
Display name of the firm, e.g. "Studio Lekker ID".

firm_members

Links a Supabase auth user to a firm, and records their role within that firm. This is how the system knows which firm a logged-in user belongs to.

firm_id  uuid, references firms(id)
Which firm this membership belongs to. Foreign key → firms.id.
user_id  uuid, references auth.users(id)
The Supabase auth user. This is the value auth.uid() returns for a logged-in user — it's the bridge between authentication and your data.
role  text: 'owner' | 'pm' | 'designer'
What the user can do within the firm. Owners and PMs see all projects. Designers only see projects they're explicitly assigned to.
whatsapp_phone  text, nullable, E.164 format
The designer's WhatsApp number, used by the webhook to identify which firm an inbound message belongs to. Nullable because it was added in migration 014 — existing members didn't have it yet.
📘 What is auth.uid()?

Supabase Auth manages user sign-ins (Google, GitHub, magic link). When a user authenticates, Supabase issues a JWT (JSON Web Token) that travels with every request.

auth.uid() is a built-in PostgreSQL function (provided by Supabase) that reads the user's ID out of that JWT. Inside an RLS policy, it returns the UUID of whoever is making the current database request — or null if no one is logged in. It's the single source of truth for "who is asking."

projects

One row per renovation project. Directly scoped to a firm via firm_id.

id  uuid, primary key
Unique identifier for this project. Referenced by all child tables.
firm_id  uuid, not null, references firms(id)
The tenant scope column. Every RLS policy uses this to ensure a user can only see projects belonging to their firm. This is the single most important column in the whole schema.
client_phone  text, E.164 format e.g. "+6591234567"
The client's WhatsApp number. Inbound messages from this number are routed to this project as "client" messages.
designer_phone  text, nullable, E.164 format
The designer's WhatsApp number as stored on the project (legacy field). Superseded by firm_members.whatsapp_phone in migration 014, but kept as a fallback for existing records.
name, type, status, contract_value, start_date  various
Core project data. Not involved in RLS — purely business fields.

project_members

Added in migration 015. A junction table recording which users are assigned to which projects. This is what enables per-project visibility: a designer only sees a project if there's a row here linking them to it.

project_id  uuid, references projects(id) on delete cascade
Which project this assignment is for. on delete cascade means if the project is deleted, this row is automatically deleted too — no orphaned records.
user_id  uuid, references auth.users(id) on delete cascade
Which user is assigned. Same cascade behaviour — if a user is deleted from auth, their project memberships disappear too.
role  text: 'lead' | 'collaborator'
'lead' is the primary designer assigned to this project (usually the creator). 'collaborator' has the same visibility but secondary responsibility. Both roles grant access to the project.
added_by  uuid, nullable, references auth.users(id)
Who created this membership. Null when inserted by a system backfill or trigger (no human actor). Used for audit trails.
added_at  timestamptz, default now()
When this membership was created. Audit trail.
primary key (project_id, user_id)
A composite primary key — one row per (project, user) pair. Prevents duplicate assignments and is what ON CONFLICT (project_id, user_id) DO NOTHING relies on.
📘 What is a junction table?

A junction table (also called a join table or bridge table) represents a many-to-many relationship. A project can have many members. A user can be a member of many projects. You can't store this in either the projects or users table directly — so you create a third table where each row represents one connection.

project_members is exactly this: each row says "user X is a member of project Y with role Z."

milestones, tasks, notes

Child tables of projects. Each carries a project_id foreign key. Their RLS policies don't check firm_id directly — instead they check whether the parent project is visible, inheriting the firm isolation automatically.

milestones.status  'Upcoming' | 'In Progress' | 'Pending Approval' | 'Complete'
The workflow state of this milestone. Advances linearly. 'Pending Approval' is when the client needs to sign off before the designer can mark it Complete.
milestones.amount_sgd  numeric
The payment amount tied to this milestone (percentage of contract value). Used to calculate outstanding balance.
milestones.target_date  date, nullable
Expected completion date. Nullable — not all milestones have a fixed date when created.

3. How RLS policies actually work

A policy has two parts: USING and WITH CHECK. They do different things and it's easy to confuse them.

📘 USING vs WITH CHECK
USING (expression) — applied to reads: SELECT, UPDATE (to decide which rows can be targeted), DELETE. Rows where the expression is false are invisible — they're filtered out silently, as if they don't exist.
WITH CHECK (expression) — applied to writes: INSERT and UPDATE (the new values). If the expression is false for the row being written, PostgreSQL throws an error rather than silently discarding the write.
FOR ALL with both clauses means the same expression governs both reads and writes. FOR SELECT only applies the USING clause and ignores writes entirely.

With that in mind, the original firm-scoped policy for projects was:

create policy "firm members can access their firm projects"
  on projects for all
  using  (firm_id = get_my_firm_id())   -- reads: only show rows from my firm
  with check (firm_id = get_my_firm_id()); -- writes: only allow writing to my firm

4. The SECURITY DEFINER helper pattern

Inside a policy expression you can write any SQL — including subqueries. The obvious way to find the current user's firm would be:

-- Naive approach — works, but fragile
using (firm_id = (
  select firm_id from firm_members where user_id = auth.uid() limit 1
))

This runs the subquery with RLS active on firm_members too. That's fine as long as firm_members's own RLS policy doesn't reference projects. But it's a trap waiting to be walked into. The safe approach is a SECURITY DEFINER helper function that performs this lookup outside the RLS context entirely.

📘 What does SECURITY DEFINER mean?

By default, a function runs with the privileges of the calling user (this is called SECURITY INVOKER). If the user can only see their own rows, the function only sees those rows too.

SECURITY DEFINER flips this: the function runs with the privileges of the function owner — typically postgres, the superuser. The superuser is exempt from RLS. So a SECURITY DEFINER function sees all rows in all tables, regardless of who called it.

This sounds dangerous, and it would be if the function returned raw data. The safe pattern is to use SECURITY DEFINER only for small helper functions that return a single fact (a UUID, a boolean, a set of IDs) — never functions that return full rows.

create or replace function get_my_firm_id()
returns uuid
language sql
security definer   -- runs as postgres; bypasses all RLS
stable             -- (see note below)
set search_path = public  -- (see note below)
as $$
  select firm_id
  from   firm_members
  where  user_id = auth.uid()
  limit 1;
$$;

-- Revoke from public first, then grant only to authenticated users
revoke all on function get_my_firm_id() from public;
grant execute on function get_my_firm_id() to authenticated;
📘 Why STABLE and set search_path = public?

STABLE tells PostgreSQL that this function returns the same result for the same inputs within a single transaction. Since auth.uid() doesn't change mid-query, this is true. PostgreSQL uses this hint to cache the result rather than re-running the function for every row it evaluates — which matters a lot when RLS evaluates your policy on thousands of rows in a loop.

set search_path = public is a security hardening measure. Without it, a malicious database user could create a schema that shadows the public schema (e.g. a fake firm_members table in a different schema), then exploit the SECURITY DEFINER escalation to run their code as postgres. Setting the search path explicitly prevents this class of injection attack. Always include it on any SECURITY DEFINER function.

With this helper in place, the projects policy is clean and loop-free:

create policy "firm members can access their firm projects"
  on projects for all
  using  (firm_id = get_my_firm_id())
  with check (firm_id = get_my_firm_id());

This worked cleanly for six migrations. Then migration 015 introduced project_members, and every project in the system silently disappeared overnight.

5. Migration 015: adding project-level visibility

Until migration 015, every firm member could see every project in their firm. The goal was finer control: owners and PMs keep blanket visibility; designers can only see projects they're explicitly assigned to.

We created the project_members junction table and rewrote the projects SELECT policy to check it. Here's the policy in full:

create policy "firm members select projects"
  on projects for select
  using (
    firm_id = get_my_firm_id()          -- must be in my firm
    AND (
      -- Owners and PMs: see all projects in the firm
      exists (
        select 1 from firm_members fm
        where fm.user_id = auth.uid()
          and fm.firm_id = projects.firm_id
          and fm.role in ('owner', 'pm')
      )
      OR
      -- Designers: only see projects they're a member of
      exists (
        select 1 from project_members pm
        where pm.project_id = projects.id
          and pm.user_id = auth.uid()
      )
    )
  );
📘 Why exists (select 1 ...) and not select count(*)?

exists (select 1 from ...) is an existence check — it returns true as soon as it finds one matching row, then stops. It doesn't count, sort, or aggregate anything.

select count(*) > 0 scans all matching rows before returning. Inside a policy that evaluates per-row, that extra work adds up. EXISTS with select 1 (or any constant) is the canonical fast pattern for "does at least one row like this exist?"

We also gave project_members its own RLS policy, so designers can't read the membership table for projects they're not part of:

create policy "firm members read project_members"
  on project_members for select
  using (
    project_id in (
      select id from projects where firm_id = get_my_firm_id()
    )
  );

Both policies look correct in isolation. The problem only appears when PostgreSQL tries to evaluate them together.

6. The circular recursion bug

To understand the loop, trace exactly what PostgreSQL does when an authenticated designer runs SELECT * FROM projects:

The loop PostgreSQL walks into:
① User runs: SELECT * FROM projects
↓ PostgreSQL applies the "firm members select projects" RLS policy
↓ Policy evaluates: exists (SELECT 1 FROM project_members WHERE …)
→ To read project_members, PostgreSQL must apply RLS on project_members too
↓ "firm members read project_members" policy evaluates:
project_id IN (SELECT id FROM projects WHERE firm_id = get_my_firm_id())
→ To read projects here, PostgreSQL must apply RLS on projects again
↓ "firm members select projects" policy evaluates — back to ①
✗ ERROR: infinite recursion detected in policy for relation "projects"
⚠ Key point: PostgreSQL does not catch this at migration time. The CREATE POLICY statement succeeds with no error. The recursion only surfaces at query time — in production, affecting every authenticated read. Every project became invisible simultaneously with no warning in the migration output.

Because this error fires on every authenticated SELECT on projects, it also blocked new project creation — the post-insert trigger that adds the creator as a lead member tries to read projects, hits the loop, and fails silently.

7. Migration 016: breaking the loop

The root cause: project_members's policy ran a bare subquery on projects, which re-triggered that table's RLS. The fix is a second SECURITY DEFINER helper that fetches the same information — which projects belong to the current user's firm — without going through RLS at all.

create or replace function get_my_firm_project_ids()
returns setof uuid   -- returns a set (zero or more UUIDs), not just one
language sql
security definer     -- bypasses RLS; joins projects + firm_members raw
stable
set search_path = public
as $$
  select p.id
  from   projects p
  join   firm_members fm on fm.firm_id = p.firm_id
  where  fm.user_id = auth.uid()
$$;

revoke all on function get_my_firm_project_ids() from public;
grant execute on function get_my_firm_project_ids() to authenticated;
📘 returns uuid vs returns setof uuid

returns uuid (used in get_my_firm_id()) returns exactly one value. If the query returns zero rows, you get NULL.

returns setof uuid returns a result set — zero or more rows, each containing one UUID. This is what we need here: a user might belong to a firm with many projects. The caller uses it as project_id in (select get_my_firm_project_ids()), which handles the set correctly.

The project_members policies are then rewritten to use this helper instead of the raw subquery:

-- Before (causes recursion):
create policy "firm members read project_members"
  on project_members for select
  using (project_id in (
    select id from projects where firm_id = get_my_firm_id()
    --              ^^^^^^^^ this triggers projects RLS → loop
  ));

-- After (loop-free):
create policy "firm members read project_members"
  on project_members for select
  using (project_id in (select get_my_firm_project_ids()));
  --                           ^^^^^^^^^^^^^^^^^^^^^^^^
  --     SECURITY DEFINER function — reads projects without triggering RLS
The same query — no loop this time:
① User runs: SELECT * FROM projects
↓ "firm members select projects" policy evaluates
↓ checks: exists (SELECT 1 FROM project_members WHERE …)
↓ "firm members read project_members" policy evaluates
↓ calls get_my_firm_project_ids() — SECURITY DEFINER
↓ function runs as postgres — reads projects + firm_members with no RLS
↓ returns a set of UUIDs — exits cleanly, no recursion
✓ policy evaluation completes — correct rows returned

8. The backfill — and what idempotent means

Migration 016 also re-ran the membership backfill: inserting a project_members row for every (project, firm member) pair so that existing projects remained visible after the new visibility rules kicked in.

insert into project_members (project_id, user_id, role, added_by, added_at)
select p.id, fm.user_id, 'collaborator', null, now()
  from projects p
  join firm_members fm on fm.firm_id = p.firm_id
  on conflict (project_id, user_id) do nothing;
📘 What does idempotent mean?

An operation is idempotent if running it multiple times produces the same result as running it once. ON CONFLICT (project_id, user_id) DO NOTHING makes this insert idempotent: if the row already exists (same project + user pair), PostgreSQL skips it silently instead of throwing a duplicate key error.

This is important for migrations: if the migration is re-run, or if migration 015's backfill already ran, you don't get a crash. Backfills in production should almost always be written this way.

9. Learnings

  • Never let two tables' RLS policies reference each other through regular subqueries. If Table A's policy checks Table B, and Table B's policy checks Table A, PostgreSQL recurses until it errors — at query time, not migration time.
  • SECURITY DEFINER helpers are the correct escape valve. They should return a minimal safe result (a UUID, a set of IDs) that a policy then uses. Never a function that returns full rows. They're a gatekeeper — not a shortcut around security.
  • Always add three flags to every SECURITY DEFINER function: STABLE (cache within transaction), set search_path = public (prevent schema injection), and explicit GRANT/REVOKE to the right roles only.
  • RLS is not validated at migration time. CREATE POLICY succeeds even if your policy would cause infinite recursion. The error only appears at query time. Add a post-migration smoke test: run SELECT COUNT(*) FROM projects as an authenticated user before calling a migration complete.
  • Write backfills idempotently. ON CONFLICT … DO NOTHING costs nothing and prevents a re-run from crashing. Every backfill in a production migration should include it.

All migrations referenced here are in the supabase/migrations/ directory. Migrations 005 (baseline RLS), 015 (project_members + the bug), and 016 (the fix) are the three most instructive to read in sequence.

Comments

Sign in to leave a comment.