PostgreSQL Connection Pool
Exhaustion — Root Cause & Fix

P0–P1 — Service Down Category: Database / Connections Avg diagnosis: 40 min manual → seconds with OM
🚨 Hitting this right now? Get root cause in seconds.
Paste your PostgreSQL error logs — free, no signup
⚡ Analyze now →

01The error

FATAL: remaining connection slots are reserved for non-replication superuser connections
FATAL: sorry, too many clients already
ERROR: max_connections limit reached (100/100)

These errors mean your PostgreSQL server has hit its max_connections limit. Every new connection attempt fails until an existing connection is closed. Services start timing out, queuing, or returning 500 errors to users.

02Symptoms

🔴
All new DB connections refused
Services can't open new connections. API endpoints return 500 or 503.
🔴
p99 latency spike to 30s+
Requests queue waiting for a connection slot. Eventually timeout.
🟡
Connection pool wait queue growing
PgBouncer/application pool logs show queue depth increasing.
🟡
Sudden spike in pg_stat_activity count
More connections than normal — connection leak or traffic burst.

03Immediate diagnosis — 4 commands

Step 1 — How many connections right now?

-- Check current vs max connections
SELECT count(*) as current,
       (SELECT setting::int FROM pg_settings WHERE name='max_connections') as max_allowed,
       count(*) * 100 / (SELECT setting::int FROM pg_settings WHERE name='max_connections') as pct_used
FROM pg_stat_activity;

-- See connections by state
SELECT state, count(*) 
FROM pg_stat_activity 
GROUP BY state 
ORDER BY count DESC;

Step 2 — Who is holding connections?

-- See all active connections with details
SELECT pid, usename, application_name, client_addr, 
       state, wait_event_type, wait_event,
       now() - state_change as duration,
       LEFT(query, 80) as query_preview
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Find long-running queries (potential locks)
SELECT pid, now() - query_start as duration, query
FROM pg_stat_activity
WHERE state = 'active' 
  AND now() - query_start > interval '30 seconds'
ORDER BY duration DESC;

Step 3 — Identify idle connections (connection leak)

-- Idle connections wasting slots
SELECT count(*) as idle_count, application_name
FROM pg_stat_activity
WHERE state = 'idle'
GROUP BY application_name
ORDER BY idle_count DESC;

-- If idle count is high → connection leak or pool misconfiguration

Step 4 — Check current max_connections setting

SHOW max_connections;
SHOW shared_buffers;

-- Rule of thumb: max_connections should be
-- RAM_GB * 1000 / 10 (assuming ~10MB per connection)
-- e.g. 4GB RAM → max 400 connections safely

04Root causes — ranked by frequency

1. No connection pooler (PgBouncer) — most common (~50%)

Each application instance opens its own direct PostgreSQL connections. With 10 app pods × 10 connections each = 100 connections. Add any background workers and you hit the limit quickly.

# Fix: Add PgBouncer between app and Postgres
# PgBouncer multiplexes hundreds of app connections into
# a small pool of actual Postgres connections
# Typical config: 20 actual PG connections serves 500 app connections

2. Connection leak — app not closing connections (~25%)

Application opens connections but doesn't properly close them. Idle connections accumulate. Classic in ORMs that don't return connections to the pool on error paths.

-- Terminate all idle connections older than 10 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
  AND state_change < NOW() - INTERVAL '10 minutes'
  AND pid != pg_backend_pid();

-- Set idle connection timeout in postgresql.conf
idle_in_transaction_session_timeout = '10min'

3. max_connections too low for workload (~15%)

Default PostgreSQL max_connections is 100. Modern applications with multiple services and replicas routinely need more.

-- In postgresql.conf
max_connections = 200  # Increase as needed

-- REQUIRES PostgreSQL restart
sudo systemctl restart postgresql

-- Or in RDS/Cloud SQL: modify parameter group

4. Traffic spike without connection pool headroom (~10%)

Normal pool configuration, but a sudden traffic surge exhausted the pool. New connections queued and timed out.

05OperatorMesh analysis — real example

⚡ OperatorMesh Triage Output

Input pasted:

service: user-service
error: query timeout after 30s — p99 latency 40ms → 31000ms
logs: FATAL: remaining connection slots reserved for superuser
      pg_stat_activity shows 98/100 connections
      87 connections in 'idle' state from app-server
recent changes: added created_at filter in v5.2.0, no index added

Root cause identified:

Connection pool exhaustion — 87 idle connections leaked from application layer. No PgBouncer between app and database. New connections queued and timed out at 30s.

94%
Diagnosis confidence
88%
Fix confidence

Ranked actions:

06Emergency fix — restore service now

-- 1. Terminate idle connections immediately (safe — idle = not doing work)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
  AND pid != pg_backend_pid();

-- 2. Verify connections freed
SELECT count(*), state FROM pg_stat_activity GROUP BY state;

-- 3. Check service recovery
-- Application should reconnect and start serving requests

-- 4. Kill long-running queries blocking connections (if any)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE now() - query_start > interval '5 minutes'
  AND state = 'active';

07Permanent fix — add PgBouncer

# pgbouncer.ini — transaction pooling mode
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000    # App can open up to 1000 connections
default_pool_size = 20    # Only 20 actual PG connections used
min_pool_size = 5
reserve_pool_size = 5
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
🚨 Hitting PostgreSQL connection issues right now?
Paste your pg_stat_activity output or error logs for instant root cause analysis. Free — no signup needed.
⚡ Analyze my PostgreSQL incident →

Related incident patterns