Redis Connection
Exhaustion — Root Cause & Fix

P1 — Service Degraded Category: Redis / Connections Error: ERR max number of clients reached

🚨 Redis refusing connections right now?

Paste your Redis error logs — instant root cause, free

01The error

ERR max number of clients reached
REDIS_CONNECTION_ERROR: connect ECONNREFUSED 127.0.0.1:6379
ReplyError: ERR max number of clients reached

# In application logs:
Error: Redis connection pool exhausted — timeout waiting for connection
RedisError: Connection pool timeout after 5000ms

Redis has a maxclients setting (default: 10,000) limiting simultaneous connections. When hit, all new connection attempts fail immediately. Unlike PostgreSQL, Redis connections are extremely lightweight — hitting this limit usually signals a connection leak.

02Symptoms

🔴

ERR max number of clients reached

Redis is refusing all new connections. Cache reads/writes failing across all services.

🔴

Application cache errors spiking

Session lookups, rate limiting, feature flags — anything backed by Redis starts failing.

🟡

connected_clients climbing continuously

Redis INFO shows connected_clients growing without dropping back down — connection leak pattern.

🟡

Memory not proportionally high

If connections are high but memory is normal — it's a connection leak, not a data issue.

03Immediate diagnosis

Step 1 — Check current connection count

# Connect to Redis and check info
redis-cli -h <host> -p <port> INFO clients

# Output you'll see:
connected_clients:9987       # Current connections
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
maxclients:10000            # Your limit

# Check the limit
redis-cli CONFIG GET maxclients

Step 2 — See who is connected

# List all connected clients
redis-cli CLIENT LIST

# Output shows: addr, fd, name, age, idle time, cmd
# Look for: high idle time = connection leak
# addr=10.0.1.5:54321 ... idle=3847 ... cmd=NULL
# ^^ connected 3847 seconds idle = leaked connection

# Count connections per source IP
redis-cli CLIENT LIST | awk '{print $2}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20

Step 3 — Check connection rate

redis-cli INFO stats | grep -E "total_connections_received|rejected_connections"

# total_connections_received: running total since start
# rejected_connections: connections refused due to maxclients
# If rejected_connections > 0 — you're actively hitting the limit

04Root causes — ranked by frequency

1. Connection leak — not returning to pool (~55%)

The most common cause. Application code opens Redis connections but error paths (exceptions, timeouts, crashes) don't properly return them to the pool. Connections accumulate as idle/leaked.

# Bad pattern (connection leak)
async function getUser(id) {
  const client = await pool.getConnection();  // gets connection
  try {
    return await client.get(`user:${id}`);
  } catch (err) {
    throw err;  // connection never returned on error!
  }
}

# Good pattern (connection always returned)
async function getUser(id) {
  const client = await pool.getConnection();
  try {
    return await client.get(`user:${id}`);
  } finally {
    client.release();  // always release, even on error
  }
}

2. New deployment created too many pool instances (~20%)

Deploy scaled up pods from 5 → 20. Each pod has a connection pool of 50 = 1000 connections. Previously 5 × 50 = 250 connections (fine). Now 20 × 50 = 1000 (hitting limit).

# Check connection pool size per service
# In Node.js (ioredis):
const redis = new Redis.Cluster(nodes, {
  scaleReads: 'slave',
  maxRetriesPerRequest: 3,
  connectionPoolSize: 10  # Lower this if scaling out
});

# In Python (redis-py):
pool = redis.ConnectionPool(
  host='localhost',
  port=6379,
  max_connections=10  # Lower per-process pool size
)

3. maxclients set too low (~15%)

Default is 10,000 — usually fine. But some managed Redis providers (ElastiCache small instances, Redis Cloud) set lower limits. Check your provider's actual limit.

# Increase maxclients (no restart needed)
redis-cli CONFIG SET maxclients 20000

# Make persistent in redis.conf
maxclients 20000

# AWS ElastiCache: modify via parameter group
# Redis Cloud: contact support or upgrade plan

4. SUBSCRIBE/PSUBSCRIBE connections not cleaned up (~10%)

Pub/sub subscriber connections are long-lived. If subscriber clients crash without unsubscribing, Redis keeps the connection open indefinitely.

# See subscription connections
redis-cli CLIENT LIST | grep "sub=" | awk '{print $2, $NF}'

# Kill stale subscription connections
redis-cli CLIENT KILL ID <client-id>

# Or kill all idle connections older than 300 seconds
redis-cli CLIENT NO-EVICT on
redis-cli CLIENT NO-TOUCH on

05OperatorMesh analysis — real example

⚡ OperatorMesh Triage Output

service: session-service
error: ERR max number of clients reached — cache reads failing
logs: connected_clients: 9987/10000
      CLIENT LIST shows 8400 connections idle > 600s from app-server-pods
      recent changes: deployed v2.1 — scaled from 8 to 32 pods

Root cause:

Connection leak compounded by pod scale-out. Each of 32 pods maintains 250 connections (vs 8 pods × 250 = 2000 before). 8400 idle connections indicate error paths not returning connections to pool.

89%

Diagnosis confidence

82%

Fix confidence

1.Kill idle connections older than 5 minutes: CLIENT KILL SKIPME yes IDLE 300 MAXAGE 300
2.Reduce pool size per pod from 250 to 20 in session-service config — 32 × 20 = 640 total
3.Audit error handling in session-service — add finally{} blocks to guarantee pool release

06Emergency fix — restore service now

# 1. Kill all idle connections immediately
redis-cli CLIENT KILL SKIPME yes IDLE 60

# 2. Or kill connections from specific IP (if one service is leaking)
redis-cli CLIENT KILL ADDR <app-server-ip>:0

# 3. Temporarily increase maxclients to buy time
redis-cli CONFIG SET maxclients 20000

# 4. Monitor connected_clients dropping
watch -n 2 "redis-cli INFO clients | grep connected_clients"

# 5. Once connections drop — identify the leaking service
redis-cli CLIENT LIST | awk '{print $2}' | cut -d: -f1 | \
  sort | uniq -c | sort -rn | head -5

07Prevention

Always use try/finally — guarantee connection pool release even on errors
Set connection timeout — timeout 300 in redis.conf kills idle connections automatically
Alert on connected_clients > 80% — catch leaks before they cause outages
Size pools per pod — when scaling out, reduce pool size per instance proportionally
Use CLIENT SETNAME — name your connections so CLIENT LIST shows which service owns them

🚨 Hitting Redis connection issues right now?

Paste your Redis INFO output or application error logs for instant root cause. Free — no signup needed.

⚡ Analyze my Redis incident →

Redis ConnectionExhaustion — Root Cause & Fix