day view on mobile bug fix
This commit is contained in:
525
CLAUDE.md
525
CLAUDE.md
@@ -4,7 +4,7 @@
|
||||
|
||||
**RosterChirp** is a self-hosted, closed-source, full-stack Progressive Web App for team messaging. It supports both single-tenant (selfhost) and multi-tenant (host) deployments.
|
||||
|
||||
**Current version:** 0.11.26
|
||||
**Current version:** 0.12.27
|
||||
|
||||
---
|
||||
|
||||
@@ -41,7 +41,7 @@ rosterchirp/
|
||||
│ │ └── auth.js ← JWT auth, teamManagerMiddleware
|
||||
│ ├── models/
|
||||
│ │ ├── db.js ← Postgres pool, query helpers, migrations, seeding
|
||||
│ │ └── migrations/ ← 001–006 SQL files, auto-applied on startup
|
||||
│ │ └── migrations/ ← 001–008 SQL files, auto-applied on startup
|
||||
│ ├── routes/
|
||||
│ │ ├── auth.js
|
||||
│ │ ├── groups.js ← receives io
|
||||
@@ -106,7 +106,7 @@ rosterchirp/
|
||||
|
||||
## Version Bump — Files to Update
|
||||
|
||||
When bumping the version (e.g. 0.11.26 → 0.11.27), update **all three**:
|
||||
When bumping the version (e.g. 0.12.27 → 0.12.28), update **all three**:
|
||||
|
||||
```
|
||||
backend/package.json "version": "X.Y.Z"
|
||||
@@ -116,7 +116,7 @@ build.sh VERSION="${1:-X.Y.Z}"
|
||||
|
||||
One-liner:
|
||||
```bash
|
||||
OLD=0.11.26; NEW=0.11.27
|
||||
OLD=0.12.27; NEW=0.12.28
|
||||
sed -i "s/\"version\": \"$OLD\"/\"version\": \"$NEW\"/" backend/package.json frontend/package.json
|
||||
sed -i "s/VERSION=\"\${1:-$OLD}\"/VERSION=\"\${1:-$NEW}\"/" build.sh
|
||||
```
|
||||
@@ -184,6 +184,8 @@ const onlineUsers = new Map(); // `${schema}:${userId}` → Set<socketId>
|
||||
|
||||
**Critical:** The map key is `${schema}:${userId}` — not bare `userId`. Integer IDs are per-schema, so two tenants can have the same user ID. Without the schema prefix, push notifications and online presence would leak across tenants.
|
||||
|
||||
**Scale note:** This in-process Map is a single-server construct. See Phase 2 (Redis) for the multi-instance replacement.
|
||||
|
||||
---
|
||||
|
||||
## Active Sessions
|
||||
@@ -395,6 +397,493 @@ Use `/debug` to confirm tokens are registered. Use `/test` to verify end-to-end
|
||||
|
||||
---
|
||||
|
||||
## Scale Architecture
|
||||
|
||||
### Context
|
||||
|
||||
RosterChirp-Host is expected to grow to 100,000+ tenants with some tenants having 300+ users — potentially millions of concurrent users total. The current single-process, single-database architecture has well-understood ceilings. This section documents what those ceilings are, what needs to change, and exactly how to implement each phase.
|
||||
|
||||
### How Messages Are Currently Loaded (No Problem Here)
|
||||
|
||||
Messages are **not** pre-loaded into server memory. The backend uses cursor-based pagination:
|
||||
- On conversation open: fetches the most recent **50 messages** via `ORDER BY created_at DESC LIMIT 50`
|
||||
- "Load older messages" button: fetches the next 50 using `before={oldest_message_id}` as a cursor
|
||||
- Each fetch is a fast indexed Postgres query; the Node process returns results and discards them immediately
|
||||
|
||||
The `messages` array grows in the **browser tab** as users scroll back (each "load more" prepends 50 items to React state). At extreme history depth this affects browser memory and scroll performance — a virtual scroll window would fix it — but this is a client-side concern, not a server concern.
|
||||
|
||||
### Current Architecture Ceilings
|
||||
|
||||
| Resource | Current Config | Approximate Ceiling |
|
||||
|---|---|---|
|
||||
| Node.js processes | 1 | ~10,000–30,000 concurrent sockets |
|
||||
| Postgres connections | Pool max 20 | Saturates under concurrent load |
|
||||
| `onlineUsers` Map | In-process JavaScript Map | Lost on restart; not shared across instances |
|
||||
| `tenantDomainCache` | In-process JavaScript Map | Stale on other instances after update |
|
||||
| File storage | `/app/uploads` (container volume) | Not accessible across multiple instances |
|
||||
|
||||
### Scale Targets by Phase
|
||||
|
||||
| Phase | Concurrent Users | Architecture |
|
||||
|---|---|---|
|
||||
| Current | ~5,000–10,000 | Single Node, single Postgres |
|
||||
| Phase 1 (PgBouncer) | ~20,000–40,000 | + connection pooler, no code changes |
|
||||
| Phase 2 (Redis) | ~200,000–500,000 | + Redis, multiple Node instances |
|
||||
| Phase 3 (Read replicas) | ~500,000–1,000,000 | + Postgres streaming replication |
|
||||
| Phase 4 (Sharding) | 1,000,000+ | Multiple Postgres clusters, regional deploy |
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — PgBouncer (Implement Now)
|
||||
|
||||
### What It Does
|
||||
|
||||
PgBouncer sits between the Node app and Postgres as a connection pooler. Instead of Node holding up to 20 long-lived Postgres connections, PgBouncer maintains a pool of e.g. 100 server-side Postgres connections and multiplexes thousands of short application requests onto them. Postgres itself stays healthy; query throughput increases significantly under concurrent load.
|
||||
|
||||
**This requires zero code changes.** It is purely an infrastructure addition.
|
||||
|
||||
### Why It Matters Now
|
||||
|
||||
The current pool `max: 20` means at most 20 queries run simultaneously across all tenants. Under load (many tenants posting messages simultaneously) requests queue up waiting for a free connection. PgBouncer resolves this without touching a line of application code.
|
||||
|
||||
### Implementation
|
||||
|
||||
**Step 1: Add PgBouncer service to `docker-compose.host.yaml`**
|
||||
|
||||
```yaml
|
||||
pgbouncer:
|
||||
image: edoburu/pgbouncer:latest
|
||||
container_name: ${PROJECT_NAME:-rosterchirp}_pgbouncer
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- DATABASE_URL=postgres://${DB_USER:-rosterchirp}:${DB_PASSWORD}@db:5432/${DB_NAME:-rosterchirp}
|
||||
- POOL_MODE=transaction
|
||||
- MAX_CLIENT_CONN=1000
|
||||
- DEFAULT_POOL_SIZE=100
|
||||
- MIN_POOL_SIZE=10
|
||||
- RESERVE_POOL_SIZE=20
|
||||
- RESERVE_POOL_TIMEOUT=5
|
||||
- SERVER_IDLE_TIMEOUT=600
|
||||
- LOG_CONNECTIONS=0
|
||||
- LOG_DISCONNECTIONS=0
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -h localhost -p 5432 -U ${DB_USER:-rosterchirp}"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
```
|
||||
|
||||
**Step 2: Point the Node app at PgBouncer instead of Postgres directly**
|
||||
|
||||
In `docker-compose.host.yaml`, change the `jama` service environment:
|
||||
```yaml
|
||||
- DB_HOST=pgbouncer # was: db
|
||||
- DB_PORT=5432
|
||||
```
|
||||
|
||||
The `jama` service `depends_on` should add `pgbouncer`.
|
||||
|
||||
**Step 3: Tune Postgres `max_connections`**
|
||||
|
||||
Add to the `db` service in `docker-compose.host.yaml`:
|
||||
```yaml
|
||||
command: >
|
||||
postgres
|
||||
-c max_connections=200
|
||||
-c shared_buffers=256MB
|
||||
-c effective_cache_size=768MB
|
||||
-c work_mem=4MB
|
||||
-c maintenance_work_mem=64MB
|
||||
-c checkpoint_completion_target=0.9
|
||||
-c wal_buffers=16MB
|
||||
-c random_page_cost=1.1
|
||||
```
|
||||
|
||||
**Step 4: Increase the Node pool size**
|
||||
|
||||
In `backend/src/models/db.js`, increase `max` since PgBouncer multiplexes efficiently:
|
||||
```js
|
||||
const pool = new Pool({
|
||||
host: process.env.DB_HOST || 'db',
|
||||
port: parseInt(process.env.DB_PORT || '5432'),
|
||||
database: process.env.DB_NAME || 'rosterchirp',
|
||||
user: process.env.DB_USER || 'rosterchirp',
|
||||
password: process.env.DB_PASSWORD || '',
|
||||
max: 100, // was 20 — PgBouncer handles the actual Postgres pool
|
||||
idleTimeoutMillis: 10000, // was 30000 — release faster, PgBouncer manages persistence
|
||||
connectionTimeoutMillis: 5000,
|
||||
});
|
||||
```
|
||||
|
||||
**Important caveat — transaction mode:** PgBouncer in `POOL_MODE=transaction` releases the server connection after each transaction completes. This means `SET search_path` (which `db.js` runs before every query) is safe only because each `query()` call acquires, uses, and releases its own connection. Do **not** use session-level state or `LISTEN/NOTIFY` through PgBouncer — it won't work in transaction mode.
|
||||
|
||||
**Step 5: Add `PGBOUNCER_` vars to `.env.example`**
|
||||
```
|
||||
PGBOUNCER_MAX_CLIENT_CONN=1000
|
||||
PGBOUNCER_DEFAULT_POOL_SIZE=100
|
||||
```
|
||||
|
||||
**Step 6: Verify**
|
||||
|
||||
After deploying:
|
||||
```bash
|
||||
# Connect to PgBouncer admin console
|
||||
docker compose exec pgbouncer psql -h localhost -p 6432 -U pgbouncer pgbouncer
|
||||
SHOW POOLS; -- shows active/idle/waiting connections
|
||||
SHOW STATS; -- shows requests/sec
|
||||
```
|
||||
|
||||
### Expected Outcome
|
||||
|
||||
With PgBouncer in place, the database connection bottleneck is effectively eliminated for the near term. 1,000 simultaneous tenant requests will queue through PgBouncer's pool of 100 server connections rather than waiting for Node's pool of 20 application-level connections. Throughput roughly 5× at moderate load.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Redis (Horizontal Scaling)
|
||||
|
||||
### What It Does
|
||||
|
||||
Redis enables multiple Node.js instances to share state that currently lives in each process's memory:
|
||||
|
||||
1. **Socket.io Redis Adapter** — allows `io.to(room).emit()` to reach sockets on any instance
|
||||
2. **Shared `onlineUsers`** — replaces the in-process Map with a Redis `SADD`/`SREM`/`SMEMBERS` structure
|
||||
3. **Shared `tenantDomainCache`** — replaces the in-process Map with a Redis hash with TTL
|
||||
|
||||
Without Redis, running two Node instances would mean:
|
||||
- A message emitted on Instance A can't reach a user connected to Instance B
|
||||
- User A on Instance 1 shows as offline to User B on Instance 2
|
||||
- A custom domain update on Instance 1 isn't reflected on Instance 2
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Phase 1 (PgBouncer) should be deployed and stable first. Phase 2 is a significant code change — plan for a maintenance window.
|
||||
|
||||
### npm Packages Required
|
||||
|
||||
```bash
|
||||
npm install @socket.io/redis-adapter ioredis
|
||||
```
|
||||
|
||||
Add to `backend/package.json` dependencies.
|
||||
|
||||
### Step 1: Add Redis to docker-compose.host.yaml
|
||||
|
||||
```yaml
|
||||
redis:
|
||||
image: redis:7-alpine
|
||||
container_name: ${PROJECT_NAME:-rosterchirp}_redis
|
||||
restart: unless-stopped
|
||||
command: >
|
||||
redis-server
|
||||
--maxmemory 512mb
|
||||
--maxmemory-policy allkeys-lru
|
||||
--save ""
|
||||
--appendonly no
|
||||
volumes:
|
||||
- rosterchirp_redis:/data
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
volumes:
|
||||
rosterchirp_redis:
|
||||
driver: local
|
||||
```
|
||||
|
||||
Add `REDIS_URL=redis://redis:6379` to the `jama` service environment and to `.env.example`.
|
||||
|
||||
### Step 2: Socket.io Redis Adapter (index.js)
|
||||
|
||||
Replace the current `new Server(server, ...)` block:
|
||||
|
||||
```js
|
||||
const { createAdapter } = require('@socket.io/redis-adapter');
|
||||
const { createClient } = require('ioredis');
|
||||
|
||||
const REDIS_URL = process.env.REDIS_URL || 'redis://localhost:6379';
|
||||
|
||||
// Two Redis clients required by the adapter (pub + sub)
|
||||
const pubClient = createClient(REDIS_URL);
|
||||
const subClient = pubClient.duplicate();
|
||||
|
||||
await Promise.all([pubClient.connect(), subClient.connect()]);
|
||||
io.adapter(createAdapter(pubClient, subClient));
|
||||
console.log('[Server] Socket.io Redis adapter connected');
|
||||
```
|
||||
|
||||
This must be done **before** `io.on('connection', ...)` registers. With this in place, `io.to(room).emit(...)` fans out via Redis pub/sub to every Node instance — no other route code changes required.
|
||||
|
||||
### Step 3: Replace onlineUsers Map with Redis (index.js)
|
||||
|
||||
Current in-process Map:
|
||||
```js
|
||||
const onlineUsers = new Map(); // `${schema}:${userId}` → Set<socketId>
|
||||
```
|
||||
|
||||
Replace with Redis operations. Create a dedicated Redis client for presence (separate from the adapter clients):
|
||||
|
||||
```js
|
||||
const presenceClient = createClient(REDIS_URL);
|
||||
await presenceClient.connect();
|
||||
|
||||
// Key structure: presence:{schema}:{userId} → Set of socketIds
|
||||
// TTL of 24h prevents stale keys if a server crashes without cleanup
|
||||
const PRESENCE_TTL = 86400; // seconds
|
||||
|
||||
async function addPresence(schema, userId, socketId) {
|
||||
const key = `presence:${schema}:${userId}`;
|
||||
await presenceClient.sAdd(key, socketId);
|
||||
await presenceClient.expire(key, PRESENCE_TTL);
|
||||
}
|
||||
|
||||
async function removePresence(schema, userId, socketId) {
|
||||
const key = `presence:${schema}:${userId}`;
|
||||
await presenceClient.sRem(key, socketId);
|
||||
// Return remaining count — 0 means user is now offline
|
||||
return presenceClient.sCard(key);
|
||||
}
|
||||
|
||||
async function isOnline(schema, userId) {
|
||||
const key = `presence:${schema}:${userId}`;
|
||||
return (await presenceClient.sCard(key)) > 0;
|
||||
}
|
||||
|
||||
async function getOnlineUserIds(schema) {
|
||||
// Scan keys matching presence:{schema}:* and return user IDs of non-empty sets
|
||||
const pattern = `presence:${schema}:*`;
|
||||
const keys = await presenceClient.keys(pattern);
|
||||
const online = [];
|
||||
for (const key of keys) {
|
||||
if ((await presenceClient.sCard(key)) > 0) {
|
||||
online.push(parseInt(key.split(':')[2]));
|
||||
}
|
||||
}
|
||||
return online;
|
||||
}
|
||||
```
|
||||
|
||||
Then replace all `onlineUsers.has/get/set/delete` calls in the `io.on('connection')` handler with the async Redis equivalents. This requires making the connection handler and its sub-handlers `async` where they aren't already.
|
||||
|
||||
**Disconnect handler becomes:**
|
||||
```js
|
||||
socket.on('disconnect', async () => {
|
||||
const remaining = await removePresence(schema, userId, socket.id);
|
||||
if (remaining === 0) {
|
||||
exec(schema, 'UPDATE users SET last_online=NOW() WHERE id=$1', [userId]).catch(() => {});
|
||||
io.to(R('schema', 'all')).emit('user:offline', { userId });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**users:online handler becomes:**
|
||||
```js
|
||||
socket.on('users:online', async () => {
|
||||
const userIds = await getOnlineUserIds(schema);
|
||||
socket.emit('users:online', { userIds });
|
||||
});
|
||||
```
|
||||
|
||||
### Step 4: Replace tenantDomainCache with Redis (db.js)
|
||||
|
||||
Current in-process Map:
|
||||
```js
|
||||
const tenantDomainCache = new Map();
|
||||
```
|
||||
|
||||
Replace with a Redis hash with TTL:
|
||||
|
||||
```js
|
||||
let redisClient = null; // set externally after Redis connects
|
||||
|
||||
function setRedisClient(client) { redisClient = client; }
|
||||
|
||||
async function resolveSchema(req) {
|
||||
// ... existing logic up to custom domain lookup ...
|
||||
|
||||
// Custom domain lookup — Redis first, fallback to DB
|
||||
if (redisClient) {
|
||||
const cached = await redisClient.hGet('tenantDomainCache', host);
|
||||
if (cached) return cached;
|
||||
}
|
||||
// DB fallback
|
||||
const tenant = await queryOne('public',
|
||||
'SELECT schema_name FROM tenants WHERE custom_domain=$1 AND status=$2',
|
||||
[host, 'active']
|
||||
);
|
||||
if (tenant) {
|
||||
if (redisClient) await redisClient.hSet('tenantDomainCache', host, tenant.schema_name);
|
||||
return tenant.schema_name;
|
||||
}
|
||||
throw new Error(`Unknown tenant for host: ${host}`);
|
||||
}
|
||||
|
||||
async function refreshTenantCache(tenants) {
|
||||
if (!redisClient) return;
|
||||
// Rebuild the entire hash atomically
|
||||
await redisClient.del('tenantDomainCache');
|
||||
for (const t of tenants) {
|
||||
if (t.custom_domain && t.schema_name) {
|
||||
await redisClient.hSet('tenantDomainCache', t.custom_domain.toLowerCase(), t.schema_name);
|
||||
}
|
||||
}
|
||||
await redisClient.expire('tenantDomainCache', 3600); // 1h TTL as safety net
|
||||
}
|
||||
```
|
||||
|
||||
Export `setRedisClient` and call it from `index.js` after Redis connects, before `initDb()`.
|
||||
|
||||
When a custom domain is updated via the host control panel (`host.js`), call `refreshTenantCache` to invalidate immediately.
|
||||
|
||||
### Step 5: File Storage — Move to Object Storage
|
||||
|
||||
With multiple Node instances, each container has its own `/app/uploads` volume. An avatar uploaded to Instance A isn't accessible from Instance B.
|
||||
|
||||
**Recommended: Cloudflare R2** (S3-compatible, free egress, affordable storage)
|
||||
|
||||
```bash
|
||||
npm install @aws-sdk/client-s3 @aws-sdk/s3-request-presigner
|
||||
```
|
||||
|
||||
Changes to `backend/src/routes/users.js` (avatar upload) and `backend/src/routes/settings.js` (logo/icon upload):
|
||||
|
||||
```js
|
||||
const { S3Client, PutObjectCommand, DeleteObjectCommand } = require('@aws-sdk/client-s3');
|
||||
|
||||
const s3 = new S3Client({
|
||||
region: 'auto',
|
||||
endpoint: process.env.R2_ENDPOINT, // https://<account>.r2.cloudflarestorage.com
|
||||
credentials: {
|
||||
accessKeyId: process.env.R2_ACCESS_KEY_ID,
|
||||
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY,
|
||||
},
|
||||
});
|
||||
|
||||
async function uploadToR2(buffer, key, contentType) {
|
||||
await s3.send(new PutObjectCommand({
|
||||
Bucket: process.env.R2_BUCKET,
|
||||
Key: key,
|
||||
Body: buffer,
|
||||
ContentType: contentType,
|
||||
}));
|
||||
return `${process.env.R2_PUBLIC_URL}/${key}`; // R2 public bucket URL
|
||||
}
|
||||
```
|
||||
|
||||
All `avatarUrl` and `logoUrl` values stored in the DB become full `https://` URLs rather than `/uploads/...` paths. The frontend already renders them via `<img src={url}>` so no frontend changes are needed.
|
||||
|
||||
Add to `.env.example`:
|
||||
```
|
||||
R2_ENDPOINT=
|
||||
R2_ACCESS_KEY_ID=
|
||||
R2_SECRET_ACCESS_KEY=
|
||||
R2_BUCKET=
|
||||
R2_PUBLIC_URL= # e.g. https://assets.yourdomain.com
|
||||
```
|
||||
|
||||
### Step 6: Load Balancing Multiple Node Instances
|
||||
|
||||
With Redis adapter in place, run multiple Node containers behind Caddy:
|
||||
|
||||
In `docker-compose.host.yaml`, add additional app instances:
|
||||
```yaml
|
||||
rosterchirp_1:
|
||||
image: rosterchirp:${ROSTERCHIRP_VERSION:-latest}
|
||||
<<: *rosterchirp-base # use YAML anchors for shared config
|
||||
container_name: rosterchirp_1
|
||||
|
||||
rosterchirp_2:
|
||||
image: rosterchirp:${ROSTERCHIRP_VERSION:-latest}
|
||||
<<: *rosterchirp-base
|
||||
container_name: rosterchirp_2
|
||||
```
|
||||
|
||||
**Caddyfile update:**
|
||||
```
|
||||
{HOST_DOMAIN} {
|
||||
reverse_proxy rosterchirp_1:3000 rosterchirp_2:3000 {
|
||||
lb_policy round_robin
|
||||
health_uri /api/health
|
||||
health_interval 15s
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Critical — WebSocket sticky sessions:** Socket.io with the Redis adapter handles cross-instance messaging, but the **initial HTTP upgrade handshake** must land on the same instance as the polling fallback. Caddy's `lb_policy round_robin` handles this correctly for WebSocket connections (once upgraded, the connection stays). For the polling transport, add:
|
||||
|
||||
```
|
||||
header_up X-Real-IP {remote_host}
|
||||
header_up Cookie {http.request.header.Cookie}
|
||||
```
|
||||
|
||||
Or force WebSocket-only transport in the Socket.io client config (eliminates the polling concern entirely):
|
||||
```js
|
||||
// frontend/src/contexts/SocketContext.jsx
|
||||
const socket = io({ transports: ['websocket'] });
|
||||
```
|
||||
|
||||
### Step 7: Verify Redis Phase
|
||||
|
||||
After deploying:
|
||||
```bash
|
||||
# Check adapter is working — should see Redis keys
|
||||
docker compose exec redis redis-cli keys '*'
|
||||
|
||||
# Check presence tracking
|
||||
docker compose exec redis redis-cli keys 'presence:*'
|
||||
|
||||
# Check tenant cache
|
||||
docker compose exec redis redis-cli hgetall tenantDomainCache
|
||||
|
||||
# Monitor real-time Redis traffic during a test message send
|
||||
docker compose exec redis redis-cli monitor
|
||||
```
|
||||
|
||||
### Phase 2 Summary — Files Changed
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `backend/src/index.js` | Redis adapter, presence helpers replacing onlineUsers Map |
|
||||
| `backend/src/models/db.js` | Redis-backed tenantDomainCache, setRedisClient export |
|
||||
| `backend/src/routes/users.js` | R2 upload for avatars |
|
||||
| `backend/src/routes/settings.js` | R2 upload for logos/icons |
|
||||
| `backend/package.json` | Add `@socket.io/redis-adapter`, `ioredis`, `@aws-sdk/client-s3` |
|
||||
| `docker-compose.host.yaml` | Add Redis service, multiple app instances, Caddy lb |
|
||||
| `frontend/src/contexts/SocketContext.jsx` | Force WebSocket transport |
|
||||
| `.env.example` | Add `REDIS_URL`, `R2_*` vars |
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Read Replicas (Future)
|
||||
|
||||
When write load on Postgres becomes a bottleneck (typically >100,000 concurrent active users):
|
||||
|
||||
1. Configure Postgres streaming replication — one primary, 1–2 standbys
|
||||
2. In `db.js`, maintain two pools: `primaryPool` (writes) and `replicaPool` (reads)
|
||||
3. Route `query()` to `replicaPool`, `exec()`/`queryResult()` to `primaryPool`
|
||||
4. `withTransaction()` always uses `primaryPool`
|
||||
|
||||
This is entirely within `db.js` — no route changes needed if the abstraction is preserved.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Tenant Sharding (Future)
|
||||
|
||||
When a single Postgres cluster can't handle the write volume (millions of active tenants):
|
||||
|
||||
1. Assign each tenant to a shard (DB cluster) at provisioning time — store in the `tenants` table as `shard_id`
|
||||
2. `resolveSchema()` in `db.js` looks up the tenant's shard and returns both schema name and DB host
|
||||
3. Maintain a pool per shard rather than one global pool
|
||||
4. `host.js` provisioning logic assigns shards using a round-robin or least-loaded strategy
|
||||
|
||||
This is a significant architectural change. Do not implement until clearly needed.
|
||||
|
||||
---
|
||||
|
||||
## Outstanding / Deferred Work
|
||||
|
||||
### iOS Push Notifications
|
||||
@@ -402,7 +891,17 @@ Use `/debug` to confirm tokens are registered. Use `/test` to verify end-to-end
|
||||
|
||||
### WebSocket Reconnect on Focus
|
||||
**Status:** Deferred. Socket drops when Android PWA is backgrounded.
|
||||
**Fix:** Frontend-only — listen for `visibilitychange` in `SocketContext.jsx`, reconnect socket when `document.visibilityState === 'visible'`.
|
||||
**Fix:** Frontend-only — listen for `visibilitychange` in `SocketContext.jsx`, reconnect socket when `document.visibilityState === 'visible'`. Note: forcing WebSocket-only transport (Phase 2 Step 6) may affect reconnect behaviour — implement reconnect-on-focus at the same time as the transport change.
|
||||
|
||||
### Message History — Browser Memory
|
||||
**Status:** Future. The `messages` array in `ChatWindow` grows unbounded as a user scrolls back through history. At extreme depth (thousands of messages in one session), this affects browser scroll performance.
|
||||
**Fix:** Virtual scroll window — discard messages scrolled far out of view, re-fetch on demand. This is a non-trivial frontend refactor (react-virtual or similar). Not needed until users regularly have very long scrollback sessions.
|
||||
|
||||
### Orphaned Image Cleanup
|
||||
**Status:** Future. Deleted messages null `image_url` in DB but leave the file on disk (or in R2 after Phase 2). A background job that periodically deletes image files with no corresponding DB row would prevent unbounded storage growth.
|
||||
|
||||
### hasMore Heuristic
|
||||
**Status:** Minor. `hasMore` is set to `true` when `messages.length >= 50`. If a conversation has exactly 50 messages total, this shows a "Load older" button that returns nothing. Fix: return a `total` count from the backend GET messages route, or check `older.length < 50` to detect end of history.
|
||||
|
||||
---
|
||||
|
||||
@@ -414,7 +913,7 @@ APP_TYPE=selfhost|host
|
||||
HOST_DOMAIN= # host mode only
|
||||
HOST_ADMIN_KEY= # host mode only
|
||||
JWT_SECRET=
|
||||
DB_HOST=db
|
||||
DB_HOST=db # set to 'pgbouncer' after Phase 1
|
||||
DB_NAME=rosterchirp
|
||||
DB_USER=rosterchirp
|
||||
DB_PASSWORD= # avoid ! (shell interpolation issue with docker-compose)
|
||||
@@ -434,6 +933,18 @@ FIREBASE_MESSAGING_SENDER_ID= # FCM web app config
|
||||
FIREBASE_APP_ID= # FCM web app config
|
||||
FIREBASE_VAPID_KEY= # FCM Web Push certificate public key
|
||||
FIREBASE_SERVICE_ACCOUNT= # FCM service account JSON (stringified, backend only)
|
||||
|
||||
# Phase 1 (PgBouncer)
|
||||
PGBOUNCER_MAX_CLIENT_CONN=1000
|
||||
PGBOUNCER_DEFAULT_POOL_SIZE=100
|
||||
|
||||
# Phase 2 (Redis + R2)
|
||||
REDIS_URL=redis://redis:6379
|
||||
R2_ENDPOINT= # https://<account>.r2.cloudflarestorage.com
|
||||
R2_ACCESS_KEY_ID=
|
||||
R2_SECRET_ACCESS_KEY=
|
||||
R2_BUCKET=
|
||||
R2_PUBLIC_URL= # https://assets.yourdomain.com
|
||||
```
|
||||
|
||||
---
|
||||
@@ -454,4 +965,4 @@ Build sequence: `build.sh` → Docker build → `npm run build` (Vite) → `dock
|
||||
|
||||
## Session History
|
||||
|
||||
Development continues in Claude Code from v0.11.26 (rebranded from jama to RosterChirp).
|
||||
Development continues in Claude Code from v0.11.26 (rebranded from jama to RosterChirp). Scale architecture analysis and Phase 1/2 implementation specs added based on planned growth to 100,000+ tenants.
|
||||
|
||||
Reference in New Issue
Block a user