Docker Production Config

While the development setup uses npm run dev, production deployments should use Docker for consistency, isolation, and reproducibility.

Production Dockerfile

Create a multi-stage Dockerfile for optimized builds:

# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:18-alpine AS production
WORKDIR /app

# Security: run as non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

USER appuser
EXPOSE 3000

CMD ["node", "dist/index.js"]

Docker Compose for Production

version: "3.8"

services:
  thepopebot:
    build:
      context: .
      dockerfile: Dockerfile
      target: production
    ports:
      - "3000:3000"
    environment:
      NODE_ENV: production
      LOG_LEVEL: info
      LOG_FORMAT: json
    env_file:
      - .env.production
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "0.5"
          memory: 512M
    networks:
      - app-network

  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    restart: unless-stopped
    networks:
      - app-network

networks:
  app-network:
    driver: bridge

volumes:
  redis-data:

Building and Running

# Build the production image
docker compose build

# Start in detached mode
docker compose up -d

# View logs
docker compose logs -f thepopebot

# Check health
docker compose ps

Security Hardening

Running an AI agent in production requires careful attention to security.

API Key Protection

Never expose API keys in logs, responses, or error messages:

// Middleware to strip sensitive data from responses
app.use((req, res, next) => {
  const originalJson = res.json.bind(res);
  res.json = (data: any) => {
    return originalJson(sanitizeResponse(data));
  };
  next();
});

function sanitizeResponse(data: any): any {
  const sensitive = ['apiKey', 'token', 'secret', 'password'];
  if (typeof data === 'object' && data !== null) {
    for (const key of Object.keys(data)) {
      if (sensitive.some(s => key.toLowerCase().includes(s))) {
        data[key] = '[REDACTED]';
      } else if (typeof data[key] === 'object') {
        data[key] = sanitizeResponse(data[key]);
      }
    }
  }
  return data;
}

Input Validation and Sanitization

Validate all incoming requests before they reach the agent:

const inputSchema = z.object({
  agent: z.string().max(50).regex(/^[a-zA-Z0-9-_]+$/),
  message: z.string().max(10000).trim(),
  metadata: z.record(z.string()).optional(),
});

app.post('/api/chat', (req, res) => {
  const result = inputSchema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({ error: 'Invalid input', details: result.error });
  }
  // Process validated input
});

Rate Limiting

Protect against abuse with multi-tier rate limiting:

import rateLimit from 'express-rate-limit';

// Global rate limit
const globalLimiter = rateLimit({
  windowMs: 60 * 1000,     // 1 minute
  max: 100,                 // 100 requests per minute globally
  standardHeaders: true,
});

// Per-user rate limit
const userLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 20,                  // 20 requests per user per minute
  keyGenerator: (req) => req.user?.id || req.ip,
});

app.use('/api/', globalLimiter);
app.use('/api/chat', userLimiter);

Tool Execution Sandboxing

Restrict what tools can do in production:

security:
  tools:
    filesystem:
      allowedPaths:
        - "/app/workspace"
        - "/tmp/agent-work"
      blockedPatterns:
        - "**/.env*"
        - "**/node_modules/**"
        - "**/*.key"
      maxFileSize: 10485760  # 10MB

    git:
      allowedOperations:
        - clone
        - diff
        - log
      blockedOperations:
        - push
        - force-push
        - reset

Monitoring and Logging

Production systems need observability to detect issues early and respond quickly.

Structured Logging

Use structured JSON logging for machine-readable output:

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
  serializers: {
    req: pino.stdSerializers.req,
    res: pino.stdSerializers.res,
    err: pino.stdSerializers.err,
  },
});

// Log agent events with context
logger.info({
  event: 'agent_request',
  agent: 'coder',
  userId: 'user-123',
  channel: 'telegram',
  tokenUsage: 1523,
  duration: 4200,
}, 'Agent request completed');

Health Check Endpoint

Implement a comprehensive health check:

app.get('/health', async (req, res) => {
  const checks = {
    status: 'ok',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    services: {
      redis: await checkRedis(),
      llm: await checkLLMProvider(),
      telegram: await checkTelegram(),
    },
    resources: {
      memory: process.memoryUsage(),
      cpu: process.cpuUsage(),
    },
  };

  const allHealthy = Object.values(checks.services)
    .every(s => s.status === 'ok');

  res.status(allHealthy ? 200 : 503).json(checks);
});

Metrics Collection

Track key metrics for performance monitoring:

const metrics = {
  requestCount: new Counter('requests_total', 'Total requests'),
  requestDuration: new Histogram('request_duration_ms', 'Request duration'),
  tokenUsage: new Counter('tokens_total', 'Total LLM tokens used'),
  toolCalls: new Counter('tool_calls_total', 'Total tool calls'),
  errorCount: new Counter('errors_total', 'Total errors'),
  activeAgents: new Gauge('active_agents', 'Currently active agents'),
};

// Expose metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', 'text/plain');
  res.send(await register.metrics());
});

Alerting Rules

Set up alerts for critical conditions:

alerts:
  high-error-rate:
    condition: "error_rate > 5%"
    window: "5m"
    severity: critical
    notify: ["slack", "pagerduty"]

  high-latency:
    condition: "p95_latency > 10s"
    window: "10m"
    severity: warning
    notify: ["slack"]

  token-budget-exceeded:
    condition: "daily_tokens > 900000"
    window: "1d"
    severity: warning
    notify: ["email"]

Scaling Strategies

As usage grows, you need strategies to scale your agent infrastructure.

Horizontal Scaling

Run multiple instances behind a load balancer:

# docker-compose.scale.yml
services:
  thepopebot:
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 30s
        order: start-first

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - thepopebot

Queue-Based Architecture

For high-throughput scenarios, use a message queue to decouple request ingestion from processing:

User Request ──▶ API Server ──▶ Message Queue ──▶ Worker Pool ──▶ Response
                                     │
                              ┌──────┼──────┐
                              ▼      ▼      ▼
                          Worker  Worker  Worker

// Producer: API server enqueues requests
await queue.add('agent-task', {
  agent: 'coder',
  message: userMessage,
  userId: user.id,
  priority: calculatePriority(user),
});

// Consumer: Workers process from the queue
queue.process('agent-task', async (job) => {
  const result = await engine.run(job.data);
  await notifyUser(job.data.userId, result);
});

Caching Strategy

Cache frequently used data to reduce LLM calls:

const cache = {
  // Cache tool results for identical inputs
  toolResults: new LRUCache({ max: 1000, ttl: 300000 }),

  // Cache agent context assembly
  contextCache: new LRUCache({ max: 100, ttl: 60000 }),

  // Cache common LLM responses
  responseCache: new LRUCache({ max: 500, ttl: 600000 }),
};

Best Practices Checklist

Before going to production, verify the following:

Security

All API keys are stored in environment variables, not in code
Input validation is implemented on all endpoints
Rate limiting is configured for all public endpoints
Tool execution is sandboxed with explicit permission lists
HTTPS is enforced for all external communication
Authentication is required for the Web UI and API
Webhook signatures are validated

Reliability

Health check endpoint is implemented and monitored
Graceful shutdown handles in-flight requests
Retry logic is implemented for transient failures
Circuit breakers protect against cascading failures
Resource limits are set for CPU, memory, and tokens

Observability

Structured logging is configured with appropriate log levels
Metrics are collected and exposed for monitoring
Alerts are set up for critical conditions
Request tracing allows debugging individual requests
Token usage is tracked and budgeted

Operations

Docker images are built with multi-stage builds
Container runs as non-root user
Automated backups are configured for persistent data
Rolling update strategy is defined
Rollback procedure is documented and tested
Scaling thresholds are defined and automated

Completing this checklist means your ThePopeBot deployment is production-ready. Congratulations on making it through all 7 days of the tutorial!

Production Deployment