Day 7 15 min read

Production Deployment

Deploy ThePopeBot to production with Docker, implement security hardening, set up monitoring and logging, and learn scaling strategies.

Docker Production Config

While the development setup uses npm run dev, production deployments should use Docker for consistency, isolation, and reproducibility.

Production Dockerfile

Create a multi-stage Dockerfile for optimized builds:

# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:18-alpine AS production
WORKDIR /app

# Security: run as non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

USER appuser
EXPOSE 3000

CMD ["node", "dist/index.js"]

Docker Compose for Production

version: "3.8"

services:
  thepopebot:
    build:
      context: .
      dockerfile: Dockerfile
      target: production
    ports:
      - "3000:3000"
    environment:
      NODE_ENV: production
      LOG_LEVEL: info
      LOG_FORMAT: json
    env_file:
      - .env.production
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "0.5"
          memory: 512M
    networks:
      - app-network

  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    restart: unless-stopped
    networks:
      - app-network

networks:
  app-network:
    driver: bridge

volumes:
  redis-data:

Building and Running

# Build the production image
docker compose build

# Start in detached mode
docker compose up -d

# View logs
docker compose logs -f thepopebot

# Check health
docker compose ps

Security Hardening

Running an AI agent in production requires careful attention to security.

API Key Protection

Never expose API keys in logs, responses, or error messages:

// Middleware to strip sensitive data from responses
app.use((req, res, next) => {
  const originalJson = res.json.bind(res);
  res.json = (data: any) => {
    return originalJson(sanitizeResponse(data));
  };
  next();
});

function sanitizeResponse(data: any): any {
  const sensitive = ['apiKey', 'token', 'secret', 'password'];
  if (typeof data === 'object' && data !== null) {
    for (const key of Object.keys(data)) {
      if (sensitive.some(s => key.toLowerCase().includes(s))) {
        data[key] = '[REDACTED]';
      } else if (typeof data[key] === 'object') {
        data[key] = sanitizeResponse(data[key]);
      }
    }
  }
  return data;
}

Input Validation and Sanitization

Validate all incoming requests before they reach the agent:

const inputSchema = z.object({
  agent: z.string().max(50).regex(/^[a-zA-Z0-9-_]+$/),
  message: z.string().max(10000).trim(),
  metadata: z.record(z.string()).optional(),
});

app.post('/api/chat', (req, res) => {
  const result = inputSchema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({ error: 'Invalid input', details: result.error });
  }
  // Process validated input
});

Rate Limiting

Protect against abuse with multi-tier rate limiting:

import rateLimit from 'express-rate-limit';

// Global rate limit
const globalLimiter = rateLimit({
  windowMs: 60 * 1000,     // 1 minute
  max: 100,                 // 100 requests per minute globally
  standardHeaders: true,
});

// Per-user rate limit
const userLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 20,                  // 20 requests per user per minute
  keyGenerator: (req) => req.user?.id || req.ip,
});

app.use('/api/', globalLimiter);
app.use('/api/chat', userLimiter);

Tool Execution Sandboxing

Restrict what tools can do in production:

security:
  tools:
    filesystem:
      allowedPaths:
        - "/app/workspace"
        - "/tmp/agent-work"
      blockedPatterns:
        - "**/.env*"
        - "**/node_modules/**"
        - "**/*.key"
      maxFileSize: 10485760  # 10MB

    git:
      allowedOperations:
        - clone
        - diff
        - log
      blockedOperations:
        - push
        - force-push
        - reset

Monitoring and Logging

Production systems need observability to detect issues early and respond quickly.

Structured Logging

Use structured JSON logging for machine-readable output:

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
  serializers: {
    req: pino.stdSerializers.req,
    res: pino.stdSerializers.res,
    err: pino.stdSerializers.err,
  },
});

// Log agent events with context
logger.info({
  event: 'agent_request',
  agent: 'coder',
  userId: 'user-123',
  channel: 'telegram',
  tokenUsage: 1523,
  duration: 4200,
}, 'Agent request completed');

Health Check Endpoint

Implement a comprehensive health check:

app.get('/health', async (req, res) => {
  const checks = {
    status: 'ok',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    services: {
      redis: await checkRedis(),
      llm: await checkLLMProvider(),
      telegram: await checkTelegram(),
    },
    resources: {
      memory: process.memoryUsage(),
      cpu: process.cpuUsage(),
    },
  };

  const allHealthy = Object.values(checks.services)
    .every(s => s.status === 'ok');

  res.status(allHealthy ? 200 : 503).json(checks);
});

Metrics Collection

Track key metrics for performance monitoring:

const metrics = {
  requestCount: new Counter('requests_total', 'Total requests'),
  requestDuration: new Histogram('request_duration_ms', 'Request duration'),
  tokenUsage: new Counter('tokens_total', 'Total LLM tokens used'),
  toolCalls: new Counter('tool_calls_total', 'Total tool calls'),
  errorCount: new Counter('errors_total', 'Total errors'),
  activeAgents: new Gauge('active_agents', 'Currently active agents'),
};

// Expose metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', 'text/plain');
  res.send(await register.metrics());
});

Alerting Rules

Set up alerts for critical conditions:

alerts:
  high-error-rate:
    condition: "error_rate > 5%"
    window: "5m"
    severity: critical
    notify: ["slack", "pagerduty"]

  high-latency:
    condition: "p95_latency > 10s"
    window: "10m"
    severity: warning
    notify: ["slack"]

  token-budget-exceeded:
    condition: "daily_tokens > 900000"
    window: "1d"
    severity: warning
    notify: ["email"]

Scaling Strategies

As usage grows, you need strategies to scale your agent infrastructure.

Horizontal Scaling

Run multiple instances behind a load balancer:

# docker-compose.scale.yml
services:
  thepopebot:
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 30s
        order: start-first

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - thepopebot

Queue-Based Architecture

For high-throughput scenarios, use a message queue to decouple request ingestion from processing:

User Request ──▢ API Server ──▢ Message Queue ──▢ Worker Pool ──▢ Response
                                     β”‚
                              β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”
                              β–Ό      β–Ό      β–Ό
                          Worker  Worker  Worker
// Producer: API server enqueues requests
await queue.add('agent-task', {
  agent: 'coder',
  message: userMessage,
  userId: user.id,
  priority: calculatePriority(user),
});

// Consumer: Workers process from the queue
queue.process('agent-task', async (job) => {
  const result = await engine.run(job.data);
  await notifyUser(job.data.userId, result);
});

Caching Strategy

Cache frequently used data to reduce LLM calls:

const cache = {
  // Cache tool results for identical inputs
  toolResults: new LRUCache({ max: 1000, ttl: 300000 }),

  // Cache agent context assembly
  contextCache: new LRUCache({ max: 100, ttl: 60000 }),

  // Cache common LLM responses
  responseCache: new LRUCache({ max: 500, ttl: 600000 }),
};

Best Practices Checklist

Before going to production, verify the following:

Security

  • All API keys are stored in environment variables, not in code
  • Input validation is implemented on all endpoints
  • Rate limiting is configured for all public endpoints
  • Tool execution is sandboxed with explicit permission lists
  • HTTPS is enforced for all external communication
  • Authentication is required for the Web UI and API
  • Webhook signatures are validated

Reliability

  • Health check endpoint is implemented and monitored
  • Graceful shutdown handles in-flight requests
  • Retry logic is implemented for transient failures
  • Circuit breakers protect against cascading failures
  • Resource limits are set for CPU, memory, and tokens

Observability

  • Structured logging is configured with appropriate log levels
  • Metrics are collected and exposed for monitoring
  • Alerts are set up for critical conditions
  • Request tracing allows debugging individual requests
  • Token usage is tracked and budgeted

Operations

  • Docker images are built with multi-stage builds
  • Container runs as non-root user
  • Automated backups are configured for persistent data
  • Rolling update strategy is defined
  • Rollback procedure is documented and tested
  • Scaling thresholds are defined and automated

Completing this checklist means your ThePopeBot deployment is production-ready. Congratulations on making it through all 7 days of the tutorial!