Skip to content

Containerization

This document outlines the containerization strategy and implementation for the AI Agent Orchestration Platform.

Overview

The platform uses containerization to ensure consistent deployment across environments, isolate components, and simplify scaling. Docker is the primary containerization technology, with Kubernetes for orchestration in production environments.

Container Architecture

Core Components

The platform is divided into several containerized components:

  • Backend API Service: FastAPI application serving the REST API
  • Frontend: React application for the user interface
  • Workflow Engine: Temporal.io server for workflow orchestration
  • Database: PostgreSQL for persistent storage
  • Agent Execution Environment: Isolated containers for running agents
  • Monitoring Stack: Prometheus, Grafana, and related services

Container Relationships

Container Architecture Diagram

Note: This is a placeholder for a container architecture diagram. The actual diagram should be created and added to the project.

Docker Configuration

Dockerfile Standards

Each component follows these Dockerfile standards:

  • Use specific version tags for base images
  • Multi-stage builds to minimize image size
  • Non-root user for running applications
  • Health checks for container status monitoring
  • Proper signal handling for graceful shutdown
  • Minimal required dependencies

Example Dockerfile for the backend service:

# Build stage
FROM python:3.11-slim AS builder

WORKDIR /app

COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt

# Runtime stage
FROM python:3.11-slim

WORKDIR /app

# Create non-root user
RUN addgroup --system app && adduser --system --group app

# Copy wheels from builder stage
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache /wheels/*

# Copy application code
COPY . .

# Set ownership
RUN chown -R app:app /app

# Switch to non-root user
USER app

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

# Command
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Docker Compose

For local development and simple deployments, Docker Compose is used to orchestrate the containers:

version: '3.8'
services:
  backend:
    build: ./backend
    ports:
      - "8000:8000"
    env_file:
      - .env
    volumes:
      - ./backend:/app
    depends_on:
      - db
      - temporal
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    env_file:
      - .env
    volumes:
      - ./frontend:/app
    depends_on:
      - backend

  db:
    image: postgres:15
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: ${DB_NAME}
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER} -d ${DB_NAME}"]
      interval: 10s
      timeout: 5s
      retries: 5

  temporal:
    image: temporalio/auto-setup:1.20.0
    ports:
      - "7233:7233"
    environment:
      - DB=postgresql
      - DB_PORT=5432
      - POSTGRES_USER=${DB_USER}
      - POSTGRES_PWD=${DB_PASSWORD}
      - POSTGRES_SEEDS=db
    depends_on:
      - db

  prometheus:
    image: prom/prometheus:v2.42.0
    ports:
      - "9090:9090"
    volumes:
      - ./infra/prometheus:/etc/prometheus
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'

  grafana:
    image: grafana/grafana:9.4.7
    ports:
      - "3001:3000"
    volumes:
      - ./infra/grafana/provisioning:/etc/grafana/provisioning
      - grafana_data:/var/lib/grafana

volumes:
  pgdata:
  prometheus_data:
  grafana_data:

Agent Execution Containers

Agent Isolation

Agents run in isolated containers to ensure:

  • Security through isolation
  • Resource constraints
  • Dependency management
  • Reproducible execution

Agent Container Lifecycle

  1. Build: Agent container images are built from agent definitions
  2. Pull: Images are pulled from registry when needed
  3. Configure: Environment variables and volumes are configured
  4. Run: Container is started with appropriate permissions
  5. Monitor: Health and resource usage are monitored
  6. Cleanup: Container is stopped and removed after execution

Agent Container Security

  • Read-only file system where possible
  • No privileged access
  • Network isolation
  • Resource limits (CPU, memory)
  • Secrets management via environment variables or mounted files

Container Registry

The platform uses a container registry to store and distribute container images:

  • Development: Local registry or cloud provider registry
  • Production: Private registry with access controls
  • CI/CD Integration: Automated builds and pushes to registry
  • Versioning: Images tagged with semantic versions and git commit hashes

Kubernetes Integration

For production deployments, containers are orchestrated with Kubernetes:

  • Deployments: Manage replica sets for stateless components
  • StatefulSets: Manage stateful components like databases
  • Services: Expose components internally and externally
  • ConfigMaps/Secrets: Manage configuration and sensitive data
  • Ingress: Route external traffic to services
  • PersistentVolumes: Manage persistent storage
  • ResourceQuotas: Limit resource usage per namespace
  • NetworkPolicies: Control network traffic between pods

Example Kubernetes deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: meta-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: ${REGISTRY}/meta-agent/backend:${VERSION}
        ports:
        - containerPort: 8000
        env:
        - name: DB_HOST
          valueFrom:
            configMapKeyRef:
              name: meta-agent-config
              key: db_host
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: meta-agent-secrets
              key: db_user
        resources:
          limits:
            cpu: "1"
            memory: "1Gi"
          requests:
            cpu: "500m"
            memory: "512Mi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5

Multi-Modal Container Support

For multi-modal agents (vision, audio, etc.), specialized containers are provided:

  • Vision Containers: Include OpenCV, PyTorch, TensorFlow
  • Audio Containers: Include speech recognition libraries
  • Sensor Data Containers: Include data processing libraries

Edge Deployment Containers

For edge deployments, lightweight containers are optimized for:

  • Minimal size
  • Low resource usage
  • Offline operation
  • Secure updates

See Edge Infrastructure for more details.

Container Management Scripts

Scripts for container management are located in /infra/scripts/:

  • build_images.sh - Build all container images
  • push_images.sh - Push images to registry
  • prune_images.sh - Clean up unused images
  • agent_container.sh - Build and manage agent containers

Best Practices

  • Use multi-stage builds to minimize image size
  • Implement proper health checks
  • Run containers as non-root users
  • Scan images for vulnerabilities
  • Use specific version tags, not latest
  • Implement proper logging
  • Set appropriate resource limits
  • Use container-specific configuration

References


Last updated: 2025-04-18