Containerization

This document outlines the containerization strategy and implementation for the AI Agent Orchestration Platform.

Overview

The platform uses containerization to ensure consistent deployment across environments, isolate components, and simplify scaling. Docker is the primary containerization technology, with Kubernetes for orchestration in production environments.

Container Architecture

Core Components

The platform is divided into several containerized components:

Backend API Service: FastAPI application serving the REST API
Frontend: React application for the user interface
Workflow Engine: Temporal.io server for workflow orchestration
Database: PostgreSQL for persistent storage
Agent Execution Environment: Isolated containers for running agents
Monitoring Stack: Prometheus, Grafana, and related services

Container Relationships

Container Architecture Diagram

Note: This is a placeholder for a container architecture diagram. The actual diagram should be created and added to the project.

Docker Configuration

Dockerfile Standards

Each component follows these Dockerfile standards:

Use specific version tags for base images
Multi-stage builds to minimize image size
Non-root user for running applications
Health checks for container status monitoring
Proper signal handling for graceful shutdown
Minimal required dependencies

Example Dockerfile for the backend service:

# Build stage
FROM python:3.11-slim AS builder

WORKDIR /app

COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt

# Runtime stage
FROM python:3.11-slim

WORKDIR /app

# Create non-root user
RUN addgroup --system app && adduser --system --group app

# Copy wheels from builder stage
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache /wheels/*

# Copy application code
COPY . .

# Set ownership
RUN chown -R app:app /app

# Switch to non-root user
USER app

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

# Command
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Docker Compose

For local development and simple deployments, Docker Compose is used to orchestrate the containers:

version: '3.8'
services:
  backend:
    build: ./backend
    ports:
      - "8000:8000"
    env_file:
      - .env
    volumes:
      - ./backend:/app
    depends_on:
      - db
      - temporal
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    env_file:
      - .env
    volumes:
      - ./frontend:/app
    depends_on:
      - backend

  db:
    image: postgres:15
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: ${DB_NAME}
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER} -d ${DB_NAME}"]
      interval: 10s
      timeout: 5s
      retries: 5

  temporal:
    image: temporalio/auto-setup:1.20.0
    ports:
      - "7233:7233"
    environment:
      - DB=postgresql
      - DB_PORT=5432
      - POSTGRES_USER=${DB_USER}
      - POSTGRES_PWD=${DB_PASSWORD}
      - POSTGRES_SEEDS=db
    depends_on:
      - db

  prometheus:
    image: prom/prometheus:v2.42.0
    ports:
      - "9090:9090"
    volumes:
      - ./infra/prometheus:/etc/prometheus
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'

  grafana:
    image: grafana/grafana:9.4.7
    ports:
      - "3001:3000"
    volumes:
      - ./infra/grafana/provisioning:/etc/grafana/provisioning
      - grafana_data:/var/lib/grafana

volumes:
  pgdata:
  prometheus_data:
  grafana_data:

Agent Execution Containers

Agent Isolation

Agents run in isolated containers to ensure:

Security through isolation
Resource constraints
Dependency management
Reproducible execution

Agent Container Lifecycle

Build: Agent container images are built from agent definitions
Pull: Images are pulled from registry when needed
Configure: Environment variables and volumes are configured
Run: Container is started with appropriate permissions
Monitor: Health and resource usage are monitored
Cleanup: Container is stopped and removed after execution

Agent Container Security

Read-only file system where possible
No privileged access
Network isolation
Resource limits (CPU, memory)
Secrets management via environment variables or mounted files

Container Registry

The platform uses a container registry to store and distribute container images:

Development: Local registry or cloud provider registry
Production: Private registry with access controls
CI/CD Integration: Automated builds and pushes to registry
Versioning: Images tagged with semantic versions and git commit hashes

Kubernetes Integration

For production deployments, containers are orchestrated with Kubernetes:

Deployments: Manage replica sets for stateless components
StatefulSets: Manage stateful components like databases
Services: Expose components internally and externally
ConfigMaps/Secrets: Manage configuration and sensitive data
Ingress: Route external traffic to services
PersistentVolumes: Manage persistent storage
ResourceQuotas: Limit resource usage per namespace
NetworkPolicies: Control network traffic between pods

Example Kubernetes deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: meta-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: ${REGISTRY}/meta-agent/backend:${VERSION}
        ports:
        - containerPort: 8000
        env:
        - name: DB_HOST
          valueFrom:
            configMapKeyRef:
              name: meta-agent-config
              key: db_host
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: meta-agent-secrets
              key: db_user
        resources:
          limits:
            cpu: "1"
            memory: "1Gi"
          requests:
            cpu: "500m"
            memory: "512Mi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5

For multi-modal agents (vision, audio, etc.), specialized containers are provided:

Vision Containers: Include OpenCV, PyTorch, TensorFlow
Audio Containers: Include speech recognition libraries
Sensor Data Containers: Include data processing libraries

Edge Deployment Containers

For edge deployments, lightweight containers are optimized for:

Minimal size
Low resource usage
Offline operation
Secure updates

See Edge Infrastructure for more details.

Container Management Scripts

Scripts for container management are located in /infra/scripts/:

build_images.sh - Build all container images
push_images.sh - Push images to registry
prune_images.sh - Clean up unused images
agent_container.sh - Build and manage agent containers

Best Practices

Use multi-stage builds to minimize image size
Implement proper health checks
Run containers as non-root users
Scan images for vulnerabilities
Use specific version tags, not latest
Implement proper logging
Set appropriate resource limits
Use container-specific configuration

References

Last updated: 2025-04-18