Deployment

A production deployment is one FastAPI process (plus optional Celery workers) behind a reverse proxy, backed by Postgres and Redis. Nothing exotic — standard 12-factor plumbing.

Production checklist

Before serving traffic:

[ ] SM_ENVIRONMENT set to something other than development/test/testing.
[ ] SM_SECRET_KEY is a strong random value (not the default).
[ ] SM_DATABASE_URL points at Postgres (postgresql+asyncpg://), not SQLite.
[ ] alembic upgrade head run against the production DB.
[ ] make doctor passes (zero errors) on the built artifact.
[ ] Admin bootstrap complete — an admin user exists and can log in.
[ ] Reverse proxy forwards X-Forwarded-Proto/X-Forwarded-For; configured with --proxy-headers.
[ ] HTTPS in front of the app (cookie is Secure when TLS is present).
[ ] Log destination configured (SM_LOG_FORMAT=json + log shipping).

Build

Typical Docker build:

dockerfile

FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN pip install uv && uv sync --frozen --all-packages --no-dev
COPY . .
RUN uv run --project host python -m compileall host modules framework packages

# Build the frontend
FROM node:20-slim AS frontend
WORKDIR /app
COPY package.json package-lock.json ./
COPY host/client_app host/client_app
COPY packages packages
COPY modules modules
RUN npm ci
RUN npm run build

FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /app /app
COPY --from=frontend /app/host/client_app/dist /app/host/client_app/dist
CMD ["uv", "run", "--project", "host", "uvicorn", "host.main:app", "--host", "0.0.0.0", "--port", "8000", "--proxy-headers"]

Tune worker count with --workers N for multi-CPU boxes, or run behind a process manager like Gunicorn with Uvicorn workers.

Running migrations on deploy

Don't migrate from inside the web container's startup hook — that way lies races when scaling up. Run it as a one-shot job before rolling the web tier:

bash

# one-shot container
uv run --project host alembic upgrade head

# then roll web tier
kubectl rollout restart deployment/web

The boot-time migration check (SM010) will fail cleanly if web starts against an unmigrated DB.

Process topology

For small deployments, one web container is plenty. Scale by adding:

Web replicas — stateless; they share state via DB and Redis. Session cookies are signed, so any replica can serve any request.
Celery workers — SM_MODULES_ENABLED=background_tasks,<modules with tasks> keeps workers lean. Run uv run --project host celery -A background_tasks.celery_app worker -l info.
Celery beat (scheduler) — exactly one instance, separate deployment.

Observability

Logs

SM_LOG_FORMAT=json emits structured logs. Every line has a correlation_id (set by CorrelationIdMiddleware). Shipping to a central store (Loki / CloudWatch / Datadog) lets you trace a request across web and worker.

Health checks

/health — Kubernetes liveness. Returns 200 if the process is alive; no dependency checks.
/admin/health — aggregated health with every module's registered checks. Return 503 if any critical check fails. Gate behind admin.health.view if you don't want it public.

Metrics

Not built in. The framework doesn't expose Prometheus metrics out of the box. If you need them, wrap uvicorn with prometheus-fastapi-instrumentator in your host bootstrap — it auto-scrapes endpoint latency/count.

SessionMiddleware uses SM_SECRET_KEY for signing. Secure + HttpOnly + SameSite=Lax by default. Rotate the secret by:

Deploying with a new SM_SECRET_KEY.
Expected behavior: existing sessions become invalid. Users sign back in.

There's no built-in key-rollover mechanism — if you need zero-downtime session rotation, fork SessionMiddleware to accept a list of keys (first = active, rest = verify-only).

CSRF

Relies on SameSite=Lax. Browsers don't attach the cookie to cross-site POST/PUT/DELETE, so forged submissions land unauthenticated and get rejected at the permission check. No token middleware, no per-form token.

Caveat: SameSite=Lax does attach on top-level navigations. If you have state-changing GETs (you shouldn't — but), that's an attack surface. Keep side effects out of GET handlers.

Reverse proxy

Minimum nginx:

nginx

upstream app { server app:8000; }

server {
    listen 443 ssl http2;
    server_name your-domain.com;

    # TLS config here

    location / {
        proxy_pass http://app;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Start uvicorn with --proxy-headers so it trusts X-Forwarded-For for client IP logging.

Static assets

The Vite build outputs to host/client_app/dist/. Serve these via the reverse proxy directly (not through Uvicorn) for better performance:

nginx

location /build/ {
    alias /var/www/app/host/client_app/dist/;
    access_log off;
    expires 1y;
    add_header Cache-Control "public, immutable";
}

Vite emits filenames with content hashes, so long cache TTLs are safe.

Zero-downtime deploys

Build new image.
Run alembic upgrade head as a one-shot job.
kubectl rollout restart deployment/web (rolling).
Watch logs for SM010 (would mean step 2 failed silently).
Rollback path: kubectl rollout undo; old pods come back up against the new DB. Make sure your migrations are backward-compatible for at least one release cycle (expand/contract pattern — see below).

Expand / contract migrations

Add a new column nullable in release N.
Release N+1: start writing to the new column, still read from both.
Release N+2: stop reading the old column; make the new one NOT NULL.
Release N+3: drop the old column.

Each release is deployable on its own. Each rollback goes to a compatible previous release.

Disaster recovery

DB backups — Postgres point-in-time recovery via WAL archiving. Standard RDS / Cloud SQL options cover this.
Secrets — SM_SECRET_KEY must be recoverable from your secrets manager. Rotating it invalidates all sessions but doesn't break anything else.
Sessions — lost sessions mean users re-login; acceptable.
Uploads — the file_storage module's local backend is ephemeral. For production, point it at S3 or a network filesystem and back that up independently.

Publishing releases

See docs/release.md for the full Python + JS publishing flow (OIDC Trusted Publishing on PyPI, NPM_TOKEN on npm, driven from GitHub Actions).

Deployment ​

Production checklist ​

Build ​

Running migrations on deploy ​

Process topology ​

Observability ​

Logs ​

Health checks ​

Metrics ​

Session cookie ​

CSRF ​

Reverse proxy ​

Static assets ​

Zero-downtime deploys ​

Expand / contract migrations ​

Disaster recovery ​

Publishing releases ​