Self-Hosting Zero

To self-host Zero, you will need to deploy zero-cache, a Postgres database, your frontend, and your API server.

Zero-cache is made up of two main components:

  1. One or more view-syncers: serving client queries using a SQLite replica.
  2. One replication-manager: bridge between the Postgres replication stream and view-syncers.

These components have the following characteristics:

Replication ManagerView Syncer
Owns replication slot
Serves client queries
Backs up replica✅ (required in multi-node)
Restores from backupOptionalRequired
Subscribes to changesN/A (produces)
CVR management
Number deployed1N (horizontal scale)

You will also need to deploy a Postgres database, your frontend, and your API server for the query and mutate endpoints.

Before setting up Postgres, read Connecting to Postgres for provider-specific notes.

Minimum Viable Strategy

The simplest way to deploy Zero is to run everything on a single node. This is the least expensive way to run Zero, and it can take you surprisingly far.

Image

Here are equivalent single-node configurations for a few common deployment targets:

services:
  zero-cache:
    image: rocicorp/zero:1.4.0
    ports:
      - 4848:4848
    stop_grace_period: 10m
    environment:
      # Used for replication from postgres
      # This *must* be a direct connection (not via pgbouncer)
      ZERO_UPSTREAM_DB: postgres://postgres:pass@upstream-db:5432/zero
      # Used for storing client view records
      # Use a pooler in production
      ZERO_CVR_DB: postgres://postgres:pass@upstream-db:5432/zero
      # Used for storing recent replication log entries
      # Use a pooler in production
      ZERO_CHANGE_DB: postgres://postgres:pass@upstream-db:5432/zero
      # Path to the SQLite replica
      ZERO_REPLICA_FILE: /data/replica.db
      # Password used to access the inspector and /statz
      ZERO_ADMIN_PASSWORD: pickanewpassword
      # URLs for your API /query and /mutate endpoints
      ZERO_QUERY_URL: https://api.example.com/api/zero/query
      ZERO_MUTATE_URL: https://api.example.com/api/zero/mutate
      ZERO_ENABLE_CRUD_MUTATIONS: 'false'
    volumes:
      - zero-cache-data:/data
    healthcheck:
      test: curl -f http://localhost:4848/keepalive
      interval: 5s
      start_period: 10m
 
  upstream-db:
    image: postgres:18
    environment:
      POSTGRES_DB: zero
      POSTGRES_PASSWORD: pass
    ports:
      - 5432:5432
    command: postgres -c wal_level=logical
    healthcheck:
      test: pg_isready
      interval: 10s

These snippets only show the zero-cache side of the deployment. The API behind ZERO_QUERY_URL and ZERO_MUTATE_URL can live anywhere zero-cache can reach.

Maximal Strategy

Once you reach the limits of the single-node deployment, you can split zero-cache into a multi-node topology. This is more expensive to run, but it gives you more flexibility and scalability.

Image

Here are equivalent multi-node configurations for the same topology on a few common deployment targets:

services:
  replication-manager:
    image: rocicorp/zero:1.4.0
    # Do not expose the RM to the public internet - only view-syncers
    expose:
      - 4849
    stop_grace_period: 10m
    depends_on:
      upstream-db:
        condition: service_healthy
    environment:
      ZERO_UPSTREAM_DB: postgres://postgres:pass@upstream-db:5432/zero
      ZERO_CVR_DB: postgres://postgres:pass@upstream-db:5432/zero
      ZERO_CHANGE_DB: postgres://postgres:pass@upstream-db:5432/zero
      ZERO_REPLICA_FILE: /data/replica.db
      ZERO_ADMIN_PASSWORD: pickanewpassword
      ZERO_NUM_SYNC_WORKERS: 0
      ZERO_LITESTREAM_BACKUP_URL: s3://acme-zero-backups/v1
    volumes:
      - replication-manager-data:/data
    healthcheck:
      test: curl -f http://localhost:4849/keepalive
      interval: 5s
      start_period: 10m
 
  view-syncer:
    image: rocicorp/zero:1.4.0
    ports:
      - 4848:4848
    stop_grace_period: 10m
    depends_on:
      replication-manager:
        condition: service_healthy
    environment:
      ZERO_UPSTREAM_DB: postgres://postgres:pass@upstream-db:5432/zero
      ZERO_CVR_DB: postgres://postgres:pass@upstream-db:5432/zero
      ZERO_CHANGE_DB: postgres://postgres:pass@upstream-db:5432/zero
      ZERO_REPLICA_FILE: /data/replica.db
      ZERO_ADMIN_PASSWORD: pickanewpassword
      ZERO_QUERY_URL: https://api.example.com/api/zero/query
      ZERO_MUTATE_URL: https://api.example.com/api/zero/mutate
      ZERO_ENABLE_CRUD_MUTATIONS: 'false'
      ZERO_CHANGE_STREAMER_URI: ws://replication-manager:4849/
    volumes:
      - view-syncer-data:/data
    healthcheck:
      test: curl -f http://localhost:4848/keepalive
      interval: 5s
      start_period: 10m
 
  upstream-db:
    image: postgres:18
    environment:
      POSTGRES_DB: zero
      POSTGRES_PASSWORD: pass
    ports:
      - 5432:5432
    command: postgres -c wal_level=logical
    healthcheck:
      test: pg_isready
      interval: 10s

In multi-node deployments, keep ZERO_LITESTREAM_BACKUP_URL on the replication-manager only and point it at an AWS S3 bucket.

The view-syncers in the multi-node topology can be horizontally scaled as needed.

If restores or initial syncs take a while, configure your orchestrator to allow a startup grace period before treating startup checks as a failure. Ten minutes is a good default for most apps. For example, Docker Compose uses healthcheck.start_period, Fly.io uses grace_period, and ECS services can use healthCheckGracePeriodSeconds. Increase it if replica restore or initial sync routinely takes longer.

Likewise, during deploys, give zero-cache, replication-manager, and view-syncer a generous shutdown grace period so they can finish cleanup and drain websocket connections.

Replica Lifecycle

Zero-cache is backed by a SQLite replica of your database. The SQLite replica uses upstream Postgres as the source of truth. If the replica is missing or a litestream restore fails, the replication-manager will resync the replica from upstream on the next start.

Performance

You want to optimize disk IOPS for the serving replica, since this is the file that is read by the view-syncers to run IVM-based queries, and one of the main bottlenecks for query hydration performance. View syncer's IVM is "hydrate once, then incrementally push diffs" against the ZQL pipeline, so performance is mostly about:

  1. How fast the server can materialize a subscription the first time (hydration).
  2. How fast it can keep it up to date (IVM advancement).

Different bottlenecks dominate each phase.

Hydration

  • SQLite read cost: hydration is essentially "run the query against the replica and stream all matching rows into the pipeline", so it's bounded by SQLite scan/index performance + result size.
  • Churn / TTL eviction: if queries get evicted (inactive long enough) and then get re-requested, you pay hydration again.
  • Custom query transform latency: the HTTP request from zero-cache to your API at ZERO_QUERY_URL does transform/authorization for queries, adding network + CPU before hydration starts.

IVM advancement

  • Replication throughput: the view-syncer can only advance when the replicator commits and emits version-ready. If upstream replication is behind, query advancement is capped by how fast the replica advances.
  • Change volume per transaction: advancement cost scales with number of changed rows, not number of queries.
  • Circuit breaker behavior: if advancement looks like it'll take longer than rehydrating, zero-cache intentionally aborts and resets pipelines (which trades "slow incremental" for "rehydrate").

System-level

  • Number of client groups per sync worker: each client group has its own pipelines; CPU and memory per group limits how many can be "fast" at once. Since Node is single-threaded, one client group can technically starve other groups. This is handled with time slicing and can be configured with the yield parameters, e.g. ZERO_YIELD_THRESHOLD_MS.
  • SQLite concurrency limits: it's designed here for one writer (replicator) + many concurrent readers (view-syncer snapshots). It scales, but very heavy read workloads can still contend on cache/IO.
  • Network to clients: even if IVM is fast, it can take time to send data over websocket. This can be improved by using CDNs (like CloudFront) that improve routing.
  • Network between services: for a single-region deployment, all services should be colocated.

Networking

View syncers must be publicly reachable by clients on port 4848. The replication-manager must only be reachable by view-syncers over your private network on port 4849.

The external load balancer for view-syncers must support websockets, and can use the health check at /keepalive to verify view-syncers are healthy. The replication-manager should also have a /keepalive health check, but that check should run through private infrastructure rather than a public load balancer.

Sticky Sessions

View syncers are designed to be disposable, but since they keep hydrated query pipelines in memory, it's important to try to keep clients connected to the same instance. If a reconnect/refresh lands on a different instance, that instance usually has to rehydrate instead of reusing warm state.

If you are seeing a lot of Rehome errors, you may need to enable sticky sessions. Two instances can end up doing redundant hydration/advancement work for the same clientGroupID, and the "loser" will eventually force clients to reconnect.

Rolling Updates

Zero supports zero-downtime updates by rolling out changes in the following order:

  1. Upgrade replication-manager and wait for it to start up.
  2. Upgrade view-syncers (if they come up before the replication-manager, they'll sit in retry loops until the manager is updated).
  3. Update the API servers (your mutate and query endpoints).
  4. Update client(s).
  5. After most clients have refreshed, run contract migrations to drop or rename obsolete columns/tables.

Client/Server Version Compatibility

Servers are compatible with any client of same major version, and with clients one major version back.

For example, server 2.2.0 is compatible with:

  • Client 2.3.0 (same major version)
  • Client 2.1.0 (same major version)
  • Client 1.0.0 (previous major version)

But server 2.2.0 is not compatible with:

  • Client 3.0.0 (next major version)
  • Client 0.1.0 (two major versions back)

To upgrade Zero to a new major version, first deploy the new zero-cache, then the new frontend.

Configuration

The zero-cache image is configured via environment variables. See zero-cache Config for available options.