OpenTelemetry

The zero-cache service embeds the JavaScript OTLP Exporter and can send logs, traces, and metrics to any standard otel collector.

To enable otel, set the following environment variables then run zero-cache as normal:

OTEL_EXPORTER_OTLP_ENDPOINT="<your otel endpoint>"
OTEL_EXPORTER_OTLP_HEADERS="<auth headers from your otel collector>"
OTEL_RESOURCE_ATTRIBUTES="<resource attributes from your otel collector>"
OTEL_NODE_RESOURCE_DETECTORS="env,host,os"

Grafana Cloud Walkthrough

Here are instructions to setup Grafana Cloud, but the setup for other otel collectors should be similar.

  1. Sign up for Grafana Cloud (Free Tier)
  2. Click Connections > Add Connection in the left sidebar add-connection
  3. Search for "OpenTelemetry" and select it
  4. Click "Quickstart" quickstart
  5. Select "JavaScript" javascript
  6. Create a new token
  7. Copy the environment variables into your .env file or similar copy-env
  8. Start zero-cache
  9. Look for logs under "Drilldown" > "Logs" in left sidebar

Distributed Tracing

You can enable end-to-end trace correlation from your frontend through zero-cache to your API server. This allows you to see the full request flow in your tracing UI.

To enable this, provide a getTraceparent callback when creating your Zero client:

import {ZeroProvider} from '@rocicorp/zero/react'
import {propagation, context} from '@opentelemetry/api'
 
function getTraceparent() {
  const carrier: Record<string, string> = {}
  propagation.inject(context.active(), carrier)
  return carrier.traceparent
}
 
return (
  <ZeroProvider
    /* ... other options ... */
    getTraceparent={getTraceparent}
  >
    <App />
  </ZeroProvider>
)

This callback is called before sending WebSocket messages that trigger API server calls (push, changeDesiredQueries, initConnection). The returned W3C traceparent header is forwarded through zero-cache to your API server, where it can be used to continue the trace.

Metrics Reference

zero.server

MetricTypeUnitDescription
uptimeGaugesCumulative uptime, starting from when requests are served

zero.replica

MetricTypeUnitDescription
db_sizeGaugebytesSize of the replica's main db file (excludes WAL)
wal_sizeGaugebytesSize of the replica's WAL file
wal2_sizeGaugebytesSize of the replica's WAL2 file (only if using wal2 mode)
backup_lagGaugemsTime since last litestream backup. Expected to sawtooth from 0 to ZERO_LITESTREAM_INCREMENTAL_BACKUP_INTERVAL_MINUTES

zero.replication

MetricTypeUnitDescription
upstream_lagGaugemsLatency from sending a replication report to receiving it in the stream
replica_lagGaugemsLatency from receiving a replication report to it reaching the replica
total_lagGaugemsEnd-to-end replication latency. Grows as an estimate if the next report hasn't arrived
eventsCounterNumber of replication events processed
transactionsCounterCount of replicated transactions

zero.sync

MetricTypeUnitDescription
max-protocol-versionGaugeHighest sync protocol version seen from connecting clients
active-clientsUpDownCounterNumber of currently connected sync clients
active-client-groupsGaugeNumber of active ViewSyncerService instances in a syncer worker
queriesGaugeActive IVM pipelines across all client groups in a syncer worker
rowsGaugeCVR-tracked rows across all client groups in a syncer worker
lock-wait-timeHistogramsTime spent waiting to acquire the ViewSyncerService lock per operation
pipeline-resetsCounterCount of pipeline resets. Has a reason attribute: advancement-timeout, scalar-subquery, schema-change, truncation, permissions-change
hydrationCounterNumber of query hydrations
hydration-timeHistogramsTime to hydrate a query
advance-timeHistogramsTime to advance all queries for a client group after applying a transaction
poke.timeHistogramsTime per poke transaction (excludes canceled/noop pokes)
poke.transactionsCounterCount of poke transactions
poke.rowsCounterCount of poked rows
cvr.flush-timeHistogramsTime to flush a CVR transaction
cvr.rows-flushedCounterNumber of changed rows flushed to a CVR
ivm.advance-timeHistogramsTime to advance IVM queries in response to a single change
ivm.conflict-rows-deletedCounterRows deleted because they conflicted with an added row
query.transformationsCounterNumber of query transformations performed
query.transformation-timeHistogramsTime to transform custom queries via API server
query.transformation-hash-changesCounterTimes a query transformation hash changed
query.transformation-no-opsCounterTimes a query transformation was a no-op

zero.mutation

MetricTypeUnitDescription
crudCounterNumber of CRUD mutations processed
customCounterNumber of custom mutations processed
pushesCounterNumber of pushes processed