Skip to content

NATS JetStream Deployment and Operations

JetStream is the durable messaging layer in NATS. Use it when you need:

  1. Persistent streams instead of transient pub/sub.
  2. Consumer acknowledgements and replay.
  3. Backpressure and retention controls.
  4. A broker that stays small enough to run and reason about without a separate queueing stack.

This article focuses on the deployment shape: how to run NATS with JetStream enabled, how to persist data safely, and how to keep the cluster observable and recoverable.

Plain NATS gives you fast message delivery, but it does not retain messages for late consumers. JetStream adds:

  1. Streams that store messages.
  2. Consumers that track delivery state.
  3. Retention policies that control when messages expire.
  4. Replay controls for recovery and reprocessing.

That makes JetStream a good fit for:

  1. Event pipelines.
  2. Work queues.
  3. Audit trails.
  4. Fan-out with durable subscribers.

It is not a replacement for every database or every workflow engine. It is a durable transport layer.

For local validation, the smallest useful server is a single NATS process with JetStream enabled:

Terminal window
docker run --rm \
-p 4222:4222 \
-p 8222:8222 \
nats:2.10 \
-js -m 8222

What this exposes:

  1. 4222 for clients.
  2. 8222 for HTTP monitoring.
  3. JetStream persistence through the -js flag.

You can then create a stream from the CLI:

Terminal window
nats stream add EVENTS \
--subjects "events.>" \
--storage file \
--retention limits \
--max-msgs=-1 \
--max-age=168h

The stream settings above mean:

  1. Store all matching subjects under events.>.
  2. Persist to disk.
  3. Keep messages for one week unless storage limits are hit earlier.

For production, use a StatefulSet when you need durable JetStream storage and predictable pod identity.

apiVersion: v1
kind: Service
metadata:
name: nats
spec:
clusterIP: None
selector:
app: nats
ports:
- name: client
port: 4222
- name: monitoring
port: 8222
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nats
spec:
serviceName: nats
replicas: 3
selector:
matchLabels:
app: nats
template:
metadata:
labels:
app: nats
spec:
containers:
- name: nats
image: nats:2.10
args:
- "-js"
- "-m"
- "8222"
- "--cluster"
- "nats://0.0.0.0:6222"
ports:
- containerPort: 4222
- containerPort: 6222
- containerPort: 8222
volumeMounts:
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi

Use this as a starting point, then add:

  1. A headless service for cluster traffic.
  2. Pod anti-affinity so replicas do not land on the same node.
  3. Resource requests and limits.
  4. A persistent volume class with the right performance profile.

For a real cluster, keep the server config explicit rather than relying on defaults.

jetstream {
store_dir = "/data/jetstream"
max_mem_store = 512Mb
max_file_store = 50Gb
}
server_name = "nats-$(POD_NAME)"
port = 4222
http_port = 8222
cluster {
name = "jetstream"
port = 6222
routes = [
"nats://nats-0.nats:6222",
"nats://nats-1.nats:6222",
"nats://nats-2.nats:6222",
]
}

The important operational choices are:

  1. Keep JetStream storage on persistent volumes.
  2. Keep the route list stable enough for restart recovery.
  3. Separate client traffic from monitoring and cluster traffic.

JetStream can store data in memory or on disk, but durable streams generally belong on disk.

Use these rules:

  1. Put high-value streams on file storage.
  2. Set max age and max message limits deliberately.
  3. Watch disk usage, not just message counts.
  4. Treat stream replication as a durability control, not an excuse to skip backups.

A practical stream definition:

Terminal window
nats stream add AUDIT \
--subjects "audit.>" \
--storage file \
--replicas 3 \
--retention limits \
--discard old \
--max-age 30d

This says:

  1. Retain audit events for 30 days.
  2. Replicate across 3 nodes.
  3. Prefer preserving new writes and dropping old data only when the limit is reached.

If you need very long retention, keep the stream compact:

  1. Use subject partitioning.
  2. Split hot and cold streams.
  3. Avoid single giant streams for every topic in the company.

Do not run JetStream without access control in shared environments.

At minimum:

  1. Require credentials.
  2. Restrict publish subjects.
  3. Restrict consumer creation.
  4. Separate operator, producer, and consumer permissions.

Example NATS permission model:

{
"publish": {
"allow": ["events.>"]
},
"subscribe": {
"allow": ["events.>", "$JS.API.CONSUMER.>"]
}
}

In practice, you usually want narrower permissions than that:

  1. Producers publish only to their own subjects.
  2. Consumers subscribe only to their own stream subjects.
  3. Admins manage stream definitions through a separate identity.

If you expose JetStream through Kubernetes, keep the client and management endpoints private unless you have a specific reason not to.

JetStream needs more than a TCP port check.

Useful health signals:

  1. NATS server process is alive.
  2. Client port accepts connections.
  3. Monitoring endpoint responds.
  4. JetStream storage is writable.
  5. Stream leader is present.
  6. Consumer lag is within tolerance.

CLI checks:

Terminal window
nats server check localhost:4222
nats stream ls
nats consumer ls EVENTS
nats stream info EVENTS

Useful operational metrics:

  1. Publish latency.
  2. Consumer ack latency.
  3. Redelivery count.
  4. Stream storage utilization.
  5. Leader changes.

If you run JetStream in Kubernetes, alert on:

  1. Persistent volume nearly full.
  2. Repeated pod restarts.
  3. Lost quorum.
  4. Consumer lag growth.

JetStream is easy to start and easy to abuse.

Guardrails that matter:

  1. Name streams consistently.
  2. Keep subject hierarchies shallow and predictable.
  3. Version subjects intentionally.
  4. Set retention per stream, not by convention.
  5. Define who owns consumer lifecycles.

Good operational patterns:

  1. events.orders.v1 for a versioned stream.
  2. events.orders.deadletter.v1 for failed processing.
  3. Separate streams for audit, operational events, and transient work.

Bad patterns:

  1. One stream for every kind of data.
  2. Unlimited retention by default.
  3. Consumers created ad hoc by every service.
  4. Application code that depends on implicit stream setup.

JetStream is the wrong fit when:

  1. You need SQL-style queries over historical data.
  2. You need complex workflow orchestration.
  3. You need multi-step transactions across systems.
  4. You need guaranteed exactly-once semantics across an entire business process.

Use it when you want durable transport with explicit replay and acknowledgement semantics.

Deploy JetStream like an operational datastore, not like a throwaway message broker:

  1. Persist data on disk.
  2. Control retention.
  3. Protect access.
  4. Monitor lag, storage, and leadership.

If you keep the deployment boring, JetStream stays useful. If you skip the guardrails, it turns into another opaque stateful service.