NATS JetStream Deployment and Operations

Overview
Why JetStream
Baseline Deployment
Kubernetes Deployment Pattern
Storage and Durability
Authentication and Authorization
Monitoring and Health Checks
Operational Guardrails
When Not to Use JetStream
Conclusion

NATS JetStream Deployment and Operations

Overview

JetStream is the durable messaging layer in NATS. Use it when you need:

Persistent streams instead of transient pub/sub.
Consumer acknowledgements and replay.
Backpressure and retention controls.
A broker that stays small enough to run and reason about without a separate queueing stack.

This article focuses on the deployment shape: how to run NATS with JetStream enabled, how to persist data safely, and how to keep the cluster observable and recoverable.

Why JetStream

Plain NATS gives you fast message delivery, but it does not retain messages for late consumers. JetStream adds:

Streams that store messages.
Consumers that track delivery state.
Retention policies that control when messages expire.
Replay controls for recovery and reprocessing.

That makes JetStream a good fit for:

Event pipelines.
Work queues.
Audit trails.
Fan-out with durable subscribers.

It is not a replacement for every database or every workflow engine. It is a durable transport layer.

Baseline Deployment

For local validation, the smallest useful server is a single NATS process with JetStream enabled:

docker run --rm \
  -p 4222:4222 \
  -p 8222:8222 \
  nats:2.10 \
  -js -m 8222

What this exposes:

4222 for clients.
8222 for HTTP monitoring.
JetStream persistence through the -js flag.

You can then create a stream from the CLI:

nats stream add EVENTS \
  --subjects "events.>" \
  --storage file \
  --retention limits \
  --max-msgs=-1 \
  --max-age=168h

The stream settings above mean:

Store all matching subjects under events.>.
Persist to disk.
Keep messages for one week unless storage limits are hit earlier.

Kubernetes Deployment Pattern

For production, use a StatefulSet when you need durable JetStream storage and predictable pod identity.

apiVersion: v1
kind: Service
metadata:
  name: nats
spec:
  clusterIP: None
  selector:
    app: nats
  ports:
    - name: client
      port: 4222
    - name: monitoring
      port: 8222
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nats
spec:
  serviceName: nats
  replicas: 3
  selector:
    matchLabels:
      app: nats
  template:
    metadata:
      labels:
        app: nats
    spec:
      containers:
        - name: nats
          image: nats:2.10
          args:
            - "-js"
            - "-m"
            - "8222"
            - "--cluster"
            - "nats://0.0.0.0:6222"
          ports:
            - containerPort: 4222
            - containerPort: 6222
            - containerPort: 8222
          volumeMounts:
            - name: data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 20Gi

Use this as a starting point, then add:

A headless service for cluster traffic.
Pod anti-affinity so replicas do not land on the same node.
Resource requests and limits.
A persistent volume class with the right performance profile.

For a real cluster, keep the server config explicit rather than relying on defaults.

jetstream {
  store_dir = "/data/jetstream"
  max_mem_store = 512Mb
  max_file_store = 50Gb
}

server_name = "nats-$(POD_NAME)"
port = 4222
http_port = 8222
cluster {
  name = "jetstream"
  port = 6222
  routes = [
    "nats://nats-0.nats:6222",
    "nats://nats-1.nats:6222",
    "nats://nats-2.nats:6222",
  ]
}

The important operational choices are:

Keep JetStream storage on persistent volumes.
Keep the route list stable enough for restart recovery.
Separate client traffic from monitoring and cluster traffic.

Storage and Durability

JetStream can store data in memory or on disk, but durable streams generally belong on disk.

Use these rules:

Put high-value streams on file storage.
Set max age and max message limits deliberately.
Watch disk usage, not just message counts.
Treat stream replication as a durability control, not an excuse to skip backups.

A practical stream definition:

nats stream add AUDIT \
  --subjects "audit.>" \
  --storage file \
  --replicas 3 \
  --retention limits \
  --discard old \
  --max-age 30d

This says:

Retain audit events for 30 days.
Replicate across 3 nodes.
Prefer preserving new writes and dropping old data only when the limit is reached.

If you need very long retention, keep the stream compact:

Use subject partitioning.
Split hot and cold streams.
Avoid single giant streams for every topic in the company.

Authentication and Authorization

Do not run JetStream without access control in shared environments.

At minimum:

Require credentials.
Restrict publish subjects.
Restrict consumer creation.
Separate operator, producer, and consumer permissions.

Example NATS permission model:

{
  "publish": {
    "allow": ["events.>"]
  },
  "subscribe": {
    "allow": ["events.>", "$JS.API.CONSUMER.>"]
  }
}

In practice, you usually want narrower permissions than that:

Producers publish only to their own subjects.
Consumers subscribe only to their own stream subjects.
Admins manage stream definitions through a separate identity.

If you expose JetStream through Kubernetes, keep the client and management endpoints private unless you have a specific reason not to.

Monitoring and Health Checks

JetStream needs more than a TCP port check.

Useful health signals:

NATS server process is alive.
Client port accepts connections.
Monitoring endpoint responds.
JetStream storage is writable.
Stream leader is present.
Consumer lag is within tolerance.

CLI checks:

nats server check localhost:4222
nats stream ls
nats consumer ls EVENTS
nats stream info EVENTS

Useful operational metrics:

Publish latency.
Consumer ack latency.
Redelivery count.
Stream storage utilization.
Leader changes.

If you run JetStream in Kubernetes, alert on:

Persistent volume nearly full.
Repeated pod restarts.
Lost quorum.
Consumer lag growth.

Operational Guardrails

JetStream is easy to start and easy to abuse.

Guardrails that matter:

Name streams consistently.
Keep subject hierarchies shallow and predictable.
Version subjects intentionally.
Set retention per stream, not by convention.
Define who owns consumer lifecycles.

Good operational patterns:

events.orders.v1 for a versioned stream.
events.orders.deadletter.v1 for failed processing.
Separate streams for audit, operational events, and transient work.

Bad patterns:

One stream for every kind of data.
Unlimited retention by default.
Consumers created ad hoc by every service.
Application code that depends on implicit stream setup.

When Not to Use JetStream

JetStream is the wrong fit when:

You need SQL-style queries over historical data.
You need complex workflow orchestration.
You need multi-step transactions across systems.
You need guaranteed exactly-once semantics across an entire business process.

Use it when you want durable transport with explicit replay and acknowledgement semantics.

Conclusion

Deploy JetStream like an operational datastore, not like a throwaway message broker:

Persist data on disk.
Control retention.
Protect access.
Monitor lag, storage, and leadership.

If you keep the deployment boring, JetStream stays useful. If you skip the guardrails, it turns into another opaque stateful service.

NATS JetStream Deployment and Operations

Table of Contents

NATS JetStream Deployment and Operations

Overview

Why JetStream

Baseline Deployment

Kubernetes Deployment Pattern

Storage and Durability

Authentication and Authorization

Monitoring and Health Checks

Operational Guardrails

When Not to Use JetStream

Conclusion