Skip to content

System Design: Designing a Globally Distributed Artifact Repository

System Design: Designing a Globally Distributed Artifact Repository

Section titled “System Design: Designing a Globally Distributed Artifact Repository”

A globally distributed artifact repository stores build outputs such as container images, language packages, binary archives, and SBOM/provenance metadata for teams across regions.

At scale, the system has to solve four problems at once:

  1. Strong integrity guarantees (no tampered artifacts).
  2. Fast regional reads for CI/CD and runtime pulls.
  3. Predictable write behavior across many tenants.
  4. Clear recovery behavior during regional outages.

Design an internal artifact platform that:

  1. Supports global producer and consumer teams.
  2. Delivers low-latency regional downloads.
  3. Guarantees immutability for released versions.
  4. Provides auditability for compliance and incident response.
  5. Handles large spikes during coordinated launch windows.
  1. Publish availability: 99.95% monthly.
  2. Download availability: 99.99% monthly.
  3. Artifact integrity: 100% cryptographic verification on publish and replication.
  4. Regional median download latency under 150 ms for cached artifacts.
  5. Recovery objective: fail over reads within 5 minutes of regional incident detection.
  1. Building source code or running CI workflows.
  2. Runtime secret storage.
  3. Full software catalog and project management features.

Use a split control-plane/data-plane model.

Control plane:

  1. Artifact metadata service (strongly consistent records).
  2. AuthN/AuthZ and policy engine.
  3. Replication scheduler and health controller.
  4. Quota/accounting service.

Data plane:

  1. Blob storage (content-addressed immutable objects).
  2. Regional pull-through caches.
  3. CDN edge for internet-adjacent consumers.
  4. Regional verification workers.
Publishers -> Global API -> Metadata DB (strong consistency)
-> Primary Blob Store
|
v
Replication Queue/Workers
/ | \
Region A Region B Region C
Blob+Cache Blob+Cache Blob+Cache
\ | /
CDN / Regional Clients

Keep metadata and content separated:

  1. ArtifactVersion:
    • package_name
    • version
    • digest (sha256)
    • size
    • created_by
    • publish_time
    • immutability_state
  2. ArtifactBlob:
    • digest
    • storage_pointer
    • encryption_key_ref
  3. ReplicationState:
    • per-region status (pending, verified, failed)
    • last verification timestamp
  4. ProvenanceRecord:
    • build attestation reference
    • signing identity
    • policy decision at publish time

Immutability rule:

  1. (package_name, version) can be created once.
  2. Retagging can only create new alias records, never mutate existing blob bindings.
  1. Client authenticates and requests upload session.
  2. Client uploads chunks to primary region blob store.
  3. Service computes digest and validates expected hash.
  4. Policy engine checks signature/provenance requirements.
  5. Metadata transaction commits artifact version atomically.
  6. Replication events are enqueued.
  1. Client resolves package/version to digest through metadata API.
  2. Client fetches from regional cache or CDN edge.
  3. If cache miss, regional cache pulls from nearest healthy blob replica.
  4. Optional client-side digest verification before install/run.

Key principle:

  1. Metadata decides what exists.
  2. Blob layer serves immutable bytes by digest.

Use mixed consistency by concern:

  1. Metadata writes: strong consistency.
  2. Blob replication: asynchronous eventual consistency with verification.
  3. Read-your-writes guarantee: enabled for publisher token in primary region.
  4. Global listing APIs: may be slightly stale by design (bounded lag SLO).

This avoids high write latency from global synchronous quorum while preserving correctness.

Replication policy:

  1. Critical artifacts replicate to all primary regions.
  2. Standard artifacts replicate to home region plus one secondary.
  3. Cold artifacts use delayed replication with on-demand prefetch.

Repair workflow:

  1. Periodic digest audit compares metadata expected hash vs blob reality.
  2. Missing or mismatched replica enters failed state.
  3. Repair worker re-copies from last known good source.
  4. Promotion to verified only after checksum and signature checks pass.

Minimum controls:

  1. Mandatory artifact signing for protected repositories.
  2. Provenance attestation verification at publish time.
  3. Immutable release repositories (no delete/overwrite by default).
  4. KMS-backed encryption at rest and TLS in transit.
  5. Fine-grained RBAC by team/repository/environment.
  6. Break-glass actions are audited and time-limited.

Policy gates example:

  1. Reject unsigned artifact for production repos.
  2. Reject artifact missing SBOM for regulated workloads.
  3. Reject vulnerable artifact above defined severity threshold.

Performance model:

  1. CDN for hot public or globally shared artifacts.
  2. Regional cache for private low-latency enterprise traffic.
  3. Digest-addressable cache keys to maximize hit ratio and safety.
  4. Predictive pre-warm for known release waves.

Operational optimization:

  1. Chunked uploads with resumable sessions.
  2. Range request support for large layers.
  3. Compression-aware storage and transfer.

Isolation controls:

  1. Tenant-scoped namespaces and token scopes.
  2. Per-tenant rate limits for publish/download APIs.
  3. Storage and egress quotas with soft and hard thresholds.
  4. Separate encryption contexts per tenant or repo class.

Fairness:

  1. Prevent noisy tenants from saturating replication workers.
  2. Use weighted queues by tenant tier and artifact criticality.

Key metrics:

  1. Publish success rate and p95 publish latency.
  2. Regional cache hit ratio.
  3. Replication lag by region and repository class.
  4. Integrity verification failure count.
  5. 4xx/5xx by endpoint and tenant.
  6. Egress volume and cost by region.

Dashboards:

  1. Control-plane health.
  2. Replication pipeline status.
  3. Top failing tenants/repos.
  4. Incident timeline with regional status map.
  1. Regional blob outage.
    • Mitigation: route reads to next healthy replica, fail writes over to backup primary, pause non-critical replication.
  2. Metadata DB partial outage.
    • Mitigation: degraded read-only mode for existing versions, promote standby with automated leader election.
  3. Cache poisoning or stale manifest.
    • Mitigation: digest pinning, short manifest TTL, forced revalidation on mismatch.
  4. Replication backlog growth.
    • Mitigation: autoscale workers, prioritize release-critical artifacts, enforce producer backpressure.
  5. Credential compromise.
    • Mitigation: short-lived tokens, immediate revocation, audit replay, signature revalidation for suspect window.
  1. Phase 1: dual-write metadata to new control plane in shadow mode.
  2. Phase 2: mirror blob replication and verify checksum parity.
  3. Phase 3: canary tenants read from new regional caches.
  4. Phase 4: progressive traffic shift by region.
  5. Phase 5: retire legacy endpoints after rollback window expires.

Rollback rule:

  1. Keep legacy read path active until parity and SLOs are stable for at least two full release cycles.

Use these points in system design interviews:

  1. Lead with immutability plus digest-addressable storage.
  2. Separate strong metadata consistency from eventual blob replication.
  3. Explain integrity verification and signed provenance as first-class controls.
  4. Show concrete SLOs and failure handling, not generic architecture diagrams.
  5. Include tenancy fairness and cost controls early.

A strong artifact repository design is not just object storage plus a REST API. It is a release-critical platform that combines immutability, cryptographic integrity, regional performance, and explicit failure behavior.

If you can explain these trade-offs clearly, you will stand out in platform, DevOps, and release engineering interviews.