System Design: Designing a Globally Distributed Artifact Repository

Overview
Problem Statement
Design Goals
Non-Goals
High-Level Architecture
Core Data Model
Read and Write Paths
Consistency Model
Replication and Repair
Security and Supply Chain Integrity
Caching and Performance Strategy
Multi-Tenant Isolation and Quotas
Observability and SLOs
Failure Modes and Mitigations
Migration and Rollout Plan
Interview Talking Points
Conclusion

System Design: Designing a Globally Distributed Artifact Repository

Overview

A globally distributed artifact repository stores build outputs such as container images, language packages, binary archives, and SBOM/provenance metadata for teams across regions.

At scale, the system has to solve four problems at once:

Strong integrity guarantees (no tampered artifacts).
Fast regional reads for CI/CD and runtime pulls.
Predictable write behavior across many tenants.
Clear recovery behavior during regional outages.

Problem Statement

Design an internal artifact platform that:

Supports global producer and consumer teams.
Delivers low-latency regional downloads.
Guarantees immutability for released versions.
Provides auditability for compliance and incident response.
Handles large spikes during coordinated launch windows.

Design Goals

Publish availability: 99.95% monthly.
Download availability: 99.99% monthly.
Artifact integrity: 100% cryptographic verification on publish and replication.
Regional median download latency under 150 ms for cached artifacts.
Recovery objective: fail over reads within 5 minutes of regional incident detection.

Non-Goals

Building source code or running CI workflows.
Runtime secret storage.
Full software catalog and project management features.

High-Level Architecture

Use a split control-plane/data-plane model.

Control plane:

Artifact metadata service (strongly consistent records).
AuthN/AuthZ and policy engine.
Replication scheduler and health controller.
Quota/accounting service.

Data plane:

Blob storage (content-addressed immutable objects).
Regional pull-through caches.
CDN edge for internet-adjacent consumers.
Regional verification workers.

Publishers -> Global API -> Metadata DB (strong consistency)
                         -> Primary Blob Store
                               |
                               v
                      Replication Queue/Workers
                    /            |              \
                 Region A     Region B       Region C
                 Blob+Cache   Blob+Cache     Blob+Cache
                    \            |              /
                        CDN / Regional Clients

Core Data Model

Keep metadata and content separated:

ArtifactVersion:
- package_name
- version
- digest (sha256)
- size
- created_by
- publish_time
- immutability_state
ArtifactBlob:
- digest
- storage_pointer
- encryption_key_ref
ReplicationState:
- per-region status (pending, verified, failed)
- last verification timestamp
ProvenanceRecord:
- build attestation reference
- signing identity
- policy decision at publish time

Immutability rule:

(package_name, version) can be created once.
Retagging can only create new alias records, never mutate existing blob bindings.

Read and Write Paths

Write Path (Publish)

Client authenticates and requests upload session.
Client uploads chunks to primary region blob store.
Service computes digest and validates expected hash.
Policy engine checks signature/provenance requirements.
Metadata transaction commits artifact version atomically.
Replication events are enqueued.

Read Path (Download)

Client resolves package/version to digest through metadata API.
Client fetches from regional cache or CDN edge.
If cache miss, regional cache pulls from nearest healthy blob replica.
Optional client-side digest verification before install/run.

Key principle:

Metadata decides what exists.
Blob layer serves immutable bytes by digest.

Consistency Model

Use mixed consistency by concern:

Metadata writes: strong consistency.
Blob replication: asynchronous eventual consistency with verification.
Read-your-writes guarantee: enabled for publisher token in primary region.
Global listing APIs: may be slightly stale by design (bounded lag SLO).

This avoids high write latency from global synchronous quorum while preserving correctness.

Replication and Repair

Replication policy:

Critical artifacts replicate to all primary regions.
Standard artifacts replicate to home region plus one secondary.
Cold artifacts use delayed replication with on-demand prefetch.

Repair workflow:

Periodic digest audit compares metadata expected hash vs blob reality.
Missing or mismatched replica enters failed state.
Repair worker re-copies from last known good source.
Promotion to verified only after checksum and signature checks pass.

Security and Supply Chain Integrity

Minimum controls:

Mandatory artifact signing for protected repositories.
Provenance attestation verification at publish time.
Immutable release repositories (no delete/overwrite by default).
KMS-backed encryption at rest and TLS in transit.
Fine-grained RBAC by team/repository/environment.
Break-glass actions are audited and time-limited.

Policy gates example:

Reject unsigned artifact for production repos.
Reject artifact missing SBOM for regulated workloads.
Reject vulnerable artifact above defined severity threshold.

Caching and Performance Strategy

Performance model:

CDN for hot public or globally shared artifacts.
Regional cache for private low-latency enterprise traffic.
Digest-addressable cache keys to maximize hit ratio and safety.
Predictive pre-warm for known release waves.

Operational optimization:

Chunked uploads with resumable sessions.
Range request support for large layers.
Compression-aware storage and transfer.

Multi-Tenant Isolation and Quotas

Isolation controls:

Tenant-scoped namespaces and token scopes.
Per-tenant rate limits for publish/download APIs.
Storage and egress quotas with soft and hard thresholds.
Separate encryption contexts per tenant or repo class.

Fairness:

Prevent noisy tenants from saturating replication workers.
Use weighted queues by tenant tier and artifact criticality.

Observability and SLOs

Key metrics:

Publish success rate and p95 publish latency.
Regional cache hit ratio.
Replication lag by region and repository class.
Integrity verification failure count.
4xx/5xx by endpoint and tenant.
Egress volume and cost by region.

Dashboards:

Control-plane health.
Replication pipeline status.
Top failing tenants/repos.
Incident timeline with regional status map.

Failure Modes and Mitigations

Regional blob outage.
- Mitigation: route reads to next healthy replica, fail writes over to backup primary, pause non-critical replication.
Metadata DB partial outage.
- Mitigation: degraded read-only mode for existing versions, promote standby with automated leader election.
Cache poisoning or stale manifest.
- Mitigation: digest pinning, short manifest TTL, forced revalidation on mismatch.
Replication backlog growth.
- Mitigation: autoscale workers, prioritize release-critical artifacts, enforce producer backpressure.
Credential compromise.
- Mitigation: short-lived tokens, immediate revocation, audit replay, signature revalidation for suspect window.

Migration and Rollout Plan

Phase 1: dual-write metadata to new control plane in shadow mode.
Phase 2: mirror blob replication and verify checksum parity.
Phase 3: canary tenants read from new regional caches.
Phase 4: progressive traffic shift by region.
Phase 5: retire legacy endpoints after rollback window expires.

Rollback rule:

Keep legacy read path active until parity and SLOs are stable for at least two full release cycles.

Interview Talking Points

Use these points in system design interviews:

Lead with immutability plus digest-addressable storage.
Separate strong metadata consistency from eventual blob replication.
Explain integrity verification and signed provenance as first-class controls.
Show concrete SLOs and failure handling, not generic architecture diagrams.
Include tenancy fairness and cost controls early.

Conclusion

A strong artifact repository design is not just object storage plus a REST API. It is a release-critical platform that combines immutability, cryptographic integrity, regional performance, and explicit failure behavior.

If you can explain these trade-offs clearly, you will stand out in platform, DevOps, and release engineering interviews.

System Design: Designing a Globally Distributed Artifact Repository

Table of Contents

System Design: Designing a Globally Distributed Artifact Repository

Overview

Problem Statement

Design Goals

Non-Goals

High-Level Architecture

Core Data Model

Read and Write Paths

Write Path (Publish)

Read Path (Download)

Consistency Model

Replication and Repair

Security and Supply Chain Integrity

Caching and Performance Strategy

Multi-Tenant Isolation and Quotas

Observability and SLOs

Failure Modes and Mitigations

Migration and Rollout Plan

Interview Talking Points

Conclusion