System Design: Designing a Globally Distributed Artifact Repository
Table of Contents
Section titled “Table of Contents”- Overview
- Problem Statement
- Design Goals
- Non-Goals
- High-Level Architecture
- Core Data Model
- Read and Write Paths
- Consistency Model
- Replication and Repair
- Security and Supply Chain Integrity
- Caching and Performance Strategy
- Multi-Tenant Isolation and Quotas
- Observability and SLOs
- Failure Modes and Mitigations
- Migration and Rollout Plan
- Interview Talking Points
- Conclusion
System Design: Designing a Globally Distributed Artifact Repository
Section titled “System Design: Designing a Globally Distributed Artifact Repository”Overview
Section titled “Overview”A globally distributed artifact repository stores build outputs such as container images, language packages, binary archives, and SBOM/provenance metadata for teams across regions.
At scale, the system has to solve four problems at once:
- Strong integrity guarantees (no tampered artifacts).
- Fast regional reads for CI/CD and runtime pulls.
- Predictable write behavior across many tenants.
- Clear recovery behavior during regional outages.
Problem Statement
Section titled “Problem Statement”Design an internal artifact platform that:
- Supports global producer and consumer teams.
- Delivers low-latency regional downloads.
- Guarantees immutability for released versions.
- Provides auditability for compliance and incident response.
- Handles large spikes during coordinated launch windows.
Design Goals
Section titled “Design Goals”- Publish availability: 99.95% monthly.
- Download availability: 99.99% monthly.
- Artifact integrity: 100% cryptographic verification on publish and replication.
- Regional median download latency under 150 ms for cached artifacts.
- Recovery objective: fail over reads within 5 minutes of regional incident detection.
Non-Goals
Section titled “Non-Goals”- Building source code or running CI workflows.
- Runtime secret storage.
- Full software catalog and project management features.
High-Level Architecture
Section titled “High-Level Architecture”Use a split control-plane/data-plane model.
Control plane:
- Artifact metadata service (strongly consistent records).
- AuthN/AuthZ and policy engine.
- Replication scheduler and health controller.
- Quota/accounting service.
Data plane:
- Blob storage (content-addressed immutable objects).
- Regional pull-through caches.
- CDN edge for internet-adjacent consumers.
- Regional verification workers.
Publishers -> Global API -> Metadata DB (strong consistency) -> Primary Blob Store | v Replication Queue/Workers / | \ Region A Region B Region C Blob+Cache Blob+Cache Blob+Cache \ | / CDN / Regional ClientsCore Data Model
Section titled “Core Data Model”Keep metadata and content separated:
ArtifactVersion:package_nameversiondigest(sha256)sizecreated_bypublish_timeimmutability_state
ArtifactBlob:digeststorage_pointerencryption_key_ref
ReplicationState:- per-region status (
pending,verified,failed) - last verification timestamp
- per-region status (
ProvenanceRecord:- build attestation reference
- signing identity
- policy decision at publish time
Immutability rule:
(package_name, version)can be created once.- Retagging can only create new alias records, never mutate existing blob bindings.
Read and Write Paths
Section titled “Read and Write Paths”Write Path (Publish)
Section titled “Write Path (Publish)”- Client authenticates and requests upload session.
- Client uploads chunks to primary region blob store.
- Service computes digest and validates expected hash.
- Policy engine checks signature/provenance requirements.
- Metadata transaction commits artifact version atomically.
- Replication events are enqueued.
Read Path (Download)
Section titled “Read Path (Download)”- Client resolves package/version to digest through metadata API.
- Client fetches from regional cache or CDN edge.
- If cache miss, regional cache pulls from nearest healthy blob replica.
- Optional client-side digest verification before install/run.
Key principle:
- Metadata decides what exists.
- Blob layer serves immutable bytes by digest.
Consistency Model
Section titled “Consistency Model”Use mixed consistency by concern:
- Metadata writes: strong consistency.
- Blob replication: asynchronous eventual consistency with verification.
- Read-your-writes guarantee: enabled for publisher token in primary region.
- Global listing APIs: may be slightly stale by design (bounded lag SLO).
This avoids high write latency from global synchronous quorum while preserving correctness.
Replication and Repair
Section titled “Replication and Repair”Replication policy:
- Critical artifacts replicate to all primary regions.
- Standard artifacts replicate to home region plus one secondary.
- Cold artifacts use delayed replication with on-demand prefetch.
Repair workflow:
- Periodic digest audit compares metadata expected hash vs blob reality.
- Missing or mismatched replica enters
failedstate. - Repair worker re-copies from last known good source.
- Promotion to
verifiedonly after checksum and signature checks pass.
Security and Supply Chain Integrity
Section titled “Security and Supply Chain Integrity”Minimum controls:
- Mandatory artifact signing for protected repositories.
- Provenance attestation verification at publish time.
- Immutable release repositories (no delete/overwrite by default).
- KMS-backed encryption at rest and TLS in transit.
- Fine-grained RBAC by team/repository/environment.
- Break-glass actions are audited and time-limited.
Policy gates example:
- Reject unsigned artifact for production repos.
- Reject artifact missing SBOM for regulated workloads.
- Reject vulnerable artifact above defined severity threshold.
Caching and Performance Strategy
Section titled “Caching and Performance Strategy”Performance model:
- CDN for hot public or globally shared artifacts.
- Regional cache for private low-latency enterprise traffic.
- Digest-addressable cache keys to maximize hit ratio and safety.
- Predictive pre-warm for known release waves.
Operational optimization:
- Chunked uploads with resumable sessions.
- Range request support for large layers.
- Compression-aware storage and transfer.
Multi-Tenant Isolation and Quotas
Section titled “Multi-Tenant Isolation and Quotas”Isolation controls:
- Tenant-scoped namespaces and token scopes.
- Per-tenant rate limits for publish/download APIs.
- Storage and egress quotas with soft and hard thresholds.
- Separate encryption contexts per tenant or repo class.
Fairness:
- Prevent noisy tenants from saturating replication workers.
- Use weighted queues by tenant tier and artifact criticality.
Observability and SLOs
Section titled “Observability and SLOs”Key metrics:
- Publish success rate and p95 publish latency.
- Regional cache hit ratio.
- Replication lag by region and repository class.
- Integrity verification failure count.
- 4xx/5xx by endpoint and tenant.
- Egress volume and cost by region.
Dashboards:
- Control-plane health.
- Replication pipeline status.
- Top failing tenants/repos.
- Incident timeline with regional status map.
Failure Modes and Mitigations
Section titled “Failure Modes and Mitigations”- Regional blob outage.
- Mitigation: route reads to next healthy replica, fail writes over to backup primary, pause non-critical replication.
- Metadata DB partial outage.
- Mitigation: degraded read-only mode for existing versions, promote standby with automated leader election.
- Cache poisoning or stale manifest.
- Mitigation: digest pinning, short manifest TTL, forced revalidation on mismatch.
- Replication backlog growth.
- Mitigation: autoscale workers, prioritize release-critical artifacts, enforce producer backpressure.
- Credential compromise.
- Mitigation: short-lived tokens, immediate revocation, audit replay, signature revalidation for suspect window.
Migration and Rollout Plan
Section titled “Migration and Rollout Plan”- Phase 1: dual-write metadata to new control plane in shadow mode.
- Phase 2: mirror blob replication and verify checksum parity.
- Phase 3: canary tenants read from new regional caches.
- Phase 4: progressive traffic shift by region.
- Phase 5: retire legacy endpoints after rollback window expires.
Rollback rule:
- Keep legacy read path active until parity and SLOs are stable for at least two full release cycles.
Interview Talking Points
Section titled “Interview Talking Points”Use these points in system design interviews:
- Lead with immutability plus digest-addressable storage.
- Separate strong metadata consistency from eventual blob replication.
- Explain integrity verification and signed provenance as first-class controls.
- Show concrete SLOs and failure handling, not generic architecture diagrams.
- Include tenancy fairness and cost controls early.
Conclusion
Section titled “Conclusion”A strong artifact repository design is not just object storage plus a REST API. It is a release-critical platform that combines immutability, cryptographic integrity, regional performance, and explicit failure behavior.
If you can explain these trade-offs clearly, you will stand out in platform, DevOps, and release engineering interviews.