Robust Error Handling and Post-Build Actions

Overview
Failure Taxonomy for CI/CD
Design Principles
Pipeline Error-Handling Strategy
Retry Patterns That Do Not Hide Real Problems
Fail-Fast vs Fail-Safe Decisions
Jenkins Patterns
Post-Build Action Framework
Artifact and Log Retention
Notifications and Incident Routing
Recovery and Rollback Hooks
Observability and Reporting
Reference Jenkinsfile
Common Anti-Patterns
Implementation Checklist
Conclusion

Robust Error Handling and Post-Build Actions

Overview

Most pipeline outages are not caused by one failing command. They come from weak failure classification, blind retries, and missing post-build cleanup.

Robust CI/CD design means:

Classifying failures clearly.
Applying controlled recovery.
Running deterministic post-build actions every time.

Failure Taxonomy for CI/CD

Use categories that map to specific responses:

Transient infrastructure failures:
- Agent disconnects, registry timeouts, temporary DNS issues.
Deterministic code/test failures:
- Unit test failures, lint violations, build compile errors.
Dependency/system failures:
- Upstream package registry outage, database unavailable.
Policy/security failures:
- Secrets scan failure, SAST policy block, unsigned artifact.
Deployment/runtime verification failures:
- Canary health checks fail, error budget burn spikes.

Without this taxonomy, pipelines either retry everything or fail with no remediation.

Design Principles

Fail early for deterministic correctness issues.
Retry only transient failures with strict caps.
Always run cleanup and evidence-collection steps.
Keep post-build side effects idempotent.
Preserve forensic data for debugging and audit.

Pipeline Error-Handling Strategy

Recommended flow:

Validate inputs and environment quickly.
Execute quality gates in order of cost and signal.
Stop immediately on non-recoverable errors.
Retry transient operations with exponential backoff.
In post phase, always run cleanup, publish reports, and emit status.

Retry Patterns That Do Not Hide Real Problems

Good retry guardrails:

Retry count is low (usually 2-3).
Backoff increases per attempt.
Retry only on known transient codes/messages.
Log each retry reason explicitly.
Escalate after retry exhaustion.

Bad pattern:

Wrapping entire pipeline in a generic retry block.

This hides deterministic regressions and wastes compute.

Fail-Fast vs Fail-Safe Decisions

Fail-fast stages:

Lint and static checks.
Unit tests and schema validation.
Security policy gates for release branches.

Fail-safe behaviors:

Preserve artifacts/logs when build fails.
Revoke temporary credentials.
Ensure workspace cleanup.
Publish final status to chat/ticketing.

Jenkins Patterns

Practical Jenkins controls:

Use options { timeout(...) } to bound hung runs.
Use retry(n) only around known flaky network actions.
Use catchError for non-critical reporting steps.
Use post { always { ... } } for mandatory cleanup.
Use archiveArtifacts and junit in all outcomes where useful.

Post-Build Action Framework

Every pipeline should define post-build actions by severity and outcome.

Always:

Cleanup temporary files and secrets.
Archive logs and test reports.
Emit build metadata (commit SHA, image digest, duration).

On failure:

Attach failure classification.
Route alert to owning team.
Link runbook and recent similar incidents.

On success:

Publish release evidence.
Update deployment ledger/change record.

Artifact and Log Retention

Retention should be policy-driven:

Keep release artifacts longer than non-release artifacts.
Keep security scan outputs with auditable retention windows.
Delete workspace-sensitive intermediates aggressively.
Keep enough logs for triage, not unlimited noisy data.

Notifications and Incident Routing

Notification quality matters more than volume.

Include:

Pipeline name, stage, and failure class.
First failing command or gate.
Last successful run reference.
Suggested immediate action and runbook link.

Avoid:

Sending all failures to all channels.

Recovery and Rollback Hooks

For deployment pipelines, post-build should include rollback readiness:

Store last-known-good artifact reference.
Trigger automated rollback for failed canary verification.
Reconcile environment state after rollback.
Record rollback reason and blast radius.

Observability and Reporting

Track metrics that improve reliability:

Failure rate by stage and category.
Retry rate and retry success ratio.
Mean time to detect and resolve pipeline failures.
Post-build action success rate.
Flake rate for tests and external dependencies.

Use this data to fix systemic noise, not just individual runs.

Reference Jenkinsfile

pipeline {
  agent any
  options {
    timeout(time: 45, unit: 'MINUTES')
    timestamps()
  }
  stages {
    stage('Lint') {
      steps {
        sh 'make lint'
      }
    }
    stage('Unit Test') {
      steps {
        sh 'make test'
      }
      post {
        always {
          junit testResults: 'reports/junit/*.xml', allowEmptyResults: true
        }
      }
    }
    stage('Build Artifact') {
      steps {
        retry(2) {
          sh 'make build'
        }
      }
    }
    stage('Publish') {
      steps {
        retry(2) {
          sh 'make publish'
        }
      }
    }
  }
  post {
    always {
      archiveArtifacts artifacts: 'reports/**,dist/**,logs/**', allowEmptyArchive: true
      sh 'rm -rf .tmp || true'
    }
    success {
      echo 'Build success: release evidence published.'
    }
    failure {
      echo 'Build failed: classify error and notify owner.'
    }
  }
}

Common Anti-Patterns

Retrying non-idempotent deployment steps without safeguards.
Skipping report publication on failure.
Keeping long-lived credentials in environment variables.
No timeout settings, causing stuck executors.
Using manual cleanup that fails silently and leaves residue.

Implementation Checklist

Define failure classes and route each to a clear action.
Add bounded retries only to transient operations.
Add mandatory post { always { ... } } cleanup/reporting.
Improve alerts with ownership, context, and runbook links.
Track failure/retry metrics and remove top reliability bottlenecks.

Conclusion

Robust pipelines are built on deterministic behavior under both success and failure. If you classify failures well, apply targeted retries, and enforce post-build actions every run, delivery becomes faster and more reliable over time.

Robust Error Handling and Post-Build Actions

Table of Contents

Robust Error Handling and Post-Build Actions

Overview

Failure Taxonomy for CI/CD

Design Principles

Pipeline Error-Handling Strategy

Retry Patterns That Do Not Hide Real Problems

Fail-Fast vs Fail-Safe Decisions

Jenkins Patterns

Post-Build Action Framework

Artifact and Log Retention

Notifications and Incident Routing

Recovery and Rollback Hooks

Observability and Reporting

Reference Jenkinsfile

Common Anti-Patterns

Implementation Checklist

Conclusion