Skip to content

Commit

Permalink
ci: Refactor Dockerfile & entrypoint (#8923)
Browse files Browse the repository at this point in the history
* Refactor formatting & docs

* Refactor the `runtime` stage in Dockerfile

* Remove unused code from `entrypoint.sh`

* Simplify `entrypoint.sh` setup

* Revise docs & formatting

* Adjust default values for env vars

* Bump Rust v from 1.79 to 1.81 in Dockerfile

* Refactor `entrypoint.sh`

* Refactor `Dockerfile`

* Add TODOs for monitoring stage to Dockerfile

* Refactor `Dockerfile`

* Add TODOs for monitoring stage to Dockerfile

* Fix a typo

* Allow running `zebrad` in test mode

* Allow custom config for `zebrad` in test mode

* Remove `curl` from the `runtime` Docker image

* Remove redundant echos

* Remove a malfunctioning CD test

The test was using a custom config file set in `test_variables`.
However, the file was not included in the Docker image, and the
entrypoint script created a new, default one under the original file's
path. Zebra then loaded this new file, and the test passed because the
pattern in `grep_patterns` matched Zebra's output containing the
original path, even though the config file was different.

* Remove a redundant CI test

* Remove all packages from the `runtime` stage

* Docs cosmetics

* Clarify docs

* Bump Rust version

* Remove a security note

* Explicitly specify network cache dir

* Explicitly specify cookie dir

* Set UID, GID and home dir for the `zebra` user

* Set a working dir for the `zebra` user

* Don't remove `FEATURES`

* Try re-introducing the `testnet-conf` check

* `ZEBRA_CACHED_STATE_DIR` -> `ZEBRA_CACHE_DIR`

This dir doesn't hold only the state cache anymore, but also the cache
for network peers, and the cookie file.

* Refactor the dir structure

* Check that `ZEBRA_CONF_PATH` exists in the image

* Improve the check for `ZEBRA_CONF_PATH`

* Use different flag in the `ZEBRA_CONF_PATH` check

* Simplify the `ZEBRA_CONF_PATH` check

* Fix spelling

* Comment out the `testnet-conf` CI check

* Add commented out `test-zebra-conf-path` CI check

* Reintroduce `testnet-conf` CI check

* Update the `custom-conf` CI check

* Add `v2.1.0.toml` conf file

* Refine the `v2.1.0.toml` conf file

* Remove `ZEBRA_LISTEN_ADDR` from the entrypoint

* Remove `ZEBRA_CHECKPOINT_SYNC` from the entrypoint

* Stop supporting configuration of the RPC port

* Add default conf file

* Prepare Zebra's config in the entrypoint script

* Remove unneeded packages from the `deps` target

* Docs cosmetics

* Use only `$FEATURES` in entrypoint

* Simplify handling of Rust features

* Add a TODO

* Add CI debug statements

* Don't require test vars in conf test

* Reintroduce `protoc`

* Remove `-e NETWORK`

* Remove `ZEBRA_FORCE_USE_COLOR=1`

* Remove `ZEBRA_CACHE_DIR=/var/cache/zebrad-cache`

* Reintroduce the "custom-conf" test

* Set up test env the same way as prod

* Don't repeatedly check for conf file in entrypoint

* Simplify file ownership in Dockerfile

* Fix checkpoint tests in entrypoint

* Fix Zebra config CI tests

* `LIGHTWALLETD_DATA_DIR` -> `LWD_CACHE_DIR`

* Add config for `LWD_CACHE_DIR` to Dockerfile

* `/var/cache/zebrad-cache` -> `~/.cache/zebra`

* `var/cache/lwd-cache` -> `/home/zebra/.cache/lwd`

* Remove `LOG_COLOR=false` from GCP setup

* Don't specify `LWD_CACHE_DIR` in CI tests

* Don't switch to `zebra` user for tests in Docker

* Join "experimental" and "all" tests in CI

* Remove outdated docs

* Refactor tests with fake activation heights

* Fix tests for scanner
  • Loading branch information
upbqdn authored Feb 13, 2025
1 parent f8860a6 commit 4132c0e
Show file tree
Hide file tree
Showing 25 changed files with 688 additions and 647 deletions.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@
!zebra-*
!zebrad
!docker/entrypoint.sh
!docker/default_zebra_config.toml
1 change: 0 additions & 1 deletion .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,6 @@ docker run --rm -e TEST_LWD_INTEGRATION=1 zebra-tests
#### Test Categories

- Full suite (`RUN_ALL_TESTS`)
- Experimental features (`RUN_ALL_EXPERIMENTAL_TESTS`)
- Integration tests (`TEST_LWD_INTEGRATION`)
- Network sync (`TEST_ZEBRA_EMPTY_SYNC`, `TEST_UPDATE_SYNC`)
- State management (`TEST_DISK_REBUILD`)
Expand Down
146 changes: 75 additions & 71 deletions .github/workflows/cd-deploy-nodes-gcp.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Google Cloud node deployments and tests that run when Rust code or dependencies are modified,
# but only on PRs from the ZcashFoundation/zebra repository.
# (External PRs are tested/deployed by GitHub's Merge Queue.)
# (External PRs are tested/deployed by GitHub's Merge Queue.)
#
# 1. `versioning`: Extracts the major version from the release semver. Useful for segregating instances based on major versions.
# 2. `build`: Builds a Docker image named `zebrad` with the necessary tags derived from Git.
Expand Down Expand Up @@ -28,85 +28,85 @@ concurrency:

on:
merge_group:
types: [ checks_requested ]
types: [checks_requested]

workflow_dispatch:
inputs:
network:
default: Mainnet
description: 'Network to deploy: Mainnet or Testnet'
description: "Network to deploy: Mainnet or Testnet"
required: true
type: choice
options:
- Mainnet
- Testnet
cached_disk_type:
default: tip
description: 'Type of cached disk to use'
description: "Type of cached disk to use"
required: true
type: choice
options:
- tip
- checkpoint
prefer_main_cached_state:
default: false
description: 'Prefer cached state from the main branch'
description: "Prefer cached state from the main branch"
required: false
type: boolean
need_cached_disk:
default: true
description: 'Use a cached state disk'
description: "Use a cached state disk"
required: false
type: boolean
no_cache:
description: 'Disable the Docker cache for this build'
description: "Disable the Docker cache for this build"
required: false
type: boolean
default: false
log_file:
default: ''
description: 'Log to a file path rather than standard output'
default: ""
description: "Log to a file path rather than standard output"

push:
# Skip main branch updates where Rust code and dependencies aren't modified.
branches:
- main
paths:
# code and tests
- '**/*.rs'
# hard-coded checkpoints and proptest regressions
- '**/*.txt'
# dependencies
- '**/Cargo.toml'
- '**/Cargo.lock'
# configuration files
- '.cargo/config.toml'
- '**/clippy.toml'
# workflow definitions
- 'docker/**'
- '.dockerignore'
- '.github/workflows/cd-deploy-nodes-gcp.yml'
- '.github/workflows/sub-build-docker-image.yml'
# Skip main branch updates where Rust code and dependencies aren't modified.
branches:
- main
paths:
# code and tests
- "**/*.rs"
# hard-coded checkpoints and proptest regressions
- "**/*.txt"
# dependencies
- "**/Cargo.toml"
- "**/Cargo.lock"
# configuration files
- ".cargo/config.toml"
- "**/clippy.toml"
# workflow definitions
- "docker/**"
- ".dockerignore"
- ".github/workflows/cd-deploy-nodes-gcp.yml"
- ".github/workflows/sub-build-docker-image.yml"

# Only runs the Docker image tests, doesn't deploy any instances
pull_request:
# Skip PRs where Rust code and dependencies aren't modified.
paths:
# code and tests
- '**/*.rs'
- "**/*.rs"
# hard-coded checkpoints and proptest regressions
- '**/*.txt'
- "**/*.txt"
# dependencies
- '**/Cargo.toml'
- '**/Cargo.lock'
- "**/Cargo.toml"
- "**/Cargo.lock"
# configuration files
- '.cargo/config.toml'
- '**/clippy.toml'
- ".cargo/config.toml"
- "**/clippy.toml"
# workflow definitions
- 'docker/**'
- '.dockerignore'
- '.github/workflows/cd-deploy-nodes-gcp.yml'
- '.github/workflows/sub-build-docker-image.yml'
- "docker/**"
- ".dockerignore"
- ".github/workflows/cd-deploy-nodes-gcp.yml"
- ".github/workflows/sub-build-docker-image.yml"

release:
types:
Expand Down Expand Up @@ -160,6 +160,17 @@ jobs:
disk_suffix: ${{ inputs.cached_disk_type || 'tip' }}
prefer_main_cached_state: ${{ inputs.prefer_main_cached_state || (github.event_name == 'push' && github.ref_name == 'main' && true) || false }}

# Test that Zebra works using $ZEBRA_CONF_PATH config
test-zebra-conf-path:
name: Test CD custom Docker config file
needs: build
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: "custom-conf"
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
test_variables: '-e ZEBRA_CONF_PATH="zebrad/tests/common/configs/v2.1.0.toml"'
grep_patterns: '-e "loaded zebrad config.*config_path.*=.*v2.1.0.toml"'

# Each time this workflow is executed, a build will be triggered to create a new image
# with the corresponding tags using information from Git
#
Expand All @@ -174,6 +185,7 @@ jobs:
image_name: zebrad
no_cache: ${{ inputs.no_cache || false }}
rust_log: info
features: ${{ format('{0} {1}', vars.RUST_PROD_FEATURES, vars.RUST_TEST_FEATURES) }}
# This step needs access to Docker Hub secrets to run successfully
secrets: inherit

Expand All @@ -183,11 +195,9 @@ jobs:
needs: build
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'default-conf'
test_id: "default-conf"
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
grep_patterns: '-e "net.*=.*Main.*estimated progress to chain tip.*BeforeOverwinter"'
test_variables: '-e NETWORK'
network: 'Mainnet'

# Test reconfiguring the docker image for testnet.
test-configuration-file-testnet:
Expand All @@ -196,23 +206,10 @@ jobs:
# Make sure Zebra can sync the genesis block on testnet
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'testnet-conf'
test_id: "testnet-conf"
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
grep_patterns: '-e "net.*=.*Test.*estimated progress to chain tip.*Genesis" -e "net.*=.*Test.*estimated progress to chain tip.*BeforeOverwinter"'
test_variables: '-e NETWORK'
network: 'Testnet'

# Test that Zebra works using $ZEBRA_CONF_PATH config
test-zebra-conf-path:
name: Test CD custom Docker config file
needs: build
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'custom-conf'
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
grep_patterns: '-e "loaded zebrad config.*config_path.*=.*v1.0.0-rc.2.toml"'
test_variables: '-e NETWORK -e ZEBRA_CONF_PATH="zebrad/tests/common/configs/v1.0.0-rc.2.toml"'
network: ${{ inputs.network || vars.ZCASH_NETWORK }}
test_variables: "-e NETWORK=Testnet"

# Deploy Managed Instance Groups (MiGs) for Mainnet and Testnet,
# with one node in the configured GCP region.
Expand All @@ -234,15 +231,22 @@ jobs:
matrix:
network: [Mainnet, Testnet]
name: Deploy ${{ matrix.network }} nodes
needs: [ build, versioning, test-configuration-file, test-zebra-conf-path, get-disk-name ]
needs:
[
build,
versioning,
test-configuration-file,
test-zebra-conf-path,
get-disk-name,
]
runs-on: ubuntu-latest
timeout-minutes: 60
env:
CACHED_DISK_NAME: ${{ needs.get-disk-name.outputs.cached_disk_name }}
environment: ${{ github.event_name == 'release' && 'prod' || 'dev' }}
permissions:
contents: 'read'
id-token: 'write'
contents: "read"
id-token: "write"
if: ${{ !cancelled() && !failure() && ((github.event_name == 'push' && github.ref_name == 'main') || github.event_name == 'release') }}

steps:
Expand Down Expand Up @@ -271,8 +275,8 @@ jobs:
id: auth
uses: google-github-actions/[email protected]
with:
workload_identity_provider: '${{ vars.GCP_WIF }}'
service_account: '${{ vars.GCP_DEPLOYMENTS_SA }}'
workload_identity_provider: "${{ vars.GCP_WIF }}"
service_account: "${{ vars.GCP_DEPLOYMENTS_SA }}"

- name: Set up Cloud SDK
uses: google-github-actions/[email protected]
Expand Down Expand Up @@ -301,11 +305,11 @@ jobs:
--image-family=cos-stable \
--network-interface=subnet=${{ vars.GCP_SUBNETWORK }} \
--create-disk="${DISK_PARAMS}" \
--container-mount-disk=mount-path='/var/cache/zebrad-cache',name=${DISK_NAME},mode=rw \
--container-mount-disk=mount-path='/home/zebra/.cache/zebra',name=${DISK_NAME},mode=rw \
--container-stdin \
--container-tty \
--container-image ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }} \
--container-env "NETWORK=${{ matrix.network }},LOG_FILE=${{ vars.CD_LOG_FILE }},LOG_COLOR=false,SENTRY_DSN=${{ vars.SENTRY_DSN }}" \
--container-env "NETWORK=${{ matrix.network }},LOG_FILE=${{ vars.CD_LOG_FILE }},SENTRY_DSN=${{ vars.SENTRY_DSN }}" \
--service-account ${{ vars.GCP_DEPLOYMENTS_SA }} \
--scopes cloud-platform \
--metadata google-logging-enabled=true,google-logging-use-fluentbit=true,google-monitoring-enabled=true \
Expand Down Expand Up @@ -349,14 +353,14 @@ jobs:
# Note: this instances are not automatically replaced or deleted
deploy-instance:
name: Deploy single ${{ inputs.network }} instance
needs: [ build, test-configuration-file, test-zebra-conf-path, get-disk-name ]
needs: [build, test-configuration-file, test-zebra-conf-path, get-disk-name]
runs-on: ubuntu-latest
timeout-minutes: 30
env:
CACHED_DISK_NAME: ${{ needs.get-disk-name.outputs.cached_disk_name }}
permissions:
contents: 'read'
id-token: 'write'
contents: "read"
id-token: "write"
# Run even if we don't need a cached disk, but only when triggered by a workflow_dispatch
if: ${{ !failure() && github.event_name == 'workflow_dispatch' }}

Expand Down Expand Up @@ -386,8 +390,8 @@ jobs:
id: auth
uses: google-github-actions/[email protected]
with:
workload_identity_provider: '${{ vars.GCP_WIF }}'
service_account: '${{ vars.GCP_DEPLOYMENTS_SA }}'
workload_identity_provider: "${{ vars.GCP_WIF }}"
service_account: "${{ vars.GCP_DEPLOYMENTS_SA }}"

- name: Set up Cloud SDK
uses: google-github-actions/[email protected]
Expand All @@ -413,11 +417,11 @@ jobs:
--image-family=cos-stable \
--network-interface=subnet=${{ vars.GCP_SUBNETWORK }} \
--create-disk="${DISK_PARAMS}" \
--container-mount-disk=mount-path='/var/cache/zebrad-cache',name=${DISK_NAME},mode=rw \
--container-mount-disk=mount-path='/home/zebra/.cache/zebra',name=${DISK_NAME},mode=rw \
--container-stdin \
--container-tty \
--container-image ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }} \
--container-env "NETWORK=${{ inputs.network }},LOG_FILE=${{ inputs.log_file }},LOG_COLOR=false,SENTRY_DSN=${{ vars.SENTRY_DSN }}" \
--container-env "NETWORK=${{ inputs.network }},LOG_FILE=${{ inputs.log_file }},SENTRY_DSN=${{ vars.SENTRY_DSN }}" \
--service-account ${{ vars.GCP_DEPLOYMENTS_SA }} \
--scopes cloud-platform \
--metadata google-logging-enabled=true,google-monitoring-enabled=true \
Expand All @@ -428,7 +432,7 @@ jobs:
failure-issue:
name: Open or update issues for release failures
# When a new job is added to this workflow, add it to this list.
needs: [ versioning, build, deploy-nodes, deploy-instance ]
needs: [versioning, build, deploy-nodes, deploy-instance]
# Only open tickets for failed or cancelled jobs that are not coming from PRs.
# (PR statuses are already reported in the PR jobs list, and checked by GitHub's Merge Queue.)
if: (failure() && github.event.pull_request == null) || (cancelled() && github.event.pull_request == null)
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/ci-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ jobs:
rust_backtrace: full
rust_lib_backtrace: full
rust_log: info
features: ${{ format('{0} {1}', vars.RUST_PROD_FEATURES, vars.RUST_TEST_FEATURES) }}
# This step needs access to Docker Hub secrets to run successfully
secrets: inherit

Expand Down
24 changes: 8 additions & 16 deletions .github/workflows/sub-build-docker-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,9 @@ on:
rust_log:
required: false
type: string
# defaults to: vars.RUST_PROD_FEATURES
features:
required: false
type: string
# defaults to: vars.RUST_TEST_FEATURES (and entrypoint.sh adds vars.RUST_PROD_FEATURES)
test_features:
required: false
type: string
latest_tag:
required: false
type: boolean
Expand All @@ -48,20 +43,18 @@ on:
required: false
type: string
no_cache:
description: 'Disable the Docker cache for this build'
description: "Disable the Docker cache for this build"
required: false
type: boolean
default: false

outputs:
image_digest:
description: 'The image digest to be used on a caller workflow'
description: "The image digest to be used on a caller workflow"
value: ${{ jobs.build.outputs.image_digest }}


env:
FEATURES: ${{ inputs.features || vars.RUST_PROD_FEATURES }}
TEST_FEATURES: ${{ inputs.test_features || vars.RUST_TEST_FEATURES }}
FEATURES: ${{ inputs.features }}
RUST_LOG: ${{ inputs.rust_log || vars.RUST_LOG }}
CARGO_INCREMENTAL: ${{ vars.CARGO_INCREMENTAL }}

Expand All @@ -75,8 +68,8 @@ jobs:
image_digest: ${{ steps.docker_build.outputs.digest }}
image_name: ${{ fromJSON(steps.docker_build.outputs.metadata)['image.name'] }}
permissions:
contents: 'read'
id-token: 'write'
contents: "read"
id-token: "write"
pull-requests: write # for `docker-scout` to be able to write the comment
env:
DOCKER_BUILD_SUMMARY: ${{ vars.DOCKER_BUILD_SUMMARY }}
Expand Down Expand Up @@ -129,9 +122,9 @@ jobs:
id: auth
uses: google-github-actions/[email protected]
with:
workload_identity_provider: '${{ vars.GCP_WIF }}'
service_account: '${{ vars.GCP_ARTIFACTS_SA }}'
token_format: 'access_token'
workload_identity_provider: "${{ vars.GCP_WIF }}"
service_account: "${{ vars.GCP_ARTIFACTS_SA }}"
token_format: "access_token"
# Some builds might take over an hour, and Google's default lifetime duration for
# an access token is 1 hour (3600s). We increase this to 3 hours (10800s)
# as some builds take over an hour.
Expand Down Expand Up @@ -173,7 +166,6 @@ jobs:
SHORT_SHA=${{ env.GITHUB_SHA_SHORT }}
RUST_LOG=${{ env.RUST_LOG }}
FEATURES=${{ env.FEATURES }}
TEST_FEATURES=${{ env.TEST_FEATURES }}
push: true
# It's recommended to build images with max-level provenance attestations
# https://docs.docker.com/build/ci/github-actions/attestations/
Expand Down
Loading

0 comments on commit 4132c0e

Please sign in to comment.