Please note: this is (and will most likely remain) a work in progress.
This is a Docker image based on Ubuntu 22.04 (Jammy) and intended for troubleshooting pods and container deployments at Soda.
Two tags will be available through Docker Hub:
- sodadata/soda-debug:slim
- sodadata/soda-debug:full
latest should point to the slim one. Do note that even the slim image is considerable in size (~2G).
Builds are created for amd64 only.
Manual build:
docker buildx build --platform=linux/amd64 --no-cache \
-f ./Dockerfile -t sodadata/soda-debug:slim .
Or for additional AWS, Azure and Google Cloud CLI tools:
docker buildx build --platform=linux/amd64 --build-arg BUILD_TYPE=FULL --no-cache \
-f ./Dockerfile -t sodadata/soda-debug:full .
Some examples:
Start the pod:
> kubectl run soda-debug -it --image=sodadata/soda-debug -n soda-agent -- bash
Please note, when running on EKS Fargate:
- add this flag to the above command:
-l eks.amazonaws.com/fargate-profile=soda-agent-profile
- it could take a few minutes before the pod get scheduled and created, so if you see an error like "error: timed out waiting for the condition", it could just mean it takes a few more minutes for the pod to become available.
- since the first command timed out, once the pod is running you will have to run
kubectl exec -it pod/soda-debug -n soda-agent -- bash
to get a shell
Now you can run commands like:
Ping an address and see if it resolves
> root@soda-debug:/app# ping some-service
PING some-service.soda-agent.svc.cluster.local (1.2.3.4) 56(84) bytes of data.
See if Soda Cloud is accessible
> root@soda-debug:/app# curl https://cloud.soda.io
Execute a Soda scan
Execute a Soda scan gainst a warehouse (you need a config for the warehouse and checks to to that, see ...), for example like:
> root@soda-debug:/app# soda
Usage: soda [OPTIONS] COMMAND [ARGS]...
Soda Core CLI version 3.0.7
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
ingest Ingest test information from different tools
scan runs a scan
update-dro updates a DRO in the distribution reference file
Copy a local file to the running pod
From your machine, you can copy your config file onto the running container (pod) and execute an actual scan.
kubectl cp ~/yourlocation/scandefinition.zip soda-debug:/app -n soda-agent
Now you can run the scan from the Bash session in the running soda-debug container. To be executed in the pod, for example:
> root@soda-debug:/app# unzip scandefinition.zip
> root@soda-debug:/app# soda scan -d warehouse_name -c configuration.yml checks.yml
Soda packages
Check which Soda packages are installed on the debug image (check if your warehouse is supported):
> root@soda-debug:/app# pip list | grep -i soda
soda-core 3.0.7
soda-core-athena 3.0.7
soda-core-bigquery 3.0.7
soda-core-db2 3.0.7
soda-core-mysql 3.0.7
soda-core-postgres 3.0.7
soda-core-redshift 3.0.7
soda-core-scientific 3.0.7
soda-core-snowflake 3.0.7
soda-core-sqlserver 3.0.7
soda-core-trino 3.0.7
Delete the running pod
> kubectl delete pod soda-debug -n soda-agent
Instead of launching a pod and running commands in a shell or against the running pod, you can also issue one off commands (the --rm
flag lets Kubernetes remove up the pod once it finishes).
Please note, when running on EKS Fargate add this flag to the above command: -l eks.amazonaws.com/fargate-profile=soda-agent-profile
Examples:
> kubectl run soda-debug -it --rm --image=sodadata/soda-debug \
-n soda-agent -- pip list | grep -i soda
> kubectl run soda-debug -it --rm --image=sodadata/soda-debug
-n soda-agent -- curl https://cloud.soda.io
> kubectl run soda-debug -it --rm --image=sodadata/soda-debug \
-n soda-agent -- ping some-service
(you need to hit CTRL-C for that last one to show up a result)
You need to use the full image for this:
sodadata/soda-debug:full
Run
aws configure
See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
Run
read -sp "Azure password: " AZ_PASS && echo && az login -u <username> -p $AZ_PASS
See https://docs.microsoft.com/en-us/cli/azure/authenticate-azure-cli for alternatives.
Run
gcloud init
See https://cloud.google.com/sdk/gcloud/reference/auth/login
- create pipeline to build and push to Docker Hub