Using Airflow to implement our ETL pipelines.
There are several tools available to create a virtual environment in Python.
Below are the steps to manage a virtual environment using venv
:
-
Create a Virtual Environment
To create a virtual environment, run the following command:
python -m venv venv
In this example,
venv
is the name of the virtual environment directory, but you can replace it with any name you prefer. -
Activate the Virtual Environment
After creating the virtual environment, activate it using the following command:
source venv/bin/activate
-
Install Dependencies
After activating the virtual environment, you can install the required dependencies:
# Install airflow and dev dependencies pip install -r requirements.txt -r requirements-dev.txt -c constraints-3.8.txt # black is conflict with click, so install it separately pip install black==19.10b0 click==7.1.2
-
Deactivate the Virtual Environment
When you're done working in the virtual environment, you can deactivate it with:
deactivate
-
For development or testing, run
cp .env.template .env.staging
. For production, runcp .env.template .env.production
. -
Follow the instructions in
.env.<staging|production>
and fill in your secrets. If you are running the staging instance for development as a sandbox and do not need to access any specific third-party services, leaving.env.staging
as-is should be fine.
Contact the maintainer if you don't have these secrets.
⚠ WARNING: About .env Please do not use the .env file for local development, as it might affect the production tables.
Set up the Authentication for GCP: https://googleapis.dev/python/google-api-core/latest/auth.html
*After running gcloud auth application-default login
, you will get a credentials.json file located at $HOME/.config/gcloud/application_default_credentials.json
. Run export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
if you have it.
* service-account.json: Please contact @david30907d via email or Discord. You do not need this json file if you are running the sandbox staging instance for development.
If you are a developer 👨💻, please check the Contributing Guide.
If you are a maintainer 👨🔧, please check the Maintenance Guide.
For development/testing:
# Build the local dev/test image
make build-dev
# Start dev/test services
make deploy-dev
# Stop dev/test services
make down-dev
The difference between production and dev/test compose files is that the dev/test compose file uses a locally built image, while the production compose file uses the image from Docker Hub.
If you are a authorized maintainer, you can pull the image from the GCP Artifact Registry.
Docker client must be configured to use the GCP Artifact Registry.
gcloud auth configure-docker asia-east1-docker.pkg.dev
Then, pull the image:
docker pull asia-east1-docker.pkg.dev/pycontw-225217/data-team/pycon-etl:{tag}
There are several tags available:
cache
: cache the image for faster deploymenttest
: for testing purposes, including the test dependenciesstaging
: when pushing to the staging environmentlatest
: when pushing to the production environment
Please check the Production Deployment Guide.