Skip to content

Commit

Permalink
Merge pull request #16 from mlibrary/dwi-19-move-to-pickup
Browse files Browse the repository at this point in the history
DWI-19 move to pickup
  • Loading branch information
niquerio authored Nov 8, 2024
2 parents 0106f0c + cfb2275 commit 5efb04f
Show file tree
Hide file tree
Showing 20 changed files with 434 additions and 125 deletions.
16 changes: 16 additions & 0 deletions .config/rclone/rclone.conf.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[digifeeds_gdrive]
type = drive
client_id = YOUR_CLIENT_ID
scope = drive
service_account_file = /conf/digifeeds_gdrive_credentials.json
root_folder_id = YOUR_ROOT_FOLDER_ID

[digifeeds_s3]
type = s3
provider = AWS
access_key_id = YOUR_ACCESS_KEY
secret_access_key = YOUR_SECRET_ACCESS_KEY

[digifeeds_bucket]
type = alias
remote = digifeeds_s3:YOUR_BUCKET_NAME
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,6 @@ htmlcov/
requirements.txt

docs/_build
bin/digifeeds/*.config
bin/digifeeds/*.config
.config/rclone/rclone.conf
.config/rclone/*.json
37 changes: 19 additions & 18 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ RUN apt-get update -yqq && apt-get install -yqq --no-install-recommends \
build-essential \
pkg-config \
default-mysql-client \
rclone\
vim-tiny

# Set the working directory to /app
Expand All @@ -46,17 +47,17 @@ RUN pip install poetry==${POETRY_VERSION}
# Use this page as a reference for python and poetry environment variables: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUNBUFFERED
# Ensure the stdout and stderr streams are sent straight to terminal, then you can see the output of your application
ENV PYTHONUNBUFFERED=1\
# Avoid the generation of .pyc files during package install
# Disable pip's cache, then reduce the size of the image
PIP_NO_CACHE_DIR=off \
# Save runtime because it is not look for updating pip version
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
# Disable poetry interaction
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache
# Avoid the generation of .pyc files during package install
# Disable pip's cache, then reduce the size of the image
PIP_NO_CACHE_DIR=off \
# Save runtime because it is not look for updating pip version
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
# Disable poetry interaction
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache

FROM poetry AS build
# Just copy the files needed to install the dependencies
Expand All @@ -68,13 +69,13 @@ RUN poetry export --without dev -f requirements.txt --output requirements.txt
# We want poetry on in development
FROM poetry AS development
RUN apt-get update -yqq && apt-get install -yqq --no-install-recommends \
git \
bats \
bats-assert \
bats-file\
wget\
zip\
unzip
git \
bats \
bats-assert \
bats-file\
wget\
zip\
unzip

RUN wget -P /opt/ https://github.com/boschresearch/shellmock/releases/download/0.9.1/shellmock.bash && \
chown ${UID}:${GID} /opt/shellmock.bash
Expand Down
99 changes: 60 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,48 @@
# AIM-py
# AIM-py

AIM's python code repository

## Setup

1. Clone the repo

```
git clone https://github.com/mlibrary/aim-py.git
cd aim-py
```
```bash
git clone https://github.com/mlibrary/aim-py.git
cd aim-py
```

2. In the terminal, run the `init.sh`
```
./init.sh
```
This will:
2. In the terminal, run the `init.sh`

* set up the initial environment variables file
* build the docker image
* install the python dependencies
* Set up the database for digifeeds
```bash
./init.sh
```

`./init.sh` can be run more than once.
This will:
* set up the initial environment variables file
* set up the rclone config with the example file
* build the docker image
* install the python dependencies
* Set up the database for digifeeds

`./init.sh` can be run more than once.

3. Edit `.env` with actual environment variables

4. If using VSCode for editing, the repository is set up for use with dev containers. You will have to rebuild the container in there.
4. Edit `.config/rclone/rclone.conf` with your actual values

5. In the app container, use `poetry shell` to enable the virtual environment. Otherwise use:
```
5. If using VSCode for editing, the repository is set up for use with dev containers. You will have to rebuild the container in there.

6. In the app container, use `poetry shell` to enable the virtual environment. Otherwise use:

```bash
docker compose run --rm app poetry run YOUR_COMMAND
```

## Structure

The codebase has the following structure high level structure:
```

```text
.
├── aim/
│ ├── cli/
Expand All @@ -61,101 +69,114 @@ The codebase has the following structure high level structure:
└── conftest.py
```

`aim` is the directory where all of the business logic lives. Every directory and subdirectory within `aim` has an `__init__.py` file so that python imports work properly.
`aim` is the directory where all of the business logic lives. Every directory and subdirectory within `aim` has an `__init__.py` file so that python imports work properly.

In this example there is an application/product/project called `my_project`. `my_project` has several subdirectories and files with related code.
In this example there is an application/product/project called `my_project`. `my_project` has several subdirectories and files with related code.

One is `aim/my_project`. That's where the application code lives. Code within `my_project` can be arranged however makes sense for the project. Further subjectories within `my_project` are fine if they help make the project code easier to think about and work with.
One is `aim/my_project`. That's where the application code lives. Code within `my_project` can be arranged however makes sense for the project. Further subjectories within `my_project` are fine if they help make the project code easier to think about and work with.
If the application is a subcommand for the AIM cli then code related to the cli goes in `cli/my_project`.
If the application is a subcommand for the AIM cli then code related to the cli goes in `cli/my_project`.
`aim/services.py` has configuration objects. It provides an object called `S` that has things like environment variable values or database connection strings. Configuration for all projects goes in here. Use `my_project_` as a prefix if there are concerns about name collisions. `S` is a `NamedTuple` so that these objects show up in code completion in IDEs like vscode.
Tests go in `tets/my_project`. Ideally the folder and file structure in `tests/my_project` should mirror `aim/my_project`. This way it's easy to figure out where relevant tests should live. Prefix tests files with `test_` so that `pytest` picks up the tests. If there are fixture files for your tests, put them in `fixtures/my_project`. This should make it easier to tell what tests the fixtures are for. As with the code in `aim` every folder in tests except for `fixtures` needs to have an `__init__.py` file.

`tests/conftest.py` has test configuration that's available for all tests for all projects. For now it has code to handle setting up the `digifeeds` database and its API for tests.

`tests/conftest.py` has test configuration that's available for all tests for all projects. For now it has code to handle setting up the `digifeeds` database and its API for tests.
## Projects
### Digifeeds
Digifeeds code is in the `aim/digifeeds` folder. The `database` folder has the code for the database and its web API.
Digifeeds code is in the `aim/digifeeds` folder. The `database` folder has the code for the database and its web API.
#### Database
To run database migrations:
```
```bash
cd aim/digifeeds/database
alembic upgrade heads
```
The alembic migrations live in the `aim/digifeeds/database/migrations` folder.
#### Web API for the Database
The docker compose `api` service runs the application on port 8000.
Assuming docker compose is up for the `aim-py` repository, in the browser go to:
http://localhost:8000/docs to work with the API.
<http://localhost:8000/docs> to work with the API.
#### CLI
The digifeeds CLI is in the file `aim/cli/digifeeds.py` It has a mix a database
operations and application operations.
To use the CLI, on the command line run:
```
```bash
docker compose run --rm app poetry run aim digifeeds --help
```
This will show the commands available for the digifeeds cli applciation.
## Tests
To run tests:
```
```bash
docker compose run --rm app poetry run pytest
```
### Connecting to the internet is blocked for tests
We are using `pytest-socket` to block actually http requests in tests.
To mock http requests, use the `responses` library. Don't forget to put the `@responses.activate` decorator above tests that use `responses`.

Blocking requests occurs because in `pyproject.toml` we've set `pytest` to run with the `--disable-socket` option. The `--allow-unix-socket` option allows connection to our test database.
Blocking requests occurs because in `pyproject.toml` we've set `pytest` to run with the `--disable-socket` option. The `--allow-unix-socket` option allows connection to our test database.
### Mocking objects
`pytest-mock` is included in the project, so the `mocker` fixture is available in all tests.
### Test Coverage
`pytest-cov` is used for test coverage information. On every run of `pytest` there's a summary of coverage in the terminal, and an html report in the folder `htmlcov`. This is configured with the following `pytest` options in `pyproject.toml`: `--cov=aim --cov-report=html --cov-report=term:skip-covered`

### Using the `digifeeds` database
`tests/conftest.py` sets up a couple of `pytest` fixtures for working with the `digifeeds` database.

One is `db_session` which provides a `sqlalchemy` database session object. You can commit changes in the session and they will only last for the duration of thests.
`tests/conftest.py` sets up a couple of `pytest` fixtures for working with the `digifeeds` database.

The other is `client`, which provides a `fastapi` `TestClient` that knows about the `db_session` fixture.
One is `db_session` which provides a `sqlalchemy` database session object. You can commit changes in the session and they will only last for the duration of thests.

The other is `client`, which provides a `fastapi` `TestClient` that knows about the `db_session` fixture.

### CLI tests
The `typer` `CliRunner` works without special modification. This is a good place to put in some integration tests since this is the entrypoint for using the application. That said, it's ok to mock out things like database calls.

The `typer` `CliRunner` works without special modification. This is a good place to put in some integration tests since this is the entrypoint for using the application. That said, it's ok to mock out things like database calls.
## Documentation
Documentation lives in the `/docs` directory.
Documentation lives in the `/docs` directory.
[Sphinx](https://www.sphinx-doc.org) is used to generate the documentation website.
[Sphinx](https://www.sphinx-doc.org) is used to generate the documentation website.
The [documentation site](https://mlibrary.github.io/aim-py/) is built with a Github Action on each push to main.
We are using [Google style docstrings](https://google.github.io/styleguide/pyguide.html#s3.8-comments-and-docstrings).
In development the documentation should build automatically and be available at http://localhost:8888/
In development the documentation should build automatically and be available at <http://localhost:8888/>
## Deployment
### Production Docker image
The production Docker image of `aim-py` uses `poetry` to generate a `requirements.txt` file of the dependencies necessary to run the application in produciton. In a separate step that `requirements.txt` file is copied into the container and then installed with `pip`.
That means the project is used differently in production than in development. In development you need to run `poetry shell` to get enable the virtual environment. If you have the virtual environment activated you can run commands like `aim --help` because `pyproject.toml` knows about the `aim` cli.
In production, you do not need to enable a virtual environment because all of the dependencies are installed globally in the image. To run the cli you need to run `python -m aim --help` to get the same help menu.
In production, you do not need to enable a virtual environment because all of the dependencies are installed globally in the image. To run the cli you need to run `python -m aim --help` to get the same help menu.
### Github Actions Workflows
Expand Down
21 changes: 21 additions & 0 deletions aim/cli/digifeeds.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from aim.digifeeds.add_to_db import add_to_db as add_to_digifeeds_db
from aim.digifeeds.list_barcodes_in_bucket import list_barcodes_in_bucket
from aim.digifeeds.check_zephir import check_zephir as check_zephir_for_barcode
from aim.digifeeds.move_to_pickup import move_to_pickup as move_volume_to_pickup
from aim.digifeeds.database import models, main
import json
import sys
Expand Down Expand Up @@ -78,3 +79,23 @@ def list_barcodes_in_input_bucket():
List the barcodes currently in the input directory in the S3 bucket.
"""
json.dump(list_barcodes_in_bucket(), sys.stdout)


@app.command()
def move_to_pickup(
barcode: Annotated[
str,
typer.Argument(help="The barcode to be added to the database"),
],
):
"""
Moves the zipped volume from the s3 bucket to the google drive folder for
pickup from google. When it's finished, the volume is moved to the processed
folder in the bucket and prefixed with the date and time.
"""
print(f'Moving barcode "{barcode}" from the s3 bucket to the google drive')
item = move_volume_to_pickup(barcode)
if item is None:
print("Item has not been in zephir long enough")
else:
print("Item has been successfully moved to pickup")
Loading

0 comments on commit 5efb04f

Please sign in to comment.