Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOROP-19 "mockyrie" - sqlalchemy and Data Mapper patterns #5

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions experiments/DOROP-19/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# PYTHON image
# Use the official Docker Python image because it has the absolute latest bugfix version of Python
# it has the absolute latest system packages
# it’s based on Debian Bookworm (Debian 12), released June 2023
# Initial Image size is 51MB
# At the end Image size is 156MB

# I did not recommed using an alpine image because it lacks the package installer pip and the support for installing
# wheel packages, which are both needed for installing applications like Pandas and Numpy.

# The base layer will contain the dependencies shared by the other layers
FROM python:3.11-slim-bookworm AS base

# Allowing the argumenets to be read into the dockerfile. Ex: .env > compose.yml > Dockerfile
ARG POETRY_VERSION=1.8.4
ARG UID=1000
ARG GID=1000

ENV PYTHONPATH="/app"

# Create our users here in the last layer or else it will be lost in the previous discarded layers
# Create a system group named "app_user" with the -r flag
RUN groupadd -g ${GID} -o app
RUN useradd -m -d /app -u ${UID} -g ${GID} -o -s /bin/bash app

RUN apt-get update -yqq && apt-get install -yqq --no-install-recommends \
python3-dev \
build-essential \
pkg-config \
vim-tiny

# Set the working directory to /app
WORKDIR /app

ENV PYTHONPATH="/app"

CMD ["tail", "-f", "/dev/null"]

# Both build and development need poetry, so it is its own step.
FROM base AS poetry

RUN pip install poetry==${POETRY_VERSION}

# Use this page as a reference for python and poetry environment variables: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUNBUFFERED
# Ensure the stdout and stderr streams are sent straight to terminal, then you can see the output of your application
ENV PYTHONUNBUFFERED=1\
# Avoid the generation of .pyc files during package install
# Disable pip's cache, then reduce the size of the image
PIP_NO_CACHE_DIR=off \
# Save runtime because it is not look for updating pip version
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
# Disable poetry interaction
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=1 \
POETRY_VIRTUALENVS_IN_PROJECT=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache

FROM poetry AS build
# Just copy the files needed to install the dependencies
COPY pyproject.toml poetry.lock README.md ./

#Use poetry to create a requirements.txt file. Dont include development dependencies
RUN poetry export --without dev -f requirements.txt --output requirements.txt

# We want poetry on in development
FROM poetry AS development
RUN apt-get update -yqq && apt-get install -yqq --no-install-recommends \
git libpq-dev

# Switch to the non-root user "user"
USER app

# RUN poetry install

# We don't want poetry on in production, so we copy the needed files form the build stage
FROM base AS production
# Switch to the non-root user "user"
# RUN mkdir -p /venv && chown ${UID}:${GID} /venv

RUN ls -l /app

COPY --chown=${UID}:${GID} . /app
COPY --chown=${UID}:${GID} --from=build "/app/requirements.txt" /app/requirements.txt

RUN pip install -r /app/requirements.txt

USER app
160 changes: 160 additions & 0 deletions experiments/DOROP-19/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# DOROP-19 - experimenting with database persistence

**What is this?** A tiny reimplementation of [Valkyrie](https://github.com/samvera/valkyrie/wiki/Dive-into-Valkyrie) built atop [SQLAlchemy](https://www.sqlalchemy.org/) and PostgreSQL [JSONB](https://www.postgresql.org/docs/current/datatype-json.html).

**Why?** This is a demonstration of a Data Mapper pattern. The types of digital objects are represented by [dataclass](https://docs.python.org/3/library/dataclasses.html) classes and are synchronized to the database via a metadata adapter.

*This is not a recommendation for how DOR should be implemented* although I'll confess it just works for me.

This experiment follows Valkyrie naming conventions (more or less), although that does lead to some confusion (the data classes are *resources* but the database table is also a *resource*...)

It's also possibly harder to see how modern SQLAlchemy is different than the usual Active Record pattern. There is a more [explicit mapper pattern](https://docs.sqlalchemy.org/en/13/orm/mapping_styles.html#classical-mappings) available.

## Setup

You've cloned the `dor-py` repo, fetched this branch, and in the terminal, run `init.sh`

```sh
$ cd ./experiments/DOROP-19
$ ./init.sh
```

This will:

* set up the initial environment variables file
* build the docker image
* install the python dependencies
* create the development database

## Quick Overview

```sh
# connect to the app container
$ docker compose run --rm app bash

# set up the database tables
$$ poetry run dor mockyrie setup

# save a monograph; this will also create member assets
$$ poetry run dor mockyrie save-monograph --num-assets 5

# save a monograph with an alernate identifier; we're not doing
# any checks for the uniqness of alternative identifiers
$$ poetry run dor mockyrie save-monograph --alternate-identifier dlxs:bhl:101 --num-assets 5

# list all resources
$$ poetry run dor mockyrie find-all
<[1] Asset : Compare walk president hear eat.>
<[2] Asset : Up prove fill minute everything affect sea.>
<[3] Asset : Purpose simply bed stay baby human rise.>
<[4] Asset : News free safe relationship discuss last activity share.>
<[5] Asset : Protect know image however police.>
<[6] {dlxs:bhl:102} Monograph : Most something imagine hope kind imagine military.>

# list the members of a monograph
$$ poetry run dor mockyrie find-members 6
∆ find_members <[6] {dlxs:bhl:102} Monograph : Most something imagine hope kind imagine military.>
- <[1] Asset : Compare walk president hear eat.>
- <[2] Asset : Up prove fill minute everything affect sea.>
- <[3] Asset : Purpose simply bed stay baby human rise.>
- <[4] Asset : News free safe relationship discuss last activity share.>
- <[5] Asset : Protect know image however police.>

# set an alternate identifier
$$ poetry run dor mockyrie set-alternate-identifier <id> --alternate-identifier <alt-id>

# update a monograph
$$ poetry run dor mockyrie update-monograph 6 --key lang --value en-AU

# dump a resource
$$ poetry run dor mockyrie dump-resource 6
∆ dump_resource <[6] {dlxs:bhl:102} Monograph : Most something imagine hope kind imagine military.>
{
"id": 6,
"created_at": "2024-11-05T15:08:44.690644Z",
"updated_at": "2024-11-05T15:08:44.690644Z",
"metadata": {
"common": {
"lang": "en-AU",
"title": "Most something imagine hope kind imagine military."
}
...
}
```

## How this works

```
dor
├── cli
│   ├── __init__.py
│   ├── main.py
│   └── mockyrie.py # commands
├── __init__.py
├── mockyrie
│   ├── models.py # domain models
│   └── persistence # database/repository layer
│   ├── __init__.py
│   ├── metadata_adapter.py # the metadata adapter is the main repository interface
│   ├── persister.py # database CRUD (only create/update is impelemented)
│   ├── query_service.py # database query
│   ├── resource_factory.py # factory to convert between domain/repository objects
│   └── resource.py # the sqlalchemy model
└── settings.py
```

How does this get used? A `MetadataAdapter` is instantiated with a reference to the SQLAlchemy session:

```python
# instantiate the adatper
adapter = MetadataAdapter(session=get_session())

# use the query service
resource = adapter.query_service.find_by(id)

# query the data in the JSONB column
resource = adapter.query_service.find_by_alternate_identifier(alternative_id)

# use the persister
resource = adapter.persister.save(resource=resource)

# the resource factory transforms domain objects into ORM objects
# (from_resource) or the reverse (to_resource)
# e.g.
stmt = select(Resource).where(Resource.id==id)
row = self.adapter.session.execute(stmt).one()[0]
return self.adapter.resource_factory.to_resource(row)

# domain objects are persisted serialized as vanilla Python
# structures before being persisted by SQLAlchemy in the JSONB column:
data = TypeAdapter(resource.__class__).dump_python(resource)
orm_object.data = data

# in the reverse case, the domain object can be instantiated
# using the parsed JSON structure:
resource = cls(**( orm_object.data )
```

## Questions/Next Steps

- [ ] is this a useful approach for DOR?

- [ ] How are relationship modeled? Both in the domain and storage?

- [ ] `mockyrie.persistence.resource_factory.ResourceFactory` is a simplified version [Vaklyrie::Persistence::Postgres::ResourceConvert](https://github.com/samvera/valkyrie/blob/v2.0.0/lib/valkyrie/persistence/postgres/resource_factory.rb) and [Vaklyrie::Persistence::Postgres::ResourceConvert](https://github.com/samvera/valkyrie/blob/v2.0.0/lib/valkyrie/persistence/postgres/resource_converter.rb); the Valyrie versions do some extra work to generate instances vs. plain Ruby hashes/arrays when thawing JSONB

## Regrets

**Is this an abuse of `dataclass`?** I couldn't find a definitive answer and the pattern was a blessed relief from boilerplate

```perl
sub new {
my $class = shift;
my $self = { @_ };
bless $self, $class;
$self->initialize;
return $self;
}
```

**Is this the best way to pass the SQLAlchemy session around?** Valkyrie --- in the end --- is backed by Rails and `ActiveRecord` which means it has magical access to database sessions.
33 changes: 33 additions & 0 deletions experiments/DOROP-19/compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
services:
app:
depends_on:
- db
build:
context: .
target: development
dockerfile: Dockerfile
args:
UID: ${UID:-1000}
GID: ${GID:-1000}
DEV: ${DEV:-false}
POETRY_VERSION: ${POETRY_VERSION:-1.8.4}
env_file:
- .env
volumes:
- .:/app
- ../../etc:/app/etc
- ../../templates:/app/templates
tty: true
stdin_open: true
db:
ports:
- "5432:5432"
image: postgres:17-alpine
environment:
- POSTGRES_PASSWORD=postgres
- PGDATA=/var/lib/postgresql/data/db
volumes:
- db-data:/var/lib/postgresql/data
- ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
volumes:
db-data:
Empty file added experiments/DOROP-19/db/.keep
Empty file.
Empty file.
Empty file.
9 changes: 9 additions & 0 deletions experiments/DOROP-19/dor/cli/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import typer
import dor.cli.mockyrie as mockyrie

app = typer.Typer()
app.add_typer(mockyrie.app, name="mockyrie")


if __name__ == "__main__": # pragma: no cover
app()
Loading