Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Mx ocrmypdf #690

Open
wants to merge 57 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
7ccf228
Switch to ocrmypdf
mimimi1968 Jun 24, 2020
b775152
Introduce 'move' feature in infolder
mimimi1968 Jun 29, 2020
0be5e8a
Exclude ./consume dir
mimimi1968 Jun 29, 2020
ede0f87
Optimize order in Dockerfile
mimimi1968 Jun 29, 2020
25a0982
Catch the exception when Ocrmypdf dies
mimimi1968 Jun 30, 2020
3d89db0
Reworked the consume mechanism to work better
mimimi1968 Jun 30, 2020
ad34445
Made LANGUAGE configurable
mimimi1968 Jul 2, 2020
7caeafa
Just corrected the path for static/
mimimi1968 Jul 13, 2020
10e5cbc
Include some management actions into GUI
mimimi1968 Jul 13, 2020
211d15e
Renamed gunicorn.conf to remove warning
mimimi1968 Jul 13, 2020
3cedc37
Updated Django to 2.2 and fixed the deps
mimimi1968 Jul 13, 2020
686b2c8
Started i18n localization
mimimi1968 Jul 13, 2020
c26f6ab
Filter for "added" field too
mimimi1968 Jul 15, 2020
8db0e7c
Remove hack for date input
mimimi1968 Jul 15, 2020
a01c9d9
Corrected the Dockerfile after squashing the commits
mimimi1968 Jul 16, 2020
9a826e7
Upgraded to alpine 3.12 - it just worked
mimimi1968 Jul 16, 2020
6cc9917
Fixed the tests to pass at least
mimimi1968 Jul 16, 2020
fba442b
Finished the i18n for admin interface
mimimi1968 Jul 17, 2020
24648f8
Added the forgotten package dep
mimimi1968 Jul 17, 2020
ddff376
Added jquery.min.js so the colouring works now
mimimi1968 Jul 17, 2020
7387257
Changed some translated words to be consistent
mimimi1968 Jul 17, 2020
220a6a7
Rollback to alpine 3.10 for tesseract not working in 3.12
mimimi1968 Jul 20, 2020
d0530a1
Raised the worker timeout
mimimi1968 Jul 20, 2020
d60b94b
Fixed the whitespace handling and small improvement for get_text
mimimi1968 Jul 22, 2020
5785416
Better handling of change detection when scanning /consume
mimimi1968 Jul 24, 2020
01332c2
Add a page count to gui and business logic
mimimi1968 Jul 31, 2020
c277c57
Changed the 'created' Field to DateField for
mimimi1968 Jul 31, 2020
b36128e
Display the version information in footer
mimimi1968 Jul 31, 2020
633d194
Ignore further changes of version.txt
mimimi1968 Jul 31, 2020
18b50c1
Silence pycodestyle errors and warnings
mimimi1968 Jul 31, 2020
d06d08c
Instruct ocrmypdf to rotate when needed
mimimi1968 Jul 31, 2020
dd574c4
Enable forcing the conversion again
mimimi1968 Jul 31, 2020
07e8325
Update to Alpine 3.11
mimimi1968 Aug 5, 2020
e6efbd1
Move transaction to _store function
mimimi1968 Aug 5, 2020
98fd129
Silence optipng output
mimimi1968 Aug 5, 2020
ed350b6
Make ocrmypdf more robust and debug friendly
mimimi1968 Aug 5, 2020
753ecfd
Remove build for python 3.5 and add 3.8
mimimi1968 Aug 7, 2020
d951ae1
Just added some .keep files
mimimi1968 Aug 8, 2020
8fa4a3c
Keep the fiel metadata when exporting/importing
mimimi1968 Aug 8, 2020
b163c96
Added the field "pages" to the serialiser code
mimimi1968 Aug 8, 2020
04d7fea
Removed the misleading error message for now
mimimi1968 Aug 8, 2020
9e8e9a1
Refactored the directory/file handling
mimimi1968 Aug 13, 2020
28a1f52
Create directories in MEDIAROOT when we save a new document
mimimi1968 Aug 13, 2020
2b0db66
Updated the packages
mimimi1968 Aug 14, 2020
03e38d5
Support for mysql as database
mimimi1968 Sep 1, 2020
79c7778
Update packages because of factoryboy not running
mimimi1968 Sep 1, 2020
c7305c9
Fixed an i18n issue in change form
mimimi1968 Oct 4, 2020
75ab1a1
Introduced an upload facility for documents
mimimi1968 Nov 27, 2020
cc3f00b
Silence pycodestyle warnings
mimimi1968 Nov 27, 2020
6275aff
Fix some C-style format strings
mimimi1968 Feb 13, 2021
6737560
Add some built-time deps for Pillow package
mimimi1968 Feb 13, 2021
55f5fd3
Fixes again for the last 2 commits
mimimi1968 Feb 13, 2021
4cb1047
Switch over to a binary dependency on ocrmypdf
mimimi1968 Apr 2, 2021
d182fb1
Add /usr/local/bin to find ocrmypdf in path
mimimi1968 Apr 2, 2021
a01c5fe
Silence a db warning for mysql backend
mimimi1968 Apr 2, 2021
0d2570e
Optimized Dockerfile for size of the image
mimimi1968 Apr 6, 2021
1868fff
Lock the dependencies for now
mimimi1968 Apr 6, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,13 @@ scripts/import-for-development
scripts/nuke

# Static files collected by the collectstatic command
./static/
static/

.DS*

# consume
consume/

src/version.txt

export/
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ sudo: false

matrix:
include:
- python: "3.5"
- python: "3.6"
- python: "3.7-dev"
- python: "3.8"
- env:
- BUILD_DOCKER=1
# Variable to add to publish the Docker image:
Expand Down
138 changes: 83 additions & 55 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,67 +1,87 @@
FROM alpine:3.11
FROM ubuntu:20.04 as builder

LABEL maintainer="The Paperless Project https://github.com/the-paperless-project/paperless" \
contributors="Guy Addadi <[email protected]>, Pit Kleyersburg <[email protected]>, \
Sven Fischer <[email protected]>"
ENV LANG=C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive

# Copy Pipfiles file, init script and gunicorn.conf
COPY Pipfile* /usr/src/paperless/
COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
COPY scripts/gunicorn.conf /usr/src/paperless/
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
build-essential autoconf automake libtool \
python3 \
python3-venv \
python3-setuptools \
python3-wheel \
pipenv \
python3-pip

# Get build deps
RUN apt-get install -y --no-install-recommends \
libpython3.8-dev \
libpq-dev \
libmariadb-dev \
libpoppler-cpp-dev \
libxslt-dev \
libxml2-dev

# get dependencies
WORKDIR /usr/src/paperless
COPY Pipfile* .
RUN pipenv lock --keep-outdated --requirements > requirements.txt

ENV VIRTUAL_ENV=/usr/src/paperless
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip3 install -r requirements.txt

FROM jbarlow83/ocrmypdf

ENV LANG=C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
python3.8 \
python3-venv \
gnupg \
imagemagick \
optipng \
libpoppler-cpp0v5 \
sudo \
gettext \
mariadb-client \
libmariadb3 \
libmagic1 \
curl && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/*

# Set export and consumption directories
ENV PAPERLESS_EXPORT_DIR=/export \
PAPERLESS_CONSUMPTION_DIR=/consume

RUN apk add --no-cache \
bash \
curl \
ghostscript \
gnupg \
imagemagick \
libmagic \
libpq \
optipng \
poppler \
python3 \
shadow \
sudo \
tesseract-ocr \
tzdata \
unpaper && \
apk add --no-cache --virtual .build-dependencies \
g++ \
gcc \
jpeg-dev \
musl-dev \
poppler-dev \
postgresql-dev \
python3-dev \
zlib-dev && \
# Install python dependencies
python3 -m ensurepip && \
rm -r /usr/lib/python*/ensurepip && \
cd /usr/src/paperless && \
pip3 install --upgrade pip pipenv && \
pipenv install --system --deploy && \
# Remove build dependencies
apk del .build-dependencies && \
# Create the consumption directory
mkdir -p $PAPERLESS_CONSUMPTION_DIR && \
# Create user
addgroup -g 1000 paperless && \
adduser -D -u 1000 -G paperless -h /usr/src/paperless paperless && \
chown -Rh paperless:paperless /usr/src/paperless && \
mkdir -p $PAPERLESS_EXPORT_DIR && \
# Avoid setrlimit warnings
# See: https://gitlab.alpinelinux.org/alpine/aports/issues/11122
echo 'Set disable_coredump false' >> /etc/sudo.conf && \
ENV PAPERLESS_EXPORT_DIR=/export
ENV PAPERLESS_CONSUMPTION_DIR=/consume

# Create the directories and user
RUN mkdir -p $PAPERLESS_CONSUMPTION_DIR && \
mkdir -p $PAPERLESS_EXPORT_DIR && \
addgroup --gid 1000 paperless && \
adduser --home /usr/src/paperless --disabled-password --gecos "" --uid 1000 --ingroup paperless paperless

RUN echo 'Defaults env_keep += "VIRTUAL_ENV"' >>/etc/sudoers.d/paperless && \
echo 'Defaults secure_path=/usr/src/paperless/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/bin' >> /etc/sudoers.d/paperless

COPY --chown=paperless:paperless --from=builder /usr/src/paperless/ /usr/src/paperless

# Setup entrypoint
chmod 755 /sbin/docker-entrypoint.sh
COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
RUN chmod 755 /sbin/docker-entrypoint.sh

# Copy gunicorn.conf
COPY scripts/gunicorn.conf.py /usr/src/paperless/

WORKDIR /usr/src/paperless/src
# Mount volumes and set Entrypoint
VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume", "/export"]

ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
CMD ["--help"]

Expand All @@ -70,5 +90,13 @@ COPY src/ /usr/src/paperless/src/
COPY data/ /usr/src/paperless/data/
COPY media/ /usr/src/paperless/media/

# setup venv
ENV VIRTUAL_ENV=/usr/src/paperless
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

RUN cd /usr/src/paperless/src && \
django-admin compilemessages

# Collect static files
RUN sudo -HEu paperless /usr/src/paperless/src/manage.py collectstatic --clear --no-input
8 changes: 6 additions & 2 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ verify_ssl = true
name = "pypi"

[packages]
django = "<2.1,>=2.0"
django = "<2.3,>=2.0"
pillow = "*"
coveralls = "*"
dateparser = "*"
Expand All @@ -13,7 +13,9 @@ django-crispy-forms = "*"
django-extensions = "*"
django-filter = "*"
djangorestframework = "*"
factory-boy = "*"
django-admin-list-filter-dropdown = "*"
django-mysql = "*"
factory-boy = "<3.0"
filemagic = "*"
fuzzywuzzy = {extras = ["speedup"],version = "==0.15.0"}
gunicorn = "*"
Expand All @@ -38,6 +40,8 @@ psycopg2 = "*"
djangoql = "*"
whitenoise = "*"
brotli = "*"
pikepdf = ">=1.19"
mysqlclient = "*"

[dev-packages]
ipython = "*"
Loading