This container creates a random CSV with ~3M lines and reads it using pyexcel.iget_records
to illustrate a memory leak when using pyexcel with Pyston.
- Run
./run-test.sh
- In a separate shell, run
docker stats
- Identify your container, then press Enter in the shell where you ran the script in (1) to start reading the CSV file
- Monitor the Docker stats and see memory increasing linearly
To see it doesn't happen with the python:3.8
image, edit the Dockerfile as needed and run through the same steps; memory usage will be constant.
Memory used by container (Python 3.8):
- Before starting: 32 MB
- Before exiting: 33 MB
Memory used by container (Pyston):
- Before starting: 39 MB
- Before exiting: 723 MB
When changing the CSV delimiter to ;
, which reduces the number of fields per line from 3 to 1, memory before exiting is reduced to 300 MB, so this seems related to the number of fields (and not lines) pyexcel has to process.
In my quest to get the most minimal reproducible example I could, I tried to see if using csv.reader
would reproduce this bug as well; it doesn't.