Hydro-analytics Paper Reproducibility Project

This is a clone of the repository that hosts the code of a hydrology-related research paper, published by Sadler et al. 2018 (https://doi.org/10.1016/j.jhydrol.2018.01.044). This repo serves as a use case of setting up and running a research reproducibility workflow.

The following changes & additions were made:

Debug and modified python, Jupyter Notebooks, and R codes to determine dependencies and fix broken links to data.
Addition of docker containers for 4 processes (2 parallel for data analytics and 2 subsequent serial ones for final model generation).
Automated the entire workflow process using dockers.

Goals of this reproducibility exercise where the following:

Containerize tools - determine system interdependencies
Utilize Github and share resources
Workflow parallelization
Reproducibility across platforms

Reproducibility Instructions

Starting a Jetstream instance is recommended: after logging in, select Ubuntu 18.04 Devel and Docker instance, m1.medium (CPU: 6, Mem: 16 GB, Disk: 60 GB) size and launch. When your instance is activated, you can use the web shell or $ ssh <VM's IP> in your command line. You can skip step 0 of Methods 1 and 2 if you choose to do this.

There are 3 methods that can be followed to reproduce the Sadler et al. 2018 paper's results:

All 3 methods will need that you can clone this github to your local machine.
$ git clone https://github.com/cyber-carpentry/group6.git
These methods will refer to PATH TO GITHUB REPOSITORY which is the path from your home directory to the github repository on your local machine. Enter $ pwd in the command line when you are in the github repository to see the path.

Method 1: Automated with pre-built Docker Images

This method requires Docker to be installed on your machine. The getting started guide on Docker has detailed instructions for setting up Docker on Mac/Windows/Linux.
Run all the Dockers using $ sh all.sh in the command line.

This command should be run in the main github directory (PATH TO GITHUB REPOSITORY).

Completed, flood_events.csv, nor_daily_observations.csv, and for_model_avgs.csv should be created in the db_scripts folder. Additionally, 4 files poisson_out_test.csv, poisson_out_train.csv, rf_out_test.csv, rf_out_train.csv will be generated in the /models folder.

Method 2: Build Docker Image

This method requires Docker to be installed on your machine.
Build Docker image
$ docker build -t flood_pred .

This command should run in the main GitHub directory (PATH TO GITHUB REPOSITORY)
There will be a file called Dockerfile in this directory (use command $ ls to check that out).

Run the Docker image using
$ docker run -v PATH TO GITHUB REPOSITORY/group6:/group6 flood_pred
Completed, flood_events.csv, nor_daily_observations.csv, for_model_avgs.csv and other files should be created in the db_scripts and models folders.

Method 3: Running Manually

This method requires creating both the python and R enviroment for running the scripts.

Python 2.7.16 was used and the required python modules with there verisons can be seen in requirements.txt
R 3.5.1 was used with caret, ggfortify, ggplot2, dplyr, RSQlite, DBI, class, and randomForest packages. All packages were installed from the http://cran.rstudio.com/ repo.
We recommend using conda to install the required enviroment with instructions on how to do that are below

Change to parent directory of repository directory
$ cd PATH TO REPOSITORY/..
Get file from Hydroshare
$ wget https://www.hydroshare.org/resource/9e1b23607ac240588ba50d6b5b9a49b5/data/contents/hampt_rd_data.sqlite

This will download a file in the current directory (Outside respository directory)

Go to database scripts directory
$ cd group6/db_scripts
Run python script to process street floods data
$ python prepare_flood_events_table.py

This will create flood_events.csv

Run python scirpt to process enviromental data
$ python make_dly_obs_table.py

This will create nor_daily_observations.csv

Run python script for combining flood and enviromental data
$ python by_event_for_model.py

This requires flood_events.csv and nor_daily_observations.csv and creates for_model_avgs.csv

Change to the model directory
$ cd models
Run R scripts for analysis
$ Rscript final_model_output_script.R
Four files, poisson_out_test.csv, poisson_out_train.csv, rf_out_test.csv, rf_out_train.csv, will be generated in the /models folder.

As another small but time-consuming alternative is to create CONDA environment to run the entire workflow using CONDA based fixed environment.

Instructions to create CONDA environment:

Starting a Jetstream instance is recommended: after logging in, select Ubuntu 18.04 Devel and miniconda m1.medium (CPU: 6, Mem: 16 GB, Disk: 60 GB) size and launch. If miniconda is not available you can download from https://docs.conda.io/en/latest/miniconda.html
$ git clone https://github.com/cyber-carpentry/group6.git \ cloning the git repository \
$ conda create --name --file hydro_make.yml \ creating a new environment\
$ source activate hydro \ Activating the environment\
This enviroment can be used to run method 3

Completed, you should see flood_events.csv, nor_daily_observations.csv, and for_model_avgs.csv should be created in db_scripts. Additionally 4 files poisson_out_test.csv, poisson_out_train.csv, rf_out_test.csv, rf_out_train.csv will be generated in /models.

The team members that led this effort were participants of the Cyber Carpentry 2019 workshop at the University of North Carolina at Chapel Hill. Note: You can access the original repository here.

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
.ipynb_checkpoints		.ipynb_checkpoints
db_scripts		db_scripts
gis_scripts		gis_scripts
hr_db_scripts		hr_db_scripts
models		models
norfolk_flood_data		norfolk_flood_data
plots		plots
.gitignore		.gitignore
.gitmodules		.gitmodules
.gitmodulesold		.gitmodulesold
Dockerfile		Dockerfile
Dockerfile.by_event_for_model		Dockerfile.by_event_for_model
Dockerfile.make_dly_obs_table		Dockerfile.make_dly_obs_table
Dockerfile.prepare_flood_events_table		Dockerfile.prepare_flood_events_table
Dockerfile.r		Dockerfile.r
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
MMPS-125_filt.csv		MMPS-125_filt.csv
MMPS-125_filt1.csv		MMPS-125_filt1.csv
README.md		README.md
Snakefile		Snakefile
__init__.py		__init__.py
all.sh		all.sh
cleaned_in_r.csv		cleaned_in_r.csv
dts_r.r		dts_r.r
event_data.csv		event_data.csv
groundwater.csv		groundwater.csv
hydro_make.yml		hydro_make.yml
install.R		install.R
modeling_pca.ipynb		modeling_pca.ipynb
plot_floods_by_date.ipynb		plot_floods_by_date.ipynb
plot_floods_by_date.py		plot_floods_by_date.py
plotting_rain_tide_events.py		plotting_rain_tide_events.py
requirements.txt		requirements.txt
requirements_R.txt		requirements_R.txt
requirements_simple.txt		requirements_simple.txt
results.py		results.py
revision_questions.ipynb		revision_questions.ipynb
revision_questions.py		revision_questions.py
runtime.txt		runtime.txt
shallow_well_raw.csv		shallow_well_raw.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hydro-analytics Paper Reproducibility Project

Reproducibility Instructions

Method 1: Automated with pre-built Docker Images

Method 2: Build Docker Image

Method 3: Running Manually

Instructions to create CONDA environment:

We welcome any suggestions and please post if you find any issues. -Thank you Group-6 team

About

Releases 1

Packages

Contributors 5

Languages

License

cyber-carpentry/group6

Folders and files

Latest commit

History

Repository files navigation

Hydro-analytics Paper Reproducibility Project

Reproducibility Instructions

Method 1: Automated with pre-built Docker Images

Method 2: Build Docker Image

Method 3: Running Manually

Instructions to create CONDA environment:

We welcome any suggestions and please post if you find any issues. -Thank you Group-6 team

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages