Reproducible Experiment Platform (REP)

REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way.

Main features:

unified python wrapper for different ML libraries (wrappers follow extended scikit-learn interface)
- Sklearn
- TMVA
- XGBoost
- uBoost
- Theanets
- Pybrain
- Neurolab
- MatrixNet service(available to CERN)
parallel training of classifiers on cluster
classification/regression reports with plots
interactive plots supported
smart grid-search algorithms with parallel execution
research versioning using git
pluggable quality metrics for classification
meta-algorithm design (aka 'rep-lego')

REP is not trying to substitute scikit-learn, but extends it and provides better user experience.

Howto examples

To get started, look at the notebooks in /howto/

Notebooks can be viewed (not executed) online at nbviewer
There are basic introductory notebooks (about python, IPython) and more advanced ones (about the REP itself)

Examples code is written in python 2, but library is python 2 and python 3 compatible.

Installation with Docker

We provide the docker image with REP and all it's dependencies. It is a recommended way, specially if you're not experienced in python.

Installation with bare hands

However, if you want to install REP and all of its dependencies on your machine yourself, follow this manual: installing manually and running manually.

License

Apache 2.0, library is open-source.

Minimal examples

REP wrappers are sklearn compatible:

from rep.estimators import XGBoostClassifier, SklearnClassifier, TheanetsClassifier
clf = XGBoostClassifier(n_estimators=300, eta=0.1).fit(trainX, trainY)
probabilities = clf.predict_proba(testX)

Beloved trick of kagglers is to run bagging over complex algorithms. This is how it is done in REP:

from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=XGBoostClassifier(), n_estimators=10)
# wrapping sklearn to REP wrapper
clf = SklearnClassifier(clf)

Another useful trick is to use folding instead of splitting data into train/test. This is specially useful when you're using some kind of complex stacking

from rep.metaml import FoldingClassifier
clf = FoldingClassifier(TheanetsClassifier(), n_folds=3)
probabilities = clf.fit(X, y).predict_proba(X)

In example above all data are splitted into 3 folds, and each fold is predicted by classifier which was trained on other 2 folds.

Also REP classifiers provide report:

report = clf.test_on(testX, testY)
report.roc().plot() # plot ROC curve
from rep.report.metrics import RocAuc
# learning curves are useful when training GBDT!
report.learning_curve(RocAuc(), steps=10)

You can read about other REP tools (like smart distributed grid search, folding and factory) in documentation and howto examples.

Name		Name	Last commit message	Last commit date
Latest commit History 975 Commits
ci		ci
docs		docs
howto		howto
rep		rep
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS		AUTHORS
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
circle.yml		circle.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducible Experiment Platform (REP)

Main features:

Howto examples

Installation with Docker

Installation with bare hands

Links

License

Minimal examples

About

Releases 7

Packages

Contributors 10

Languages

License

yandex/rep

Folders and files

Latest commit

History

Repository files navigation

Reproducible Experiment Platform (REP)

Main features:

Howto examples

Installation with Docker

Installation with bare hands

Links

License

Minimal examples

About

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 10

Languages

Packages