Skip to content

Latest commit

 

History

History

3_extracting_features

3. Extract NF1 Features

In this module, I present the pipeline for extracting features from the NF1 data.

Feature Extraction

To extract features from the NF1 data, I use DeepProfiler, commit 8752f69.

Based off of previous projects in the lab, I decided to use a pretrained model from the LUAD Cell Painting repository with DeepProfiler. DeepProfiler has the function to be able to train and create your own model (also named checkpoint) which I would like to test in the future.

The config files were made based off the one used in the same repository from above. The following changes were made to each config file, which are NF1_nuc_config.json and NF1_cyto_config.json.

Both:

  • "Allele" -> "Genotype In the LUAD study, alleles were compared across cell painting images. For the NF1 data, the genotypes of the NF1 gene are compared.
  • dataset: images: {file format: tif, bits: 16, width: 1080, height: 1080} -> dataset: images: {file format: tiff, bits: 8, width: 1224, height: 904}: The image details are changed to reflect the NF1 data.
  • prepare: implement: true -> prepare: implement: false We do not prepare the NF1 data with illumination correction (already done) or compression with Deep Profiler.
  • dataset: images: channels: [DNA, ER, RNA, AGP, Mito] -> dataset: images: channels: [DNA, ER, RNA] While the Cell Painting dataset has multiple channels for cell images, the NF1 data only has the first three channels to examine.

NF1_nuc_config.json:

  • dataset: locations: box_size: 96 -> dataset: locations: box_size: 128 This change expands the size of the box put around each nuclei that DeepProfiler interprets. This expansion was recommended by Juan Caicedo to improve performance.

NF1_cyto_config.json:

  • dataset: locations: box_size: 96 -> dataset: locations: box_size: 256 This change expands the size of the box around each cell that DeepProfiler interprets. This expansion attempts to capture as much of the cytoplasm as possible (this will be benchmarked in the future to assess the best box size).

Step 1: Setup Feature Extraction Environment

Step 1a: Create Feature Extraction Environment

# Run this command in terminal to create the conda environment for feature extraction
conda env create -f 3.NF1_feature_extraction_env.yml

Step 1b: Activate Feature Extraction Environment

# Run this command in terminal to activate the conda environment for Deep Profiler feature extraction

conda activate 3.feature-extraction-NF1

Step 2: Install DeepProfiler

Step 2a: Clone Repository

Clone the DeepProfiler repository into 3_extracting_features/ with

# Make sure you are located in 3_extracting_features/
cd 3_extracting_features/
git clone https://github.com/cytomining/DeepProfiler.git

Step 2b: Install Repository

Install the DeepProfiler repository with

# Make sure you are located in DeepProfiler/ to install
cd DeepProfiler/
pip install -e .

Step 2c: Complete Tensorflow GPU Setup

Based on previous projects within the lab, we found using Tensorflow GPU when using DeepProfiler improves performance. To setup, follow these instructions. I use Tensorflow GPU while processing NF1 data.

Step 3: Define Project Paths

Inside the notebook compile_DP_projects.ipynb, the variables nuc_project_path and cyto_project_path need to be changed to reflect the desired object DeepProfiler project locations.

Step 4: Compile DeepProfiler Project

In order to profile features with DeepProfiler, a project needs to be set up with a certain file structure and files.

In compile_DP_projects.ipynb, the necessary project structure is created using the functions from DPutils.py.

The config files (NF1_nuc_config.json/NF1_cyto_config.json) are copied to their corresponding projects and the pretrained model (efficientnet-b0_weights_tf_dim_ordering_tf_kernels_autoaugment.h5) to both projects. Both of these files are located within the DP_files folder for reference.

We need to compile an index.csv file as it necessary for DeepProfiler to load each image. We create this using the the annotations file. Using the index.csv, we compile the locations (in project/input/locations), which are necessary csv files for DeepProfiler to find the single cells in each image.

For more information on DeepProfiler, please reference the DeepProfiler wiki.

# Run this script in terminal to compile the DeepProfiler projects
bash 3.compile-DP-projects.sh

Step 5: Extract Features with DeepProfiler

Change path/to/DP_nuc_project and path/to/DP_cyto_project below to the nuc_project_path and cyto_project_path set in step 3. Note: Only include what is in the pathlib.Path(), not the full path for each variable (e.g pathlib.Path('NF1_nuc_project') -> use NF1_nuc_project)

# Run this script in terminal to extract features with DeepProfiler
python3 -m deepprofiler --gpu 0 --exp efn_pretrained --root `path/to/DP_nuc_project` --config NF1_nuc_config.json profile
python3 -m deepprofiler --gpu 0 --exp efn_pretrained --root `path/to/DP_cyto_project` --config NF1_cyto_config.json profile

Step 6: Define Cytoplasm Locations Directory Path

Inside the notebook rename_cyto_locations.ipynb, the variable cyto_locations_path needs to be changed to reflect the plate in the /locations directory from the Cytoplasm project that contains the files to be renamed.

Note: Currently, there is only one plate from this pilot data so that is why the path goes directory to one plate and not to the whole /locations directory.

Step 7: Rename Cytoplasm Project Location Files

Due to the format of the checkpoint being used, the location files within the Cytoplasm project must end with Nuclei.csv. This means that the files in the /inputs/locations directory for both the Nuclei and Cytoplasm projects are named the exact same (e.g {well}-{site}-Nuclei.csv). With the rename_cyto_locations.ipynb notebook, the suffix of the files in Cytoplasm projects are renamed to Cytoplasm.csv to avoid confusion during downstream analysis.

# Run this script in terminal to rename all files in the /inputs/locations/ directory for the Cytoplasm Project
bash 3.rename_cyto_locations.sh