Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patient Pathways within JuliaHealth - Part 1 #124

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
239 changes: 239 additions & 0 deletions JuliaHealth-Blogs.md
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Exploring Patient Pathways within JuliaHealth

## Introduction
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved
Here, we are going to have a walkthrough of how to filter out treatment pathways of interest for a given dataset present in CDM format.


## Required Packages
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved

Here are the packages we will need for exploring patient pathways grouped by primary use cases in this exploration:

- Interfacing with databases

* [`DBInterface.jl`](https://github.com/JuliaDatabases/DBInterface.jl) - Database interface definitions for Julia

* [`SQLite`](https://github.com/JuliaDatabases/SQLite.jl) - A Julia interface to the SQLite library

- Health analytics built specifically for working with OMOP CDM databases


* [`OMOPCDMCohortCreator`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) - Create cohorts from databases utilizing the OMOP CDM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely, this doesn't render correctly in the preview...

Copy link
Member Author

@Jay-sanjay Jay-sanjay Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, the last two commits to this PR is for that thing only, but still not fixed


- General data analytics tools

* [`DataFrames.jl`](https://github.com/JuliaData/DataFrames.jl) - In-memory tabular data in Julia

- Miscellaneous packages

* [`HealthSampleData.jl`](https://github.com/JuliaHealth/HealthSampleData.jl) - Sample health data for a variety of health formats and use cases

* [`FunSQL`](https://github.com/MechanicalRabbit/FunSQL.jl) - Support for generating random numbers

* `Base` - Default libraries built into Julia


1. First step is to import all the essential Packages:
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved

```
TUTORIAL> add DBInterface
TUTORIAL> add HealthSampleData
TUTORIAL> add OMOPCDMCohortCreator
TUTORIAL> add SQLite
TUTORIAL> add DataFrames
TURORIAL> add FunSQL
```

2. Next we need Data to be worked on.

For this tutorial, we will work with data from [`Eunomia`](https://github.com/OHDSI/Eunomia) that is stored in a SQLite format. To install the data on your machine, execute the following code block and follow the prompts - you will need a stable internet connection for the download to complete:

```
import HealthSampleData: Eunomia

eunomia = Eunomia()
```


3. After you have finished your set up in the Julia, we need to establish a connection to the Eunomia SQLite database that we will use for the rest of the tutorial:
```
import SQLite: DB

conn = DB(eunomia)
```

4. With Eunomia, the database's schema is simply called "main". We will use this to generate database connection details that will inform `OMOPCDMCohortCreator` about the type of queries we will write (i.e. SQLite) and the name of the database's schema. For this step, we will use `OMOPCDMCohortCreator`:

```
import OMOPCDMCohortCreator as occ

occ.GenerateDatabaseDetails(
:sqlite,
"main"
)
```

5. Then will generate internal representations of each table found within Eunomia for OMOPCDMCohortCreator to use:

```
occ.GenerateTables(conn)
```

6. Now to make things easy for this tutorial we will characterize a group of patients with a certain condition (or conditions) across various attributes like race, age, and combinations thereof. We are going to do miniature version of such a study looking at patients with strep throat. For this, we will use the condition_concept_id: 2806028060 - this will be needed for you to get correct results.
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved

```
strep_patients = occ.ConditionFilterPersonIDs(28060, conn)
```


7. Now this are some of the required functions that probabily would be directly be useable via `occ` after the new version of [`OMOPCDMCohortCreator`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) is released.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hasn't this functionality been merged already? Can't we remove this section? Or did I miss something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can remove it now, as the new version for occ is here



* Function to query the drug's start date for a given drug.
```
function GetDrugExposureStartDate(
drug_exposure_ids;
tab = drug_exposure
)

sql =
From(tab) |>
Where(Fun.in(Get.drug_exposure_id, drug_exposure_ids...)) |>
Select(Get.drug_exposure_id, Get.drug_exposure_start_date) |>
q -> render(q, dialect=dialect)

return String(sql)

end

function GetDrugExposureStartDate(
drug_exposure_ids,
conn;
tab = drug_exposure
)

df = DBInterface.execute(conn, GetDrugExposureStartDate(drug_exposure_ids; tab=tab)) |> DataFrame

return df
end
```
* Function to query the drug's end date for a given drug.
```
function GetDrugExposureEndDate(
drug_exposure_ids;
tab = drug_exposure
)

sql =
From(tab) |>
Where(Fun.in(Get.drug_exposure_id, drug_exposure_ids...)) |>
Select(Get.drug_exposure_id, Get.drug_exposure_end_date) |>
q -> render(q, dialect=dialect)

return String(sql)

end

function GetDrugExposureEndDate(
drug_exposure_ids,
conn;
tab = drug_exposure
)

df = DBInterface.execute(conn, GetDrugExposureEndDate(drug_exposure_ids; tab=tab)) |> DataFrame

return df
end

```

#So now that we have the dataset to work with as well as all the functions to work with also ready, we from here can start work form PATHWAYS-STUDY.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to say here? Don't follow entirely -- is this where you are introducing what we are doing?


* To start with we need to get the `drug ids` corresponding to each of the patients with strep throat.
```
patient_drug_exposures = occ.GetDrugExposureIDs(strep_patients, conn)
```


* We would also require `drug concepts`
```
pateints_drug_concept_id = occ.GetDrugConceptIDs(patient_drug_exposures, conn)
```



* Now that we have the `drug ids` corresponding to each patients we now need to get the `start date` and `end date` corresponding to each `drug ids`

```
exposure_start_date = GetDrugExposureStartDate(patient_drug_exposures.drug_exposure_id, conn)


exposure_end_date = GetDrugExposureEndDate(patient_drug_exposures.drug_exposure_id, conn)
```


* A thing to notice here is that the dates here are in `unix` format, which preety annoying to understand so we need to convert it into `data-time` format. This can be done as follows
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved
```
exposure_start_date.drug_exposure_start_date = exposure_start_date.drug_exposure_start_date .|> unix2datetime

exposure_end_date.drug_exposure_end_date = exposure_end_date.drug_exposure_end_date .|> unix2datetime
```

* Now to make the Dataframe look more appealing we try to combine the dataset like this:

```
combined_df = DataFrames.outerjoin(patient_drug_exposures, exposure_start_date, on = :drug_exposure_id)
combined_df = DataFrames.outerjoin(combined_df, exposure_end_date, on = :drug_exposure_id)
combined_df = DataFrames.outerjoin(combined_df, pateints_drug_concept_id, on = :drug_exposure_id, makeunique=true)
```
* Now we need to sort the Dataframe in the ascending order of the dates.
```
combined_df = sort!(combined_df, :drug_exposure_start_date)
```

* An important thing to notice here is that some start and end dates seems to be preety weird like below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This is an interesting question you have brought up here... Franklin can actually execute and render cells from Franklin markdown documents (like this one). I am leaning towards us actually being able to make this whole document computable (see what I mean here: https://franklinjl.org/code/#evaluating_code)

That way, the tables render properly and you don't have to worry about copying and pasting results all the time. What do you think Jay?

```
Row │ person_id drug_exposure_id drug_exposure_start_date drug_exposure_end_date drug_concept_id person_id_1
│ Float64? Float64? Union{Missing, DateTime} Union{Missing, DateTime} Float64? Float64?
109154 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 1.12708e6 1834.0
109155 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 1.12708e6 484.0
109156 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 4.02132e7 1834.0
109157 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 4.02132e7 484.0
```

* So in-order to address this issue within the dataset, we will chop off such rows for our pathways study by doinf something like this:
```
combined_df = combined_df[combined_df.drug_exposure_start_date .< combined_df.drug_exposure_end_date, :]
```

* Now as a final step to exploring pathways, we peform a very naive approach to get the treatment-pathways by:
1. Itterate through each of the `patients_id`
2. Itterate through the combined_df and push the drug_exposure_id to the drug_pathways list if the two consecutive start dates are different as well as the two consecutive end dates are different.


```
pathways_dict = Dict()

for person_id in unique(combined_df.person_id)
my_patients = combined_df[combined_df.person_id .== person_id, :]
pathways = []
for i in 1:size(my_patients, 1)-1
if ((my_patients[i, :drug_exposure_start_date] != my_patients[i+1, :drug_exposure_start_date] || my_patients[i, :drug_exposure_end_date] != my_patients[i+1, :drug_exposure_end_date]))
push!(pathways, my_patients[i, :drug_exposure_id])
end
end
pathways_dict[person_id] = pathways
end
```

* Now the pathways of our interset are present in the dixtionary `pathways_dict`that would look like this:
Jay-sanjay marked this conversation as resolved.
Show resolved Hide resolved

```
Dict{Any, Any} with 1677 entries:
4986.0 => Any[59989.0, 59989.0, 59977.0, 59977.0, 59969.0, 59976.0, 59976.0, 59986.0, 59973.0, 59975.0, 59968.0, 59984.0, 59985.0, 59982.0, 59983.0, 59988.0, 59970.0]
4700.0 => Any[56635.0, 56632.0, 56632.0, 56633.0, 56639.0, 56638.0, 56637.0]
4576.0 => Any[55107.0, 55112.0, 55115.0, 55119.0, 55111.0, 55118.0, 55103.0, 55105.0, 55110.0, 55116.0, 55117.0, 55120.0]
1175.0 => Any[14056.0, 14052.0, 14057.0, 14048.0, 14046.0, 14055.0, 14053.0, 14044.0, 14054.0, 53705.0, 53705.0, 53684.0, 53684.0, 14051.0, 14058.0, 14050.0, 53703.0…
1144.0 => Any[52276.0, 13660.0, 13667.0, 13673.0, 13671.0, 13672.0, 13672.0, 52273.0, 52273.0, 13665.0, 13661.0, 13658.0, 13662.0, 52265.0, 52265.0, 13664.0, 13669.0…
719.0 => Any[32930.0, 32930.0, 8612.0, 8614.0, 8618.0, 8608.0, 8621.0, 8617.0, 8616.0, 8623.0, 32928.0, 8606.0, 8625.0, 8619.0, 8622.0, 32928.0, 8610.0]
3634.0 => Any[43874.0, 43875.0, 43872.0, 43871.0, 43878.0]
```
1 change: 1 addition & 0 deletions _layout/head.html
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
<li><a href="https://github.com/JuliaHealth">GitHub</a></li>
<li><a href="/related-organizations">Related Organizations</a></li>
<li><a href="/meeting-notes">Meeting Notes</a></li>
<li><a href="/JuliaHealth-Blogs">JuliaHealth Blogs</a></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename the file to just "blogs".md? Also, this page should be more of an aggregator that has the following per blog post:

  1. Blog post title
  2. Blog post description
  3. Authors
  4. Date

</ul>
<img src="/assets/hamburger.svg" id="menu-icon">
</nav>
Expand Down
Loading