From 7c2a691186903731bba56c712506bc6cbe2ab998 Mon Sep 17 00:00:00 2001 From: Jay-sanjay <134289328+Jay-sanjay@users.noreply.github.com> Date: Fri, 2 Feb 2024 00:47:50 +0530 Subject: [PATCH 1/5] blog-1 added --- JuliaHealth-Blogs.md | 239 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 239 insertions(+) create mode 100644 JuliaHealth-Blogs.md diff --git a/JuliaHealth-Blogs.md b/JuliaHealth-Blogs.md new file mode 100644 index 0000000..00c0d34 --- /dev/null +++ b/JuliaHealth-Blogs.md @@ -0,0 +1,239 @@ +# Exploring Patient Pathways within JuliaHealth + +## Introduction +Here, we are going to have a walkthrough of how to filter out treatment pathways of interest for a given dataset present in CDM format. + + +## Required Packages + +Here are the packages we will need for exploring patient pathways grouped by primary use cases in this exploration: + +- Interfacing with databases + + * [`DBInterface.jl`](https://github.com/JuliaDatabases/DBInterface.jl) - Database interface definitions for Julia + + * [`SQLite`](https://github.com/JuliaDatabases/SQLite.jl) - A Julia interface to the SQLite library + +- Health analytics built specifically for working with OMOP CDM databases + + + * [`OMOPCDMCohortCreator.jl`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) - Create cohorts from databases utilizing the OMOP CDM + +- General data analytics tools + + * [`DataFrames.jl`](https://github.com/JuliaData/DataFrames.jl) - In-memory tabular data in Julia + +- Miscellaneous packages + + * [`HealthSampleData.jl`](https://github.com/JuliaHealth/HealthSampleData.jl) - Sample health data for a variety of health formats and use cases + + * [`FunSQL`](https://github.com/MechanicalRabbit/FunSQL.jl) - Support for generating random numbers + + * `Base` - Default libraries built into Julia + + +1. First step is to import all the essential Packages: + +``` +TUTORIAL> add DBInterface +TUTORIAL> add HealthSampleData +TUTORIAL> add OMOPCDMCohortCreator +TUTORIAL> add SQLite +TUTORIAL> add DataFrames +TURORIAL> add FunSQL +``` + +2. Next we need Data to be worked on. + + For this tutorial, we will work with data from [`Eunomia`](https://github.com/OHDSI/Eunomia) that is stored in a SQLite format. To install the data on your machine, execute the following code block and follow the prompts - you will need a stable internet connection for the download to complete: + +``` +import HealthSampleData: Eunomia + +eunomia = Eunomia() +``` + + +3. After you have finished your set up in the Julia, we need to establish a connection to the Eunomia SQLite database that we will use for the rest of the tutorial: +``` +import SQLite: DB + +conn = DB(eunomia) +``` + +4. With Eunomia, the database's schema is simply called "main". We will use this to generate database connection details that will inform `OMOPCDMCohortCreator` about the type of queries we will write (i.e. SQLite) and the name of the database's schema. For this step, we will use `OMOPCDMCohortCreator`: + +``` +import OMOPCDMCohortCreator as occ + +occ.GenerateDatabaseDetails( + :sqlite, + "main" +) +``` + +5. Then will generate internal representations of each table found within Eunomia for OMOPCDMCohortCreator to use: + +``` +occ.GenerateTables(conn) +``` + +6. Now to make things easy for this tutorial we will characterize a group of patients with a certain condition (or conditions) across various attributes like race, age, and combinations thereof. We are going to do miniature version of such a study looking at patients with strep throat. For this, we will use the condition_concept_id: 2806028060 - this will be needed for you to get correct results. + +``` +strep_patients = occ.ConditionFilterPersonIDs(28060, conn) +``` + + +7. Now this are some of the required functions that probabily would be directly be useable via `occ` after the new version of [`OMOPCDMCohortCreator`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) is released. + + +* Function to query the drug's start date for a given drug. +``` +function GetDrugExposureStartDate( + drug_exposure_ids; + tab = drug_exposure +) + + sql = + From(tab) |> + Where(Fun.in(Get.drug_exposure_id, drug_exposure_ids...)) |> + Select(Get.drug_exposure_id, Get.drug_exposure_start_date) |> + q -> render(q, dialect=dialect) + + return String(sql) + +end + +function GetDrugExposureStartDate( + drug_exposure_ids, + conn; + tab = drug_exposure +) + + df = DBInterface.execute(conn, GetDrugExposureStartDate(drug_exposure_ids; tab=tab)) |> DataFrame + + return df +end +``` +* Function to query the drug's end date for a given drug. +``` +function GetDrugExposureEndDate( + drug_exposure_ids; + tab = drug_exposure +) + + sql = + From(tab) |> + Where(Fun.in(Get.drug_exposure_id, drug_exposure_ids...)) |> + Select(Get.drug_exposure_id, Get.drug_exposure_end_date) |> + q -> render(q, dialect=dialect) + + return String(sql) + +end + +function GetDrugExposureEndDate( + drug_exposure_ids, + conn; + tab = drug_exposure +) + + df = DBInterface.execute(conn, GetDrugExposureEndDate(drug_exposure_ids; tab=tab)) |> DataFrame + + return df +end + +``` + +#So now that we have the dataset to work with as well as all the functions to work with also ready, we from here can start work form PATHWAYS-STUDY. + +* To start with we need to get the `drug ids` corresponding to each of the patients with strep throat. +``` +patient_drug_exposures = occ.GetDrugExposureIDs(strep_patients, conn) +``` + + +* We would also require `drug concepts` +``` +pateints_drug_concept_id = occ.GetDrugConceptIDs(patient_drug_exposures, conn) +``` + + + +* Now that we have the `drug ids` corresponding to each patients we now need to get the `start date` and `end date` corresponding to each `drug ids` + +``` +exposure_start_date = GetDrugExposureStartDate(patient_drug_exposures.drug_exposure_id, conn) + + +exposure_end_date = GetDrugExposureEndDate(patient_drug_exposures.drug_exposure_id, conn) +``` + + +* A thing to notice here is that the dates here are in `unix` format, which preety annoying to understand so we need to convert it into `data-time` format. This can be done as follows +``` +exposure_start_date.drug_exposure_start_date = exposure_start_date.drug_exposure_start_date .|> unix2datetime + +exposure_end_date.drug_exposure_end_date = exposure_end_date.drug_exposure_end_date .|> unix2datetime +``` + +* Now to make the Dataframe look more appealing we try to combine the dataset like this: + +``` +combined_df = DataFrames.outerjoin(patient_drug_exposures, exposure_start_date, on = :drug_exposure_id) +combined_df = DataFrames.outerjoin(combined_df, exposure_end_date, on = :drug_exposure_id) +combined_df = DataFrames.outerjoin(combined_df, pateints_drug_concept_id, on = :drug_exposure_id, makeunique=true) +``` +* Now we need to sort the Dataframe in the ascending order of the dates. +``` +combined_df = sort!(combined_df, :drug_exposure_start_date) +``` + +* An important thing to notice here is that some start and end dates seems to be preety weird like below: +``` + Row │ person_id drug_exposure_id drug_exposure_start_date drug_exposure_end_date drug_concept_id person_id_1 + │ Float64? Float64? Union{Missing, DateTime} Union{Missing, DateTime} Float64? Float64? + 109154 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 1.12708e6 1834.0 + 109155 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 1.12708e6 484.0 + 109156 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 4.02132e7 1834.0 + 109157 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 4.02132e7 484.0 +``` + +* So in-order to address this issue within the dataset, we will chop off such rows for our pathways study by doinf something like this: +``` +combined_df = combined_df[combined_df.drug_exposure_start_date .< combined_df.drug_exposure_end_date, :] +``` + +* Now as a final step to exploring pathways, we peform a very naive approach to get the treatment-pathways by: +1. Itterate through each of the `patients_id` +2. Itterate through the combined_df and push the drug_exposure_id to the drug_pathways list if the two consecutive start dates are different as well as the two consecutive end dates are different. + + +``` +pathways_dict = Dict() + +for person_id in unique(combined_df.person_id) + my_patients = combined_df[combined_df.person_id .== person_id, :] + pathways = [] + for i in 1:size(my_patients, 1)-1 + if ((my_patients[i, :drug_exposure_start_date] != my_patients[i+1, :drug_exposure_start_date] || my_patients[i, :drug_exposure_end_date] != my_patients[i+1, :drug_exposure_end_date])) + push!(pathways, my_patients[i, :drug_exposure_id]) + end + end + pathways_dict[person_id] = pathways +end +``` + +* Now the pathways of our interset are present in the dixtionary `pathways_dict`that would look like this: + +``` +Dict{Any, Any} with 1677 entries: + 4986.0 => Any[59989.0, 59989.0, 59977.0, 59977.0, 59969.0, 59976.0, 59976.0, 59986.0, 59973.0, 59975.0, 59968.0, 59984.0, 59985.0, 59982.0, 59983.0, 59988.0, 59970.0] + 4700.0 => Any[56635.0, 56632.0, 56632.0, 56633.0, 56639.0, 56638.0, 56637.0] + 4576.0 => Any[55107.0, 55112.0, 55115.0, 55119.0, 55111.0, 55118.0, 55103.0, 55105.0, 55110.0, 55116.0, 55117.0, 55120.0] + 1175.0 => Any[14056.0, 14052.0, 14057.0, 14048.0, 14046.0, 14055.0, 14053.0, 14044.0, 14054.0, 53705.0, 53705.0, 53684.0, 53684.0, 14051.0, 14058.0, 14050.0, 53703.0… + 1144.0 => Any[52276.0, 13660.0, 13667.0, 13673.0, 13671.0, 13672.0, 13672.0, 52273.0, 52273.0, 13665.0, 13661.0, 13658.0, 13662.0, 52265.0, 52265.0, 13664.0, 13669.0… + 719.0 => Any[32930.0, 32930.0, 8612.0, 8614.0, 8618.0, 8608.0, 8621.0, 8617.0, 8616.0, 8623.0, 32928.0, 8606.0, 8625.0, 8619.0, 8622.0, 32928.0, 8610.0] + 3634.0 => Any[43874.0, 43875.0, 43872.0, 43871.0, 43878.0] +``` From 05f39e93ad0cf44418e9f12507884d7e47ce8198 Mon Sep 17 00:00:00 2001 From: Jay-sanjay <134289328+Jay-sanjay@users.noreply.github.com> Date: Fri, 2 Feb 2024 01:01:49 +0530 Subject: [PATCH 2/5] added --- _layout/head.html | 1 + 1 file changed, 1 insertion(+) diff --git a/_layout/head.html b/_layout/head.html index a79a03b..67b72b3 100644 --- a/_layout/head.html +++ b/_layout/head.html @@ -20,6 +20,7 @@
  • GitHub
  • Related Organizations
  • Meeting Notes
  • +
  • JuliaHealth Blogs
  • From a25d082409d28b5342ea1e074c12a2600e5fd712 Mon Sep 17 00:00:00 2001 From: Jay-sanjay <134289328+Jay-sanjay@users.noreply.github.com> Date: Fri, 2 Feb 2024 10:51:58 +0530 Subject: [PATCH 3/5] fixes in the link --- JuliaHealth-Blogs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/JuliaHealth-Blogs.md b/JuliaHealth-Blogs.md index 00c0d34..d322bd4 100644 --- a/JuliaHealth-Blogs.md +++ b/JuliaHealth-Blogs.md @@ -17,7 +17,7 @@ Here are the packages we will need for exploring patient pathways grouped by pri - Health analytics built specifically for working with OMOP CDM databases - * [`OMOPCDMCohortCreator.jl`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) - Create cohorts from databases utilizing the OMOP CDM + * [`OMOPCDMCohortCreator`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) - Create cohorts from databases utilizing the OMOP CDM - General data analytics tools From 62cc172d264de1e62a47d92b8b82382103b17d57 Mon Sep 17 00:00:00 2001 From: Jay-sanjay <134289328+Jay-sanjay@users.noreply.github.com> Date: Sun, 25 Feb 2024 15:00:49 +0530 Subject: [PATCH 4/5] requried changes made --- .../Treatment-Pathways.md | 24 ++++++++++++------- 1 file changed, 16 insertions(+), 8 deletions(-) rename JuliaHealth-Blogs.md => blogs/Treatment-Pathways.md (76%) diff --git a/JuliaHealth-Blogs.md b/blogs/Treatment-Pathways.md similarity index 76% rename from JuliaHealth-Blogs.md rename to blogs/Treatment-Pathways.md index d322bd4..3a14c9b 100644 --- a/JuliaHealth-Blogs.md +++ b/blogs/Treatment-Pathways.md @@ -1,10 +1,16 @@ # Exploring Patient Pathways within JuliaHealth +## What is Patient Pathways ? +Patient pathways refer to the journey that patients with specific medical conditions undergo in terms of their treatment. This concept goes beyond simple drug uptake statistics and looks at the sequence of treatments patients receive over time, including first-line treatments and subsequent therapies. Understanding patient pathways is essential for analyzing treatment patterns, adherence to clinical guidelines, and the rational use of drugs. +To analyze patient pathways, one would typically use real-world data from sources such as electronic health records, claims data, and registries. However, barriers such as data interoperability and resource requirements have hindered the full utilization of real-world data for this purpose. + +So to address these challenges we wanna introduce to a set of tool to extract and analyze these patient pathways. These set of tool are based on the Observational Medical Outcomes Partnership (OMOP) common data model, which standardizes health care data to promote interoperability. + ## Introduction Here, we are going to have a walkthrough of how to filter out treatment pathways of interest for a given dataset present in CDM format. -## Required Packages +## Environment Set-Up and Packages 📝 Here are the packages we will need for exploring patient pathways grouped by primary use cases in this exploration: @@ -32,7 +38,7 @@ Here are the packages we will need for exploring patient pathways grouped by pri * `Base` - Default libraries built into Julia -1. First step is to import all the essential Packages: +## Adding the required Dependencies ``` TUTORIAL> add DBInterface @@ -43,7 +49,7 @@ TUTORIAL> add DataFrames TURORIAL> add FunSQL ``` -2. Next we need Data to be worked on. +## Data For this tutorial, we will work with data from [`Eunomia`](https://github.com/OHDSI/Eunomia) that is stored in a SQLite format. To install the data on your machine, execute the following code block and follow the prompts - you will need a stable internet connection for the download to complete: @@ -54,14 +60,15 @@ eunomia = Eunomia() ``` -3. After you have finished your set up in the Julia, we need to establish a connection to the Eunomia SQLite database that we will use for the rest of the tutorial: +## Connecting to the Eunomia Database 💾 +After you have finished your set up in the Julia, we need to establish a connection to the Eunomia SQLite database that we will use for the rest of the tutorial: ``` import SQLite: DB conn = DB(eunomia) ``` -4. With Eunomia, the database's schema is simply called "main". We will use this to generate database connection details that will inform `OMOPCDMCohortCreator` about the type of queries we will write (i.e. SQLite) and the name of the database's schema. For this step, we will use `OMOPCDMCohortCreator`: +With Eunomia, the database's schema is simply called "main". We will use this to generate database connection details that will inform `OMOPCDMCohortCreator` about the type of queries we will write (i.e. SQLite) and the name of the database's schema. For this step, we will use `OMOPCDMCohortCreator`: ``` import OMOPCDMCohortCreator as occ @@ -72,20 +79,21 @@ occ.GenerateDatabaseDetails( ) ``` -5. Then will generate internal representations of each table found within Eunomia for OMOPCDMCohortCreator to use: +Then will generate internal representations of each table found within Eunomia for OMOPCDMCohortCreator to use: ``` occ.GenerateTables(conn) ``` -6. Now to make things easy for this tutorial we will characterize a group of patients with a certain condition (or conditions) across various attributes like race, age, and combinations thereof. We are going to do miniature version of such a study looking at patients with strep throat. For this, we will use the condition_concept_id: 2806028060 - this will be needed for you to get correct results. +## Characterizing Patients Who Have Had Strep Throat 🤒 +Now to make things easy for this tutorial we will characterize a group of patients with a certain condition (or conditions) across various attributes like race, age, and combinations thereof. We are going to do miniature version of such a study looking at patients with strep throat. For this, we will use the ``condition_concept_id``: 28060 - this will be needed for you to get correct results. ``` strep_patients = occ.ConditionFilterPersonIDs(28060, conn) ``` -7. Now this are some of the required functions that probabily would be directly be useable via `occ` after the new version of [`OMOPCDMCohortCreator`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) is released. +**Now this are some of the required functions that probabily would be directly be useable via `occ` after the new version of [`OMOPCDMCohortCreator`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) is released.** * Function to query the drug's start date for a given drug. From cdaaa554d69e3928c7a653bc040fe91564bbffaf Mon Sep 17 00:00:00 2001 From: Jay-sanjay <134289328+Jay-sanjay@users.noreply.github.com> Date: Tue, 26 Mar 2024 00:29:37 +0530 Subject: [PATCH 5/5] Franklin docs style added --- blogs/Treatment-Pathways.md | 166 ++++++++++++++++-------------------- 1 file changed, 72 insertions(+), 94 deletions(-) diff --git a/blogs/Treatment-Pathways.md b/blogs/Treatment-Pathways.md index 3a14c9b..83bcfad 100644 --- a/blogs/Treatment-Pathways.md +++ b/blogs/Treatment-Pathways.md @@ -1,3 +1,4 @@ +@def title = "Patient pathways" # Exploring Patient Pathways within JuliaHealth ## What is Patient Pathways ? @@ -40,37 +41,43 @@ Here are the packages we will need for exploring patient pathways grouped by pri ## Adding the required Dependencies +```julia:./ex1 +using Pkg +Pkg.add(["DBInterface", "SQLite", "DataFrames", "FunSQL", "HealthSampleData", "OMOPCDMCohortCreator"]) +using DBInterface +using HealthSampleData +using OMOPCDMCohortCreator +using SQLite +using DataFrames +using FunSQL ``` -TUTORIAL> add DBInterface -TUTORIAL> add HealthSampleData -TUTORIAL> add OMOPCDMCohortCreator -TUTORIAL> add SQLite -TUTORIAL> add DataFrames -TURORIAL> add FunSQL -``` +\show{./ex1} ## Data For this tutorial, we will work with data from [`Eunomia`](https://github.com/OHDSI/Eunomia) that is stored in a SQLite format. To install the data on your machine, execute the following code block and follow the prompts - you will need a stable internet connection for the download to complete: -``` +```julia:./ex2 import HealthSampleData: Eunomia - eunomia = Eunomia() ``` +\show{./ex2} + ## Connecting to the Eunomia Database 💾 After you have finished your set up in the Julia, we need to establish a connection to the Eunomia SQLite database that we will use for the rest of the tutorial: -``` -import SQLite: DB +```julia:./ex3 +import SQLite: DB conn = DB(eunomia) ``` +\show{./ex3} + With Eunomia, the database's schema is simply called "main". We will use this to generate database connection details that will inform `OMOPCDMCohortCreator` about the type of queries we will write (i.e. SQLite) and the name of the database's schema. For this step, we will use `OMOPCDMCohortCreator`: -``` +```julia:./ex4 import OMOPCDMCohortCreator as occ occ.GenerateDatabaseDetails( @@ -79,146 +86,110 @@ occ.GenerateDatabaseDetails( ) ``` +\show{./ex4} + Then will generate internal representations of each table found within Eunomia for OMOPCDMCohortCreator to use: -``` +```julia:./ex5 occ.GenerateTables(conn) ``` +\show{./ex5} + ## Characterizing Patients Who Have Had Strep Throat 🤒 Now to make things easy for this tutorial we will characterize a group of patients with a certain condition (or conditions) across various attributes like race, age, and combinations thereof. We are going to do miniature version of such a study looking at patients with strep throat. For this, we will use the ``condition_concept_id``: 28060 - this will be needed for you to get correct results. -``` +```julia:./ex6 strep_patients = occ.ConditionFilterPersonIDs(28060, conn) +println(first(strep_patients, 10)) ``` +\show{./ex6} -**Now this are some of the required functions that probabily would be directly be useable via `occ` after the new version of [`OMOPCDMCohortCreator`](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl) is released.** - - -* Function to query the drug's start date for a given drug. -``` -function GetDrugExposureStartDate( - drug_exposure_ids; - tab = drug_exposure -) - - sql = - From(tab) |> - Where(Fun.in(Get.drug_exposure_id, drug_exposure_ids...)) |> - Select(Get.drug_exposure_id, Get.drug_exposure_start_date) |> - q -> render(q, dialect=dialect) - - return String(sql) - -end - -function GetDrugExposureStartDate( - drug_exposure_ids, - conn; - tab = drug_exposure -) - - df = DBInterface.execute(conn, GetDrugExposureStartDate(drug_exposure_ids; tab=tab)) |> DataFrame - - return df -end -``` -* Function to query the drug's end date for a given drug. -``` -function GetDrugExposureEndDate( - drug_exposure_ids; - tab = drug_exposure -) - - sql = - From(tab) |> - Where(Fun.in(Get.drug_exposure_id, drug_exposure_ids...)) |> - Select(Get.drug_exposure_id, Get.drug_exposure_end_date) |> - q -> render(q, dialect=dialect) - - return String(sql) - -end - -function GetDrugExposureEndDate( - drug_exposure_ids, - conn; - tab = drug_exposure -) - - df = DBInterface.execute(conn, GetDrugExposureEndDate(drug_exposure_ids; tab=tab)) |> DataFrame - return df -end -``` -#So now that we have the dataset to work with as well as all the functions to work with also ready, we from here can start work form PATHWAYS-STUDY. +## So now that we have the dataset to work with as well as all the functions to work with also ready, we from here can start work form PATHWAYS-STUDY. * To start with we need to get the `drug ids` corresponding to each of the patients with strep throat. -``` + +```julia:./ex7 patient_drug_exposures = occ.GetDrugExposureIDs(strep_patients, conn) +println(first(patient_drug_exposures, 10)) ``` +\show{./ex7} + * We would also require `drug concepts` -``` +```julia:./ex8 pateints_drug_concept_id = occ.GetDrugConceptIDs(patient_drug_exposures, conn) +println(first(pateints_drug_concept_id, 10)) ``` +\show{./ex8} + * Now that we have the `drug ids` corresponding to each patients we now need to get the `start date` and `end date` corresponding to each `drug ids` -``` -exposure_start_date = GetDrugExposureStartDate(patient_drug_exposures.drug_exposure_id, conn) +```julia:./ex9 +exposure_start_date = occ.GetDrugExposureStartDate(patient_drug_exposures.drug_exposure_id, conn) +println(first(exposure_start_date, 10)) -exposure_end_date = GetDrugExposureEndDate(patient_drug_exposures.drug_exposure_id, conn) +exposure_end_date = occ.GetDrugExposureEndDate(patient_drug_exposures.drug_exposure_id, conn) +println(first(exposure_end_date, 10)) ``` +\show{./ex9} * A thing to notice here is that the dates here are in `unix` format, which preety annoying to understand so we need to convert it into `data-time` format. This can be done as follows -``` +```julia:./ex10 exposure_start_date.drug_exposure_start_date = exposure_start_date.drug_exposure_start_date .|> unix2datetime exposure_end_date.drug_exposure_end_date = exposure_end_date.drug_exposure_end_date .|> unix2datetime + +println(first(exposure_end_date, 10)) ``` +\show{./ex10} * Now to make the Dataframe look more appealing we try to combine the dataset like this: -``` +```julia:./ex11 combined_df = DataFrames.outerjoin(patient_drug_exposures, exposure_start_date, on = :drug_exposure_id) combined_df = DataFrames.outerjoin(combined_df, exposure_end_date, on = :drug_exposure_id) combined_df = DataFrames.outerjoin(combined_df, pateints_drug_concept_id, on = :drug_exposure_id, makeunique=true) +println(first(combined_df, 10)) ``` +\show{./ex11} + + * Now we need to sort the Dataframe in the ascending order of the dates. -``` +```julia:./ex12 combined_df = sort!(combined_df, :drug_exposure_start_date) +println(first(combined_df, 10)) ``` -* An important thing to notice here is that some start and end dates seems to be preety weird like below: -``` - Row │ person_id drug_exposure_id drug_exposure_start_date drug_exposure_end_date drug_concept_id person_id_1 - │ Float64? Float64? Union{Missing, DateTime} Union{Missing, DateTime} Float64? Float64? - 109154 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 1.12708e6 1834.0 - 109155 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 1.12708e6 484.0 - 109156 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 4.02132e7 1834.0 - 109157 │ 484.0 22308.0 2019-07-01T00:00:00 1955-01-22T00:00:00 4.02132e7 484.0 -``` +\show{./ex12} -* So in-order to address this issue within the dataset, we will chop off such rows for our pathways study by doinf something like this: -``` +* An important thing to notice here is that some start and end dates seems to be preety weird. We need to make sure that the start date is less than the end date. + +* So in-order to address this issue within the dataset, we will chop off such rows for our pathways study by doing something like this: + +```julia:./ex13 combined_df = combined_df[combined_df.drug_exposure_start_date .< combined_df.drug_exposure_end_date, :] +println(first(combined_df, 10)) ``` +\show{./ex13} + * Now as a final step to exploring pathways, we peform a very naive approach to get the treatment-pathways by: 1. Itterate through each of the `patients_id` -2. Itterate through the combined_df and push the drug_exposure_id to the drug_pathways list if the two consecutive start dates are different as well as the two consecutive end dates are different. +2. Itterate through the combined_df and push the `drug_exposure_id` to the `drug_pathways` list if the two consecutive start dates are different as well as the two consecutive end dates are different. -``` +```julia:./ex14 pathways_dict = Dict() for person_id in unique(combined_df.person_id) @@ -232,9 +203,16 @@ for person_id in unique(combined_df.person_id) pathways_dict[person_id] = pathways end ``` +\show{./ex14} * Now the pathways of our interset are present in the dixtionary `pathways_dict`that would look like this: +```julia:./ex15 +println(first(pathways_dict, 10)) +``` + +\show{./ex15} + ``` Dict{Any, Any} with 1677 entries: 4986.0 => Any[59989.0, 59989.0, 59977.0, 59977.0, 59969.0, 59976.0, 59976.0, 59986.0, 59973.0, 59975.0, 59968.0, 59984.0, 59985.0, 59982.0, 59983.0, 59988.0, 59970.0]