diff --git a/GSoC.md b/GSoC.md new file mode 100644 index 0000000..d631352 --- /dev/null +++ b/GSoC.md @@ -0,0 +1,45 @@ +@def title = "JuliaHealth - Google Summer of Code" + +This page lists our [Google Summer of Code (GSoC)](https://summerofcode.withgoogle.com) fellows and their experiences working across the JuliaHealth ecosystem. +Students interested in being a GSoC fellow should review these past projects to get a sense for what we look for in building projects that contribute to the JuliaHealth ecosystem. + +\toc + +# GSoC 2023 + +## JuliaHealth's Tools for Patient-Level Predictions: Strengthening Capacity and Innovation + +**Student:** Fareeda Abdelazeez + +**Mentor:** Jacob Zelko + +[Project Proposal](https://docs.google.com/document/d/18-p6VG6MwvzFdyA45MvXyqxOLVEByFP6D_gff9-E1XE/edit#heading=h.zgq6k5hzq0t) ##add the pdf + +**Summary:** Working with the OMOP CDM (Observational Medical Outcomes Partnership Common Data Model) involves handling large datasets that require a set of tools for extracting necessary data efficiently for various analyses. +The first part of the project focused on improving JuliaHealth's infrastructure by increasing the range of tools available to users. +This involved enabling connections to various databases and working with building understanding on how to robustly work with observational health data. +The second goal was to leverage the capacity built in the previous phase to develop a comprehensive framework for patient-level prediction. +This framework explored how to predict patient cohort outcomes with given treatments and was tested on the [MIMIC III dataset](https://physionet.org/content/mimiciii/1.4/) that was converted to the OMOP CDM. + + + +**Fellowship core accomplishments:** + +The [PR](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl/pull/54) for OMOPCDMCohortCreator added new features: + +- Enriched OMOPCDMCohortCreator tools +- Intensive tests for new functions +- Updated the documentation + +The [PR](https://github.com/JuliaDatabases/DBConnector.jl/pull/13) +for DBConnector + +- Created a documentation +- Rewired tools used to connect to SQLite,postgresql, MySQL +- Created test unit + +This Jupyter notebook shows the flow of creating a prediction model from OMOP CDM using developed packages through the program. +You can find it on juliahealth website - the tutorial section. +For more details, this [blogpost] (https://medium.com/@fareedaabdelazeez/google-summer-of-code-2023-strengthening-healthcare-with-juliahealth-7b8fde5af9ec) wraps up the details of GSoC program achievements. Check Acknowledgments too! + +- Poster presentation at [JuliaCon 2023, _JuliaHealth's Tools for Patient-Level Predictions: Strengthening Capacity and Innovation_](/assets/JuliaCon-gsoc.pdf) diff --git a/_assets/JuliaCon-gsoc.pdf b/_assets/JuliaCon-gsoc.pdf new file mode 100644 index 0000000..ff8835d Binary files /dev/null and b/_assets/JuliaCon-gsoc.pdf differ diff --git a/_assets/JuliaHealth-Patient-level-prediction.html b/_assets/JuliaHealth-Patient-level-prediction.html new file mode 100644 index 0000000..5ad673a --- /dev/null +++ b/_assets/JuliaHealth-Patient-level-prediction.html @@ -0,0 +1,16901 @@ + + +
+ + +using SQLite
+using LibPQ
+
using DataFrames
+using DBInterface
+import OMOPCDMCohortCreator as occ
+"""
+using OMOPCDMDatabaseConnector
+This package will connect the database directly in the future but it is still under maintanence
+"""
+
+#connecting the basic way
+DBconn = DBInterface.connect(LibPQ.Connection,"**********************************")
+occ.GenerateDatabaseDetails(
+ :postgresql,
+ "omop"
+ )
+tables = occ.GenerateTables(DBconn)
+occ.GetDatabasePersonIDs(DBconn)
+
[ Info: Global database dialect set to: postgresql +[ Info: Global schema set to: omop +[ Info: measurement table generated internally +[ Info: payer_plan_period table generated internally +[ Info: location table generated internally +[ Info: source_to_concept_map table generated internally +[ Info: note_nlp table generated internally +[ Info: visit_detail_assign table generated internally +[ Info: visit_occurrence table generated internally +[ Info: vocabulary table generated internally +[ Info: procedure_occurrence table generated internally +[ Info: relationship table generated internally +[ Info: domain table generated internally +[ Info: dose_era table generated internally +[ Info: concept table generated internally +[ Info: death table generated internally +[ Info: metadata table generated internally +[ Info: concept_class table generated internally +[ Info: drug_era table generated internally +[ Info: note table generated internally +[ Info: specimen table generated internally +[ Info: condition_occurrence table generated internally +[ Info: concept_ancestor table generated internally +[ Info: cohort table generated internally +[ Info: fact_relationship table generated internally +[ Info: drug_exposure table generated internally +[ Info: person table generated internally +[ Info: observation_period table generated internally +[ Info: cost table generated internally +[ Info: cohort_attribute table generated internally +[ Info: observation table generated internally +[ Info: condition_era table generated internally +[ Info: concept_relationship table generated internally +[ Info: provider table generated internally +[ Info: concept_synonym table generated internally +[ Info: attribute_definition table generated internally +[ Info: cdm_source table generated internally +[ Info: cohort_definition table generated internally +[ Info: drug_strength table generated internally +[ Info: visit_detail table generated internally +[ Info: care_site table generated internally +[ Info: device_exposure table generated internally ++
46520-element Vector{Int64}: + 622701440 + 622684030 + 622692774 + 622709475 + 622705072 + 622691611 + 622703768 + 622697129 + 622701153 + 622682262 + 622705465 + 622711774 + 622693042 + ⋮ + 622690452 + 622698813 + 622691630 + 622689444 + 622691888 + 622678894 + 622709120 + 622702234 + 622706201 + 622680998 + 622693599 + 622698890+
### The concepts for AFib and Stroke from ATLAS OHDSI
+
+Afib = [4199501,313217]
+stroke = [4164092,
+44784623,
+40480002,
+43530679,
+437544,
+43531622,
+4112018,
+374055,
+759831,
+442615,
+437540,
+4201094,
+4326561,
+4029497,
+380747,
+372924,
+316437,
+375557,
+376713,
+374384,
+441874,
+381316,
+381591,
+44782819,
+36712779,
+438873,
+438881,
+438270,
+434166,
+440537,
+4014781,
+372721,
+4159164,
+4153380,
+37109512,
+432346,
+43530687,
+40479575,
+4306943,
+441246,
+40481762,
+40484522,
+40484513,
+436277,
+437427,
+439190,
+439847,
+42873157,
+444197,
+444198,
+444196,
+433624,
+436526,
+40492969,
+434155,
+4310996,
+434056,
+40480938,
+40481842,
+378774,
+377254,
+444091,
+443790,
+443864,
+314667,
+436430,
+4162038,
+435378,
+433037,
+372654,
+443599,
+436519,
+260841,
+313272,
+313833,
+40480449,
+432923,
+4134162,
+441709,
+440244,
+378544,
+4318408,
+439040,
+433050,
+618759,
+4045745,
+373503,
+4154699,
+4136546,
+4017107,
+380423,
+434656,
+43531583]
+
93-element Vector{Int64}: + 4164092 + 44784623 + 40480002 + 43530679 + 437544 + 43531622 + 4112018 + 374055 + 759831 + 442615 + 437540 + 4201094 + 4326561 + ⋮ + 4318408 + 439040 + 433050 + 618759 + 4045745 + 373503 + 4154699 + 4136546 + 4017107 + 380423 + 434656 + 43531583+
### since mimic iii data manipulated date of birth for patients for privacy issues, it can be retrieved from mimic data using this query
+## IMPORTANT NOTE: Addition of 229896253 for each mimic_id is a special case for our database, Almost you don't have to add it
+
+Mconn = LibPQ.Connection("*******************************************")
+LibPQ.execute(Mconn,"set search_path to mimiciii")
+
+### something happened in the database and each mimic_id is past its corresponding person_id by 229896253
+
+age_mimic = LibPQ.execute(Mconn,"SELECT pat.subject_id, (pat.mimic_id) AS person_id,
+ CAST(CAST(EXTRACT(epoch FROM adm.admittime - pat.dob)/(60*60*24*365.242) AS numeric) AS integer) AS age
+FROM icustays ie
+INNER JOIN admissions adm
+ ON ie.hadm_id = adm.hadm_id
+INNER JOIN patients pat
+ ON ie.subject_id = pat.subject_id
+;") |> DataFrame
+age_mimic = unique(age_mimic, :person_id)
+sort(age_mimic, :person_id)
+
Row | subject_id | person_id | age |
---|---|---|---|
Int32? | Int32? | Int32? | |
1 | 249 | 622672103 | 75 |
2 | 250 | 622672104 | 24 |
3 | 251 | 622672105 | 20 |
4 | 252 | 622672106 | 55 |
5 | 253 | 622672107 | 84 |
6 | 255 | 622672108 | 78 |
7 | 256 | 622672109 | 77 |
8 | 257 | 622672110 | 82 |
9 | 258 | 622672111 | 0 |
10 | 260 | 622672112 | 0 |
11 | 261 | 622672113 | 76 |
12 | 262 | 622672114 | 64 |
13 | 263 | 622672115 | 56 |
⋮ | ⋮ | ⋮ | ⋮ |
46465 | 44065 | 622718611 | 66 |
46466 | 44069 | 622718612 | 67 |
46467 | 44071 | 622718613 | 60 |
46468 | 44073 | 622718614 | 88 |
46469 | 44082 | 622718615 | 66 |
46470 | 44083 | 622718616 | 54 |
46471 | 44084 | 622718617 | 58 |
46472 | 44089 | 622718618 | 85 |
46473 | 44115 | 622718619 | 37 |
46474 | 44123 | 622718620 | 85 |
46475 | 44126 | 622718621 | 52 |
46476 | 44128 | 622718622 | 51 |
### Getting list of person_ids who suffer from AFib and stroke using OMOPCDMCohortCreator###
+
AFib_combined_df = DataFrame()
+stroke_combined_df = DataFrame()
+AFib_combined_df = occ.ConditionFilterPersonIDs(Afib,DBconn)
+AFib_combined_df[!, :has_AFib] .= 1
+stroke_combined_df = occ.ConditionFilterPersonIDs(stroke,DBconn)
+stroke_combined_df[!, :has_stroke] .= 1
+df = outerjoin(AFib_combined_df, stroke_combined_df, on = [:person_id => :person_id], matchmissing = :equal)
+df = coalesce.(df, 0)
+
Row | person_id | has_AFib | has_stroke |
---|---|---|---|
Int32 | Int64 | Int64 | |
1 | 622707430 | 1 | 1 |
2 | 622716907 | 1 | 1 |
3 | 622704382 | 1 | 1 |
4 | 622695167 | 1 | 1 |
5 | 622679460 | 1 | 1 |
6 | 622673865 | 1 | 1 |
7 | 622690154 | 1 | 1 |
8 | 622679180 | 1 | 1 |
9 | 622678192 | 1 | 1 |
10 | 622702721 | 1 | 1 |
11 | 622693672 | 1 | 1 |
12 | 622697655 | 1 | 1 |
13 | 622713836 | 1 | 1 |
⋮ | ⋮ | ⋮ | ⋮ |
15126 | 622707415 | 0 | 1 |
15127 | 622700696 | 0 | 1 |
15128 | 622679790 | 0 | 1 |
15129 | 622687676 | 0 | 1 |
15130 | 622691527 | 0 | 1 |
15131 | 622717579 | 0 | 1 |
15132 | 622698960 | 0 | 1 |
15133 | 622702426 | 0 | 1 |
15134 | 622708972 | 0 | 1 |
15135 | 622701212 | 0 | 1 |
15136 | 622717243 | 0 | 1 |
15137 | 622690412 | 0 | 1 |
#Getting each patiend gender and race
+df = occ.GetPatientGender(df, DBconn)
+df = occ.GetPatientRace(df, DBconn)
+
Row | person_id | race_concept_id | gender_concept_id | has_AFib | has_stroke |
---|---|---|---|---|---|
Int32? | Int32? | Int32? | Int64? | Int64? | |
1 | 622707430 | 8527 | 8507 | 1 | 1 |
2 | 622716907 | 8527 | 8507 | 1 | 1 |
3 | 622704382 | 8527 | 8507 | 1 | 1 |
4 | 622695167 | 8527 | 8532 | 1 | 1 |
5 | 622679460 | 4087921 | 8532 | 1 | 1 |
6 | 622673865 | 8527 | 8532 | 1 | 1 |
7 | 622690154 | 8527 | 8507 | 1 | 1 |
8 | 622679180 | 8527 | 8507 | 1 | 1 |
9 | 622678192 | 8527 | 8507 | 1 | 1 |
10 | 622702721 | 4188159 | 8507 | 1 | 1 |
11 | 622693672 | 4218674 | 8532 | 1 | 1 |
12 | 622697655 | 8527 | 8507 | 1 | 1 |
13 | 622713836 | 8527 | 8507 | 1 | 1 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
15126 | 622707415 | 8527 | 8507 | 0 | 1 |
15127 | 622700696 | 4087921 | 8532 | 0 | 1 |
15128 | 622679790 | 8527 | 8507 | 0 | 1 |
15129 | 622687676 | 4188159 | 8532 | 0 | 1 |
15130 | 622691527 | 8527 | 8532 | 0 | 1 |
15131 | 622717579 | 8527 | 8532 | 0 | 1 |
15132 | 622698960 | 8527 | 8507 | 0 | 1 |
15133 | 622702426 | 8515 | 8507 | 0 | 1 |
15134 | 622708972 | 8527 | 8507 | 0 | 1 |
15135 | 622701212 | 38003599 | 8532 | 0 | 1 |
15136 | 622717243 | 8527 | 8532 | 0 | 1 |
15137 | 622690412 | 8527 | 8532 | 0 | 1 |
#joining the table with their ages
+df = leftjoin(df, age_mimic, on = [:person_id => :person_id], matchmissing = :equal, makeunique=true)
+sort(df, :person_id)
+
+ArgumentError: column :person_id not found in the left data frame + +Stacktrace: + [1] DataFrames.DataFrameJoiner(dfl::DataFrame, dfr::DataFrame, on::Vector{Pair{Symbol, Symbol}}, matchmissing::Symbol, kind::Symbol) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/join/composer.jl:54 + [2] _join(df1::DataFrame, df2::DataFrame; on::Vector{Pair{Symbol, Symbol}}, kind::Symbol, makeunique::Bool, indicator::Nothing, validate::Tuple{Bool, Bool}, left_rename::typeof(identity), right_rename::typeof(identity), matchmissing::Symbol, order::Symbol) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/join/composer.jl:504 + [3] #leftjoin#673 + @ ~/.julia/packages/DataFrames/58MUJ/src/join/composer.jl:940 [inlined] + [4] top-level scope + @ In[9]:2+
#df.age .= ifelse.(df.age .> 89, 90, df.age)
+df
+
Row | person_id | race_concept_id | gender_concept_id | has_AFib | has_stroke | subject_id | age |
---|---|---|---|---|---|---|---|
Int32 | Int32 | Int32 | Int64 | Int64 | Int32 | Int32 | |
1 | 622672122 | 4218674 | 8507 | 0 | 1 | 270 | 80 |
2 | 622672559 | 38003599 | 8507 | 0 | 1 | 274 | 66 |
3 | 622672560 | 8527 | 8507 | 0 | 1 | 275 | 82 |
4 | 622672566 | 8527 | 8532 | 0 | 1 | 282 | 74 |
5 | 622672568 | 8527 | 8532 | 1 | 0 | 284 | 87 |
6 | 622672570 | 8527 | 8532 | 1 | 0 | 286 | 85 |
7 | 622672573 | 4218674 | 8507 | 1 | 0 | 290 | 74 |
8 | 622672587 | 38003599 | 8532 | 1 | 1 | 304 | 300 |
9 | 622672588 | 8527 | 8532 | 1 | 1 | 305 | 73 |
10 | 622672589 | 8527 | 8532 | 0 | 1 | 306 | 61 |
11 | 622672590 | 8527 | 8532 | 0 | 1 | 307 | 75 |
12 | 622672595 | 4218674 | 8532 | 0 | 1 | 313 | 79 |
13 | 622672603 | 4218674 | 8532 | 1 | 0 | 321 | 75 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
15116 | 622713243 | 4218674 | 8532 | 1 | 0 | 94879 | 300 |
15117 | 622713245 | 8527 | 8507 | 1 | 0 | 94889 | 53 |
15118 | 622713246 | 8527 | 8507 | 1 | 1 | 94896 | 77 |
15119 | 622713249 | 8527 | 8507 | 1 | 0 | 94906 | 80 |
15120 | 622713254 | 8527 | 8507 | 1 | 0 | 94916 | 300 |
15121 | 622713255 | 8527 | 8507 | 1 | 0 | 94921 | 69 |
15122 | 622713256 | 8527 | 8532 | 0 | 1 | 94924 | 75 |
15123 | 622713257 | 38003599 | 8532 | 1 | 1 | 94926 | 87 |
15124 | 622713258 | 8527 | 8532 | 0 | 1 | 94932 | 47 |
15125 | 622713261 | 8527 | 8507 | 1 | 1 | 94942 | 80 |
15126 | 622713264 | 8527 | 8532 | 0 | 1 | 94953 | 53 |
15127 | 622713265 | 4218674 | 8532 | 1 | 0 | 94954 | 68 |
[count(ismissing,col) for col in eachcol(df)]
+dropmissing!(df)
+[count(ismissing,col) for col in eachcol(df)]
+
5-element Vector{Int64}: + 0 + 0 + 0 + 0 + 0+
eltype.(eachcol(df))
+
7-element Vector{DataType}: + Int32 + Int32 + Int32 + Int64 + Int64 + Int32 + Int32+
# dropping person_id and subject_id columns because they won't be needed
+df = select(df, Not(:person_id))
+df = select(df, Not(:subject_id))
+
+ArgumentError: column name :subject_id not found in the data frame + +Stacktrace: + [1] lookupname + @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:413 [inlined] + [2] getindex + @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:422 [inlined] + [3] getindex + @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:227 [inlined] + [4] manipulate(df::DataFrame, c::InvertedIndex{Symbol}; copycols::Bool, keeprows::Bool, renamecols::Bool) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/selection.jl:1836 + [5] select(df::DataFrame, args::Any; copycols::Bool, renamecols::Bool, threads::Bool) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/selection.jl:1299 + [6] select(df::DataFrame, args::Any) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/selection.jl:1299 + [7] top-level scope + @ In[8]:3+
eltype.(eachcol(df))
+df.age = convert.(Int64, df.age)
+df.race_concept_id = convert.(Int64, df.race_concept_id)
+df.gender_concept_id = convert.(Int64, df.gender_concept_id)
+df.has_AFib = convert.(Int64, df.has_AFib)
+df.has_stroke = convert.(Int64, df.has_stroke)
+df = select(df, Not(:gender_concept_id))
+df = select(df, Not(:race_concept_id))
+
+ArgumentError: column name :age not found in the data frame + +Stacktrace: + [1] lookupname + @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:413 [inlined] + [2] getindex + @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:422 [inlined] + [3] getindex(df::DataFrame, #unused#::typeof(!), col_ind::Symbol) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:557 + [4] getproperty(df::DataFrame, col_ind::Symbol) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/abstractdataframe.jl:431 + [5] top-level scope + @ In[10]:2+
describe(df)
+
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Float64 | Int64 | Float64 | Int64 | Int64 | DataType | |
1 | has_AFib | 0.681766 | 0 | 1.0 | 1 | 0 | Int64 |
2 | has_stroke | 0.44256 | 0 | 0.0 | 1 | 0 | Int64 |
3 | age | 87.8847 | 0 | 74.0 | 307 | 0 | Int64 |
using Pkg
+using StatsBase
+countmap(df.has_stroke)
+
Dict{Int64, Int64} with 2 entries: + 0 => 2892 + 1 => 2296+
#using MLJ
+#df, df_test = partition(df, 0.7, rng=123, shuffle= true)
+
(3632×3 DataFrame + Row │ has_AFib has_stroke age + │ Int64 Int64 Int64 +──────┼───────────────────────────── + 1 │ 1 0 69 + 2 │ 1 0 67 + 3 │ 1 0 62 + 4 │ 0 1 49 + 5 │ 0 1 63 + 6 │ 1 0 44 + 7 │ 1 0 88 + 8 │ 1 0 83 + 9 │ 1 0 85 + 10 │ 1 0 79 + 11 │ 1 0 67 + ⋮ │ ⋮ ⋮ ⋮ + 3623 │ 1 0 80 + 3624 │ 0 1 81 + 3625 │ 1 1 77 + 3626 │ 0 1 79 + 3627 │ 1 0 73 + 3628 │ 1 0 76 + 3629 │ 1 0 71 + 3630 │ 1 0 88 + 3631 │ 1 0 80 + 3632 │ 1 0 84 + 3611 rows omitted, 1556×3 DataFrame + Row │ has_AFib has_stroke age + │ Int64 Int64 Int64 +──────┼───────────────────────────── + 1 │ 1 0 77 + 2 │ 1 1 300 + 3 │ 1 0 80 + 4 │ 1 0 84 + 5 │ 1 0 84 + 6 │ 1 0 89 + 7 │ 1 0 86 + 8 │ 0 1 60 + 9 │ 1 1 67 + 10 │ 1 0 83 + 11 │ 1 0 85 + ⋮ │ ⋮ ⋮ ⋮ + 1547 │ 1 0 59 + 1548 │ 1 1 71 + 1549 │ 1 0 73 + 1550 │ 1 0 59 + 1551 │ 1 0 300 + 1552 │ 1 1 53 + 1553 │ 1 1 58 + 1554 │ 0 1 74 + 1555 │ 1 0 59 + 1556 │ 1 0 85 + 1535 rows omitted)+
using MLJ
+using BetaML
+
+# Split the data into training and test sets
+#selected_features = [:gender_concept_id, :age, :race_concept_id, :has_AFib]
+selected_features = [ :age, :has_AFib]
+
+X = select(df, selected_features)
+y = df.has_stroke
+train, test = partition(df, 0.7, rng=123, shuffle=true)
+train_X = select(train, selected_features)
+train_y = train.has_stroke
+test_X = select(test, selected_features)
+test_y = test.has_stroke
+
+models(matching(X, y))
+
3-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}: + (name = EvoTreeCount, package_name = EvoTrees, ... ) + (name = NeuralNetworkClassifier, package_name = BetaML, ... ) + (name = NeuralNetworkRegressor, package_name = BetaML, ... )+
Tree = @load EvoTreeClassifier pkg = EvoTrees
+using EvoTrees
+config = EvoTreeClassifier(
+ loss=:mse,
+ nrounds=100,
+ max_depth=6,
+ nbins=32,
+ eta=0.1)
+
import EvoTrees ✔ ++
[ Info: For silent loading, specify `verbosity=0`. ++
EvoTreeClassifier( + nrounds = 100, + lambda = 0.0, + gamma = 0.0, + eta = 0.1, + max_depth = 6, + min_weight = 1.0, + rowsample = 1.0, + colsample = 1.0, + nbins = 32, + alpha = 0.5, + tree_type = "binary", + rng = Random.MersenneTwister(123))+
train_X =Matrix(train_X)
+m = fit_evotree(config; x_train = train_X, y_train=train_y)
+
┌ Info: EvoTreeClassifier{EvoTrees.MLogLoss} +│ - nrounds: 100 +│ - lambda: 0.0 +│ - gamma: 0.0 +│ - eta: 0.1 +│ - max_depth: 6 +│ - min_weight: 1.0 +│ - rowsample: 1.0 +│ - colsample: 1.0 +│ - nbins: 32 +│ - alpha: 0.5 +│ - tree_type: binary +└ - rng: Random.MersenneTwister(123, (0, 4008, 3006, 626)) ++
EvoTree{EvoTrees.MLogLoss, 2} + - Contains 101 trees in field `trees` (incl. 1 bias tree). + - Data input has 2 features. + - [:target_levels, :fnames, :feattypes, :edges, :featbins] info accessible in field `info` ++
test_X = Matrix(test_X)
+preds = MLJ.predict( m,test_X)
+
1090×2 Matrix{Float32}: + 0.800264 0.199736 + 0.814238 0.185762 + 1.36711f-5 0.999986 + 0.847743 0.152257 + 0.827981 0.172019 + 1.36711f-5 0.999986 + 0.741776 0.258224 + 0.822133 0.177867 + 1.36711f-5 0.999986 + 0.847743 0.152257 + 1.36711f-5 0.999986 + 0.800264 0.199736 + 1.36711f-5 0.999986 + ⋮ + 1.36711f-5 0.999986 + 0.797935 0.202065 + 0.735359 0.264641 + 0.735655 0.264345 + 0.741776 0.258224 + 0.860911 0.139089 + 0.908374 0.0916265 + 1.36711f-5 0.999986 + 0.908374 0.0916265 + 0.826605 0.173395 + 1.36711f-5 0.999986 + 1.36711f-5 0.999986+
features_gain = EvoTrees.importance(m)
+
2-element Vector{Pair{String, Float64}}: + "feat_2" => 0.9758581174651332 + "feat_1" => 0.024141882534866845+
using Plots
+plot(preds)
+
preds[:,2]= [if x < 0.5 0 else 1 end for x in preds[:,2]];
+
preds
+
1090×2 Matrix{Float32}: + 0.800264 0.0 + 0.814238 0.0 + 1.36711f-5 1.0 + 0.847743 0.0 + 0.827981 0.0 + 1.36711f-5 1.0 + 0.741776 0.0 + 0.822133 0.0 + 1.36711f-5 1.0 + 0.847743 0.0 + 1.36711f-5 1.0 + 0.800264 0.0 + 1.36711f-5 1.0 + ⋮ + 1.36711f-5 1.0 + 0.797935 0.0 + 0.735359 0.0 + 0.735655 0.0 + 0.741776 0.0 + 0.860911 0.0 + 0.908374 0.0 + 1.36711f-5 1.0 + 0.908374 0.0 + 0.826605 0.0 + 1.36711f-5 1.0 + 1.36711f-5 1.0+
pred = convert.(Int64, preds[:,2])
+prediction_df = DataFrame(y_actual = test_y, y_predicted = pred, prob_predicted = preds[:,1]);
+prediction_df.correctly_classified = prediction_df.y_actual .== prediction_df.y_predicted
+
1090-element BitVector: + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 1 + 1 + 1 + 1 + ⋮ + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1+
accuracy = mean(prediction_df.correctly_classified)
+print("Accuracy of the model is : ",accuracy)
+
Accuracy of the model is : 0.865137614678899+