6 Step 3: Recurrence – High-Dimensional Propensity Scores

6.1 Genrate recurrence covariates

(Schneeweiss et al. 2009)

In this step, we generate 3 binary recurrence covariates for each of the candidate empirical covariates identified in the previous step:

occurred at least once
occurred sporadically (at least more than the median)
occurred frequently (at least more than the 75th percentile)

step2 <- get_recurrence_covariates(df = out1, 
                                   patientIdVarname = "idx",
                                   eventCodeVarname = "icd10", 
                                   patientIdVector = patientIds)

6.2 Example of recurrence covariates

Important

In this step, the algorithm analyzes the specific ICD-10-CM code D64.9 (Anemia, unspecified). This code was selected in Step 2 because it appeared frequently enough in the overall dataset (e.g., in at least 20 patients).

The Goal: The algorithm must convert the simple count of how many times a patient had an anemia diagnosis into binary (Yes/No) variables. This is necessary because the Bross formula used in the next step requires binary inputs.

For the anemia code (D64.9), the algorithm generates three distinct “Recurrence Covariates” for every patient. To determine if a patient gets a “Yes” (1) or “No” (0) for these variables, their personal frequency is compared against the population’s statistics (Median and 75th Percentile).

Did the patient have the anemia code at least once during the assessment window? (rec_dx_D64_once)
Did the patient have the anemia code more often than the median frequency of the population? (rec_dx_D64_sporadic)
Did the patient have the anemia code more often than the 75th percentile of the population? (rec_dx_D64_frequent)

Output: Instead of simply recording that Patient A had “2 counts of anemia,” the hdPS algorithm assigns them three binary flags for this specific condition:

Once: yes/no
Sporadic: yes/no
Frequent: yes/no

This process creates a detailed profile of the intensity of the patient’s interaction with the healthcare system regarding anemia.

ICD-10-CM code (dimension 1)	code appeared at least once	code appeared at least more than the median	code appeared at least more than the 75th percentile
D64.9 Anemia	rec_dx_D64_once	rec_dx_D64_sporadic	rec_dx_D64_frequent
D75.9P Blood clots	rec_dx_D75_once	rec_dx_D75_sporadic	rec_dx_D75_frequent
D89.9 Immune disorder	rec_dx_D89_once	rec_dx_D89_sporadic	rec_dx_D89_frequent
\(\ldots\)	\(\ldots\)	\(\ldots\)	\(\ldots\)
E07.9 Disorder of thyroid	rec_dx_E07_once	rec_dx_E07_sporadic	rec_dx_E07_frequent

Example of 3 binary covariates (hypothetical) created based on the candidate empirical covariates.

6.3 Recurrence covariates in the data

out2 <- step2$recurrence_data
ncol(out2)-1
#> [1] 91

Here we show binary recurrence covariates for only 2 columns

6.4 Refined recurrence covariates

Below you can click to see a list of all recurrence covariates obtained in our data.

ICD-10 Recurrence Data
1	rec_dx_A49_once	rec_dx_B00_once	rec_dx_B35_once
2	rec_dx_C50_once	rec_dx_D75_once	rec_dx_E03_once
3	rec_dx_E04_once	rec_dx_E07_once	rec_dx_E78_once
4	rec_dx_E87_once	rec_dx_F31_once	rec_dx_F31_frequent
5	rec_dx_F32_once	rec_dx_F39_once	rec_dx_F41_once
6	rec_dx_F43_once	rec_dx_F90_once	rec_dx_G25_once
7	rec_dx_G40_once	rec_dx_G40_frequent	rec_dx_G43_once
8	rec_dx_G47_once	rec_dx_H04_once	rec_dx_H40_once
9	rec_dx_H40_frequent	rec_dx_I10_once	rec_dx_I10_frequent
10	rec_dx_I20_once	rec_dx_I21_once	rec_dx_I48_once
11	rec_dx_I48_frequent	rec_dx_I49_once	rec_dx_I50_once
12	rec_dx_I50_frequent	rec_dx_I51_once	rec_dx_I63_once
13	rec_dx_J30_once	rec_dx_J42_once	rec_dx_J44_once
14	rec_dx_J44_frequent	rec_dx_J45_once	rec_dx_J45_frequent
15	rec_dx_K04_once	rec_dx_K08_once	rec_dx_K21_once
16	rec_dx_K25_once	rec_dx_K27_once	rec_dx_K30_once
17	rec_dx_K59_once	rec_dx_K92_once	rec_dx_L40_once
18	rec_dx_L70_once	rec_dx_M06_once	rec_dx_M06_frequent
19	rec_dx_M10_once	rec_dx_M13_once	rec_dx_M19_once
20	rec_dx_M1A_once	rec_dx_M25_once	rec_dx_M54_once
21	rec_dx_M62_once	rec_dx_M79_once	rec_dx_M81_once
22	rec_dx_N28_once	rec_dx_N32_once	rec_dx_N39_once
23	rec_dx_N40_once	rec_dx_N92_once	rec_dx_N94_once
24	rec_dx_N95_once	rec_dx_R00_once	rec_dx_R05_once
25	rec_dx_R06_once	rec_dx_R07_once	rec_dx_R09_once
26	rec_dx_R10_once	rec_dx_R11_once	rec_dx_R12_once
27	rec_dx_R25_once	rec_dx_R32_once	rec_dx_R35_once
28	rec_dx_R39_once	rec_dx_R41_once	rec_dx_R42_once
29	rec_dx_R51_once	rec_dx_R52_once	rec_dx_R60_once
30	rec_dx_R73_once	rec_dx_T14_once	rec_dx_T78_once
31	rec_dx_Z79_once

Tip

Given that we had one dimension of proxy data, \(p=1\), at most \(n=200\) most prevalent codes (with the restriction that minimum number of patients in each code = 20), and \(3\) intensity, we could theoretically have at most \(p \times n \times 3 = 1 \times 200 \times \ 3 = 600\) recurrence covariates.

Based on all of the restrictions, we created 91 distinct recurrence covariates.
The merged data (analytic and proxies) size is now 7,585.

If 2 or all 3 recurrence covariates are identical, only one distinct recurrence covariate is returned. This is why you do not see any sporadic recurrence covariate here.
Recurrence covariate creation is for each patient, and it is possible to have same code occur multiple time because we are working with a 3 digit granularity (possible to have medications from other codes within same ICD-10 3 digit granularity).