Data Exploration

To better understand the dataset, we can explore the distribution of patients and observations per patient.

## Trial characteristics
n <- length(unique(simulated.data$id))
n1 <- length(unique(simulated.data$id[simulated.data$Z==1]))
n0 <- length(unique(simulated.data$id[simulated.data$Z==0]))
averageObs <- mean(table(simulated.data$id))

cat(n, "participants total", "\n", 
        n1, "were randomized to the treatment arm", "\n",
        n0, "were randomized to the control arm.", "\n",
        "On average each patient has ", round(averageObs, 1), "observations")

## 2000 participants total 
##  1000 were randomized to the treatment arm 
##  1000 were randomized to the control arm. 
##  On average each patient has  58.8 observations

There are 1000 individuals in each treatment arm with approximately 59 observations per person. The proportion of trial participants who experience the event of interest in the trial dataset is:

## Event rate
e.rate <- sum(simulated.data$Y)/
  length(unique(simulated.data$id))
cat("Event rate is", round(e.rate*100,1), "%\n")

## Event rate is 11.7 %

Next, we can quantify non-adherence within the trial by calculating what proportion of individuals became non-adherent by their last observation:

## Non-adherence rates by arm

## how many person-time deviated in the treatment arm?
simulated.data.treated <- simulated.data[simulated.data$Z==1 & 
                                           simulated.data$Alag1==1,]
## how many person-time deviated in the control arm?
simulated.data.control <- simulated.data[simulated.data$Z==0 & 
                                           simulated.data$Alag1==0,]

pt1dev = dim(simulated.data.treated[simulated.data.treated$A==0&
                                      simulated.data.treated$t0<59,])[1]/n1
pt0dev = dim(simulated.data.control[simulated.data.control$A==1&
                                      simulated.data.control$t0<59,])[1]/n0

cat("In the treatment arm", pt1dev*100, "% were non-adherent by their last observation", 
        "\n", "In the control arm", pt0dev*100, 
        "% were non-adherent by their last observation")

## In the treatment arm 30.3 % were non-adherent by their last observation 
##  In the control arm 27.2 % were non-adherent by their last observation

As the last part of our exploration, let’s see the covariates for this dataset and how they are distributed between the two treatment arms. In this example trial, there is one measured baseline covariate and two time-varying covariates. The estimation methods detailed in this document can be expanded to handle additional baseline or time-varying covariates by using the complete set of observed baseline covariates wherever \(B\) is used, and the complete set of observed time-varying covariates whereever \(L_{1}\) and \(L_{2}\) are used.

simulated.data %>% 
    group_by(Z) %>% 
    summarize(B_mean = mean(B),
                        B_sd = sd(B),
                        L1_mean = mean(L1),
                        L1_sd = sd(L1),
                        L2_prop = sum(L2)/n())

## # A tibble: 2 × 6
##       Z B_mean  B_sd L1_mean L1_sd L2_prop
##   <int>  <dbl> <dbl>   <dbl> <dbl>   <dbl>
## 1     0  0.498 0.223    4.02  2.62   0.708
## 2     1  0.486 0.219    1.97  2.58   0.472

We can see the baseline covariate is well balanced between the two treatment arms while the time-varying covariates are not well balanced between the treatment arms. This can occur when the time-varying covariates are associated with the receipt of treatment.