Data Exploration
To better understand the dataset, we can explore the distribution of patients and observations per patient.
## Trial characteristics
<- length(unique(simulated.data$id))
n <- length(unique(simulated.data$id[simulated.data$Z==1]))
n1 <- length(unique(simulated.data$id[simulated.data$Z==0]))
n0 <- mean(table(simulated.data$id))
averageObs
cat(n, "participants total", "\n",
"were randomized to the treatment arm", "\n",
n1, "were randomized to the control arm.", "\n",
n0, "On average each patient has ", round(averageObs, 1), "observations")
## 2000 participants total
## 1000 were randomized to the treatment arm
## 1000 were randomized to the control arm.
## On average each patient has 58.8 observations
There are 1000 individuals in each treatment arm with approximately 59 observations per person. The proportion of trial participants who experience the event of interest in the trial dataset is:
## Event rate
<- sum(simulated.data$Y)/
e.rate length(unique(simulated.data$id))
cat("Event rate is", round(e.rate*100,1), "%\n")
## Event rate is 11.7 %
Next, we can quantify non-adherence within the trial by calculating what proportion of individuals became non-adherent by their last observation:
## Non-adherence rates by arm
## how many person-time deviated in the treatment arm?
<- simulated.data[simulated.data$Z==1 &
simulated.data.treated $Alag1==1,]
simulated.data## how many person-time deviated in the control arm?
<- simulated.data[simulated.data$Z==0 &
simulated.data.control $Alag1==0,]
simulated.data
= dim(simulated.data.treated[simulated.data.treated$A==0&
pt1dev $t0<59,])[1]/n1
simulated.data.treated= dim(simulated.data.control[simulated.data.control$A==1&
pt0dev $t0<59,])[1]/n0
simulated.data.control
cat("In the treatment arm", pt1dev*100, "% were non-adherent by their last observation",
"\n", "In the control arm", pt0dev*100,
"% were non-adherent by their last observation")
## In the treatment arm 30.3 % were non-adherent by their last observation
## In the control arm 27.2 % were non-adherent by their last observation
As the last part of our exploration, let’s see the covariates for this dataset and how they are distributed between the two treatment arms. In this example trial, there is one measured baseline covariate and two time-varying covariates. The estimation methods detailed in this document can be expanded to handle additional baseline or time-varying covariates by using the complete set of observed baseline covariates wherever \(B\) is used, and the complete set of observed time-varying covariates whereever \(L_{1}\) and \(L_{2}\) are used.
%>%
simulated.data group_by(Z) %>%
summarize(B_mean = mean(B),
B_sd = sd(B),
L1_mean = mean(L1),
L1_sd = sd(L1),
L2_prop = sum(L2)/n())
## # A tibble: 2 × 6
## Z B_mean B_sd L1_mean L1_sd L2_prop
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0.498 0.223 4.02 2.62 0.708
## 2 1 0.486 0.219 1.97 2.58 0.472
We can see the baseline covariate is well balanced between the two treatment arms while the time-varying covariates are not well balanced between the treatment arms. This can occur when the time-varying covariates are associated with the receipt of treatment.