-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy path08-tmle3mopttx.Rmd
1165 lines (949 loc) · 52.9 KB
/
08-tmle3mopttx.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# (PART) Part 3: Advanced Topics {-}
# Dynamic and Optimal Individualized Treatment Regimes
_Ivana Malenica_
Based on the [`tmle3mopttx` `R` package](https://github.com/tlverse/tmle3mopttx)
by _Ivana Malenica, Jeremy Coyle, and Mark van der Laan_.
## Learning Objectives
1. Differentiate dynamic and optimal dynamic treatment interventions from static
interventions.
2. Explain the benefits, and challenges, associated with using optimal
individualized treatment regimes in practice.
3. Contrast the impact of implementing an optimal individualized treatment
regime in the population with the impact of implementing static and dynamic
treatment regimes in the population.
4. Estimate causal effects under optimal individualized treatment regimes with
the `tmle3mopttx` `R` package.
5. Assess the mean under optimal individualized treatment with resource
constraints.
6. Implement optimal individualized treatment rules based on sub-optimal
rules, or "simple" rules, and recognize the practical benefit of these rules.
7. Construct "realistic" optimal individualized treatment regimes that respect
real data and subject-matter knowledge limitations on interventions by
only considering interventions that are supported by the data.
8. Interpret the estimated optimal individualized treatment rule.
9. Measure variable importance as defined in terms of the optimal individualized
treatment interventions.
## Introduction to Optimal Individualized Interventions
Identifying which intervention will be effective for which patient based on
lifestyle, genetic and environmental factors is a common goal in precision
medicine. To put it in context, Abacavir and Tenofovir are commonly prescribed
as part of the antiretroviral therapy to Human Immunodeficiency Virus (HIV)
patients. However, not all individuals benefit from the two medications equally.
In particular, patients with renal dysfunction might further deteriorate if
prescribed Tenofovir, due to the high nephrotoxicity caused by the medication.
While Tenofovir is still highly effective treatment option for HIV patients, in
order to maximize the patient's well-being, it would be beneficial to prescribe
Tenofovir only to individuals with healthy kidney function. As an another example,
consider a HIV trial where our goal is to improve retention in HIV care.
In a randomized clinical trial, several interventions show efficacy- including
appointment reminders through text messages, small cash incentives for on time
clinic visits, and peer health workers. Ideally, we want to improve effectiveness
by assigning each patient the intervention they are most likely to benefit from,
as well as improve efficiency by not allocating resources to individuals that do not need
them, or would not benefit from it.
```{r, fig.cap="Dynamic Treatment Regime in a Clinical Setting", results="asis", echo=FALSE}
knitr::include_graphics(path = "img/png/DynamicA_Illustration.png")
```
One opts to administer the intervention to individuals who will profit from it,
instead of assigning treatment on a population level. But how do we know which
intervention works for which patient? This aim motivates a different type of
intervention, as opposed to the static exposures we described in previous chapters.
In particular, in this chapter we learn about dynamic or "individualized"
interventions that tailor the treatment decision based on the collected
covariates. Formally, dynamic treatments represent interventions that at each
treatment-decision stage are allowed to respond to the currently available
treatment and covariate history. A dynamic treatment rule can be thought of as
a rule where the input is the available set of collected covariates, and the
output is an individualized treatment for each patient
[@bembom2007realistic; @robins1986; @moodie2013].
In the statistics community such a treatment strategy is termed an
__individualized treatment regime__ (ITR), also known as the optimal
dynamic treatment rule, optimal treatment regime, optimal strategy,
and optimal policy [@murphy2003; @robins2004]. The (counterfactual)
population mean outcome under an ITR is the value of the ITR [@murphy2003; @robins2004].
Even more, suppose one wishes to maximize the population mean of an
outcome, where for each individual we have access to some set of measured
covariates. This means, for example, that we can learn for which individual
characteristics assigning treatment increases the probability of a beneficial
outcome. An ITR with the maximal value is referred to as an
optimal ITR or the __optimal individualized treatment__. Consequently, the value
of an optimal ITR is termed the optimal value, or the
__mean under the optimal individualized treatment__.
The problem of estimating the optimal individualized treatment has received much
attention in the statistics literature over the years, especially with the
advancement of precision medicine; see @murphy2003, @robins2004, @laber2012,
@kosorok2012, @moodie2013 and @robins2014 to name a few. However, much of the
early work depends on parametric assumptions. As such, even in a randomized
trial, the statistical inference for the optimal individualized treatment relies
on assumptions that are generally believed to be false, and can lead to biased
results.
In this chapter, we consider estimation of the mean outcome under the optimal
individualized treatment where the candidate rules are restricted to depend only
on user-supplied subset of the baseline covariates. The estimation problem is
addressed in a statistical model for the data distribution that is
nonparametric, and at most places restrictions on the probability of a patient
receiving treatment given covariates (as in a randomized trial). As such, we
don't need to make any assumptions about the relationship of the outcome with
the treatment and covariates, or the relationship between the treatment and
covariates. Further, we provide a Targeted Maximum Likelihood Estimator for the
mean under the optimal individualized treatment that allows us to generate valid
inference for our parameter, without having any parametric assumptions.
In the following, we provide a brief overview of the methodology with a focus on
building intuition for the target parameter and its importance --- aided with simulations,
data examples and software demonstrations. For more information on the technical aspects
of the algorithm, further practical advice and overview, the interested reader is invited to
additionally consult @vanderLaanLuedtke15, @luedtke2016super, @montoya2021optimal and
@montoya2021performance.
---
## Data Structure and Notation
Suppose we observe $n$ independent and identically distributed observations of
the form $O=(W,A,Y) \sim P_0$. We denote $A$ as categorical treatment, and $Y$
as the final outcome. In particular, we define $A \in \mathcal{A}$ where
$\mathcal{A} \equiv \{a_1, \cdots, a_{n_A} \}$ and $n_A = |\mathcal{A}|$, with
$n_A$ denoting the number of categories (possibly only two, for a binary setup).
Note that we treat $W$ as vector-valued, representing all of our collected
baseline covariates. Therefore, for a single random individual $i$, we have that
their observed data is $O_i$: with corresponding baseline covariates $W_i$,
treatment $A_i$, and final outcome $Y_i$. Let $O^n = \{O_i\}_{i=1}^n$ denote
$n$ observed samples. Then, we say that $O^n \sim P_0$, or that all
data was drawn from some true probability distribution $P_0$. Let $\mathcal{M}$
denote a statistical model for the probability distribution of the data that is
nonparametric, beyond possible knowledge of the treatment mechanism. In words, this
means that we make no assumptions on the relationship between variables, but might
be able to say something about the relationship of $A$ and $W$, as is the case in
a randomized trial. In general, the more we know, or are willing to assume about the
experiment that produces the data, the smaller the model. The true data generating
distribution $P_0$ is part of the statistical model $\mathcal{M}$, and we write
$P_0 \in \mathcal{M}$. As in previous chapters, we denote $P_n$ as the empirical distribution
which gives each observation weight $1/n$.
We use the structural equation model (SEM) in order to define
the process that gives rise to the observed (endogenous) and not observed
(exogenous) variables, as described by @pearl2009causality. In particular, we
denote $U=(U_W,U_A,U_Y)$ as the exogenous random variables, drawn from $U \sim P_U$.
The endogenous variables, written as $O=(W,A,Y)$, correspond to the observed data.
We can define the relationships between variables with the following structural equations:
\begin{align}
W &= f_W(U_W) \\ A &= f_A(W, U_A) \\ Y &= f_Y(A, W, U_Y),
(\#eq:npsem-mopttx)
\end{align}
where the collection $f=(f_W,f_A,f_Y)$ denotes unspecified functions, beyond possible
knowledge of the treatment mechanism function, $f_A$. Note that
in the case of a randomized trial, we can write the above NPSEM as
\begin{align}
W &= f_W(U_W) \\ A &= U_A \\ Y &= f_Y(A, W, U_Y),
(\#eq:npsem-rt-mopttx)
\end{align}
where $U_A$ has a known distribution and $U_A$ is independent of $U_W$. We will discuss
this more in later sections on identifiability.
The likelihood of the data admits a factorization, implied by the time ordering
of $O$. We denote the true density of $O$ as $p_0$, corresponding to the
distribution $P_0$ and dominating measure $\mu$.
\begin{equation}
p_0(O) = p_{Y,0}(Y \mid A,W) p_{A,0}(A \mid W) p_{W,0}(W) =
q_{Y,0}(Y \mid A,W) g_{A,0}(A \mid W) q_{W,0}(W),
(\#eq:likelihood-factorization-mopttx)
\end{equation}
where $p_{Y,0}(Y|A,W)$ is the conditional density of $Y$ given $(A, W)$ with
respect to some dominating measure $\mu_Y$, $p_{A,0}$ is the conditional density
of $A$ given $W$ with respect to a counting measure $\mu_A$, and $p_{W,0}$ is
the density of $W$ with respect to dominating measure $\mu_W$. In order to
match relevant Targeted Learning literature, we also
write $P_{Y,0}(Y \mid A, W) = Q_{Y,0}(Y \mid A,W)$, $P_{A,0}(A \mid W) = g_0(A \mid W)$
and $P_{W,0}(W)=Q_{W,0}(W)$ as the corresponding conditional
distribution of $Y$ given $(A,W)$, treatment mechanism $A$ given $W$, and
distribution of baseline covariates. For notational simplicity, we additionally define
$\bar{Q}_{Y,0}(A,W) \equiv \E_0[Y \mid A,W]$ as the conditional expectation of
$Y$ given $(A,W)$.
Lastly, we define $V$ as a subset of the baseline covariates the optimal
individualized rule depends on, where $V \in W$. Note that $V$ could be all of
$W$, or an empty set, depending on the subject matter knowledge. In particular,
a researcher might want to consider known effect modifiers available at the time
of treatment decision as possible $V$ covariates, or consider dynamic treatment
rules based on measurments that can be easily obtained in a clinical setting.
Defining $V$ as a more restrictive set of baseline covariates allows us to consider
possibly sub-optimal rules that are easier to estimate, and thereby allows for
statistical inference for the counterfactual mean outcome under the sub-optimal rule;
we will elaborate on this in later sections.
## Defining the Causal Effect of an Optimal Individualized Intervention
Consider dynamic treatment rules, denoted as $d$, in the set of all possible rules
$\mathcal{D}$. Then, in a point treatment setting, $d$ is a deterministic function
that takes as input $V$ and outputs a treatment decision where
$V \rightarrow d(V) \in \{a_1, \cdots, a_{n_A} \}$. We will use dynamic treatment rules,
and the corresponding treatment decision, to describe an intervention on the
treatment mechanism and the corresponding outcome under a dynamic treatment rule.
As mentioned in the previous section, causal effects are defined in terms of
hypothetical interventions on the SEM \@ref(eq:npsem-mopttx). For a given
rule $d$, our modified system then takes the following form:
\begin{align}
W &= f_W(U_W) \\ A &= d(V) \\ Y_{d(V)} &= f_Y(d(V), W, U_Y),
(\#eq:npsem-causal-mopttx)
\end{align}
where the dynamic treatment regime may be viewed as an intervention in which $A$
is set equal to a value based on a hypothetical regime $d(V)$. The couterfactual outcome
$Y_{d(V)}$ denotes the outcome for a patient had their treatment been assigned using the
dynamic rule $d(V)$, possibly contrary to the fact. Similarly, the counterfactual
outcomes had all patients been assigned treatment ($A=1$), or given control ($A=0$), are
written as $Y_1$ and $Y_0$. Finally, we denote the distribution of the counterfactual outcomes
as $P_{U,X}$, implied by the distribution of exogenous variables $U$ and structural
equations $f$. The set of all possible counterfactual distributions are encompased
by the causal model $\mathcal{M}^F$, where $P_{U,X} \in \mathcal{M}^F$.
The goal of any causal analysis motivated by such dynamic interventions is to
estimate a parameter defined as the counterfactual mean of the outcome with
respect to the modified intervention distribution. That is, subject's outcome if,
possibly contrary to the fact, the subject received treatment that would have been
assigned by rule $d(V)$. Equivalently, we ask the following causal question:
"What is the expected outcome had every subject received treatment according to the
(optimal) individualized treatment?" In order to estimate the optimal individualized
treatment, we set the following optimization problem:
$$d_{opt}(V) \equiv \text{argmax}_{d(V) \in \mathcal{D}}
\E_{P_{U,X}}[Y_{d(V)}], $$
where the optimal individualized rule is the rule with the maximal value. We note that, in case
the problem at hand requires minimizing the mean of an outcome, our optimal individualized
rule will be the rule with the minimal value instead.
With that in mind, we can consider different
treatment rules, all in the set $\mathcal{D}$:
1. The true rule, $d_{0,\text{opt}}$, and the corresponding causal parameter
$\E_{U,X}[Y_{d_{0,\text{opt}}(V)}]$ denoting the expected outcome under the
true optimal treatment rule $d_{0,\text{opt}}(V)$.
2. The estimated rule, $d_{n,\text{opt}}$, and the corresponding causal parameter
$\E_{U,X}[Y_{d_{n,\text{opt}}(V)}]$ denoting the expected outcome under the
estimated optimal treatment rule $d_{n,\text{opt}}(V)$.
In this chapter, we will focus on the value under the estimated optimal rule $d_{n,\text{opt}}$,
a __data-adaptive parameter__. Note that its true value depends on the sample! Finally,
our causal target parameter of interest is the expected outcome under
the estimated optimal individualized rule:
$$\Psi_{d_{n, \text{opt}}(V)}(P_{U,X}) \coloneqq \E_{P_{U,X}}[Y_{d_{n,
\text{opt}}(V)}].$$
### Identification and Statistical Estimand
The optimal individualized rule, as well as the value of an optimal
individualized rule, are causal parameters based on the unobserved
counterfactuals. In order for the causal quantities to be estimated from the
observed data, they need to be identified with statistical parameters. This step
of the roadmap requires we make a few assumptions:
1. _Strong ignorability_: $A \indep Y^{d_{n, \text{opt}}(v)} \mid W$, for all $a \in
\mathcal{A}$.
2. _Positivity (or overlap)_: $P_0(\min_{a \in \mathcal{A}} g_0(a \mid W) > 0)
= 1$
Under the above assumptions, we can identify the causal target parameter
with observed data using the G-computation formula. The value of an individualized
rule can now be expressed as
$$\E_0[Y_{d_{n, \text{opt}}(V)}] = \E_{0,W}[\bar{Q}_{Y,0}(A=d_{n, \text{opt}}(V),W)],$$
which, under assumptions, is interpreted as the mean outcome if
(possibly contrary to fact), treatment was assigned according to the optimal rule.
Finally, the statistical counterpart to the causal parameter of interest is
defined as
$$\psi_0 = \E_{0,W}[\bar{Q}_{Y,0}(A=d_{n,\text{opt}}(V),W)].$$
Inference for the optimal value has been shown to be difficult at exceptional
laws, defined as probability distributions for which there is a positive
probability on a set of $W$ values for which conditional expectation of $Y$
given $A$ and $W$ is constant in $a$ - so all treatments are equally
benefitial. Inference is similarly difficult in finite samples if
the treatment effect is very small in all strata, even though valid asymptotic
estimators exist in this setting. With that in mind, we address the estimation
problem under the assumption of non-exceptional laws in effect.
Many methods for learning the optimal rule from data have been developed
[@murphy2003; @robins2004; @laber2012; @kosorok2012; @moodie2013]. In this
chapter, we focus on the methods discussed in @luedtke2016super and
@vanderLaanLuedtke15. Note however, that `tmle3mopttx` also supports the widely
used Q-learning approach, where the optimal individualized rule is based on the
initial estimate of $\bar{Q}_{Y,0}(A,W)$ [@Sutton1998].
We follow the methodology outlined in @luedtke2016super and
@vanderLaanLuedtke15, where we learn the optimal ITR using Super Learner
[@vdl2007super], and estimate its value with cross-validated Targeted Minimum
Loss-based Estimation (CV-TMLE) [@cvtmle2010]. In great generality, we first
need to estimate the true individual treatment regime, $d_0(V)$, which
corresponds to dynamic treatment rule that takes a subset of covariates
$V$ and assigns treatment to each individual based on their observed
covariates $v$. With the estimate of the true optimal ITR in hand, we can
estimate its corresponding value.
### Binary treatment
How do we estimate the optimal individualized treatment regime? In the case of a
binary treatment, a key quantity for optimal ITR is the __blip function__. One can
show that any optimal ITR assigns treatment to individuals falling in strata in
which the stratum specific average treatment effect, the blip, is
positive and does not assign treatment to individuals for which this quantity is
negative. Therefore for a binary treatment, under causal assumptions, we define
the blip function as:
$$\bar{Q}_0(V) \equiv \E_0[Y_1-Y_0 \mid V] \equiv \E_0[\bar{Q}_{Y,0}(1,W) -
\bar{Q}_{Y,0}(0,W) \mid V],$$
or the average treatment effect within a stratum of $V$. The note that the
optimal individualized rule can now be derived as $d_{n,\text{opt}}(V) =
\mathbb{I}(\bar{Q}_{n}(V) > 0)$.
The package `tmle3mopttx` relies on using the Super Learner to estimate the blip
function. With that in mind, the loss function utilized for learning the optimal
individualized rule corresponds to conditional mean type losses. It is however worth
mentioning that @luedtke2016super present three different approaches for learning the optimal
rule. Namely, they focus on:
1. Super Learner of the blip function using the squared error loss,
2. Super Learner of $d_0$ using the weighted classification loss function,
3. Super Learner of $d_0$ that uses a library of candidate estimators that are
implied by estimators of the blip as well as estimators that directly go for
$d_0$ through weighted classification.
A benefit of relying on the blip function, as implemented in `tmle3mopttx`, is that
one can look at the distribution of the predicted outcomes of the blip for a given
sample. Having an estimate of the blip allows one to identify patients in the sample
who benefit the most (or the least) from treatment. Additionally, blip-based approach
allows for straight-forward extension to the categorical treatment, interpretable rules,
and OIT under resource constrains, where only a percent of the population can receive
treatment [@luedtke2016resource].
Relying on the Targeted Maximum Likelihood (TML) estimator and the Super Learner
estimate of the blip function, we follow the below steps in order to obtain
value of the ITR:
1. Estimate $\bar{Q}_{Y,0}(A,W)$ and $g_0(A \mid W)$ using `sl3`. We denote such
estimates as $\bar{Q}_{Y,n}(A,W)$ and $g_n(A \mid W)$.
2. Apply the doubly robust Augmented-Inverse Probability Weighted (A-IPW)
transform to our outcome (double-robust pseudo-outcome), where we define:
$$D_{\bar{Q}_Y,g,a}(O) \equiv \frac{\mathbb{I}(A=a)}{g(A \mid W)} (Y -
\bar{Q}_Y(A,W)) + \bar{Q}_Y(A=a,W).$$
Note that under the randomization and positivity assumptions we have that
$\E[D_{\bar{Q}_Y,g,a}(O) \mid V] = \E[Y_a \mid V]$. We emphasize the double
robust nature of the A-IPW transform --- consistency of $\E[Y_a \mid V]$ will depend
on correct estimation of either $\bar{Q}_{Y,0}(A,W)$ or $g_0(A \mid W)$. As
such, in a randomized trial, we are guaranteed a consistent estimate of $\E[Y_a \mid V]$
even if we get $\bar{Q}_{Y,0}(A,W)$ wrong! An alternative to the double-robust pseudo-outcome
just presented would be single stage Q-learning, where an estimate $\bar{Q}_{Y,0}(A,W)$
is used to predict at $\bar{Q}_{Y,n}(A=1,W)$ and $\bar{Q}_{Y,n}(A=0,W)$. This provides
an estimate of the blip function, $\bar{Q}_{Y,n}(A=1,W) - \bar{Q}_{Y,n}(A=0,W)$, but
relies on doing a good job on estimating $\bar{Q}_{Y,0}(A,W)$.
Using the double-robust pseudo-outcome, we can define the following contrast:
$$D_{\bar{Q}_Y,g}(O) = D_{\bar{Q}_Y, g, a=1}(O) - D_{\bar{Q}_Y, g, a=0}(O).$$
We estimate the blip function, $\bar{Q}_{0,a}(V)$, by regressing
$D_{\bar{Q}_Y,g}(O)$ on $V$ using the specified `sl3` library of learners and an
appropriate loss function. Finally, we are ready for the final steps.
3. Our estimated rule corresponds to $\text{argmax}_{a \in \mathcal{A}}
\bar{Q}_{0,a}(V)$.
4. We obtain inference for the mean outcome under the estimated optimal rule
using CV-TMLE.
### Categorical treatment
In line with the approach considered for binary treatment, we extend the blip
function to allow for categorical treatment. We denote such blip function
extensions as _pseudo-blips_, which are our new estimation targets in a
categorical setting. We define pseudo-blips as vector-valued entities where the
output for a given $V$ is a vector of length equal to the number of treatment
categories, $n_A$. As such, we define it as:
$$\bar{Q}_0^{pblip}(V) = \{\bar{Q}_{0,a}^{pblip}(V): a \in \mathcal{A} \}$$
We implement three different pseudo-blips in `tmle3mopttx`.
1. _Blip1_ corresponds to choosing a reference category of treatment, and
defining the blip for all other categories relative to the specified
reference. Hence we have that:
$$\bar{Q}_{0,a}^{pblip-ref}(V) \equiv \E_0[Y_a-Y_0 \mid V]$$ where $Y_0$ is
the specified reference category with $A=0$. Note that, for the case of
binary treatment, this strategy reduces to the approach described for the
binary setup.
2. _Blip2_ approach corresponds to defining the blip relative to the average of
all categories. As such, we can define $\bar{Q}_{0,a}^{pblip-avg}(V)$ as:
$$\bar{Q}_{0,a}^{pblip-avg}(V) \equiv \E_0 [Y_a - \frac{1}{n_A} \sum_{a \in
\mathcal{A}} Y_a \mid V].$$
In the case where subject-matter knowledge regarding which reference category
to use is not available, blip2 might be a viable option.
3. _Blip3_ reflects an extension of Blip2, where the average is now a weighted
average:
$$\bar{Q}_{0,a}^{pblip-wavg}(V) \equiv \E_0 [ Y_a - \frac{1}{n_A} \sum_{a \in
\mathcal{A}} Y_{a} P(A=a \mid V) \mid V ].$$
Just like in the binary case, pseudo-blips are estimated by regressing contrasts
composed using the A-IPW transform on $V$.
### Technical Note: Inference and data-adaptive parameter
In a randomized trial, statistical inference relies on the second-order
difference between the estimate of the optimal individualized treatment and the
optimal individualized treatment itself to be asymptotically negligible. This is
a reasonable condition if we consider rules that depend on a small number of
covariates, or if we are willing to make smoothness assumptions. Alternatively,
we can consider TMLEs and statistical inference for data-adaptive target
parameters defined in terms of an estimate of the optimal individualized
treatment. In particular, instead of trying to estimate the mean under the true
optimal individualized treatment, we aim to estimate the mean under the
estimated optimal individualized treatment. As such, we develop cross-validated
TMLE approach that provides asymptotic inference under minimal conditions for
the mean under the estimate of the optimal individualized treatment. In
particular, considering the data adaptive parameter allows us to avoid
consistency and rate condition for the fitted optimal rule, as required for
asymptotic linearity of the TMLE of the mean under the actual, true optimal
rule. Practically, the estimated (data-adaptive) rule should be preferred, as
this possibly sub-optimal rule is the one implemented in the population.
### Technical Note: Why CV-TMLE?
As discussed in @vanderLaanLuedtke15, CV-TMLE is necessary as the
non-cross-validated TMLE is biased upward for the mean outcome under the rule,
and therefore overly optimistic. More generally however, using CV-TMLE allows us
more freedom in estimation and therefore greater data adaptivity, without
sacrificing inference.
## Interpreting the Causal Effect of an Optimal Individualized Intervention
In summary, the mean outcome under the optimal individualized treatment is a
counterfactual quantity of interest representing what the mean outcome would
have been if everybody, contrary to the fact, received treatment that optimized
their outcome. The optimal individualized treatment regime is a rule that
optimizes the mean outcome under the dynamic treatment, where the candidate
rules are restricted to only respond to a user-supplied subset of the baseline
covariates. In essence, our target parameter answers the key
aim of precision medicine: allocating the available treatment by tailoring it to
the individual characteristics of the patient, with the goal of optimizing the
final outcome.
## Evaluating the Causal Effect of an OIT with Binary Treatment {#oit-eval-bin}
Finally, we demonstrate how to evaluate the mean outcome under the optimal
individualized treatment using `tmle3mopptx`. To start, let's load the packages
we'll use and set a seed:
```{r setup-mopttx}
library(data.table)
library(sl3)
library(tmle3)
library(tmle3mopttx)
library(devtools)
set.seed(111)
```
### Simulated Data
First, we load the simulated data. We will start with the more general setup
where the treatment is a binary variable; later in the chapter we will consider
another data-generating distribution where $A$ is categorical. In this example,
our data generating distribution is of the following form:
\begin{align*}
W &\sim \mathcal{N}(\bf{0},I_{3 \times 3})\\
\P(A=1 \mid W) &= \frac{1}{1+\exp^{(-0.8*W_1)}}\\
\P(Y=1 \mid A,W) &= 0.5\text{logit}^{-1}[-5I(A=1)(W_1-0.5)+5I(A=0)(W_1-0.5)] +
0.5\text{logit}^{-1}(W_2W_3)
\end{align*}
```{r load sim_bin_data}
data("data_bin")
```
The above composes our observed data structure $O = (W, A, Y)$. Note that the
truth is $\psi=0.578$ for this data generating distribution.
To formally express this fact using the `tlverse` grammar introduced by the
`tmle3` package, we create a single data object and specify the functional
relationships between the nodes in the _directed acyclic graph_ (DAG) via
_structural equation models_ (SEMs), reflected in the node list
that we set up:
```{r data_nodes2-mopttx}
# organize data and nodes for tmle3
data <- data_bin
node_list <- list(
W = c("W1", "W2", "W3"),
A = "A",
Y = "Y"
)
```
We now have an observed data structure (`data`) and a specification of the role
that each variable in the dataset plays as the nodes in a DAG.
### Constructing Optimal Stacked Regressions with `sl3`
To easily incorporate ensemble machine learning into the estimation procedure,
we rely on the facilities provided in the [`sl3` R
package](https://tlverse.org/sl3). Using the framework provided by the [`sl3`
package](https://tlverse.org/sl3), the nuisance parameters of the TML estimator
may be fit with ensemble learning, using the cross-validation framework of the
Super Learner algorithm of @vdl2007super.
```{r mopttx_sl3_lrnrs2}
# Define sl3 library and metalearners:
lrn_xgboost_50 <- Lrnr_xgboost$new(nrounds = 50)
lrn_xgboost_100 <- Lrnr_xgboost$new(nrounds = 100)
lrn_xgboost_500 <- Lrnr_xgboost$new(nrounds = 500)
lrn_mean <- Lrnr_mean$new()
lrn_glm <- Lrnr_glm_fast$new()
lrn_lasso <- Lrnr_glmnet$new()
## Define the Q learner:
Q_learner <- Lrnr_sl$new(
learners = list(lrn_lasso, lrn_mean, lrn_glm),
metalearner = Lrnr_nnls$new()
)
## Define the g learner:
g_learner <- Lrnr_sl$new(
learners = list(lrn_lasso, lrn_glm),
metalearner = Lrnr_nnls$new()
)
## Define the B learner:
b_learner <- Lrnr_sl$new(
learners = list(lrn_lasso,lrn_mean, lrn_glm),
metalearner = Lrnr_nnls$new()
)
```
As seen above, we generate three different ensemble learners that must be fit,
corresponding to the learners for the outcome regression (Q), propensity score
(g), and the blip function (B). We make the above explicit with respect to
standard notation by bundling the ensemble learners into a list object below:
```{r mopttx_make_lrnr_list}
# specify outcome and treatment regressions and create learner list
learner_list <- list(Y = Q_learner, A = g_learner, B = b_learner)
```
The `learner_list` object above specifies the role that each of the ensemble
learners we've generated is to play in computing initial estimators. Recall that
we need initial estimators of relevant parts of the likelihood in order to
build a TMLE for the parameter of interest. In particular, `learner_list`
makes explicit the fact that our `Y` is used in fitting the outcome regression,
while `A` is used in fitting the treatment mechanism regression, and finally `B`
is used in fitting the blip function.
### Targeted Estimation of the Mean under the Optimal Individualized Interventions Effects
To start, we will initialize a specification for the TMLE of our parameter of
interest simply by calling `tmle3_mopttx_blip_revere`. We specify the argument
`V = c("W1", "W2", "W3")` when initializing the `tmle3_Spec` object in order to
communicate that we're interested in learning a rule dependent on `V`
covariates. Note that we don't have to specify `V` --- this will result in a rule
that is not based on any collected covariates; we will see an example like this
shortly. We also need to specify the type
of (pseudo) blip we will use in this estimation problem, the list of learners used
to estimate the blip function, whether we want to maximize or minimize the final
outcome, and few other more advanced features including searching for a less
complex rule, realistic interventions and possible resource constraints.
```{r mopttx_spec_init_complex}
# initialize a tmle specification
tmle_spec <- tmle3_mopttx_blip_revere(
V = c("W1", "W2", "W3"), type = "blip1",
learners = learner_list,
maximize = TRUE, complex = TRUE,
realistic = FALSE, resource = 1
)
```
As seen above, the `tmle3_mopttx_blip_revere` specification object
(like all `tmle3_Spec` objects) does _not_ store the data for our
specific analysis of interest. Later,
we'll see that passing a data object directly to the `tmle3` wrapper function,
alongside the instantiated `tmle_spec`, will serve to construct a `tmle3_Task`
object internally.
We elaborate more on the initialization specifications. In initializing the
specification for the TMLE of our parameter of interest, we have specified the
set of covariates the rule depends on (`V`), the type of (pseudo) blip to use
(`type`), and the learners used for estimating the relevant parts of the
likelihood and the blip function. In addition, we need to specify whether we
want to maximize the mean outcome under the rule (`maximize`), and whether we
want to estimate the rule under all the covariates $V$ provided by the user
(`complex`). If `FALSE`, `tmle3mopttx` will instead consider all the possible
rules under a smaller set of covariates including the static rules, and optimize
the mean outcome over all the subsets of $V$. As such, while the user might have
provided a full set of collected covariates as input for $V$, it is possible
that the true rule only depends on a subset of the set provided by the user. In
that case, our returned mean under the optimal individualized rule will be based
on the smaller subset. In addition, we provide an option to search for realistic
optimal individualized interventions via the `realistic` specification. If
`TRUE`, only treatments supported by the data will be considered, therefore
alleviating concerns regarding practical positivity issues. Finally, we can incorporate
source constrains by setting `resource` argument to less than 1. We explore all the
important extensions of `tmle3mopttx` in later sections.
```{r mopttx_fit_tmle_auto_blip_revere_complex}
# fit the TML estimator
fit <- tmle3(tmle_spec, data, node_list, learner_list)
fit
```
By studying the output generated, we can see that the confidence interval covers the
true parameter, as expected.
#### Resource constraint
We can restrict the number of individuals that get the treatment by only
treating $k$ percent of samples. With that, only patients with the biggest benefit (according
to the estimated blip) receive treatment. In order to impose a
resource constraint, we only have to specify the percent of individuals that can
get treatment. For example, if `resource=1`, all
individuals with blip higher than zero will get treatment; if `resource=0`,
noone will be treated.
```{r mopttx_spec_init_complex_resource}
# initialize a tmle specification
tmle_spec_resource <- tmle3_mopttx_blip_revere(
V = c("W1", "W2", "W3"), type = "blip1",
learners = learner_list,
maximize = TRUE, complex = TRUE,
realistic = FALSE, resource = 0.90
)
```
```{r mopttx_fit_tmle_auto_blip_revere_complex_resource, eval=T}
# fit the TML estimator
fit_resource <- tmle3(tmle_spec_resource, data, node_list, learner_list)
fit_resource
```
We can compare the number of individuals that got treatment with and without the
resource constraint:
```{r mopttx_compare_resource}
# Number of individuals getting treatment (no resource constraint):
table(tmle_spec$return_rule)
# Number of individuals getting treatment (resource constraint):
table(tmle_spec_resource$return_rule)
```
#### Empty V
Below we the show an example where $V$ is not specified, under the
resource constraint.
```{r mopttx_spec_init_complex_V_empty}
# initialize a tmle specification
tmle_spec_V_empty <- tmle3_mopttx_blip_revere(
type = "blip1",
learners = learner_list,
maximize = TRUE, complex = TRUE,
realistic = FALSE, resource = 0.90
)
```
```{r mopttx_fit_tmle_auto_blip_revere_complex_V_empty, eval=T}
# fit the TML estimator
fit_V_empty <- tmle3(tmle_spec_V_empty, data, node_list, learner_list)
fit_V_empty
```
## Evaluating the Causal Effect of an optimal ITR with Categorical Treatment {#oit-eval-cat}
In this section, we consider how to evaluate the mean outcome under the optimal
individualized treatment when $A$ has more than two categories. While the
procedure is analogous to the previously described binary treatment, we now need
to pay attention to the type of blip we define in the estimation stage, as well
as how we construct our learners.
### Simulated Data
First, we load the simulated data. Our data generating distribution is
of the following form:
\begin{align*}
W &\sim \mathcal{N}(\bf{0},I_{4 \times 4})\\
\P(A=a \mid W) &= \frac{1}{1+\exp^{(-0.8*W_1)}}\\
\P(Y=1 \mid A,W) = 0.5\text{logit}^{-1}[15I(A=1)(W_1-0.5) - \\
3I(A=2)(2W_1+0.5) + \\
3I(A=3)(3W_1-0.5)] +\text{logit}^{-1}(W_2W_1) \\
\end{align*}
We can just load the data available as part of the package as follows:
```{r load sim_cat_data}
data("data_cat_realistic")
```
The above composes our observed data structure $O = (W, A, Y)$. Note that the
truth is now $\psi_0=0.658$, which is the quantity we aim to estimate.
```{r data_nodes-mopttx}
# organize data and nodes for tmle3
data <- data_cat_realistic
node_list <- list(
W = c("W1", "W2", "W3", "W4"),
A = "A",
Y = "Y"
)
```
We can see the number of observed categories of treatment below:
```{r data_cats-mopttx}
# organize data and nodes for tmle3
table(data$A)
```
### Constructing Optimal Stacked Regressions with `sl3`
**QUESTION:** With categorical treatment, what is the dimension of the blip now?
What is the dimension for the current example? How would we go about estimating it?
We will now create new ensemble learners using the
`sl3` learners initialized previously:
```{r sl3_lrnrs-mopttx}
# Initialize few of the learners:
lrn_xgboost_50 <- Lrnr_xgboost$new(nrounds = 50)
lrn_xgboost_100 <- Lrnr_xgboost$new(nrounds = 100)
lrn_xgboost_500 <- Lrnr_xgboost$new(nrounds = 500)
lrn_mean <- Lrnr_mean$new()
lrn_glm <- Lrnr_glm_fast$new()
## Define the Q learner, which is just a regular learner:
Q_learner <- Lrnr_sl$new(
learners = list(lrn_xgboost_100, lrn_mean, lrn_glm),
metalearner = Lrnr_nnls$new()
)
# Define the g learner, which is a multinomial learner:
# specify the appropriate loss of the multinomial learner:
mn_metalearner <- make_learner(Lrnr_solnp,
eval_function = loss_loglik_multinomial,
learner_function = metalearner_linear_multinomial
)
g_learner <- make_learner(Lrnr_sl, list(lrn_xgboost_100, lrn_xgboost_500, lrn_mean), mn_metalearner)
# Define the Blip learner, which is a multivariate learner:
learners <- list(lrn_xgboost_50, lrn_xgboost_100, lrn_xgboost_500, lrn_mean, lrn_glm)
b_learner <- create_mv_learners(learners = learners)
```
As seen above, we generate three different ensemble learners that must be fit,
corresponding to the learners for the outcome regression, propensity score, and
the blip function. Note that we need to estimate $g_0(A \mid W)$ for a
categorical $A$ --- therefore, we use the multinomial Super Learner option
available within the `sl3` package with learners that can address multi-class
classification problems. In order to see which learners can be used to estimate
$g_0(A \mid W)$ in `sl3`, we run the following:
```{r cat_learners}
# See which learners support multi-class classification:
sl3_list_learners(c("categorical"))
```
Since the corresponding blip will be vector valued, we will have a
column for each additional level of treatment. As such, we need to create
multivariate learners with the helper function `create_mv_learners` that takes a
list of initialized learners as input.
We make the above explicit with respect to the standard notation by bundling the
ensemble learners into a list object below:
```{r make_lrnr_list-mopttx}
# specify outcome and treatment regressions and create learner list
learner_list <- list(Y = Q_learner, A = g_learner, B = b_learner)
```
### Targeted Estimation of the Mean under the Optimal Individualized Interventions Effects {#oit-eval-cat-v1}
```{r spec_init}
# initialize a tmle specification
tmle_spec_cat <- tmle3_mopttx_blip_revere(
V = c("W1", "W2", "W3", "W4"), type = "blip2",
learners = learner_list, maximize = TRUE, complex = TRUE,
realistic = FALSE
)
```
```{r fit_tmle_auto}
# fit the TML estimator
fit_cat <- tmle3(tmle_spec_cat, data, node_list, learner_list)
fit_cat
# How many individuals got assigned each treatment?
table(tmle_spec_cat$return_rule)
```
We can see that the confidence interval covers the truth.
**NOTICE the distribution of the assigned treatment! We will need this shortly.**
## Extensions to Causal Effect of an OIT
In this section, we consider two extensions to the procedure described for
estimating the value of the OIT. First one considers a setting where the user
might be interested in a grid of possible sub-optimal rules, corresponding to
potentially limited knowledge of potential effect modifiers. The second
extension concerns implementation of a realistic optimal individual
interventions where certain regimes might be preferred, but due to practical or
global positivity restraints, are not realistic to implement.
### Simpler Rules {#oit-eval-cat-v2}
In order to not only consider the most ambitious fully $V$-optimal rule, we
define $S$-optimal rules as the optimal rule that considers all possible subsets
of $V$ covariates, with card($S$) $\leq$ card($V$) and $\emptyset \in S$. In
particular, this allows us to define a Super Learner for $d_0$ that includes
a range of estimators from very simple (e.g., statis rules) to more complex
(e.g. full $V$), and let the discrete Super Learner select a simple rule when
appropriate. This allows us to consider sub-optimal rules that are easier to estimate and
potentially provide more realistic rules. Within the `tmle3mopttx` paradigm, we just need
to change the `complex` parameter to `FALSE`:
```{r mopttx_spec_init_noncomplex}
# initialize a tmle specification
tmle_spec_cat_simple <- tmle3_mopttx_blip_revere(
V = c("W4", "W3", "W2", "W1"), type = "blip2",
learners = learner_list,
maximize = TRUE, complex = FALSE, realistic = FALSE
)
```
```{r mopttx_fit_tmle_auto_blip_revere_noncomplex}
# fit the TML estimator
fit_cat_simple <- tmle3(tmle_spec_cat_simple, data, node_list, learner_list)
fit_cat_simple
```
Even though we specified all baseline covariates as the basis
for rule estimation, a simpler rule is sufficient to maximize the mean outcome.
**QUESTION:** How does the set of covariates picked by `tmle3mopttx`
compare to the baseline covariates the true rule depends on?
### Realistic Optimal Individual Regimes {#oit-eval-cat-v3}
In addition to considering less complex rules, `tmle3mopttx` also provides an
option to estimate the mean under the realistic, or implementable, optimal
individualized treatment. It is often the case that assigning particular regime
might have the ability to fully maximize (or minimize) the desired outcome, but
due to global or practical positivity constrains, such treatment can never be
implemented in real life (or is highly unlikely). As such, specifying
`realistic` to `TRUE`, we consider possibly suboptimal treatments that optimize
the outcome in question while being supported by the data.
```{r mopttx_spec_init_realistic}
# initialize a tmle specification
tmle_spec_cat_realistic <- tmle3_mopttx_blip_revere(
V = c("W4", "W3", "W2", "W1"), type = "blip2",
learners = learner_list,
maximize = TRUE, complex = TRUE, realistic = TRUE
)
```
```{r mopttx_fit_tmle_auto_blip_revere_realistic}
# fit the TML estimator
fit_cat_realistic <- tmle3(tmle_spec_cat_realistic, data, node_list, learner_list)
fit_cat_realistic
# How many individuals got assigned each treatment?
table(tmle_spec_cat_realistic$return_rule)
```
**QUESTION:** Referring back to the data-generating distribution, why do you
think the distribution of allocated treatment changed from the distribution
we had under the "non-realistic"" rule?
### Missingness and `tmle3mopttx`
In this section, we present how to use the `tmle3mopttx` package when the data is subject
to missingness in $Y$. Let's start by add some missingness to our outcome, first.
```{r data_nodes-add-missigness-mopttx}
data_missing <- data_cat_realistic
#Add some random missingless:
rr <- sample(nrow(data_missing), 100, replace = FALSE)
data_missing[rr,"Y"]<-NA
summary(data_missing$Y)
```
To start, we must first add to our library --- we now also need to estimate the
missigness process as well.
```{r sl3_lrnrs-add-mopttx}
delta_learner <- Lrnr_sl$new(
learners = list(lrn_mean, lrn_glm),
metalearner = Lrnr_nnls$new()
)
# specify outcome and treatment regressions and create learner list
learner_list <- list(Y = Q_learner, A = g_learner, B = b_learner, delta_Y=delta_learner)
```
The `learner_list` object above specifies the role that each of the ensemble
learners we've generated is to play in computing the initial estimators needed
for building the TMLE for the parameter of interest. In particular, it makes
explicit the fact that `Y` is used in fitting the outcome regression
while `A` is used in fitting our treatment mechanism regression,
`B` for fitting the blip function, and `delta_Y` fits the missing outcome process.
Now, with the additional estimation step associated with missingness added, we can
proceed as usual.
```{r spec_init_missingness, eval = FALSE}
# initialize a tmle specification
tmle_spec_cat_miss <- tmle3_mopttx_blip_revere(
V = c("W1", "W2", "W3", "W4"), type = "blip2",
learners = learner_list, maximize = TRUE, complex = TRUE,
realistic = FALSE
)
```
```{r fit_tmle_auto2, eval = FALSE}
# fit the TML estimator
fit_cat_miss <- tmle3(tmle_spec_cat_miss, data_missing, node_list, learner_list)
fit_cat_miss
```
### Q-learning
Alternatively, we could estimate the mean under the optimal individualized
treatment using Q-learning. The optimal rule can be learned through fitting the
likelihood, and consequently estimating the optimal rule under this fit of the
likelihood [@Sutton1998; @murphy2003].
Below we outline how to use `tmle3mopttx` package in order to estimate the mean
under the ITR using Q-learning. As demonstrated in the previous sections, we
first need to initialize a specification for the TMLE of our parameter of
interest. As opposed to the previous section however, we will now use
`tmle3_mopttx_Q` instead of `tmle3_mopttx_blip_revere` in order to indicate that
we want to use Q-learning instead of TMLE.
```{r spec_init_Qlearning2, eval=FALSE}
# initialize a tmle specification
tmle_spec_Q <- tmle3_mopttx_Q(maximize = TRUE)
# Define data:
tmle_task <- tmle_spec_Q$make_tmle_task(data, node_list)
# Define likelihood:
initial_likelihood <- tmle_spec_Q$make_initial_likelihood(
tmle_task,
learner_list
)
# Estimate the parameter:
Q_learning(tmle_spec_Q, initial_likelihood, tmle_task)[1]
```
## Variable Importance Analysis with OIT
Suppose one wishes to assess the importance of each observed covariate, in
terms of maximizing (or minimizing) the population mean of an outcome under an
optimal individualized treatment regime. In particular, a covariate that
maximizes (or minimizes) the population mean outcome the most under an optimal
individualized treatment out of all other considered covariates under optimal
assignment might be considered _more important_ for the outcome. To put it in
context, perhaps optimal allocation of treatment 1, denoted $A_1$, results in a
larger mean outcome than optimal allocation of another treatment 2, denoted $A_2$.
Therefore, we would label $A_1$ as having a higher variable importance with
regard to maximizing (or minimizing) the mean outcome under the optimal
individualized treatment.
### Simulated Data
For illustration purpose, we bin baseline covariates corresponding to
the data-generating distribution [described previously](#oit-eval):
```{r data_vim-nodes-mopttx}
# bin baseline covariates to 3 categories:
data$W1<-ifelse(data$W1<quantile(data$W1)[2],1,ifelse(data$W1<quantile(data$W1)[3],2,3))
node_list <- list(
W = c("W3", "W4", "W2"),
A = c("W1", "A"),
Y = "Y"
)
```
Our node list now includes $W_1$ as treatments as well! Don't worry,
we will still properly adjust for all baseline covariates.