From 506b9ea62231e501c3a0758aea5ab985487ce91f Mon Sep 17 00:00:00 2001
From: Jitao David Zhang <jitao_david.zhang@roche.com>
Date: Wed, 17 May 2023 16:00:43 +0200
Subject: [PATCH 01/24] update the basic examples, to prepare for adding the
 generative model

---
 vignettes/basic_examples.Rmd | 128 +++++++++++++++++++++++------------
 1 file changed, 85 insertions(+), 43 deletions(-)

diff --git a/vignettes/basic_examples.Rmd b/vignettes/basic_examples.Rmd
index 11a2d431..ca058f6a 100644
--- a/vignettes/basic_examples.Rmd
+++ b/vignettes/basic_examples.Rmd
@@ -1,14 +1,29 @@
 ---
-title: "Basic example"
-output: rmarkdown::html_vignette
+title: "Basic example of using designit: plate layout with two factors"
+output: 
+  rmarkdown::html_vignette:
+    html_document:
+    df_print: paged
+    mathjax: default
+    number_sections: true
+    toc: true
+    toc_depth: 2
 vignette: >
-  %\VignetteIndexEntry{Basic example}
+  %\VignetteIndexEntry{Basic example of using designit: plate layout with two factors}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
 
-```{r, include = FALSE}
+This vignette demonstrates the use of the _deisngit_ package with a series 
+of examples deriving from the same task, namely to randomize samples of a 
+two-factor experiment into plate layouts. We shall start with the most basic
+use and gradually exploring some basic yet useful utilities provided
+by the package.
+
+```{r include=FALSE, message=FALSE, warning=FALSE}
 knitr::opts_chunk$set(
+  echo = TRUE,
+  fig.height = 6, fig.width = 6,
   collapse = TRUE,
   comment = "#>"
 )
@@ -21,14 +36,11 @@ library(dplyr)
 library(tidyr)
 ```
 
-# Plate layout with two factors
-
-## The samples
+# The samples and the conditions
 
-Samples of a 2-condition in-vivo experiment are to be
-placed on 48 well plates.
+Our task is to randomize samples of an in-vivo experiment with multiple conditions. Our aim is to place them in several 48-well plates.
 
-These are the conditions
+These are the conditions:
 
 ```{r}
 # conditions to use
@@ -44,10 +56,10 @@ conditions <- data.frame(
 gt::gt(conditions)
 ```
 
-We will have 3 animals per groups with 4 replicates each
+We will have 3 animals per group, with 4 replicates of each animal.
 
 ```{r}
-# sample table (2 animals per group with 3 replicates)
+# sample table
 n_reps <- 4
 n_animals <- 3
 animals <- bind_rows(replicate(n_animals, conditions, simplify = FALSE),
@@ -64,14 +76,16 @@ samples <- bind_rows(replicate(n_reps, animals, simplify = FALSE),
 
 samples |>
   head(10) |>
+  arrange(animal, group, replicate) %>%
   gt::gt()
 ```
-## Plate layout requirements
+
+# Plate layout requirements
 
 Corner wells of the plates should be left empty.
 This means on a 48 well plate we can place 44 samples.
 Since we have `r nrow(samples)` samples, they will fit on
-`r ceiling(nrow(samples)/44)` plates
+`r ceiling(nrow(samples)/44)` plates.
 
 ```{r}
 n_samp <- nrow(samples)
@@ -81,9 +95,9 @@ n_plates <- ceiling(n_samp / n_loc_per_plate)
 exclude_wells <- expand.grid(plate = seq(n_plates), column = c(1, 8), row = c(1, 6))
 ```
 
-## Setting up a Batch container
+# Setting up a BatchContainer object
 
-Create a BatchContainer object that provides all possible locations
+First, we create a BatchContainer object that provides all possible locations.
 
 ```{r}
 bc <- BatchContainer$new(
@@ -97,9 +111,9 @@ bc$exclude
 bc$get_locations() |> head()
 ```
 
-## Moving samples
+# Moving samples
 
-Use random assignment function to place samples to plate locations
+Next, we use the random assignment function to place samples to plate locations.
 
 ```{r}
 bc <- assign_random(bc, samples)
@@ -108,7 +122,7 @@ bc$get_samples()
 bc$get_samples(remove_empty_locations = TRUE)
 ```
 
-Plot of the result using the `plot_plate` function
+To check the results visually, we can plot of the result using the `plot_plate` function.
 
 ```{r, fig.width=6, fig.height=3.5}
 plot_plate(bc,
@@ -116,7 +130,7 @@ plot_plate(bc,
   .color = treatment, .alpha = dose
 )
 ```
-To not show empty wells, we can directly plot the sample table as well
+To not show empty wells, we can directly plot the sample table as well.
 
 ```{r, fig.width=6, fig.height=3.5}
 plot_plate(bc$get_samples(remove_empty_locations = TRUE),
@@ -125,12 +139,11 @@ plot_plate(bc$get_samples(remove_empty_locations = TRUE),
 )
 ```
 
-To move individual samples or manually assigning all locations we can use the 
-`batchContainer$move_samples()` method
-
-To swap two or more samples use:
+Sometimes we may wish to move samples, or to swap samples, or to manually 
+assign some locations. To move individual samples or manually assigning all
+locations we can use the `batchContainer$move_samples()` method.
 
-**Warning**: This will change your BatchContainer in-place.
+To swap two or more samples, use
 
 ```{r, fig.width=6, fig.height=3.5}
 bc$move_samples(src = c(1L, 2L), dst = c(2L, 1L))
@@ -143,9 +156,8 @@ plot_plate(bc$get_samples(remove_empty_locations = TRUE),
 
 To assign all samples in one go, use the option `location_assignment`.
 
-**Warning**: This will change your BatchContainer in-place.
-
 The example below orders samples by ID and adds the empty locations afterwards
+
 ```{r, fig.width=6, fig.height=3.5}
 bc$move_samples(
   location_assignment = c(
@@ -160,13 +172,17 @@ plot_plate(bc$get_samples(remove_empty_locations = TRUE, include_id = TRUE),
 )
 ```
 
-## Run an optimization
 
-The optimization procedure is invoked with e.g. `optimize_design`.
-Here we use a simple shuffling schedule: 
-swap 10 samples for 100 times, then swap 2 samples for 400 times.
+# Running an optimization
 
-To evaluate how good a layout is, we need a scoring function. 
+Once we have setup an initial layout, which may be suboptimal, we can optimize it in multiple ways, for instance by sample shuffling. The optimization procedure is invoked with e.g. `optimize_design`.
+Here we use a simple shuffling schedule: swap 10 samples for 100 times, then swap 2 samples for 400 times.
+
+In the context of randomization, a good layout means that known independent 
+variables and/or covariates that may affect the dependent variable(s) are
+as uncorrelated as possible with the layout. To evaluate how good a layout is, 
+we need a scoring function, which we pass a scoring function to the
+`optimize_design` function.
 
 This function will assess how well treatment 
 and dose are balanced across the two plates.
@@ -208,15 +224,15 @@ ggplot(
   facet_wrap(~plate)
 ```
 
-## Customizing the plate layout
+# Customizing the plate layout
 
 To properly distinguish between empty and excluded locations one can do the
 following.
 
-* Supply the BatchContainer directly
-* set `add_excluded = TRUE`, set `rename_empty = TRUE`
-* supply a custom color palette
-* excluded wells have NA values and can be colored with `na.value`
+* Supply the BatchContainer directly;
+* set `add_excluded = TRUE` and set `rename_empty = TRUE`;
+* supply a custom color palette;
+* excluded wells have NA values and can be colored with `na.value`.
 
 ```{r, fig.width=6, fig.height=3.5}
 color_palette <- c(
@@ -232,8 +248,8 @@ plot_plate(bc,
   scale_fill_manual(values = color_palette, na.value = "darkgray")
 ```
 
-To remove all empty wells from the plot, hand the pruned sample list.
-to plot_plate rather than the whole BatchContainer.
+To remove all empty wells from the plot, hand the pruned sample list
+to `plot_plate` rather than the whole `BatchContainer` object.
 You can still assign your own colors.
 
 ```{r, fig.width=6, fig.height=3.5}
@@ -248,10 +264,36 @@ Note: removing all empty and excluded wells will lead to omitting
 completely empty rows or columns!
 
 ```{r, fig.width=6, fig.height=3.5}
-plot_plate(bc$get_samples(remove_empty_locations = TRUE) |>
-  filter(column != 2),
-plate = plate, column = column, row = row,
-.color = treatment, .alpha = dose
+plot_plate(
+  bc$get_samples(remove_empty_locations = TRUE) |>
+    filter(column != 2),
+  plate = plate, column = column, row = row,
+  .color = treatment, .alpha = dose
 ) +
   scale_fill_viridis_d()
 ```
+
+# Summary
+
+To summarize
+
+1. In order to randomize the layout of samples from an experiment, create an 
+instance of `BatchContainer` with `BatchContainer$new()`. 
+2. Use functions `assign_random` and `plot_plate` to assign samples randomly
+and to plot the plate layout. If necessary, you can retrieve the samples from
+the BatchContainer instance `bc` with the method `bc$get_samples()`, or move
+samples with the method `bc$move_samples()`.
+3. The scoring function of `bc` can be set by `bc$scoring_f`. Once it is set,
+we can optimize the design, for instance by shuffling the samples.
+4. Various options are available to further customize the design.
+
+Now you have already the first experience of using _designit_ for randomization.
+It is time to apply the learning in your work. If you need more examples or 
+if you want to understand more details of the package, please explore other
+vignettes of the package as well as check out the documentations.
+
+# Session information
+
+```{r sessionInfo}
+sessionInfo()
+```

From 074585ce9b026f7fb2886f72427f4ce361656830 Mon Sep 17 00:00:00 2001
From: Jitao David Zhang <jitao_david.zhang@roche.com>
Date: Wed, 17 May 2023 16:02:19 +0200
Subject: [PATCH 02/24] updated conclusion

---
 vignettes/basic_examples.Rmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vignettes/basic_examples.Rmd b/vignettes/basic_examples.Rmd
index ca058f6a..d9abd956 100644
--- a/vignettes/basic_examples.Rmd
+++ b/vignettes/basic_examples.Rmd
@@ -287,8 +287,8 @@ samples with the method `bc$move_samples()`.
 we can optimize the design, for instance by shuffling the samples.
 4. Various options are available to further customize the design.
 
-Now you have already the first experience of using _designit_ for randomization.
-It is time to apply the learning in your work. If you need more examples or 
+Now you have already the first experience of using _designit_ for randomization,
+it is time to apply the learning to your work. If you need more examples or 
 if you want to understand more details of the package, please explore other
 vignettes of the package as well as check out the documentations.
 

From 991ccfbda3e2f4cb4033c0404cbbd5c26d4ed3f3 Mon Sep 17 00:00:00 2001
From: Jitao David Zhang <jitao_david.zhang@roche.com>
Date: Wed, 17 May 2023 16:02:33 +0200
Subject: [PATCH 03/24] add JDZ as contributor

---
 DESCRIPTION | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/DESCRIPTION b/DESCRIPTION
index 98de9331..4b21abc2 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -32,6 +32,11 @@ Authors@R: c(
            role = c("aut", "cph"),
            email = "balazs.banfai@roche.com",
            comment = c(ORCID = "0000-0003-0422-7977")),
+    person(given = "Jitao David",
+           family = "Zhang",
+           role = c("aut", "cph"),
+           email = "jitao_david.zhang@roche.com",
+           comment = c(ORCID="0000-0002-3085-0909")),
     person(given = "F. Hoffman-La Roche", role = c("cph", "fnd")))
 Description:
     Intelligently assign samples to batches in order to reduce batch effects.

From 64db97f6c45cc91ff3d504392f89f9c02196fe67 Mon Sep 17 00:00:00 2001
From: Jitao David Zhang <jitao_david.zhang@roche.com>
Date: Tue, 11 Jul 2023 16:28:01 +0200
Subject: [PATCH 04/24] add new line after title

---
 vignettes/osat.Rmd | 1 +
 1 file changed, 1 insertion(+)

diff --git a/vignettes/osat.Rmd b/vignettes/osat.Rmd
index b7d7e1fb..1f43c80e 100644
--- a/vignettes/osat.Rmd
+++ b/vignettes/osat.Rmd
@@ -40,6 +40,7 @@ samples <- read_tsv(file.path(osat_data_path, "samples.txt"),
 ```
 
 # Running OSAT optimization
+
 Here we use OSAT to optimize setup.
 ```{r}
 gs <- OSAT::setup.sample(samples, optimal = c("SampleType", "Race", "AgeGrp"))

From de5295662218f20747fad628298f4b58ca901924 Mon Sep 17 00:00:00 2001
From: Jitao David Zhang <jitao_david.zhang@roche.com>
Date: Tue, 11 Jul 2023 16:28:10 +0200
Subject: [PATCH 05/24] add vignette to show the necessity

---
 vignettes/generative_necessity.Rmd | 61 ++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 vignettes/generative_necessity.Rmd

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
new file mode 100644
index 00000000..c17b1d3a
--- /dev/null
+++ b/vignettes/generative_necessity.Rmd
@@ -0,0 +1,61 @@
+---
+title: "On the necessity of experiment design: a generative modelling approach"
+author: "Jitao david Zhang"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Necessity of experiment design}
+  %\VignetteEncoding{UTF-8}
+  %\VignetteEngine{knitr::rmarkdown}
+---
+
+In this document, we demonstrate the necessity of a proper experiment design 
+with a generative model. We show that a proper experiment design helps
+experimentalists and analysts make correct inference about the quantity of 
+interest that is robust against randomness.
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+```{r setup, echo = FALSE}
+library(designit)
+library(tidyverse)
+```
+
+# The simplest case
+
+Assume we perform an experiment to test the effect of eleven drug candidates under development on cell viability. To do so, we treat cells in culture with a fixed concentration of each of the elevent candidates, and we treat cells with DMSO (dimethyl sulfoxide) as a vehicle control, since the drug candidates are all solved in DMSO solutions. 
+
+To assess the effect with regard to the variability intrinsic to the experiment setup, we measure the effect of each drug candidate (and DMSO) in eight different batches of cells, which are comparable to each other.
+
+In total, we have 96 samples: 11 drug candidates plus one DMSO control, 8 samples each. The samples fit neatly into a 96-well microtiter plate with 8 rows, which are usually marked with alphabets between `A` and `H`, and 12 columns, which are usually marked with numbers betwen 1 and 12.
+
+In order to avoid batch effects and to make the operation simple, all operations and measurements are done by the same careful person and performed all at once. The operator has two possibilities:
+
+1. She does not randomize the samples with regard to the plate layout. The naive layout will put each drug candidate or DMSO in one column. For simplicity, let us assume that the cells treated with DMSO are put in column 1, and cells treated with the eleven drug candidates are put in columns 2 to 12.
+2. She randomizes the samples with regard to the plate layout, so that nearby samples are not necessarily of the same condition.
+
+What is the difference between the two variants? Option 2 apparently involves more planning and labor than option 1. If manual instead of robotic pipetting is involved, option 2 is likely error-prone. So why bothering considering the later option?
+
+Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. For instance, in our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
+
+```{r}
+set.seed(2307111)
+
+dat <- data.frame(SampleIndex=1:96,
+                  Compound=gl(12, 8, 
+                              labels=c("DMSO", paste0("Compound", 1:11))),
+                  NaiveRow=rep(1:8, 12),
+                  NaiveCol=rep(1:12, each=8))
+bc <- BatchContainer$new(
+  dimensions = list("plate" = 1, "row" = 8, "col" = 12),
+)
+
+assign_in_order(bc, dat)
+
+head(bc$get_samples()) %>% gt::gt()
+```
+

From 2f23d6d0c1df2b8a4bb6d4e4eceabb04f5811b07 Mon Sep 17 00:00:00 2001
From: Jitao David Zhang <jitao_david.zhang@roche.com>
Date: Tue, 22 Aug 2023 17:24:22 +0200
Subject: [PATCH 06/24] a generative model showing the necessity of designIt

---
 vignettes/generative_necessity.Rmd | 168 +++++++++++++++++++++++++++--
 1 file changed, 157 insertions(+), 11 deletions(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index c17b1d3a..1957fddd 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -25,33 +25,45 @@ library(designit)
 library(tidyverse)
 ```
 
-# The simplest case
+## A simple case study about plate effect: the background
 
-Assume we perform an experiment to test the effect of eleven drug candidates under development on cell viability. To do so, we treat cells in culture with a fixed concentration of each of the elevent candidates, and we treat cells with DMSO (dimethyl sulfoxide) as a vehicle control, since the drug candidates are all solved in DMSO solutions. 
+Assume we perform an experiment to test the effect of eleven drug candidates under development on cell viability. To do so, we treat cells in culture with a fixed concentration of each of the eleven candidates, and we treat cells with DMSO (dimethyl sulfoxide) as a vehicle control, since the drug candidates are all solved in DMSO solutions. 
 
 To assess the effect with regard to the variability intrinsic to the experiment setup, we measure the effect of each drug candidate (and DMSO) in eight different batches of cells, which are comparable to each other.
 
-In total, we have 96 samples: 11 drug candidates plus one DMSO control, 8 samples each. The samples fit neatly into a 96-well microtiter plate with 8 rows, which are usually marked with alphabets between `A` and `H`, and 12 columns, which are usually marked with numbers betwen 1 and 12.
+In total, we have 96 samples: 11 drug candidates plus one DMSO control, 8 samples each. The samples neatly fit into a 96-well microtiter plate with 8 rows, and 12 columns.
 
-In order to avoid batch effects and to make the operation simple, all operations and measurements are done by the same careful person and performed all at once. The operator has two possibilities:
+In order to avoid batch effects and to make the operation simple, all operations and measurements are done by the same careful operator and performed at the same time. The operator has two possibilities:
 
-1. She does not randomize the samples with regard to the plate layout. The naive layout will put each drug candidate or DMSO in one column. For simplicity, let us assume that the cells treated with DMSO are put in column 1, and cells treated with the eleven drug candidates are put in columns 2 to 12.
+1. She does *not* randomize the samples with regard to the plate layout. The naive layout will put each drug candidate or control (DMSO) in one column. For simplicity, let us assume that the cells treated with DMSO are put in column 1, and cells treated with the eleven drug candidates are put in columns 2 to 12.
 2. She randomizes the samples with regard to the plate layout, so that nearby samples are not necessarily of the same condition.
 
 What is the difference between the two variants? Option 2 apparently involves more planning and labor than option 1. If manual instead of robotic pipetting is involved, option 2 is likely error-prone. So why bothering considering the later option?
 
-Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. For instance, in our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
+Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
+
+For simplicity, we make following further assumptions:
+
+(1) The plate effect is radial, i.e. cells in wells on the edges are more affected by than cells in wells in the middle of the plate. 
+(2) The plate effect is positive, i.e. cells in edge wells show higher viability than cells in the middle wells.
+(3) None of the tested compounds regulate cell viability significantly, i.e. cells treated with compounds and cell treated with DMSO control have the same expected value of viability. We simulate the effect of DMSO and compounds by drawing random samples from a normal distribution.
+(4) The true effect of compounds and the plate effect is additive, i.e. our measurement is the sum of the true effect and the plate effect.
 
 ```{r}
 set.seed(2307111)
 
+conds <- c("DMSO", sprintf("Compound%02d", 1:11))
 dat <- data.frame(SampleIndex=1:96,
-                  Compound=gl(12, 8, 
-                              labels=c("DMSO", paste0("Compound", 1:11))),
-                  NaiveRow=rep(1:8, 12),
-                  NaiveCol=rep(1:12, each=8))
+                  Compound=factor(rep(conds, 8), levels=conds),
+                  rawRow=rep(1:8, each=12),
+                  rawCol=rep(1:12, 8)) %>%
+  mutate(trueEffect=rnorm(96, mean=10, sd=1),
+    plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
+    measurement=trueEffect + plateEffect)
 bc <- BatchContainer$new(
-  dimensions = list("plate" = 1, "row" = 8, "col" = 12),
+  dimensions = list("plate" = 1, 
+                    row = list(values=LETTERS[1:8]), 
+                    col = list(values=sprintf("%02d", 1:12)))
 )
 
 assign_in_order(bc, dat)
@@ -59,3 +71,137 @@ assign_in_order(bc, dat)
 head(bc$get_samples()) %>% gt::gt()
 ```
 
+## Simulating a study in which randomization is not used
+
+First we simulate a study in which randomization is not used. In this context, it means that the treatment (controls and compounds in columns) and the plate effect are correlated. The following plot visualizes the layout of the plate, the true effect, the plate effect, and the measurement as a sum of the true effect and the plate effect.
+
+```{r rawPlatePlots, fig.height=5.5, fig.width=8}
+cowplot::plot_grid(
+  plotlist = list(plot_plate(bc,
+                             plate=plate,
+                             row=row, column=col, .color=Compound,
+                             title="Layout by treatment"),
+                  plot_plate(bc,
+                             plate = plate, row = row, column = col, .color = trueEffect,
+                             title = "True effect"
+                  ),
+                  plot_plate(bc,
+                             plate = plate, row = row, column = col, .color = plateEffect,
+                             title = "Plate effect"
+                  ),
+                  plot_plate(bc,
+                             plate = plate, row = row, column = col, .color = measurement,
+                             title = "Measurement"
+                  )
+  ), ncol = 2, nrow=2
+)
+```
+
+When we perform an one-way ANOVA test with the true effect, the F-test suggests that there are no significant differences between the treatments (p>0.05).
+
+```{r}
+summary(aov(trueEffect ~ Compound, data=dat))
+```
+
+However, if we consider the measurement, which sums the true effect and the plate effect, the F-test suggests that there are significant differences between the compounds (p<0.01).
+
+```{r}
+summary(aov(measurement ~ Compound, data=dat))
+```
+
+To verify, we calculate Turkey's honest significant differences using true effect. As expected, no single compound shows significant difference from the effect of DMSO (adjusted p-value>0.05)
+```{r}
+versusDMSO <- paste0(conds[-1], "-", conds[1])
+trueDiff <- TukeyHSD(aov(trueEffect ~ Compound, data=dat))$Compound
+trueDiff[versusDMSO,]
+```
+
+However, calculating the differences with measurements reveal that Compound 6 would have a significant difference in viability from that of DMSO (adjusted p<0.01).
+
+```{r}
+measureDiff <- TukeyHSD(aov(measurement ~ Compound, 
+                            data=bc$get_samples()))$Compound
+measureDiff[versusDMSO,]
+```
+
+We can also detect the difference visually with a Box-Whisker plot.
+
+```{r boxplot, fig.height=5, fig.width=5}
+ggplot(bc$get_samples(), 
+       aes(x=Compound, y=measurement)) +
+  geom_boxplot() + ylab("Measurement [w/o randomization]") +
+  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
+```
+
+
+Given that our simulation study assumed that no single compound affects cell viability significantly differently from DMSO controls. So the addition of plate effect causes one false discovery in this simulation. It can be expected that the false-discovery rate may vary depending on the relative strength and variability of the plate effect with regard to the true effects. What matters most is the observation that in the presence of plate effect, a lack of randomization, i.e. a correlation of treatment with plate positions, may cause wrong inferences.
+
+## Randomization prevents plate effect from interfering with inferences
+
+Now we use the all but one assumptions made above, with the only change that we shall randomize the layout of the samples. The randomization will break the correlation between treatments and plate effects.
+
+```{r}
+set.seed(2307111)
+
+rand_dat <- data.frame(SampleIndex=1:96,
+                  Compound=sample(factor(rep(conds, 8), levels=conds)),
+                  rawRow=rep(1:8, each=12),
+                  rawCol=rep(1:12, 8)) %>%
+  mutate(trueEffect=rnorm(96, mean=10, sd=1),
+    plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
+    measurement=trueEffect + plateEffect)
+
+bc2 <- BatchContainer$new(
+  dimensions = list("plate" = 1, 
+                    row = list(values=LETTERS[1:8]), 
+                    col = list(values=sprintf("%02d", 1:12)))
+)
+
+assign_in_order(bc2, rand_dat)
+head(bc2$get_samples()) %>% gt::gt()
+```
+
+```{r randomPlatePlots, fig.height=5.5, fig.width=8}
+cowplot::plot_grid(
+  plotlist = list(plot_plate(bc2,
+                             plate=plate,
+                             row=row, column=col, .color=Compound,
+                             title="Layout by treatment"),
+                  plot_plate(bc2,
+                             plate = plate, row = row, column = col, .color = trueEffect,
+                             title = "True effect"
+                  ),
+                  plot_plate(bc2,
+                             plate = plate, row = row, column = col, .color = plateEffect,
+                             title = "Plate effect"
+                  ),
+                  plot_plate(bc2,
+                             plate = plate, row = row, column = col, .color = measurement,
+                             title = "Measurement"
+                  )
+  ), ncol = 2, nrow=2
+)
+```
+
+When we apply the F-test, we detect no significant differences between any compound and DMSO.
+
+```{r}
+randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound, 
+                            data=bc2$get_samples()))$Compound
+randMeasureDiff[versusDMSO,]
+```
+
+We can also use the boxplot as a visual help to inspect the difference between the treatments, to confirm that randomization prevents plate effect from affecting the statistical inference.
+
+```{r randBoxplot, fig.height=5, fig.width=5}
+ggplot(bc2$get_samples(), 
+       aes(x=Compound, y=measurement)) +
+  geom_boxplot() + ylab("Measurement [with randomization]") +
+  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
+```
+
+## Discussions and conclusions
+
+The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the effect of plate effect.
+
+While the case study is on the margin of being overly simple, we hope that it demonstrates the advantage of appropriate experiment design and the necessity of statistical techniques such as randomization and blocking when it comes to data analysis in drug discovery and development.

From 4176337a2b98027f1b85737f3e12e9be8f15dff9 Mon Sep 17 00:00:00 2001
From: Jitao David Zhang <jitao_david.zhang@roche.com>
Date: Tue, 22 Aug 2023 17:25:59 +0200
Subject: [PATCH 07/24] update the wording

---
 vignettes/generative_necessity.Rmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index 1957fddd..24f0ced5 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -1,5 +1,5 @@
 ---
-title: "On the necessity of experiment design: a generative modelling approach"
+title: "On the benefits of experiment design: a generative modelling approach"
 author: "Jitao david Zhang"
 output: rmarkdown::html_vignette
 vignette: >
@@ -204,4 +204,4 @@ ggplot(bc2$get_samples(),
 
 The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the effect of plate effect.
 
-While the case study is on the margin of being overly simple, we hope that it demonstrates the advantage of appropriate experiment design and the necessity of statistical techniques such as randomization and blocking when it comes to data analysis in drug discovery and development.
+While the case study is on the margin of being overly simple, we hope that it demonstrates the advantage of appropriate experiment design using tools like \textit{DesignIt}, as well as the necessity of statistical techniques such as randomization and blocking in drug discovery and development.

From 81a6554a559eb4523822bd72e4616ffdb5174bb2 Mon Sep 17 00:00:00 2001
From: julianesiebourg <51031392+julianesiebourg@users.noreply.github.com>
Date: Tue, 12 Mar 2024 11:16:40 +0100
Subject: [PATCH 08/24] Update vignettes/generative_necessity.Rmd

---
 vignettes/generative_necessity.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index 24f0ced5..6760bd8e 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -66,7 +66,7 @@ bc <- BatchContainer$new(
                     col = list(values=sprintf("%02d", 1:12)))
 )
 
-assign_in_order(bc, dat)
+bc <- assign_in_order(bc, dat)
 
 head(bc$get_samples()) %>% gt::gt()
 ```

From 8dd1748fc3d16e771f57fd46489f2a16773c0458 Mon Sep 17 00:00:00 2001
From: julianesiebourg <51031392+julianesiebourg@users.noreply.github.com>
Date: Tue, 12 Mar 2024 11:16:46 +0100
Subject: [PATCH 09/24] Update vignettes/generative_necessity.Rmd

---
 vignettes/generative_necessity.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index 6760bd8e..830b61a8 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -157,7 +157,7 @@ bc2 <- BatchContainer$new(
                     col = list(values=sprintf("%02d", 1:12)))
 )
 
-assign_in_order(bc2, rand_dat)
+bc2 <- assign_in_order(bc2, rand_dat)
 head(bc2$get_samples()) %>% gt::gt()
 ```
 

From 9b20533004f4ff229247e08b8d72be46a30de9eb Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 16:36:43 +0200
Subject: [PATCH 10/24] Apply suggestions from Juliane's code review

Co-authored-by: julianesiebourg <51031392+julianesiebourg@users.noreply.github.com>
---
 vignettes/generative_necessity.Rmd | 82 ++++++++++++++----------------
 1 file changed, 37 insertions(+), 45 deletions(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index 830b61a8..ca2ee2cf 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -1,5 +1,5 @@
 ---
-title: "On the benefits of experiment design: a generative modelling approach"
+title: "On the benefits of experiment design: a simulation approach"
 author: "Jitao david Zhang"
 output: rmarkdown::html_vignette
 vignette: >
@@ -9,7 +9,7 @@ vignette: >
 ---
 
 In this document, we demonstrate the necessity of a proper experiment design 
-with a generative model. We show that a proper experiment design helps
+with a generative model which we use to simulate data with "batch" effects. We show that a proper experiment design helps
 experimentalists and analysts make correct inference about the quantity of 
 interest that is robust against randomness.
 
@@ -40,7 +40,7 @@ In order to avoid batch effects and to make the operation simple, all operations
 
 What is the difference between the two variants? Option 2 apparently involves more planning and labor than option 1. If manual instead of robotic pipetting is involved, option 2 is likely error-prone. So why bothering considering the later option?
 
-Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
+Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from a *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
 
 For simplicity, we make following further assumptions:
 
@@ -52,23 +52,22 @@ For simplicity, we make following further assumptions:
 ```{r}
 set.seed(2307111)
 
-conds <- c("DMSO", sprintf("Compound%02d", 1:11))
-dat <- data.frame(SampleIndex=1:96,
-                  Compound=factor(rep(conds, 8), levels=conds),
-                  rawRow=rep(1:8, each=12),
-                  rawCol=rep(1:12, 8)) %>%
-  mutate(trueEffect=rnorm(96, mean=10, sd=1),
-    plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
-    measurement=trueEffect + plateEffect)
-bc <- BatchContainer$new(
-  dimensions = list("plate" = 1, 
-                    row = list(values=LETTERS[1:8]), 
-                    col = list(values=sprintf("%02d", 1:12)))
-)
+set.seed(2307111)
 
-bc <- assign_in_order(bc, dat)
+conditions <- c("DMSO", sprintf("Compound%02d", 1:11))
+# set up samples with conditions and true effects
+dat <- data.frame(SampleIndex = 1:96,
+                  Compound = factor(rep(conditions, 8), levels = conditions),
+                  trueEffect = rnorm(96, mean = 10, sd = 1))
+  
+# add the layout plus plate effect
+dat <- dat %>% 
+  mutate(
+    row=rep(1:8, each=12), col=rep(1:12, 8),
+    plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
+    measurement=trueEffect + plateEffect)
 
-head(bc$get_samples()) %>% gt::gt()
+head(dat) %>% gt::gt()
 ```
 
 ## Simulating a study in which randomization is not used
@@ -77,19 +76,19 @@ First we simulate a study in which randomization is not used. In this context, i
 
 ```{r rawPlatePlots, fig.height=5.5, fig.width=8}
 cowplot::plot_grid(
-  plotlist = list(plot_plate(bc,
+  plotlist = list(plot_plate(dat,
                              plate=plate,
                              row=row, column=col, .color=Compound,
                              title="Layout by treatment"),
-                  plot_plate(bc,
+                  plot_plate(dat,
                              plate = plate, row = row, column = col, .color = trueEffect,
                              title = "True effect"
                   ),
-                  plot_plate(bc,
+                  plot_plate(dat,
                              plate = plate, row = row, column = col, .color = plateEffect,
                              title = "Plate effect"
                   ),
-                  plot_plate(bc,
+                  plot_plate(dat,
                              plate = plate, row = row, column = col, .color = measurement,
                              title = "Measurement"
                   )
@@ -120,14 +119,14 @@ However, calculating the differences with measurements reveal that Compound 6 wo
 
 ```{r}
 measureDiff <- TukeyHSD(aov(measurement ~ Compound, 
-                            data=bc$get_samples()))$Compound
+                            data=dat))$Compound
 measureDiff[versusDMSO,]
 ```
 
 We can also detect the difference visually with a Box-Whisker plot.
 
 ```{r boxplot, fig.height=5, fig.width=5}
-ggplot(bc$get_samples(), 
+ggplot(dat, 
        aes(x=Compound, y=measurement)) +
   geom_boxplot() + ylab("Measurement [w/o randomization]") +
   theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
@@ -143,39 +142,32 @@ Now we use the all but one assumptions made above, with the only change that we
 ```{r}
 set.seed(2307111)
 
-rand_dat <- data.frame(SampleIndex=1:96,
-                  Compound=sample(factor(rep(conds, 8), levels=conds)),
-                  rawRow=rep(1:8, each=12),
-                  rawCol=rep(1:12, 8)) %>%
-  mutate(trueEffect=rnorm(96, mean=10, sd=1),
-    plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
+# add the layout plus plate effect
+randomized_dat <- dat %>% 
+  slice(sample(1:n())) %>% # shuffle the order of samples in the dataset
+  mutate(
+    row=rep(1:8, each=12), col=rep(1:12, 8),
+    plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
     measurement=trueEffect + plateEffect)
 
-bc2 <- BatchContainer$new(
-  dimensions = list("plate" = 1, 
-                    row = list(values=LETTERS[1:8]), 
-                    col = list(values=sprintf("%02d", 1:12)))
-)
-
-bc2 <- assign_in_order(bc2, rand_dat)
-head(bc2$get_samples()) %>% gt::gt()
+head(randomized_dat) %>% gt::gt()
 ```
 
 ```{r randomPlatePlots, fig.height=5.5, fig.width=8}
 cowplot::plot_grid(
-  plotlist = list(plot_plate(bc2,
+  plotlist = list(plot_plate(randomized_dat,
                              plate=plate,
                              row=row, column=col, .color=Compound,
                              title="Layout by treatment"),
-                  plot_plate(bc2,
+                  plot_plate(randomized_dat,
                              plate = plate, row = row, column = col, .color = trueEffect,
                              title = "True effect"
                   ),
-                  plot_plate(bc2,
+                  plot_plate(randomized_dat,
                              plate = plate, row = row, column = col, .color = plateEffect,
                              title = "Plate effect"
                   ),
-                  plot_plate(bc2,
+                  plot_plate(randomized_dat,
                              plate = plate, row = row, column = col, .color = measurement,
                              title = "Measurement"
                   )
@@ -187,14 +179,14 @@ When we apply the F-test, we detect no significant differences between any compo
 
 ```{r}
 randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound, 
-                            data=bc2$get_samples()))$Compound
+                            data=randomized_dat))$Compound
 randMeasureDiff[versusDMSO,]
 ```
 
 We can also use the boxplot as a visual help to inspect the difference between the treatments, to confirm that randomization prevents plate effect from affecting the statistical inference.
 
 ```{r randBoxplot, fig.height=5, fig.width=5}
-ggplot(bc2$get_samples(), 
+ggplot(randomized_dat, 
        aes(x=Compound, y=measurement)) +
   geom_boxplot() + ylab("Measurement [with randomization]") +
   theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
@@ -202,6 +194,6 @@ ggplot(bc2$get_samples(),
 
 ## Discussions and conclusions
 
-The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the effect of plate effect.
+The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the bias of the plate effect.
 
 While the case study is on the margin of being overly simple, we hope that it demonstrates the advantage of appropriate experiment design using tools like \textit{DesignIt}, as well as the necessity of statistical techniques such as randomization and blocking in drug discovery and development.

From 3ff8afa73a5f80903d3b69c459281f4bacbbb17d Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 14:51:47 +0000
Subject: [PATCH 11/24] remove duplicated statement

---
 vignettes/generative_necessity.Rmd | 2 --
 1 file changed, 2 deletions(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index ca2ee2cf..2013118b 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -52,8 +52,6 @@ For simplicity, we make following further assumptions:
 ```{r}
 set.seed(2307111)
 
-set.seed(2307111)
-
 conditions <- c("DMSO", sprintf("Compound%02d", 1:11))
 # set up samples with conditions and true effects
 dat <- data.frame(SampleIndex = 1:96,

From a35103e068bf34d0927c2293f6d51d1aceb25ed1 Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 14:52:07 +0000
Subject: [PATCH 12/24] fix variable naming error

---
 vignettes/generative_necessity.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index 2013118b..f77cc86f 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -108,7 +108,7 @@ summary(aov(measurement ~ Compound, data=dat))
 
 To verify, we calculate Turkey's honest significant differences using true effect. As expected, no single compound shows significant difference from the effect of DMSO (adjusted p-value>0.05)
 ```{r}
-versusDMSO <- paste0(conds[-1], "-", conds[1])
+versusDMSO <- paste0(conditions[-1], "-", conditions[1])
 trueDiff <- TukeyHSD(aov(trueEffect ~ Compound, data=dat))$Compound
 trueDiff[versusDMSO,]
 ```

From bf6d9edccd72f7b85927212740255f3c657d0e3e Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 14:52:25 +0000
Subject: [PATCH 13/24] use new pipe

---
 vignettes/generative_necessity.Rmd | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index f77cc86f..cff6a23e 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -59,13 +59,13 @@ dat <- data.frame(SampleIndex = 1:96,
                   trueEffect = rnorm(96, mean = 10, sd = 1))
   
 # add the layout plus plate effect
-dat <- dat %>% 
+dat <- dat |> 
   mutate(
     row=rep(1:8, each=12), col=rep(1:12, 8),
     plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
     measurement=trueEffect + plateEffect)
 
-head(dat) %>% gt::gt()
+head(dat) |> gt::gt()
 ```
 
 ## Simulating a study in which randomization is not used
@@ -141,14 +141,16 @@ Now we use the all but one assumptions made above, with the only change that we
 set.seed(2307111)
 
 # add the layout plus plate effect
-randomized_dat <- dat %>% 
-  slice(sample(1:n())) %>% # shuffle the order of samples in the dataset
+randomized_dat <- dat |> 
+  slice(sample(1:n())) |> # shuffle the order of samples in the dataset
   mutate(
     row=rep(1:8, each=12), col=rep(1:12, 8),
     plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
     measurement=trueEffect + plateEffect)
 
-head(randomized_dat) %>% gt::gt()
+randomized_dat |>
+  head() |>
+  gt::gt()
 ```
 
 ```{r randomPlatePlots, fig.height=5.5, fig.width=8}

From f9ef4b464bd39b848ac9fef2fc05bdb2e9f13ab3 Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 14:53:08 +0000
Subject: [PATCH 14/24] _styler

---
 vignettes/generative_necessity.Rmd | 144 ++++++++++++++++-------------
 1 file changed, 80 insertions(+), 64 deletions(-)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/generative_necessity.Rmd
index cff6a23e..42894808 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/generative_necessity.Rmd
@@ -54,16 +54,19 @@ set.seed(2307111)
 
 conditions <- c("DMSO", sprintf("Compound%02d", 1:11))
 # set up samples with conditions and true effects
-dat <- data.frame(SampleIndex = 1:96,
-                  Compound = factor(rep(conditions, 8), levels = conditions),
-                  trueEffect = rnorm(96, mean = 10, sd = 1))
-  
+dat <- data.frame(
+  SampleIndex = 1:96,
+  Compound = factor(rep(conditions, 8), levels = conditions),
+  trueEffect = rnorm(96, mean = 10, sd = 1)
+)
+
 # add the layout plus plate effect
-dat <- dat |> 
+dat <- dat |>
   mutate(
-    row=rep(1:8, each=12), col=rep(1:12, 8),
-    plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
-    measurement=trueEffect + plateEffect)
+    row = rep(1:8, each = 12), col = rep(1:12, 8),
+    plateEffect = 0.5 * sqrt((row - 4.5)^2 + (col - 6.5)^2),
+    measurement = trueEffect + plateEffect
+  )
 
 head(dat) |> gt::gt()
 ```
@@ -74,60 +77,66 @@ First we simulate a study in which randomization is not used. In this context, i
 
 ```{r rawPlatePlots, fig.height=5.5, fig.width=8}
 cowplot::plot_grid(
-  plotlist = list(plot_plate(dat,
-                             plate=plate,
-                             row=row, column=col, .color=Compound,
-                             title="Layout by treatment"),
-                  plot_plate(dat,
-                             plate = plate, row = row, column = col, .color = trueEffect,
-                             title = "True effect"
-                  ),
-                  plot_plate(dat,
-                             plate = plate, row = row, column = col, .color = plateEffect,
-                             title = "Plate effect"
-                  ),
-                  plot_plate(dat,
-                             plate = plate, row = row, column = col, .color = measurement,
-                             title = "Measurement"
-                  )
-  ), ncol = 2, nrow=2
+  plotlist = list(
+    plot_plate(dat,
+      plate = plate,
+      row = row, column = col, .color = Compound,
+      title = "Layout by treatment"
+    ),
+    plot_plate(dat,
+      plate = plate, row = row, column = col, .color = trueEffect,
+      title = "True effect"
+    ),
+    plot_plate(dat,
+      plate = plate, row = row, column = col, .color = plateEffect,
+      title = "Plate effect"
+    ),
+    plot_plate(dat,
+      plate = plate, row = row, column = col, .color = measurement,
+      title = "Measurement"
+    )
+  ), ncol = 2, nrow = 2
 )
 ```
 
 When we perform an one-way ANOVA test with the true effect, the F-test suggests that there are no significant differences between the treatments (p>0.05).
 
 ```{r}
-summary(aov(trueEffect ~ Compound, data=dat))
+summary(aov(trueEffect ~ Compound, data = dat))
 ```
 
 However, if we consider the measurement, which sums the true effect and the plate effect, the F-test suggests that there are significant differences between the compounds (p<0.01).
 
 ```{r}
-summary(aov(measurement ~ Compound, data=dat))
+summary(aov(measurement ~ Compound, data = dat))
 ```
 
 To verify, we calculate Turkey's honest significant differences using true effect. As expected, no single compound shows significant difference from the effect of DMSO (adjusted p-value>0.05)
 ```{r}
 versusDMSO <- paste0(conditions[-1], "-", conditions[1])
-trueDiff <- TukeyHSD(aov(trueEffect ~ Compound, data=dat))$Compound
-trueDiff[versusDMSO,]
+trueDiff <- TukeyHSD(aov(trueEffect ~ Compound, data = dat))$Compound
+trueDiff[versusDMSO, ]
 ```
 
 However, calculating the differences with measurements reveal that Compound 6 would have a significant difference in viability from that of DMSO (adjusted p<0.01).
 
 ```{r}
-measureDiff <- TukeyHSD(aov(measurement ~ Compound, 
-                            data=dat))$Compound
-measureDiff[versusDMSO,]
+measureDiff <- TukeyHSD(aov(measurement ~ Compound,
+  data = dat
+))$Compound
+measureDiff[versusDMSO, ]
 ```
 
 We can also detect the difference visually with a Box-Whisker plot.
 
 ```{r boxplot, fig.height=5, fig.width=5}
-ggplot(dat, 
-       aes(x=Compound, y=measurement)) +
-  geom_boxplot() + ylab("Measurement [w/o randomization]") +
-  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
+ggplot(
+  dat,
+  aes(x = Compound, y = measurement)
+) +
+  geom_boxplot() +
+  ylab("Measurement [w/o randomization]") +
+  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
 ```
 
 
@@ -141,12 +150,13 @@ Now we use the all but one assumptions made above, with the only change that we
 set.seed(2307111)
 
 # add the layout plus plate effect
-randomized_dat <- dat |> 
+randomized_dat <- dat |>
   slice(sample(1:n())) |> # shuffle the order of samples in the dataset
   mutate(
-    row=rep(1:8, each=12), col=rep(1:12, 8),
-    plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
-    measurement=trueEffect + plateEffect)
+    row = rep(1:8, each = 12), col = rep(1:12, 8),
+    plateEffect = 0.5 * sqrt((row - 4.5)^2 + (col - 6.5)^2),
+    measurement = trueEffect + plateEffect
+  )
 
 randomized_dat |>
   head() |>
@@ -155,41 +165,47 @@ randomized_dat |>
 
 ```{r randomPlatePlots, fig.height=5.5, fig.width=8}
 cowplot::plot_grid(
-  plotlist = list(plot_plate(randomized_dat,
-                             plate=plate,
-                             row=row, column=col, .color=Compound,
-                             title="Layout by treatment"),
-                  plot_plate(randomized_dat,
-                             plate = plate, row = row, column = col, .color = trueEffect,
-                             title = "True effect"
-                  ),
-                  plot_plate(randomized_dat,
-                             plate = plate, row = row, column = col, .color = plateEffect,
-                             title = "Plate effect"
-                  ),
-                  plot_plate(randomized_dat,
-                             plate = plate, row = row, column = col, .color = measurement,
-                             title = "Measurement"
-                  )
-  ), ncol = 2, nrow=2
+  plotlist = list(
+    plot_plate(randomized_dat,
+      plate = plate,
+      row = row, column = col, .color = Compound,
+      title = "Layout by treatment"
+    ),
+    plot_plate(randomized_dat,
+      plate = plate, row = row, column = col, .color = trueEffect,
+      title = "True effect"
+    ),
+    plot_plate(randomized_dat,
+      plate = plate, row = row, column = col, .color = plateEffect,
+      title = "Plate effect"
+    ),
+    plot_plate(randomized_dat,
+      plate = plate, row = row, column = col, .color = measurement,
+      title = "Measurement"
+    )
+  ), ncol = 2, nrow = 2
 )
 ```
 
 When we apply the F-test, we detect no significant differences between any compound and DMSO.
 
 ```{r}
-randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound, 
-                            data=randomized_dat))$Compound
-randMeasureDiff[versusDMSO,]
+randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound,
+  data = randomized_dat
+))$Compound
+randMeasureDiff[versusDMSO, ]
 ```
 
 We can also use the boxplot as a visual help to inspect the difference between the treatments, to confirm that randomization prevents plate effect from affecting the statistical inference.
 
 ```{r randBoxplot, fig.height=5, fig.width=5}
-ggplot(randomized_dat, 
-       aes(x=Compound, y=measurement)) +
-  geom_boxplot() + ylab("Measurement [with randomization]") +
-  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
+ggplot(
+  randomized_dat,
+  aes(x = Compound, y = measurement)
+) +
+  geom_boxplot() +
+  ylab("Measurement [with randomization]") +
+  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
 ```
 
 ## Discussions and conclusions

From bd9edde3295ef4ee8ca674111ac57348c23dcadc Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 15:06:06 +0000
Subject: [PATCH 15/24] rename vignette

---
 vignettes/{generative_necessity.Rmd => false_positives.Rmd} | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 rename vignettes/{generative_necessity.Rmd => false_positives.Rmd} (99%)

diff --git a/vignettes/generative_necessity.Rmd b/vignettes/false_positives.Rmd
similarity index 99%
rename from vignettes/generative_necessity.Rmd
rename to vignettes/false_positives.Rmd
index 42894808..441591a2 100644
--- a/vignettes/generative_necessity.Rmd
+++ b/vignettes/false_positives.Rmd
@@ -1,5 +1,5 @@
 ---
-title: "On the benefits of experiment design: a simulation approach"
+title: "Batch effects and false positives: a simulation study"
 author: "Jitao david Zhang"
 output: rmarkdown::html_vignette
 vignette: >

From 75b7d99a7721a5d5a56484b6e17ccdb038ea4d15 Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 15:11:54 +0000
Subject: [PATCH 16/24] false positives vignette: formatting

---
 vignettes/false_positives.Rmd | 138 +++++++++++++++++++++++++---------
 1 file changed, 104 insertions(+), 34 deletions(-)

diff --git a/vignettes/false_positives.Rmd b/vignettes/false_positives.Rmd
index 441591a2..9b98da86 100644
--- a/vignettes/false_positives.Rmd
+++ b/vignettes/false_positives.Rmd
@@ -8,10 +8,11 @@ vignette: >
   %\VignetteEngine{knitr::rmarkdown}
 ---
 
-In this document, we demonstrate the necessity of a proper experiment design 
-with a generative model which we use to simulate data with "batch" effects. We show that a proper experiment design helps
-experimentalists and analysts make correct inference about the quantity of 
-interest that is robust against randomness.
+In this document, we demonstrate the necessity of a proper experiment design
+with a generative model which we use to simulate data with "batch" effects. We
+show that a proper experiment design helps experimentalists and analysts make
+correct inference about the quantity of interest that is robust against
+randomness.
 
 ```{r, include = FALSE}
 knitr::opts_chunk$set(
@@ -27,27 +28,61 @@ library(tidyverse)
 
 ## A simple case study about plate effect: the background
 
-Assume we perform an experiment to test the effect of eleven drug candidates under development on cell viability. To do so, we treat cells in culture with a fixed concentration of each of the eleven candidates, and we treat cells with DMSO (dimethyl sulfoxide) as a vehicle control, since the drug candidates are all solved in DMSO solutions. 
-
-To assess the effect with regard to the variability intrinsic to the experiment setup, we measure the effect of each drug candidate (and DMSO) in eight different batches of cells, which are comparable to each other.
-
-In total, we have 96 samples: 11 drug candidates plus one DMSO control, 8 samples each. The samples neatly fit into a 96-well microtiter plate with 8 rows, and 12 columns.
-
-In order to avoid batch effects and to make the operation simple, all operations and measurements are done by the same careful operator and performed at the same time. The operator has two possibilities:
-
-1. She does *not* randomize the samples with regard to the plate layout. The naive layout will put each drug candidate or control (DMSO) in one column. For simplicity, let us assume that the cells treated with DMSO are put in column 1, and cells treated with the eleven drug candidates are put in columns 2 to 12.
-2. She randomizes the samples with regard to the plate layout, so that nearby samples are not necessarily of the same condition.
-
-What is the difference between the two variants? Option 2 apparently involves more planning and labor than option 1. If manual instead of robotic pipetting is involved, option 2 is likely error-prone. So why bothering considering the later option?
-
-Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from a *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
+Assume we perform an experiment to test the effect of eleven drug candidates
+under development on cell viability. To do so, we treat cells in culture with a
+fixed concentration of each of the eleven candidates, and we treat cells with
+DMSO (dimethyl sulfoxide) as a vehicle control, since the drug candidates are
+all solved in DMSO solutions. 
+
+To assess the effect with regard to the variability intrinsic to the experiment
+setup, we measure the effect of each drug candidate (and DMSO) in eight
+different batches of cells, which are comparable to each other.
+
+In total, we have 96 samples: 11 drug candidates plus one DMSO control, 8
+samples each. The samples neatly fit into a 96-well microtiter plate with 8
+rows, and 12 columns.
+
+In order to avoid batch effects and to make the operation simple, all operations
+and measurements are done by the same careful operator and performed at the same
+time. The operator has two possibilities:
+
+1. She does *not* randomize the samples with regard to the plate layout. The
+   naive layout will put each drug candidate or control (DMSO) in one column.
+   For simplicity, let us assume that the cells treated with DMSO are put in
+   column 1, and cells treated with the eleven drug candidates are put in
+   columns 2 to 12.
+2. She randomizes the samples with regard to the plate layout, so that nearby
+   samples are not necessarily of the same condition.
+
+What is the difference between the two variants? Option 2 apparently involves
+more planning and labor than option 1. If manual instead of robotic pipetting is
+involved, option 2 is likely error-prone. So why bothering considering the later
+option?
+
+Randomization pays off when unwanted variance is large enough so that it may
+distort our estimate of the quantity in which we are interested in. In our
+example, the unwanted variance may come from a *plate effect*: due to variances
+in temperature, humidity, and evaporation between wells in the plate, cells may
+respond differently to *even the same treatment*. Such *plate effects* are
+difficult to judge practically because they are not known prior to the
+experiment, unless a calibration study is performed where the cells in a
+microtiter plate are indeed treated with the same condition and measurements are
+performed in order to quantify the plate effect. However, it is simple to
+*simulate* such plate effects *in silico* with *a generative model*, and test
+the effect of randomization.
 
 For simplicity, we make following further assumptions:
 
-(1) The plate effect is radial, i.e. cells in wells on the edges are more affected by than cells in wells in the middle of the plate. 
-(2) The plate effect is positive, i.e. cells in edge wells show higher viability than cells in the middle wells.
-(3) None of the tested compounds regulate cell viability significantly, i.e. cells treated with compounds and cell treated with DMSO control have the same expected value of viability. We simulate the effect of DMSO and compounds by drawing random samples from a normal distribution.
-(4) The true effect of compounds and the plate effect is additive, i.e. our measurement is the sum of the true effect and the plate effect.
+(1) The plate effect is radial, i.e. cells in wells on the edges are more
+    affected by than cells in wells in the middle of the plate. 
+(2) The plate effect is positive, i.e. cells in edge wells show higher viability
+    than cells in the middle wells.
+(3) None of the tested compounds regulate cell viability significantly, i.e.
+  cells treated with compounds and cell treated with DMSO control have the same
+  expected value of viability. We simulate the effect of DMSO and compounds by
+  drawing random samples from a normal distribution.
+(4) The true effect of compounds and the plate effect is additive, i.e. our
+    measurement is the sum of the true effect and the plate effect.
 
 ```{r}
 set.seed(2307111)
@@ -73,7 +108,11 @@ head(dat) |> gt::gt()
 
 ## Simulating a study in which randomization is not used
 
-First we simulate a study in which randomization is not used. In this context, it means that the treatment (controls and compounds in columns) and the plate effect are correlated. The following plot visualizes the layout of the plate, the true effect, the plate effect, and the measurement as a sum of the true effect and the plate effect.
+First we simulate a study in which randomization is not used. In this context,
+it means that the treatment (controls and compounds in columns) and the plate
+effect are correlated. The following plot visualizes the layout of the plate,
+the true effect, the plate effect, and the measurement as a sum of the true
+effect and the plate effect.
 
 ```{r rawPlatePlots, fig.height=5.5, fig.width=8}
 cowplot::plot_grid(
@@ -99,26 +138,34 @@ cowplot::plot_grid(
 )
 ```
 
-When we perform an one-way ANOVA test with the true effect, the F-test suggests that there are no significant differences between the treatments (p>0.05).
+When we perform an one-way ANOVA test with the true effect, the F-test suggests
+that there are no significant differences between the treatments (p>0.05).
 
 ```{r}
 summary(aov(trueEffect ~ Compound, data = dat))
 ```
 
-However, if we consider the measurement, which sums the true effect and the plate effect, the F-test suggests that there are significant differences between the compounds (p<0.01).
+However, if we consider the measurement, which sums the true effect and the
+plate effect, the F-test suggests that there are significant differences between
+the compounds (p<0.01).
 
 ```{r}
 summary(aov(measurement ~ Compound, data = dat))
 ```
 
-To verify, we calculate Turkey's honest significant differences using true effect. As expected, no single compound shows significant difference from the effect of DMSO (adjusted p-value>0.05)
+To verify, we calculate Turkey's honest significant differences using true
+effect. As expected, no single compound shows significant difference from the
+effect of DMSO (adjusted p-value>0.05)
+
 ```{r}
 versusDMSO <- paste0(conditions[-1], "-", conditions[1])
 trueDiff <- TukeyHSD(aov(trueEffect ~ Compound, data = dat))$Compound
 trueDiff[versusDMSO, ]
 ```
 
-However, calculating the differences with measurements reveal that Compound 6 would have a significant difference in viability from that of DMSO (adjusted p<0.01).
+However, calculating the differences with measurements reveal that Compound 6
+would have a significant difference in viability from that of DMSO (adjusted
+p<0.01).
 
 ```{r}
 measureDiff <- TukeyHSD(aov(measurement ~ Compound,
@@ -140,11 +187,20 @@ ggplot(
 ```
 
 
-Given that our simulation study assumed that no single compound affects cell viability significantly differently from DMSO controls. So the addition of plate effect causes one false discovery in this simulation. It can be expected that the false-discovery rate may vary depending on the relative strength and variability of the plate effect with regard to the true effects. What matters most is the observation that in the presence of plate effect, a lack of randomization, i.e. a correlation of treatment with plate positions, may cause wrong inferences.
+Given that our simulation study assumed that no single compound affects cell
+viability significantly differently from DMSO controls. So the addition of plate
+effect causes one false discovery in this simulation. It can be expected that
+the false-discovery rate may vary depending on the relative strength and
+variability of the plate effect with regard to the true effects. What matters
+most is the observation that in the presence of plate effect, a lack of
+randomization, i.e. a correlation of treatment with plate positions, may cause
+wrong inferences.
 
 ## Randomization prevents plate effect from interfering with inferences
 
-Now we use the all but one assumptions made above, with the only change that we shall randomize the layout of the samples. The randomization will break the correlation between treatments and plate effects.
+Now we use the all but one assumptions made above, with the only change that we
+shall randomize the layout of the samples. The randomization will break the
+correlation between treatments and plate effects.
 
 ```{r}
 set.seed(2307111)
@@ -187,7 +243,8 @@ cowplot::plot_grid(
 )
 ```
 
-When we apply the F-test, we detect no significant differences between any compound and DMSO.
+When we apply the F-test, we detect no significant differences between any
+compound and DMSO.
 
 ```{r}
 randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound,
@@ -196,7 +253,9 @@ randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound,
 randMeasureDiff[versusDMSO, ]
 ```
 
-We can also use the boxplot as a visual help to inspect the difference between the treatments, to confirm that randomization prevents plate effect from affecting the statistical inference.
+We can also use the boxplot as a visual help to inspect the difference between
+the treatments, to confirm that randomization prevents plate effect from
+affecting the statistical inference.
 
 ```{r randBoxplot, fig.height=5, fig.width=5}
 ggplot(
@@ -210,6 +269,17 @@ ggplot(
 
 ## Discussions and conclusions
 
-The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the bias of the plate effect.
-
-While the case study is on the margin of being overly simple, we hope that it demonstrates the advantage of appropriate experiment design using tools like \textit{DesignIt}, as well as the necessity of statistical techniques such as randomization and blocking in drug discovery and development.
+The simple case study discussed in this vignette is an application of generative
+models, which means that assuming that we know the mechanism by which the data
+is generated, we can simulate the data generation process and use it for various
+purposes. In our cases, we simulated a linear additive model of true effects of
+compounds and control on cell viability and the plate effect induced by
+positions in a microtitre plate. Using the model, we demonstrate that (1) plate
+effect can impact statistical inference by introducing false positive (and in
+other case, false negative) findings, and (2) a full randomization can guard
+statistical inference by reducing the bias of the plate effect.
+
+While the case study is on the margin of being overly simple, we hope that it
+demonstrates the advantage of appropriate experiment design using tools like
+`designit`, as well as the necessity of statistical techniques such as
+randomization and blocking in drug discovery and development.

From debd34f8654accccab909097d16392462805b205 Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Mon, 30 Sep 2024 18:06:26 +0200
Subject: [PATCH 17/24] remove duplicate entry from DESCRIPTION

---
 DESCRIPTION | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 4b21abc2..98de9331 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -32,11 +32,6 @@ Authors@R: c(
            role = c("aut", "cph"),
            email = "balazs.banfai@roche.com",
            comment = c(ORCID = "0000-0003-0422-7977")),
-    person(given = "Jitao David",
-           family = "Zhang",
-           role = c("aut", "cph"),
-           email = "jitao_david.zhang@roche.com",
-           comment = c(ORCID="0000-0002-3085-0909")),
     person(given = "F. Hoffman-La Roche", role = c("cph", "fnd")))
 Description:
     Intelligently assign samples to batches in order to reduce batch effects.

From 3354fcceca64fc6786b6626a53d87adc8f662e7a Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Tue, 1 Oct 2024 11:42:50 +0000
Subject: [PATCH 18/24] use more designit-centric approach (including
 randomization)

---
 vignettes/false_positives.Rmd | 102 ++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 29 deletions(-)

diff --git a/vignettes/false_positives.Rmd b/vignettes/false_positives.Rmd
index 9b98da86..dc93059c 100644
--- a/vignettes/false_positives.Rmd
+++ b/vignettes/false_positives.Rmd
@@ -88,21 +88,30 @@ For simplicity, we make following further assumptions:
 set.seed(2307111)
 
 conditions <- c("DMSO", sprintf("Compound%02d", 1:11))
-# set up samples with conditions and true effects
-dat <- data.frame(
-  SampleIndex = 1:96,
-  Compound = factor(rep(conditions, 8), levels = conditions),
-  trueEffect = rnorm(96, mean = 10, sd = 1)
-)
-
-# add the layout plus plate effect
-dat <- dat |>
-  mutate(
-    row = rep(1:8, each = 12), col = rep(1:12, 8),
-    plateEffect = 0.5 * sqrt((row - 4.5)^2 + (col - 6.5)^2),
-    measurement = trueEffect + plateEffect
+# set up batch container
+bc <- BatchContainer$new(
+  dimensions = list(
+    row = 8, col = 12
+  )
+) |>
+  # assign samples with conditions and true effects
+  assign_in_order(
+    data.frame(
+      SampleIndex = 1:96,
+      Compound = factor(rep(conditions, 8), levels = conditions),
+      trueEffect = rnorm(96, mean = 10, sd = 1)
+    )
   )
 
+# get observations with batch effect
+get_observations <- function(bc) {
+  bc$get_samples() |>
+    mutate(
+      plateEffect = 0.5 * sqrt((row - 4.5)^2 + (col - 6.5)^2),
+      measurement = trueEffect + plateEffect
+    )
+}
+
 head(dat) |> gt::gt()
 ```
 
@@ -114,6 +123,10 @@ effect are correlated. The following plot visualizes the layout of the plate,
 the true effect, the plate effect, and the measurement as a sum of the true
 effect and the plate effect.
 
+```{r}
+dat <- get_observations(bc)
+```
+
 ```{r rawPlatePlots, fig.height=5.5, fig.width=8}
 cowplot::plot_grid(
   plotlist = list(
@@ -159,7 +172,10 @@ effect of DMSO (adjusted p-value>0.05)
 
 ```{r}
 versusDMSO <- paste0(conditions[-1], "-", conditions[1])
-trueDiff <- TukeyHSD(aov(trueEffect ~ Compound, data = dat))$Compound
+trueDiff <- TukeyHSD(aov(
+  trueEffect ~ Compound,
+  data = dat
+))$Compound
 trueDiff[versusDMSO, ]
 ```
 
@@ -202,19 +218,47 @@ Now we use the all but one assumptions made above, with the only change that we
 shall randomize the layout of the samples. The randomization will break the
 correlation between treatments and plate effects.
 
-```{r}
+We use the builting function `mk_plate_scoring_functions` to define the scoring
+functions for the plate layout. We then use the `optimize_design` function to
+randomize the layout of the samples.
+
+```{r, eval=FALSE}
 set.seed(2307111)
 
-# add the layout plus plate effect
-randomized_dat <- dat |>
-  slice(sample(1:n())) |> # shuffle the order of samples in the dataset
-  mutate(
-    row = rep(1:8, each = 12), col = rep(1:12, 8),
-    plateEffect = 0.5 * sqrt((row - 4.5)^2 + (col - 6.5)^2),
-    measurement = trueEffect + plateEffect
+bc_rnd <- optimize_design(
+  bc,
+  scoring = mk_plate_scoring_functions(bc,
+    row = "row", column = "col",
+    group = "Compound"
   )
+)
+```
+
+```{r, include=FALSE}
+# this is quite slow, we use cached results
+# bc_rnd$get_samples(include_id=TRUE) |> pull(.sample_id) |> dput()
+
+bc_rnd <- bc$move_samples(
+  location_assignment =
+    c(
+      1, 24, 57, 4, 91, 94, 8, 47, 27, 26, 66, 65, 53,
+      67, 87, 13, 42, 60, 38, 86, 58, 21, 88, 71, 82, 18,
+      56, 11, 77, 64, 31, 45, 85, 25, 3, 36, 69, 75, 50,
+      96, 46, 83, 52, 89, 79, 78, 20, 92, 35, 2, 73, 32,
+      16, 9, 34, 63, 54, 41, 84, 19, 90, 40, 23, 55, 61,
+      29, 12, 68, 74, 39, 70, 33, 80, 5, 48, 15, 93, 49,
+      30, 10, 59, 7, 14, 28, 62, 22, 43, 6, 51, 44, 81,
+      72, 17, 76, 95, 37
+    )
+)
+```
+
+We add plate effect to the randomized data and calculate the measurement.
+
+```{r}
+dat_rnd <- get_observations(bc_rnd)
 
-randomized_dat |>
+dat_rnd |>
   head() |>
   gt::gt()
 ```
@@ -222,20 +266,20 @@ randomized_dat |>
 ```{r randomPlatePlots, fig.height=5.5, fig.width=8}
 cowplot::plot_grid(
   plotlist = list(
-    plot_plate(randomized_dat,
+    plot_plate(dat_rnd,
       plate = plate,
       row = row, column = col, .color = Compound,
       title = "Layout by treatment"
     ),
-    plot_plate(randomized_dat,
+    plot_plate(dat_rnd,
       plate = plate, row = row, column = col, .color = trueEffect,
       title = "True effect"
     ),
-    plot_plate(randomized_dat,
+    plot_plate(dat_rnd,
       plate = plate, row = row, column = col, .color = plateEffect,
       title = "Plate effect"
     ),
-    plot_plate(randomized_dat,
+    plot_plate(dat_rnd,
       plate = plate, row = row, column = col, .color = measurement,
       title = "Measurement"
     )
@@ -248,7 +292,7 @@ compound and DMSO.
 
 ```{r}
 randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound,
-  data = randomized_dat
+  data = dat_rnd
 ))$Compound
 randMeasureDiff[versusDMSO, ]
 ```
@@ -259,7 +303,7 @@ affecting the statistical inference.
 
 ```{r randBoxplot, fig.height=5, fig.width=5}
 ggplot(
-  randomized_dat,
+  dat_rnd,
   aes(x = Compound, y = measurement)
 ) +
   geom_boxplot() +

From 2ff18472ea6642d24c4571ea36a8934dc93b19c3 Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Tue, 1 Oct 2024 11:47:25 +0000
Subject: [PATCH 19/24] update basic_examples to the new $move_samples() syntax
 (bc <- ...)

---
 vignettes/basic_examples.Rmd | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/vignettes/basic_examples.Rmd b/vignettes/basic_examples.Rmd
index d9abd956..21bd4906 100644
--- a/vignettes/basic_examples.Rmd
+++ b/vignettes/basic_examples.Rmd
@@ -14,7 +14,7 @@ vignette: >
   %\VignetteEncoding{UTF-8}
 ---
 
-This vignette demonstrates the use of the _deisngit_ package with a series 
+This vignette demonstrates the use of the `deisngit` package with a series 
 of examples deriving from the same task, namely to randomize samples of a 
 two-factor experiment into plate layouts. We shall start with the most basic
 use and gradually exploring some basic yet useful utilities provided
@@ -107,7 +107,6 @@ bc <- BatchContainer$new(
 bc
 
 bc$n_locations
-bc$exclude
 bc$get_locations() |> head()
 ```
 
@@ -146,7 +145,7 @@ locations we can use the `batchContainer$move_samples()` method.
 To swap two or more samples, use
 
 ```{r, fig.width=6, fig.height=3.5}
-bc$move_samples(src = c(1L, 2L), dst = c(2L, 1L))
+bc <- bc$move_samples(src = c(1L, 2L), dst = c(2L, 1L))
 
 plot_plate(bc$get_samples(remove_empty_locations = TRUE),
   plate = plate, column = column, row = row,
@@ -159,7 +158,7 @@ To assign all samples in one go, use the option `location_assignment`.
 The example below orders samples by ID and adds the empty locations afterwards
 
 ```{r, fig.width=6, fig.height=3.5}
-bc$move_samples(
+bc <- bc$move_samples(
   location_assignment = c(
     1:nrow(samples),
     rep(NA, (bc$n_locations - nrow(samples)))
@@ -287,7 +286,7 @@ samples with the method `bc$move_samples()`.
 we can optimize the design, for instance by shuffling the samples.
 4. Various options are available to further customize the design.
 
-Now you have already the first experience of using _designit_ for randomization,
+Now you have already the first experience of using `designit` for randomization,
 it is time to apply the learning to your work. If you need more examples or 
 if you want to understand more details of the package, please explore other
 vignettes of the package as well as check out the documentations.

From 0b0d6584f65b423850d77ac2a8c61c99909cc2cb Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Tue, 1 Oct 2024 11:54:30 +0000
Subject: [PATCH 20/24] clarify move_samples is in-place in basic_examples
 vignette

---
 R/batch_container.R          |  2 +-
 man/BatchContainer.Rd        |  2 +-
 vignettes/basic_examples.Rmd | 10 +++++++---
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/R/batch_container.R b/R/batch_container.R
index 431e6cc2..9100c8e1 100644
--- a/R/batch_container.R
+++ b/R/batch_container.R
@@ -267,7 +267,7 @@ BatchContainer <- R6::R6Class("BatchContainer",
 
 
     #' @description
-    #' Move samples between locations
+    #' Move samples between locations modifying the BatchContainer in place
     #'
     #' This method can receive either `src` and `dst` or `locations_assignment`.
     #'
diff --git a/man/BatchContainer.Rd b/man/BatchContainer.Rd
index e5c32ac8..99561d70 100644
--- a/man/BatchContainer.Rd
+++ b/man/BatchContainer.Rd
@@ -179,7 +179,7 @@ A \code{\link[tibble:tibble]{tibble}} with all the available locations.
 \if{html}{\out{<a id="method-BatchContainer-move_samples"></a>}}
 \if{latex}{\out{\hypertarget{method-BatchContainer-move_samples}{}}}
 \subsection{Method \code{move_samples()}}{
-Move samples between locations
+Move samples between locations modifying the BatchContainer in place
 
 This method can receive either \code{src} and \code{dst} or \code{locations_assignment}.
 \subsection{Usage}{
diff --git a/vignettes/basic_examples.Rmd b/vignettes/basic_examples.Rmd
index 21bd4906..d5169438 100644
--- a/vignettes/basic_examples.Rmd
+++ b/vignettes/basic_examples.Rmd
@@ -140,12 +140,16 @@ plot_plate(bc$get_samples(remove_empty_locations = TRUE),
 
 Sometimes we may wish to move samples, or to swap samples, or to manually 
 assign some locations. To move individual samples or manually assigning all
-locations we can use the `batchContainer$move_samples()` method.
+locations we can use the `BatchContainer$move_samples()` method.
+
+*Warning*: The `$move_samples()` method will modify the `BatchContainer` object
+in place. That is usually faster than creating a copy. Most of the time you
+will probably call `optimize_design()` instead of moving samples manually.
 
 To swap two or more samples, use
 
 ```{r, fig.width=6, fig.height=3.5}
-bc <- bc$move_samples(src = c(1L, 2L), dst = c(2L, 1L))
+bc$move_samples(src = c(1L, 2L), dst = c(2L, 1L))
 
 plot_plate(bc$get_samples(remove_empty_locations = TRUE),
   plate = plate, column = column, row = row,
@@ -158,7 +162,7 @@ To assign all samples in one go, use the option `location_assignment`.
 The example below orders samples by ID and adds the empty locations afterwards
 
 ```{r, fig.width=6, fig.height=3.5}
-bc <- bc$move_samples(
+bc$move_samples(
   location_assignment = c(
     1:nrow(samples),
     rep(NA, (bc$n_locations - nrow(samples)))

From 6e6dbf13fcefdfbe5ecd8c07a4b310ee5ba91d1b Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Tue, 1 Oct 2024 12:58:11 +0000
Subject: [PATCH 21/24] basic_examples vignette: improve conclusion

---
 vignettes/basic_examples.Rmd | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/vignettes/basic_examples.Rmd b/vignettes/basic_examples.Rmd
index d5169438..70576e9b 100644
--- a/vignettes/basic_examples.Rmd
+++ b/vignettes/basic_examples.Rmd
@@ -285,9 +285,11 @@ instance of `BatchContainer` with `BatchContainer$new()`.
 2. Use functions `assign_random` and `plot_plate` to assign samples randomly
 and to plot the plate layout. If necessary, you can retrieve the samples from
 the BatchContainer instance `bc` with the method `bc$get_samples()`, or move
-samples with the method `bc$move_samples()`.
-3. The scoring function of `bc` can be set by `bc$scoring_f`. Once it is set,
-we can optimize the design, for instance by shuffling the samples.
+samples with the method `bc$move_samples()`. The better approach usually is to 
+optimize the design with `optimize_design()`.
+3. The scoring function can be set by passinrg `scoring` parameter to the
+`optimize_design()` function. The sample assignent is optimized by shuffling
+the samples.
 4. Various options are available to further customize the design.
 
 Now you have already the first experience of using `designit` for randomization,

From f5899c7450fe37241f4c6ca2945605331582cbdb Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Tue, 1 Oct 2024 14:59:26 +0200
Subject: [PATCH 22/24] remove empty line

---
 vignettes/osat.Rmd | 1 -
 1 file changed, 1 deletion(-)

diff --git a/vignettes/osat.Rmd b/vignettes/osat.Rmd
index 1f43c80e..b7d7e1fb 100644
--- a/vignettes/osat.Rmd
+++ b/vignettes/osat.Rmd
@@ -40,7 +40,6 @@ samples <- read_tsv(file.path(osat_data_path, "samples.txt"),
 ```
 
 # Running OSAT optimization
-
 Here we use OSAT to optimize setup.
 ```{r}
 gs <- OSAT::setup.sample(samples, optimal = c("SampleType", "Race", "AgeGrp"))

From c35ef9231286116ec4db9a7f0a78a0d65b8f13ea Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Tue, 1 Oct 2024 13:05:06 +0000
Subject: [PATCH 23/24] rearrange false positives code and fix rendering error

---
 vignettes/false_positives.Rmd | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/vignettes/false_positives.Rmd b/vignettes/false_positives.Rmd
index dc93059c..64d68ad9 100644
--- a/vignettes/false_positives.Rmd
+++ b/vignettes/false_positives.Rmd
@@ -102,7 +102,17 @@ bc <- BatchContainer$new(
       trueEffect = rnorm(96, mean = 10, sd = 1)
     )
   )
+```
+
+## Simulating a study in which randomization is not used
 
+First we simulate a study in which randomization is not used. In this context,
+it means that the treatment (controls and compounds in columns) and the plate
+effect are correlated. The following plot visualizes the layout of the plate,
+the true effect, the plate effect, and the measurement as a sum of the true
+effect and the plate effect.
+
+```{r}
 # get observations with batch effect
 get_observations <- function(bc) {
   bc$get_samples() |>
@@ -111,20 +121,12 @@ get_observations <- function(bc) {
       measurement = trueEffect + plateEffect
     )
 }
-
-head(dat) |> gt::gt()
 ```
 
-## Simulating a study in which randomization is not used
-
-First we simulate a study in which randomization is not used. In this context,
-it means that the treatment (controls and compounds in columns) and the plate
-effect are correlated. The following plot visualizes the layout of the plate,
-the true effect, the plate effect, and the measurement as a sum of the true
-effect and the plate effect.
-
 ```{r}
 dat <- get_observations(bc)
+
+head(dat) |> gt::gt()
 ```
 
 ```{r rawPlatePlots, fig.height=5.5, fig.width=8}

From e0cd8c99fa723948fd91ad514d7884bd911ae895 Mon Sep 17 00:00:00 2001
From: Iakov Davydov <671660+idavydov@users.noreply.github.com>
Date: Wed, 16 Oct 2024 12:15:36 +0200
Subject: [PATCH 24/24] code review by Juliane

Co-authored-by: julianesiebourg <51031392+julianesiebourg@users.noreply.github.com>
---
 vignettes/basic_examples.Rmd | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/vignettes/basic_examples.Rmd b/vignettes/basic_examples.Rmd
index 70576e9b..f0745a5e 100644
--- a/vignettes/basic_examples.Rmd
+++ b/vignettes/basic_examples.Rmd
@@ -14,7 +14,7 @@ vignette: >
   %\VignetteEncoding{UTF-8}
 ---
 
-This vignette demonstrates the use of the `deisngit` package with a series 
+This vignette demonstrates the use of the `desingit` package with a series 
 of examples deriving from the same task, namely to randomize samples of a 
 two-factor experiment into plate layouts. We shall start with the most basic
 use and gradually exploring some basic yet useful utilities provided
@@ -76,7 +76,7 @@ samples <- bind_rows(replicate(n_reps, animals, simplify = FALSE),
 
 samples |>
   head(10) |>
-  arrange(animal, group, replicate) %>%
+  arrange(animal, group, replicate) |>
   gt::gt()
 ```
 
@@ -287,7 +287,7 @@ and to plot the plate layout. If necessary, you can retrieve the samples from
 the BatchContainer instance `bc` with the method `bc$get_samples()`, or move
 samples with the method `bc$move_samples()`. The better approach usually is to 
 optimize the design with `optimize_design()`.
-3. The scoring function can be set by passinrg `scoring` parameter to the
+3. The scoring function can be set by passing `scoring` parameter to the
 `optimize_design()` function. The sample assignent is optimized by shuffling
 the samples.
 4. Various options are available to further customize the design.