Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize score generation and optimization #54

Open
8 tasks
banfai opened this issue Nov 25, 2024 · 4 comments
Open
8 tasks

Generalize score generation and optimization #54

banfai opened this issue Nov 25, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@banfai
Copy link
Collaborator

banfai commented Nov 25, 2024

this should not need the whole batch container, it should work on subset:

mk_plate_scoring_functions <- function(batch_container, plate = NULL, row, column, group, p = 2, penalize_lines = "soft") {

list of lists, should work with group_by:

scoring_funcs <- purrr::map(within_plate_variables, ~ mk_plate_scoring_functions(bc, plate = plate, row = row, column = column, group = .x)) |>

TODO:

  • decide how to proceed
    1. to act on a sample subset bc |> group_by() |> optimize(<bc>, scoring_fun = mk_scoring(batch_var))
    2. or do it like bc |> group_by() |> mk_scoring("osat") |> optimize()
  • batch container NEEDS grouping (write a method to group_by the samplesheet and add a key to the bc)
  • we NEED a new function to work on a subset of samplesheet -> grid_2d_score_generator()
  • how to name it?
  • scoring function might collide with the shuffling shuffling needs to act on the subset as well
  • what would this break? -
    osat_score_generator <- function(batch_vars, feature_vars, quiet = FALSE) {
  • remove the "group_by" variable from the batch_vars?
  • shuffle function should also respect this

<1.> bc |> optimize(scoring = "osat") |> group_by("plate") |> optimize(scoring = "2d")
<2.> bc |> add_scoring_fun("osat", "plate") |> optimize() |> group_by("plate") |> add_scoring_fun("2d") |> optimize()

this should be generalizable to any grouping by batch, etc.

even more general:
bc |> add_scoring_fun("osat", "plate") |> add_shuffling_fun("row_swap") |> optimize()

@banfai banfai changed the title Split mk_plate_scoring_fun to be able to run on subsets Generalize score generation and optimization Nov 25, 2024
@banfai banfai added enhancement New feature or request help wanted Extra attention is needed labels Nov 25, 2024
@idavydov
Copy link
Collaborator

Hi @banfai ,

from what I understand there are two problems:

  1. some scoring functions (osat_score()) cache things. and we need some kind of cache invalidation when they switch from one working on a different group
  2. some scoring function (plate_scoring_func()) rely on batch container when they are created (i.e., before we even know which group will be the first one to be optimized)

I think the solution to this is to

a) make function creation agnostic of a batch container
b) think about a caching mechanism which can be used both for osat_score() and plate_scoring_func()

my personal opinion: group_by() (similar to rowwise()) is not a good approach. I think Hadley even considered removing group_by() because it was only affecting mutate() and summarize() and then it could be an explicit argument to those functions.

In our case group_by() will only affect a single function optimize_design(). so we can use it as a semantic sugar (similar to how ggplots are constructed), but a grouped batch container is a very difficult concept to grasp. and all the groupping/ungroupping will create more problems than solve.

I see more confusion than actual benefit.

I think the approach should be:

bc |> optimize(<bc>, scoring_fun = mk_scoring(batch_var), group_by=c(x, y))

@idavydov
Copy link
Collaborator

also note on add_scoring_fun and add_shuffling_fun, we had score as a part of BatchContainer, and we concluded that it doesn't conceptually belong there. the same with shuffling.

if we want to go this way, we should create an separate optimization object. which is by the way possible as a wrapper around optimize design.

@ingitwetrust
Copy link
Collaborator

Hi @idavydov . Just to explain how we thought yesterday in the discussion, addressing your 2 points above...

  1. The caching would not be an issue at all if we implement add_scoring_fun in a way that it generates n independent osat or whatever scorings and puts them into the bc, ready to be used for the n subgroups. So the scoring functions generators would not see the bc at all but just work on the details they have to know about the respective (sub)sample lists.
  2. We thought that those scoring functions should be rewritten ion a way that they don't see the bc. Regarding the plst_score_generator, we would like to change it so that it generates the scoring function just for a single 2d grid. Seems to be a cleaner setup.

So the idea would be to put all the sophistication into the add_scoring and add_shuffling functions and keep the actual generators of those scorings/shufflings as lean as possible. The pipe syntax seemed attryctive to the 3 of us present yesterday and it could work out like this...

@idavydov
Copy link
Collaborator

Some unstructured notes:

scoring_f <- osat_score_generator("plates", c("SampleType", "Race", "AgeGrp"))
optimize_design(bc, scoring = scoring_f, max_iter = params$iterations)

scoring_f <- cached_score_generator(osat_score, feature_vars="plates", batch_vars=c("SampleType", "Race", "AgeGrp"))
optimize_design(bc, scoring = scoring_f, max_iter = params$iterations)



score_factory <- function() {function(f, ...) {
  force(f)
  args <- list(...)
  function() {
    cache <- NULL
    function(dt) {
      res <- do.call(f, c(args,  cache = cache))
      cache <<- res$cache
      return(res$score_values)
    }
  }
}

bc |>
  optimize_design(score = osat_score, group_by = plate)

fast_osat_score <- cached_score_generator(osat_score, batch = "...", feature = "...")

bc |>
  optimize_design(score = , group_by = plate)

bc |>
  optimize_design(score = osat_score_generator(batch_var = y, features = x ))


scoring_f <- list(
  my_simple_score,
  \(dt) osat_score(dt, feature_vars="a", batch_vars="b"),
  score_generator(osat_score, feature_vars="a", batch_vars="b")
  ...
)
optimize_design(bc, scoring = scoring_f, max_iter = params$iterations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants