Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gamlss vs. mgcv conflicting results of KS test #454

Open
florianhartig opened this issue Nov 28, 2024 · 1 comment
Open

gamlss vs. mgcv conflicting results of KS test #454

florianhartig opened this issue Nov 28, 2024 · 1 comment

Comments

@florianhartig
Copy link
Owner

Here an example from a question that I received via email, showing that KS test p-values on a gamlss model are different from a KS test on an equivalent mgcv model. The question is why and if there is a problem.

library(DHARMa)
library(gamlss)
library(mgcv)

# Simulation function
simulate_mod <- function(model, n = 1000) { # default number of simulations in DHARMa
  fam <- model$family[1]
  random_generator <- get(paste0("r", fam))
  pred <- predict(model, type = "response")
  nObs <- length(pred)
  sim <- matrix(nrow = nObs, ncol = n)
  for(i in 1:n) sim[,i] <- random_generator(nObs, pred)
  return(sim)
}

# Sample model - Negative Binomial type 2
set.seed(2024)
y <- rNBI(1000)
mod <- gamlss(y ~ 1, family = "NBI", data = as.data.frame(y))

sim <- simulate_mod(mod)

DHARMa_res <- createDHARMa(simulatedResponse = sim,
                           observedResponse = eval(mod$call$data) |> dplyr::pull(toString(mod$call$formula[[2]])),
                           fittedPredictedResponse = predict(mod),
                           integerResponse = T)

testUniformity(DHARMa_res)

## mgcv equivalent
mod2 <- gam(y ~ 1, family = nb)
simulationOutput <- simulateResiduals(fittedModel = mod2, n = 1000)
testUniformity(simulationOutput)
@florianhartig
Copy link
Owner Author

florianhartig commented Nov 28, 2024

OK, this is not a problem or error. You have to consider that randomized quantile residuals are, as the name says, randomized. The randomization is smoothing out the inter-valued observations and is done internally in DHARMa.

In principle, each time you create a residual plot, you would get slightly different values. This is also not solved by setting a larger n, as the randomization is over observations, and not the simulations (i.e. it would be solved by increasing observations, but not simulations). To avoid confusing the users or allow them to play around with the randomisation, DHARMa fixes the random seed while calculating the residuals, but you can overrun this by trying out different values of the seed in

simulationOutput <- simulateResiduals(fittedModel = mod2, seed = 2)
testUniformity(simulationOutput)

Running this will allow you to get an appreciation of the "natural variation" of the randomized quantile residuals.

This "fixing" of the seed will, however, not work across different model packages, as they don't all do the simulations in exactly the same way. Therefore, you may get slightly different residuals when you use different regression packages on the same data. In your case, you wrote the simulation function by hand, so you can just also just rerun the block

sim <- simulate_mod(mod)
DHARMa_res <- createDHARMa(simulatedResponse = sim,
                           observedResponse = eval(mod$call$data) |> dplyr::pull(toString(mod$call$formula[[2]])),
                           fittedPredictedResponse = predict(mod),
                           integerResponse = T)

and you will get slightly different residuals.

Note that this is not an issue - the variation in the residuals is considered in the DHARMa tests, and as you see, although you get slightly different values, the result overall is correct, which is that most (I would expect around 95%) of the cases, the KS test is n.s.

I would only see a reason for concern if you note that type I error rates are different between packages or wrong for a particular package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants