You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here an example from a question that I received via email, showing that KS test p-values on a gamlss model are different from a KS test on an equivalent mgcv model. The question is why and if there is a problem.
library(DHARMa)
library(gamlss)
library(mgcv)
# Simulation function
simulate_mod <- function(model, n = 1000) { # default number of simulations in DHARMa
fam <- model$family[1]
random_generator <- get(paste0("r", fam))
pred <- predict(model, type = "response")
nObs <- length(pred)
sim <- matrix(nrow = nObs, ncol = n)
for(i in 1:n) sim[,i] <- random_generator(nObs, pred)
return(sim)
}
# Sample model - Negative Binomial type 2
set.seed(2024)
y <- rNBI(1000)
mod <- gamlss(y ~ 1, family = "NBI", data = as.data.frame(y))
sim <- simulate_mod(mod)
DHARMa_res <- createDHARMa(simulatedResponse = sim,
observedResponse = eval(mod$call$data) |> dplyr::pull(toString(mod$call$formula[[2]])),
fittedPredictedResponse = predict(mod),
integerResponse = T)
testUniformity(DHARMa_res)
## mgcv equivalent
mod2 <- gam(y ~ 1, family = nb)
simulationOutput <- simulateResiduals(fittedModel = mod2, n = 1000)
testUniformity(simulationOutput)
The text was updated successfully, but these errors were encountered:
OK, this is not a problem or error. You have to consider that randomized quantile residuals are, as the name says, randomized. The randomization is smoothing out the inter-valued observations and is done internally in DHARMa.
In principle, each time you create a residual plot, you would get slightly different values. This is also not solved by setting a larger n, as the randomization is over observations, and not the simulations (i.e. it would be solved by increasing observations, but not simulations). To avoid confusing the users or allow them to play around with the randomisation, DHARMa fixes the random seed while calculating the residuals, but you can overrun this by trying out different values of the seed in
Running this will allow you to get an appreciation of the "natural variation" of the randomized quantile residuals.
This "fixing" of the seed will, however, not work across different model packages, as they don't all do the simulations in exactly the same way. Therefore, you may get slightly different residuals when you use different regression packages on the same data. In your case, you wrote the simulation function by hand, so you can just also just rerun the block
Note that this is not an issue - the variation in the residuals is considered in the DHARMa tests, and as you see, although you get slightly different values, the result overall is correct, which is that most (I would expect around 95%) of the cases, the KS test is n.s.
I would only see a reason for concern if you note that type I error rates are different between packages or wrong for a particular package.
Here an example from a question that I received via email, showing that KS test p-values on a gamlss model are different from a KS test on an equivalent mgcv model. The question is why and if there is a problem.
The text was updated successfully, but these errors were encountered: