diff --git a/DHARMa/vignettes/DHARMa.Rmd b/DHARMa/vignettes/DHARMa.Rmd index 26280be1..0311a381 100644 --- a/DHARMa/vignettes/DHARMa.Rmd +++ b/DHARMa/vignettes/DHARMa.Rmd @@ -104,61 +104,64 @@ Let's assume we have a fitted model that is supported by DHARMa. ```{r} testData = createData(sampleSize = 250) -fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData) +fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , + family = "poisson", data = testData) ``` -Most functions in DHARMa could be calculated directly on the fitted model. So, for example, if you are only interested in testing dispersion, you could calculate +Most functions in DHARMa can be calculated directly on the fitted model object. For example, if you are only interested in testing for dispersion problems, you could run -```{r, fig.show='hide'} +```{r, results = "hide", fig.show='hide'} testDispersion(fittedModel) ``` -In this case, the randomized quantile residuals are calculated on the fly. However, residual calculation can take a while, and would have to be repeated by every other test you call. It is therefore highly recommended to first calculate the residuals once, using the simulateResiduals() function. +In this case, the randomized quantile residuals are calculated on the fly inside the function call. If you work in this way, however, residual calculation will be repeated by every test / plot you call, and this can take a while. It is therefore highly recommended to first calculate the residuals once, using the simulateResiduals() function -```{r, fig.show='hide'} -simulationOutput <- simulateResiduals(fittedModel = fittedModel, plot = T) +```{r} +simulationOutput <- simulateResiduals(fittedModel = fittedModel, plot = F) ``` -The simulateResiduals function calculates randomized quantile residuals according to the algorithm discussed above, i.e. it a) creates n new synthetic datasets by simulating from the fitted model, b) calculates the cumulative distribution of simulated values for each observed value, and c) calculates the quantile (residual) value that corresponds to the observed value. Those quantiles are called "scaled residuals" in DHARMa. For example, a scaled residual value of 0.5 means that half of the simulated data are higher than the observed value, and half of them lower. A value of 0.99 would mean that nearly all simulated data are lower than the observed value. The minimum/maximum values for the residuals are 0 and 1. The function returns an object of class DHARMa, containing the simulations and the scaled residuals, which can then be passed on to all other plots and test functions. When specifying the optional argument plot = T, the standard DHARMa residual plot is displayed directly, which will be discussed below. The calculated residuals can be accessed via +which calculates calculates randomized quantile residuals according to the algorithm discussed above. The function returns an object of class DHARMa, containing the simulations and the scaled residuals, which can later be passed on to all other plots and test functions. When specifying the optional argument plot = T, the standard DHARMa residual plot is displayed directly. The interpretation of the plot will be discussed below. Using the simulateResiduals function has the added benefit that you can modify the way in which residuals are calculated. For example, you may want to change the number of simulations, or the REs to condition on. See ?simulateResiduals and section "simulation options" below for details. + +The calculated (scaled) residuals can be plotted and tested via a number of DHARMa functions (see below), or accessed directly via ```{r, results = "hide"} residuals(simulationOutput) ``` -Using the simulateResiduals function has the added benefit that you can modify the way in which residuals are calculated. For example, the default number of simulations to run is n = 250, which proved to be a reasonable compromise between computation time and precision, but if high precision is desired, n should be raised to 1000 at least. - -As discussed above, for a correctly specified model we would expect asymptotically +To interpret the residuals, remember that a scaled residual value of 0.5 means that half of the simulated data are higher than the observed value, and half of them lower. A value of 0.99 would mean that nearly all simulated data are lower than the observed value. The minimum/maximum values for the residuals are 0 and 1. For a correctly specified model we would expect asymptotically * a uniform (flat) distribution of the scaled residuals * uniformity in y direction if we plot against any predictor. -Note: the expected uniform distribution is the only differences to the linear regression that one has to keep in mind when interpreting DHARMa residuals. If you cannot get used to this and you must have residuals that behave exactly like a linear regression, you can transform the uniform distribution to another distribution, for example normal, in the residual() function via +Note: the uniform distribution is the only differences to "conventional" residuals as calculated for a linear regression. If you cannot get used to this, you can transform the uniform distribution to another distribution, for example normal, via + ```{r, eval = F} residuals(simulationOutput, quantileFunction = qnorm, outlierValues = c(-7,7)) ``` -These normal residuals will behave exactly like the residuals of a linear regression. However, for reasons of a) numeric stability with low number of simulations and b) my conviction that it is much easier to visually detect deviations from uniformity than normality, I would advice against using this transformation. +These normal residuals will behave exactly like the residuals of a linear regression. However, for reasons of a) numeric stability with low number of simulations, which makes it neccessary to descide on which value outliers are to be transformed and b) my conviction that it is much easier to visually detect deviations from uniformity than normality, DHARMa checks all residuals in the uniform space, and I would personally advice against using the transformation. ## Plotting the scaled residuals -The main plot function for DHARMa residuals is the plot.DHARMa() function +The main plot function for the calculated DHARMa object produced by simulateResiduals() is the plot.DHARMa() function ```{r} plot(simulationOutput) ``` -The plot function creates two plots, which can also be called separately +The function creates two plots, which can also be called separately, and provide extended explanations / examples in the help ```{r, eval = F} plotQQunif(simulationOutput) # left plot in plot.DHARMa() plotResiduals(simulationOutput) # right plot in plot.DHARMa() ``` -* plotQQunif creates a qq-plot to detect overall deviations from the expected distribution, by default with added tests for correct distribution (KS test), dispersion and outliers. Note that outliers in DHARMa are values that are by default defined as values outside the simulation envelope, not in terms of a particular quantile. Thus, which values will appear as outliers will depend on the number of simulations. If you want outliers in terms of a particuar quantile, you can use the outliers() function. -* plotResiduals produces a plot of the residuals against the predicted value (or alternatively, other variable). Simulation outliers (data points that are outside the range of simulated values) are highlighted as red stars. These points should be carefully interpreted, because we actually don't know "how much" these values deviate from the model expectation. Note also that the probability of an outlier depends on the number of simulations (in fact, it is 1/(nSim +1) for each side), so whether the existence of outliers is a reason for concern depends also on the number of simulations. +* plotQQunif (left panel) creates a qq-plot to detect overall deviations from the expected distribution, by default with added tests for correct distribution (KS test), dispersion and outliers. Note that outliers in DHARMa are values that are by default defined as values outside the simulation envelope, not in terms of a particular quantile. Thus, which values will appear as outliers will depend on the number of simulations. If you want outliers in terms of a particuar quantile, you can use the outliers() function. -To provide a visual aid in detecting deviations from uniformity in y-direction, the plot function calculates an (optional) quantile regression, which compares the empirical 0.25, 0.5 and 0.75 quantiles in y direction (red solid lines) with the theoretical 0.25, 0.5 and 0.75 quantiles (dashed black line), and provides a p-value for the deviation from the expected quantile. The significance of the deviation to the expected quantiles is tested and displayed visually, and can be additionally extracted with the testQuantiles function. +* plotResiduals (right panel) produces a plot of the residuals against the predicted value (or alternatively, other variable). Simulation outliers (data points that are outside the range of simulated values) are highlighted as red stars. These points should be carefully interpreted, because we actually don't know "how much" these values deviate from the model expectation. Note also that the probability of an outlier depends on the number of simulations, so whether the existence of outliers is a reason for concern depends also on the number of simulations. + +To provide a visual aid in detecting deviations from uniformity in y-direction, the plot function calculates an (optional default) quantile regression, which compares the empirical 0.25, 0.5 and 0.75 quantiles in y direction (red solid lines) with the theoretical 0.25, 0.5 and 0.75 quantiles (dashed black line), and provides a p-value for the deviation from the expected quantile. The significance of the deviation to the expected quantiles is tested and displayed visually, and can be additionally extracted with the testQuantiles function. By default, plotResiduals plots against predicted values. However, you can also use it to plot residuals against a specific other predictors (highly recommend). @@ -166,7 +169,7 @@ By default, plotResiduals plots against predicted values. However, you can also plotResiduals(simulationOutput, form = YOURPREDICTOR) ``` -If the predictor is a factor, or if there is just a small number of observations on the x axis, plotResiduals will plot a box plot instead of a scatter plot. +If the predictor is a factor, or if there is just a small number of observations on the x axis, plotResiduals will plot a box plot with additional tests instead of a scatter plot. ```{r, eval = F} plotResiduals(simulationOutput, form = testData$group) @@ -257,6 +260,8 @@ Moreover (general advice), to ensure reproducibility, it's advisable to add a se # Interpreting residuals and recognizing misspecification problems +## General remarks on interperting residual patterns and tests + In all plots / tests that were shown so far, the model was correctly specified, resulting in "perfect" residual plots. In this section, we discuss how to recognize and interpret model misspecifications in the scaled residuals. Note, however, that 1. The fact that none of the here-presented tests shows a misspecification problem doesn't proof that the model is correctly specified. There are likely a large number of structural problems that will not show a pattern in the standard residual plots. @@ -265,22 +270,20 @@ In all plots / tests that were shown so far, the model was correctly specified, **Important conclusion: DHARMa only flags a difference between the observed and expected data - the user has to decide whether this difference is actually a problem for the analysis!** -## Overdispersion / underdispersion +This leads us to another *A word of warning* that applies also to all tests: significance is NOT a measures of the strength of the residual pattern, it is a measure of the signa/noise ratio. Significance in hypothesis tests depends on at least 2 ingredients: strenght of the signal, and number of data points. Hence, the p-value alone is not a good indicator of the extent to which your residuals deviate from assumptions. Specifically, if you have a lot of data points, residual diagnostics will nearly inevitably become significant, because having a perfectly fitting model is very unlikely. That, however, doesn't necessarily mean that you need to change your model. The p-values confirm that there is a deviation from your null hypothesis. It is, however, in your discretion to decide whether this deviation is worth worrying about. For example, if you see a dispersion parameter of 1.01, I would not worry, even if the dispersion test is significant. A significant value of 5, however, is clearly a reason to move to a model that accounts for overdispersion. -The most common concern for GLMMs is overdispersion, underdispersion and zero-inflation. +## Recognizing over/underdispersion -Over/underdispersion refers to the phenomenon that residual variance is larger/smaller than expected under the fitted model. Over/underdispersion can appear for any distributional family with fixed variance, in particular for Poisson and binomial models. +GL(M)Ms often display over/underdispersion, which means that residual variance is larger/smaller than expected under the fitted model. This phenomenon is most common for GLM families with constant (fixed) dispersion, in particular for Poisson and binomial models, but it can also occur in GLM families that adjust the variance (such as the beta or negative binomial) when distribution assumptions are violated. A few general rules of thumb about dealing with dispersion problems: -A few general rules of thumb - -* You can detect overdispersion / zero-inflation only AFTER fitting the model +* Dispersion is a property of the residuals, i.e. you can detect dispersion problems only AFTER fitting the model. It doesn't make sense to look at the dispersion of your response variable * Overdispersion is more common than underdispersion -* If overdispersion is present, confidence intervals tend to be too narrow, and p-values to small. The opposite is true for underdispersion +* If overdispersion is present, the main effect is that confidence intervals tend to be too narrow, and p-values to small, leading to inflated type I error. The opposite is true for underdispersion, i.e. the main issue of underdispersion is that you loose power. * A common reason for overdispersion is a misspecified model. When overdispersion is detected, one should therefore first search for problems in the model specification (e.g. by plotting residuals against predictors with DHARMa), and only if this doesn't lead to success, overdispersion corrections such as individual-level random effects or changes in the distribution should be applied -#### An example of overdispersion +### Residual patterns of over/underdispersion -This this is how **overdispersion** looks like in the DHARMa residuals +This this is how **overdispersion** looks like in the DHARMa residuals. Note that we get more residuals around 0 and 1, which means that more residuals are in the tail of distribution than would be expected under the fitted model. ```{r} testData = createData(sampleSize = 200, overdispersion = 1.5, family = poisson()) @@ -290,11 +293,7 @@ simulationOutput <- simulateResiduals(fittedModel = fittedModel) plot(simulationOutput) ``` -Note that we get more residuals around 0 and 1, which means that more residuals are in the tail of distribution than would be expected under the fitted model. - -#### An example of underdispersion - -This is an example of underdispersion +This is an example of underdispersion. Here, we get too many residuals around 0.5, which means that we are not getting as many residuals as we would expect in the tail of the distribution than expected from the fitted model. ```{r} testData = createData(sampleSize = 500, intercept=0, fixedEffects = 2, overdispersion = 0, family = poisson(), roundPoissonVariance = 0.001, randomEffectVariance = 0) @@ -304,43 +303,38 @@ simulationOutput <- simulateResiduals(fittedModel = fittedModel) plot(simulationOutput) ``` -Here, we get too many residuals around 0.5, which means that we are not getting as many residuals as we would expect in the tail of the distribution than expected from the fitted model. - -#### Testing for over/underdispersion +### Formal tests for over/underdispersion -Although, as discussed above, over/underdispersion will show up in the residuals, and it's possible to detect it with the testUniformity function, simulations show that this test is less powerful than more targeted tests. DHARMa therefore contains several overdispersion tests that compare the dispersion of simulated residuals to the observed residuals. +Although, as discussed above, over/underdispersion will show up in the residuals, and it's possible to detect it with the testUniformity function, simulations show that this test is less powerful than more targeted tests. DHARMa contains several overdispersion tests that compare the dispersion of simulated residuals to the observed residuals. -1. *default:* a non-parametric test that compares the variance of the simulated residuals to the observed residuals (default), which has some analogy to the variance test implemented in R package aer -2. *PearsonChisq:* alternatively, DHARMa implements the parametric Pearson-chi2 test that is popular in the literature, suggested in the glmm Wiki, and implemented in some other R packages such as performance. +1. *default:* a non-parametric test that compares the variance of the simulated residuals to the observed residuals (default), which has some analogy to the variance test implemented in aer::dispersiontest +2. *PearsonChisq:* alternatively, DHARMa implements the Pearson-chi2 test that is popular in the literature, suggested in the glmm Wiki, and implemented in some other R packages such as performance::check_overdispersion 3. *refit* if residual simulations are done via refit, DHARMa will compare the the Pearson residuals of the re-fitted simulations to the original Pearson residuals. This is essentially a nonparametric version of test 2. -All of these tests are included in the testDispersion function, see help for details. +All of these tests are included in the testDispersion function, see ?testDispersion for details. ```{r overDispersionTest, echo = T, fig.width=4.5, fig.height=4.5} testDispersion(simulationOutput) ``` -IMPORTANT: we have made extensive simulations, which have shown that the various tests have certain advantages and disadvantages. The basic results are that: +IMPORTANT INFO: we have made extensive simulations, which have shown that the various tests have certain advantages and disadvantages. The basic results are that: * The most powerful and reliable test is option 3, but this costs a lot of time and is not available for all regression packages, as it requires that Pearson residuals are available -* The parametric Pearson-chi2 is fast if Pearson residuals are available, but based on a naive expectation of df and thus biased to underdispersion for mixed models, bias increasing with the number of RE levels. When testing only for overdispersion, this makes the bias conservative, but it also costs power. Overall, we do not see an advantage over option 1 -* The DHARMa default option 1 is fast, nearly unbiased (i.e. you can test under and overdispersion), and only slightly less powerful as test 3, PROVIDED that simulations are made conditional on the fitted REs. Note that the latter is not the DHARMa default, so you have to actively request conditional simulations, e.g. for lme4 by specifying re.form = NULL. +* Option 2, the parametric Pearson-chi2 is fast if Pearson residuals are available, but based on a naive expectation of df (counts RE as 1 df) and the test statistic is thus biased towards underdispersion for mixed models. Similar to the df approximation, Bias increasing with the number of RE levels. When testing only for overdispersion (alternative = "greater"), this makes the test more conservative, but it also costs power. +* The DHARMa default option 1 is fast, nearly unbiased (i.e. you can test under and overdispersion), and only slightly less powerful as test 3, PROVIDED that simulations are made conditional on the fitted REs. Note that the latter is not the DHARMa default, so you have to actively request conditional simulations, e.g. for lme4 by specifying re.form = NULL. Power compared to the parametric Pearson-chi2 test depends on the number of RE levels, it will be more powerful for typical number of RE levels. -As support for these statements, here results of the simulation, which compares the uniform (KS) test with the standard simuation-based test (conditional and unconditional) and the Pearson-chi2 test (two-sided and greater) for a Poisson GLMM with 30 group levels. +As support for these statements, here results of the simulation, which compares the uniform (KS) test with the standard simuation-based test (conditional and unconditional) and the Pearson-chi2 test (two-sided and greater) for an n=200 Poisson GLMM with 30 RE levels. - + -Thus, the current recommendation is: for most users, use the default DHARMa test, but create simulations conditionally. See ?testDispersion for details. +Thus, the current recommendation is: for most users, use the default DHARMa test, but create simulations conditionally. -*A word of warning* that applies also to all other tests that follow: significance in hypothesis tests depends on at least 2 ingredients: strenght of the signal, and number of data points. Hence, the p-value alone is not a good indicator of the extent to which your residuals deviate from assumptions. Specifically, if you have a lot of data points, residual diagnostics will nearly inevitably become significant, because having a perfectly fitting model is very unlikely. That, however, doesn't necessarily mean that you need to change your model. The p-values confirm that there is a deviation from your null hypothesis. It is, however, in your discretion to decide whether this deviation is worth worrying about. If you see a dispersion parameter of 1.01, I would not worry, even if the test is significant. A significant value of 5, however, is clearly a reason to move to a model that accounts for overdispersion. ## Zero-inflation / k-inflation or deficits -A common special case of overdispersion is zero-inflation, which is the situation when more zeros appear in the observation than expected under the fitted model. Zero-inflation requires special correction steps. - -More generally, we can also have too few zeros, or too much or too few of any other values. We'll discuss that at the end of this section +A common special case of overdispersion is zero-inflation, which is the situation when more zeros appear in the observation than expected under the fitted model. Zero-inflation requires special correction steps. More generally, we can also have too few zeros, or too much or too few of any other values. We'll discuss that at the end of this section. -#### An example of zero-inflation +### Residual patterns Here an example of a typical zero-inflated count dataset, plotted against the environmental predictor @@ -352,14 +346,9 @@ plot(testData$Environment1, testData$observedResponse, xlab = "Envrionmental Pre hist(testData$observedResponse, xlab = "Response", main = "") ``` -We see a hump-shaped dependence of the environment, but with too many zeros. - -#### Zero-inflation in the scaled residuals - -In the normal DHARMa residual, plots, zero-inflation will look pretty much like overdispersion +We see a hump-shaped dependence of the environment, but with too many zeros. In the normal DHARMa residual plots, zero-inflation will look pretty much like overdispersion ```{r} - fittedModel <- glmer(observedResponse ~ Environment1 + I(Environment1^2) + (1|group) , family = "poisson", data = testData) simulationOutput <- simulateResiduals(fittedModel = fittedModel) @@ -368,18 +357,15 @@ plot(simulationOutput) The reason is that the model will usually try to find a compromise between the zeros, and the other values, which will lead to excess variance in the residuals. -#### Test for zero-inflation +### Formal tests for zero-inflation -DHARMa has a special test for zero-inflation, which compares the distribution of expected zeros in the data against the observed zeros +DHARMa also has a special test for zero-inflation, which compares the distribution of expected zeros in the data against the observed zeros ```{r, fig.width=4, fig.height=4} - testZeroInflation(simulationOutput) ``` -This test is likely better suited for detecting zero-inflation than the standard plot, but note that also overdispersion will lead to excess zeros, so only seeing too many zeros is not a reliable diagnostics for moving towards a zero-inflated model. A reliable differentiation between overdispersion and zero-inflation will usually only be possible when directly comparing alternative models, e.g. through residual comparison / model selection of a model with / without zero-inflation, or by simply fitting a model with zero-inflation and looking at the parameter estimate for the zero-inflation. - -A good option is the R package glmmTMB, which is also supported by DHARMa. We can use this to fit +This test is likely better suited for detecting zero-inflation than the standard plot, but note that also overdispersion will lead to excess zeros, so only seeing too many zeros is not a reliable diagnostics for moving towards a zero-inflated model. A reliable differentiation between overdispersion and zero-inflation will usually only be possible when directly comparing alternative models, e.g. through residual comparison / model selection of a model with / without zero-inflation, or by simply fitting a model with zero-inflation and looking at the parameter estimate for the zero-inflation. A good option is the R package glmmTMB, which is also supported by DHARMa. We can use this to fit ```{r} library(glmmTMB) @@ -759,7 +745,7 @@ In particular in the latter case (prior dominates, which can be checked via prio Of course, also the MLE distribution might get problems in low data situations, but I would argue that MLE is usually only used anyway if the MLE is reasonably sharp. In practice, I have seldom experienced problems with MLE estimates. It's a bit different in the Bayesian case, where it is possible and often done to fit very complex models with limited data. In this case, many of the general issues in defining null distributions for Bayesian p-values (as, e.g., reviewed in [Conn et al., 2018](https://doi.org/10.1002/ecm.1314)) apply. -# Supported packages +# Supported packages and frameworks ## lm and glm @@ -789,9 +775,11 @@ GLMMadaptive is supported by DHARMa since 0.3.4. If confronted with an unsupported package, DHARMa will try to use standard S3 functions such as coef(), simulate() etc. to perform simulations. If no error occurs, a residual object will be calculated, and a warning will be provided that this package has not been checked for full functionality. In many cases, the results can be used though (but no guarantee, maybe check with null simulations if the results are OK). Other than that, see my general comments about [adding new R packages to DHARMa](https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA) -# Importing external simulations (e.g. from Bayesian software or unsupported packages) +## Importing external simulations (e.g. from Bayesian software or unsupported packages) + +DHARMa can also import external simulations from a fitted model via createDHARMa(), which will be interesting for unsupported packages and for Bayesians. -DHARMa can also import external simulations from a fitted model via createDHARMa(), which will be interesting for unsupported packages and for Bayesians. Bayesians should note the extra Vignette on "DHARMa for Bayesians". +**Bayesians should note the extra Vignette on "DHARMa for Bayesians" regarding the interpretation of these residuals.** Here, an example for how to create simulations for a Poisson glm. Of course, it doesn't make sense to do this as glm is a supported model class, but you could do the same in case you want to check a model class that is currently not supported by DHARMa.