Skip to content

Commit

Permalink
vignette updates
Browse files Browse the repository at this point in the history
  • Loading branch information
florianhartig committed May 11, 2020
1 parent c8682ee commit 849c354
Showing 1 changed file with 25 additions and 39 deletions.
64 changes: 25 additions & 39 deletions DHARMa/vignettes/DHARMa.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,15 @@ editor_options:
chunk_output_type: console
---

```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=8.5, fig.height=5.5, fig.align='center', warning=FALSE, message=FALSE, cache = T)
```

```{r, echo = F, message = F}
library(DHARMa)
set.seed(123)
```


```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=8.5, fig.height=5.5, fig.align='center', warning=FALSE, message=FALSE, cache = F)
```

# Motivation

The interpretation of conventional residuals for generalized linear (mixed) and other hierarchical statistical models is often problematic. As an example, here the result of conventional Deviance, Pearson and raw residuals for two Poisson GLMMs, one that is lacking a quadratic effect, and one that fits the data perfectly. Could you tell which is the correct model?
Expand Down Expand Up @@ -110,21 +109,21 @@ testData = createData(sampleSize = 250)
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData)
```

Most functions in DHARMa could be calculated directly on the fitted model. So, for example, if you are only interested in testing dispersion, you could calculate
Most functions in DHARMa could be calculated directly on the fitted model. So, for example, if you are only interested in testing dispersion, you could calculate

```{r}
```{r, fig.show='hide'}
testDispersion(fittedModel)
```

In this case, the randomized quantile residuals are calculated on the fly. However, residual calculation can take a while, and would have to be repeated by every other test you call. It is therefore highly recommended to first calculate the residuals once, using the simulateResiduals() function.

```{r}
```{r, fig.show='hide'}
simulationOutput <- simulateResiduals(fittedModel = fittedModel, plot = T)
```

The function implements the algorithm discussed above, i.e. it a) creates n new synthetic datasets by simulating from the fitted model, b) calculates the cumulative distribution of simulated values for each observed value, and c) calculates the quantile (residual) value that corresponds to the observed value. Those quantiles are called "scaled residuals" in DHARMa. For example, a scaled residual value of 0.5 means that half of the simulated data are higher than the observed value, and half of them lower. A value of 0.99 would mean that nearly all simulated data are lower than the observed value. The minimum/maximum values for the residuals are 0 and 1. The function returns an object of class DHARMa, containing the simulations and the scaled residuals, which can then be passed on to all other plots and test functions. When specifying the optional argument plot = T, the standard DHARMa residual plot is displayed directly, which will be discussed below. The calculated residuals can be accesed via

```{r, eval = F}
```{r, results = "hide"}
residuals(simulationOutput)
```

Expand Down Expand Up @@ -290,8 +289,8 @@ A few general rules of thumb
This this is how **overdispersion** looks like in the DHARMa residuals

```{r}
testData = createData(sampleSize = 500, overdispersion = 2, family = poisson())
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData)
testData = createData(sampleSize = 200, overdispersion = 1.5, family = poisson())
fittedModel <- glm(observedResponse ~ Environment1 , family = "poisson", data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput)
Expand All @@ -307,19 +306,10 @@ This is an example of underdispersion
testData = createData(sampleSize = 500, intercept=0, fixedEffects = 2, overdispersion = 0, family = poisson(), roundPoissonVariance = 0.001, randomEffectVariance = 0)
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData)
summary(fittedModel)
# plotConventionalResiduals(fittedModel)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput)
```

```{r fig.width=4.5, fig.height=4.5}
testUniformity(simulationOutput = simulationOutput)
```


Here, we get too many residuals around 0.5, which means that we are not getting as many residuals as we would expect in the tail of the distribution than expected from the fitted model.


Expand Down Expand Up @@ -439,14 +429,12 @@ simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput)
```

The
The exact p-values for the quantile lines in the plot can be displayed via

```{r}
```{r, eval = F}
testQuantiles(simulationOutput)
plotResiduals(simulationOutput)
```


Adding a simple overdispersion correction will try to find a compromise between the different levels of dispersion in the model. The qq plot looks better now, but there is still a pattern in the residuals

```{r}
Expand Down Expand Up @@ -479,7 +467,7 @@ fittedModel <- glmer(observedResponse ~ Environment1 + Environment2 + (1|group)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
# plotConventionalResiduals(fittedModel)
plot(simulationOutput, quantreg = T)
testUniformity(simulationOutput = simulationOutput)
# testUniformity(simulationOutput = simulationOutput)
```

It is difficult to see that there is a problem at all in the general plot, but it becomes clear if we plot against the environment
Expand All @@ -506,13 +494,12 @@ simulationOutput <- simulateResiduals(fittedModel = fittedModel)

The function testTemporalAutocorrelation performs a Durbin-Watson test from the package lmtest on the uniform residuals to test for temporal autocorrelation in the residuals, and additionally plots the residuals against time.

The function also has an option to perform the test against randomized time (H0) - the sense of this is to be able to run simulations for testing if the test has correct error rates in the respective situation, i.e. is not oversensitive (too high sensitivity has sometimes been reported for Durbin-Watson).

```{r, fig.width=4, fig.height=4}
```{r}
testTemporalAutocorrelation(simulationOutput = simulationOutput, time = testData$time)
testTemporalAutocorrelation(simulationOutput = simulationOutput)
```

If no time varialbe is provided, the function uses a random time (H0) - apart from testing, the sense of this is to be able to run simulations for testing if the test has correct error rates in the respective situation, i.e. is not oversensitive (too high sensitivity has sometimes been reported for Durbin-Watson).

Note general caveats mentioned about the DW test in the help of testTemporalAutocorrelation(). In general, as for spatial autocorrelation, it is difficult to specify one test, because temporal and spatial autocorrelation can appear in many flavors, short-scale and long scale, homogenous or not, and so on. The pre-defined functions in DHARMa are a starting point, but they are not something you should rely on blindly.

## Spatial autocorrelation
Expand All @@ -536,7 +523,7 @@ An additional test against randomized space (H0) can be performed, for the same

```{r, fig.width=4.5, fig.height=4.5}
testSpatialAutocorrelation(simulationOutput = simulationOutput, x = testData$x, y= testData$y)
testSpatialAutocorrelation(simulationOutput = simulationOutput)
# testSpatialAutocorrelation(simulationOutput = simulationOutput) # again, this uses random x,y
```

The usual caveats for Moran.I apply, in particular that it may miss non-local and heterogeneous (non-stationary) spatial autocorrelation. The former should be better detectable visually in the spatial plot, or via regressions on the pattern.
Expand Down Expand Up @@ -577,8 +564,9 @@ data$logDensity = log10(data$density.attack+1)
```


```{r, fig.width=4, fig.height=4}
plot(N_parasitized / (N_adult + N_parasitized ) ~ logDensity, xlab = "Density", ylab = "Proportion infected", data = data)
```{r, fig.width=5, fig.height=5}
plot(N_parasitized / (N_adult + N_parasitized ) ~ logDensity,
xlab = "Density", ylab = "Proportion infected", data = data)
```

Let's fit the data with a regular binomial n/k glm
Expand Down Expand Up @@ -645,9 +633,9 @@ Somewhat better, but not good. Move to neg binom, to adjust dispersion

```{r}
m3 <- glmmTMB(SiblingNegotiation ~ FoodTreatment*SexParent + offset(log(BroodSize)) + (1|Nest), data=Owls , family = nbinom1)
summary(m3)
res <- simulateResiduals(m3, plot = T)
par(mfrow = c(1,3))
plotResiduals(res, Owls$FoodTreatment)
testDispersion(res)
testZeroInflation(res)
Expand All @@ -657,10 +645,9 @@ We see underdispersion now. In a model with variable dispersion, this is often t

```{r}
m4 <- glmmTMB(SiblingNegotiation ~ FoodTreatment*SexParent + offset(log(BroodSize)) + (1|Nest), ziformula = ~ FoodTreatment + SexParent, data=Owls , family = nbinom1)
summary(m4)
res <- simulateResiduals(m4)
plot(res)
res <- simulateResiduals(m4, plot = T)
par(mfrow = c(1,3))
plotResiduals(res, Owls$FoodTreatment)
testDispersion(res)
testZeroInflation(res)
Expand All @@ -670,9 +657,9 @@ This looks a lot better. Trying a slightly different model specification

```{r}
m5 <- glmmTMB(SiblingNegotiation ~ FoodTreatment*SexParent + offset(log(BroodSize)) + (1|Nest), dispformula = ~ FoodTreatment , ziformula = ~ FoodTreatment + SexParent, data=Owls , family = nbinom1)
summary(m5)
simulateResiduals(m5, plot = T)
res <- simulateResiduals(m4, plot = T)
par(mfrow = c(1,3))
plotResiduals(res, Owls$FoodTreatment)
testDispersion(res)
testZeroInflation(res)
Expand Down Expand Up @@ -727,7 +714,6 @@ testDispersion(simulationOutput)
simulationOutput = recalculateResiduals(simulationOutput , group = testData$group)
testDispersion(simulationOutput)
```


Expand Down

0 comments on commit 849c354

Please sign in to comment.