-
Notifications
You must be signed in to change notification settings - Fork 54
/
Copy pathrequired-no-defaults.qmd
84 lines (58 loc) · 3.63 KB
/
required-no-defaults.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Required args shouldn't have defaults {#sec-required-no-defaults}
```{r}
#| include = FALSE
source("common.R")
```
## What's the pattern?
Required arguments shouldn't have defaults; optional arguments should have defaults.
In other words, an argument should have a default if and only if it's optional.
This simple convention ensures that you can tell which arguments are optional and which arguments are required from a glance at the function signature.
Otherwise you need to rely on a careful reading of documentation.
Additionally, if you don't follow this convention and want to provide helpful error messages, you'll need to implement them yourself rather than relying on R's defaults.
::: {.callout-note collapse="true"}
## When should an argument be required?
This pattern raises the question of when an argument should be required, and when you should provide a default.
I think this usually seems "obvious" but I wanted to discuss a few functions that might get it wrong:
- `rnorm()` and `runif()` are interesting cases as they set default values for `mean`/`sd` and `min`/`max`.
Giving them defaults makes them feels like less important, and inconsistent with the other RNGs which generally require that you specify the parameters of the distribution.
But both the normal and uniform distributions have very high-profile "standard" versions that make sense as defaults.
- You can use `predict()` directly on a model and it gives predictions for the data used to fit the model:
```{r}
mod <- lm(Employed ~ ., data = longley)
head(predict(mod))
```
In my opinion, `predict()` should always require a dataset because prediction is primary about applying the model to new situations.
- `stringr::str_sub()` has default values for `start` and `end`.
This allows you to do clever things like `str_sub(x, end = 3)` or `str_sub(x, -3)` to select the first or last three characters, but I now believe that leads to code that is harder to read, and it would have been better to make `start` and `end` required arguments.
:::
## What are some examples?
This is a straightforward convention that the vast majority of functions follow.
There are a few exceptions that exist in base R, mostly for historical reasons.
Here are a couple of examples:
- In `sample()` neither `x` not `size` has a default value:
```{r}
args(sample)
```
This suggests that `size` is required, but it's actually optional:
```{r}
sample(1:4)
sample(4)
```
- `lm()` does not have defaults for `formula`, `data`, `subset`, `weights`, `na.action`, or `offset`.
```{r}
args(lm)
```
But only `formula` is actually required:
```{r}
x <- 1:5
y <- 2 * x + 1 + rnorm(length(x))
lm(y ~ x)
```
In the tidyverse, one function that fails to follow this pattern is `ggplot2::geom_abline()`, `slope` and `intercept` don't have defaults but are not required.
If you don't supply them they default to `slope = 1` and `intercept = 0`, *or* are taken from `aes()` if they're provided there.
This is a mistake caused by trying to have `geom_abline()` do too much --- it can be both used as an annotation (i.e. with a single `slope` and `intercept`) or used to draw multiple lines from data (i.e. with one line for each row).
## How do I use the pattern?
This pattern is generally easy to follow: if you don't use `missing()` it's very hard to do this by mistake.
## How do I remediate past mistakes?
If an argument is required, remove the default argument.
If an argument is optional, either set it to the default value, or if the computation is complicated, set it to `NULL` and then compute inside the body of the function.