Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved discussion of source() #289

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions 16-projects.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,17 @@ When you work on larger projects or reports, you may not want to put all text an

## Source external R scripts {#source-script}

If your R Markdown document has a large amount of code, you may consider putting some code in external R scripts, and run these scripts via `source()`\index{source()} or `sys.source()`\index{sys.source()}, e.g.,
If your R Markdown document has a large amount of code, you may consider putting some code in external R scripts, and running these scripts with `source()`\index{source()}[^sys-source]. This not only makes your R Markdown document cleaner, but can also make it more convenient to develop R code since (e.g.) debugging R code is often easier in pure R scripts.

````md
```{r, include=FALSE}`r ''`
source("your-script.R", local = knitr::knit_global())
Copy link
Member

@yihui yihui Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is more robust than local = TRUE. It works anywhere (e.g. inside a function). It's an absolute environment, and local = TRUE is a relative environment, which is subject to your current environment, and could be confusing when it is not called directly in a code chunk but from another function, e.g.,

load_scripts <- function(files) {
  for (f in files) source(f, local = TRUE)
}
load_scripts(c('a.R', 'b.R'))

Both local = TRUE and FALSE have their usefulness. There isn't an absolutely correct choice, so I didn't strongly emphasize one or another. Perhaps the distinction between the two choices should be explained more clearly in this section. In general, I feel source("your-script.R", local = knitr::knit_global()) is still the safest choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if you want to run source() inside a function and have it impact code outside of the function, you're going to have to do something special. As a general principle, I don't think functions should mutate the global environment in this way, so I do believe that local = TRUE is the correct recommendation here. (And then mention local = knitr::knitr_global() as a special case if you really want to do that)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is now really about advice for writing functions that call source, not about calling source() in knitr chunks.

# or sys.source("your-script.R", envir = knitr::knit_global())
source("your-script.R", local = TRUE)
```
````

We recommend that you use the argument `local` in `source()` or `envir` in `sys.source()` explicitly to make sure the code is evaluated in the correct environment, i.e., `knitr::knit_global()`\index{knitr!knit\_global()}. The default values for them may not be the appropriate environment: you may end up creating variables in the wrong environment, and being surprised that certain objects are not found in later code chunks.
[^sys-source]: We don't recomend ever using `sys.source()` as it is designed specifically for internal use by base R.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't recommend ever using sys.source() as it is designed specifically for internal use by base R.

I haven't heard of this before and am surprised to learn it. Could you point out the source of this claim? The help page ?sys.source didn't say it was designed only for internal use by base R. I mentioned sys.source() because it does a much simpler job than source(), i.e., it basically does eval(parse()), whereas source() does several other things that are irrelevant to evaluating the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's implied by the envir argument which defaults to baseenv() — there is no case in which a user would want this. Do you have any evidence to suggest that people are using sys.source() in the wild?


Next in the R Markdown document, you can use objects created in these scripts (e.g., data objects or functions). This way will not only make your R Markdown document cleaner, but also make it more convenient for you to develop R code (e.g., debugging R code is often easier with pure R scripts than R Markdown).
We recommend using the `local = TRUE` argument to `source()`. The default behaviour is to evaluate the code in the global environment, which can cause confusion. Usually, knitr will evaluate code in a child of the global environment, so any assignments in the source file will be overridden by assignments in **earlier** chunks. `local = TRUE` forces `source()` to evaluate in the correct environment, avoiding any problems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, knitr will evaluate code in a child of the global environment

Usually knitr evaluates code just in the global environment, instead of a child of the global environment. knitr::knit() and rmarkdown::render() have envir = parent.frame() by default, which points to globalenv() if you call them in the global environment.

so any assignments in the source file will be overridden by assignments in earlier chunks

I'm confused: did you mean earlier or later? If you have x <- 1 in the source file, after you source() it, the assignment will be overridden by x <- 2 in a later chunk. Similarly, if you have x <- 0 in an earlier chunk, source() will override it with x <- 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the following paragraph to imply that knitr usually evaluates in some special environment:

The default values for them may not be the appropriate environment: you may end up creating variables in the wrong environment, and being surprised that certain objects are not found in later code chunks.

If knitr usually evaluates in the global environment, then calling source() with no arguments will usually be safe.


Note that we used `include = FALSE`\index{chunk option!include} in the above example because we only want to execute the script without showing any output. If you do want output, you may remove this chunk option, or use the options in Section \@ref(hide-one) to selectively hide or show different types of output.

Expand Down