-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcaching.qmd
102 lines (66 loc) · 7.47 KB
/
caching.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: "Caching"
bibliography: references.bib
---
This page will discuss the concept of caching and how it is used in {renv}.
<!-- - briefly describe caching and the fact that there are shared libraries -->
<!-- - discuss how {renv} uses caching -->
<!-- - shared {renv} library for packages on local system -->
<!-- - installation of packages, and checks for packages first in the shared library -->
<!-- - "symlinks" to the shared library in `renv/library` -->
# What is Caching?
In the context of software development, caching is used to store data that is frequently accessed, such as packages, to speed up the execution of a program. When a program needs to access a package, it first checks the cache to see if the package is already stored there. If the package is found in the cache, the program can retrieve it quickly without having to download it again. This can significantly reduce the time it takes to run a program, especially if the package is large or if the program is run frequently.
# Caching in {renv}
In the context of {renv}, the package cache is a shared library that contains the packages downloaded for and then used in your projects. The cache will, when needed, contain multiple different versions of the same package and your project will link to the correct version, only downloading the version specified in the `renv.lock` if you don't already have it somewhere in the renv cache. This shared library is a huge space saver, especially if you have many projects using the same packages.
A cache is built per each minor version of R you use. For example, if you have used {renv} with R versions 4.3 and 4.4 on your computer, then you will end up with a cache matching each of these R versions. This is an important detail to note for the [Restoring Projects](restoring_a_project.html) section of this tutorial, as you may need to rebuild the cache if you upgrade your R version.
## Cache Locations
Although you will likely never need to interact with the cache directly, it can be useful to know where it is located on your system in case you need to troubleshoot any issues with {renv}. Moreover, seeing the actual cache locations can help make the concept of caching more tangible.
The {renv} caches for R packages will be nested in folders below one of the following locations, based on your operating system [@posit2024]:
> - Linux: `~/.cache/R/renv/cache`
>
> - macOS: `~/Library/Caches/org.R-project.R/R/renv/cache`
>
> - Windows: `%LOCALAPPDATA%/renv/cache`
Within each of these cache folders, you should see sub-folders for each version of R that you have used with {renv}. For example, on a macOS system, you might see the following folders:
``` r
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5", full.names = T)
#> [1] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/macos"
#> [2] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.2"
#> [3] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3"
#> [4] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4"
```
The {renv} package also provides a function to access the exact path to the cache used in your current project. This cache location will be slightly more specific than the paths listed above because it is a reference to the **one** cache specific to the version of R used in your project. You can access the path to the cache with the following code:
``` r
# Run on a MacOS, and <USER> removed.
renv::paths$cache()
#> [1] "/Users/<USER>/Library/Caches/org.R-project.R/R/renv/cache/v5/macos/R-4.4/aarch64-apple-darwin20"
```
### Packages in the Cache
Each of the caches will contain the packages that you downloaded for use in your projects as you were using that version of R. So for example, if you downloaded version 1.0.9 of {dplyr} while using R 4.3, you would find the package in the `R-4.3` folder of the cache. If you then started a project using R 4.4 and downloaded version 1.1.2 of {dplyr}, you would find that version in the `R-4.4` folder of the cache. Moreover, if another project using R 4.4 also needed version 1.1.4 of {dplyr}, it would be found in the `R-4.4` folder of the cache.
This text might be an over-explanation of the concept so I'll demonstrate it with a code chunk below that lists out the versions of {dplyr} in the cache.
``` r
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3/aarch64-apple-darwin20/dplyr")
#> [1] "1.1.2" "1.1.4"
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4/aarch64-apple-darwin20/dplyr")
#> [1] "1.1.2" "1.1.4"
```
So it seems that I have versions 1.1.2 and 1.1.4 of {dplyr} in the caches for both R 4.3 and R 4.4. However, this is merely a coincidence; it won't always be the case that the caches have the exact same versions of packages. Let's look at an example where the versions differ.
``` r
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3/aarch64-apple-darwin20/ggplot2")
#> [1] "3.4.3" "3.4.4" "3.5.0" "3.5.1"
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4/aarch64-apple-darwin20/ggplot2")
#> [1] "3.5.0" "3.5.1"
```
In this case, I have versions 3.4.3, 3.4.4, 3.5.0, and 3.5.1 of {ggplot2} in the cache for R 4.3, but only versions 3.5.0 and 3.5.1 in the cache for R 4.4 which is an expected outcome. I used several older version of {ggplot2} with an older version of R and then upgraded to a newer version of R and only used the newer versions of {ggplot2}!
## Project Library
The discussion of caching so far has covered just the shared libraries that {renv} uses to store packages. But how does {renv} use these caches, how does this relate back to your project libraries, and what is the role of the `renv/library` folder in your projects?
Each {renv} project has its own library, located in the `renv/library` folder of the project. When you install a package in a project with {renv}, however, the package is not *technically* installed in the `renv/library` library. In fact, none of the packages used in your project are *actually* stored in this folder. Instead, the contents of these folders are "symlinks" to the packages in the shared library.
### Symlinks
A symlink or "symbolic link" is a file that points to another file or directory. It is a reference to the original file or directory, and it can be used to access the original file or directory from a different location. Symlinks are important because they allow you to create shortcuts to files or directories, which can be useful for organizing files, accessing files from different locations, or creating symbolic links to files or directories that are located on different drives or partitions.
In the context of {renv}, the apparent packages in the `renv/library` folder of your project are actually symlinks that point to the packages in the cache!
# Summary and Key Points
This chapter contained quite a bit of technical detail and may have been a bit overwhelming. Here are the key points to remember:
1. Caching is used to store data that is frequently accessed, such as packages, to speed up the execution of a program.
2. In the context of {renv}, the package cache is a shared library that contains the packages downloaded for and then used in your projects.
3. The cache is built per each minor version of R you use, and the cache locations are specific to your operating system. **You will almost never need to directly access these cache locations.**
4. Each project has its own library, located in the `renv/library` folder of the project, but the packages in this folder are actually symlinks to the packages in the cache.