-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add draft slides and figures 20240229
- Loading branch information
1 parent
8c78651
commit dc2042c
Showing
6 changed files
with
136,660 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,329 @@ | ||
--- | ||
layout: presentation | ||
title: Become a ggplot hero | ||
--- | ||
|
||
class: center, middle | ||
|
||
![:scale 30%]({{ site.baseurl}}/assets/images/coding_club_logo_1.png) | ||
|
||
<!-- Do not forget to adapt the presentation title in the header! --> | ||
|
||
<!-- Adjust the presentation to the session. Focus on the challenges, | ||
this is not a coding tutorial. | ||
Note, to include figures, store the image in the `/docs/assets/images/yyyymmdd/` | ||
folder and use the jekyll base.url reference as done in this template | ||
or see https://jekyllrb.com/docs/liquid/tags/#links. | ||
using the scale attribute ![:scale 30%](...), you can adjust the image size. | ||
--> | ||
|
||
<!-- Adjust the day, month --> | ||
# 29 FEBRUARY 2024 | ||
|
||
## INBO coding club | ||
|
||
<!-- Adjust the room number and name --> | ||
Herman Teirlinck Building | ||
|
||
01.19 - Paul Van Ostaijen | ||
|
||
--- | ||
class: left, top | ||
|
||
## ROOMIE: room reservation | ||
|
||
``` | ||
> if (isFALSE(roomie)) { | ||
+ warning("Please confirm asap the room reservation on the roomie") | ||
+ } | ||
Warning message: | ||
Please confirm asap the room reservation on the roomie | ||
``` | ||
|
||
--- | ||
class: left, top | ||
|
||
## Record the session | ||
|
||
Kind reminder to... myself. | ||
|
||
--- | ||
class: center, middle | ||
|
||
<!-- Create a new badge using Inkscape or other programs based on the assets/images/coding_club_badges.svg file --> | ||
![:scale 90%]({{ site.baseurl}}/assets/images/20240229/20240229_badge_ggplot.png) | ||
|
||
--- | ||
class: center, top | ||
|
||
![:scale 100%]({{ site.baseurl}}/assets/images/20240229/20240229_cheat_sheet_ggplot.png) | ||
Download [cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20240229_cheat_sheet_ggplot2.pdf) | ||
|
||
CC-BY RStudio. Download the cheat sheet in [English](https://rstudio.github.io/cheatsheets/data-visualization.pdf) or in [Dutch](https://rstudio.github.io/cheatsheets/translations/dutch/data-visualization_nl.pdf). Do you know that this cheat sheet is available as [html](https://rstudio.github.io/cheatsheets/html/data-visualization.html) format? The pdf is, aside Dutch and English, available in other 7 languages. See e.g. the [French](https://rstudio.github.io/cheatsheets/translations/french/data-visualization_fr.pdf) and the [Spanish](https://rstudio.github.io/cheatsheets/translations/spanish/data-visualization_es.pdf) versions. | ||
|
||
--- | ||
class: left, middle | ||
|
||
### How to get started? | ||
|
||
Check the [Each session setup](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) to get started. | ||
|
||
### First time coding club? | ||
|
||
Check the [First time setup](https://inbo.github.io/coding-club/gettingstarted.html#first-time-setup) section to setup. | ||
|
||
--- | ||
class: left, top | ||
|
||
![:scale 100%]({{ site.baseurl}}/assets/images/coding_club_sticky_concept.png) | ||
|
||
--- | ||
class: center, top | ||
|
||
# Share your code during the coding session | ||
|
||
<!-- Create a new hackmd file and replace this link (twice!) --> | ||
Go to https://hackmd.io/OKAoEJMTQZWT_2Lvl5cEmQ?both and start by adding your name in section "Participants". | ||
|
||
<iframe src="https://hackmd.io/OKAoEJMTQZWT_2Lvl5cEmQ?edit" height="400px" width="800px"></iframe> | ||
|
||
--- | ||
class: left, top | ||
|
||
# Download data and code | ||
|
||
You can download the material of today: | ||
|
||
- automatically via `inborutils::setup_codingclub_session()`* | ||
|
||
- manually** from GitHub foders [coding-club/data/20240229](https://github.com/inbo/coding-club/tree/master/data/20240229) and [coding-club/src/20240229](https://github.com/inbo/coding-club/tree/master/src/20240229) | ||
|
||
<br> | ||
<small> __\* Note__: you can use the date in "YYYYMMDD" format to download the coding club material of a specific day, e.g. run `setup_codingclub_session("20230228")` to download the coding club material of February, 28 2023. If date is omitted, i.e. `setup_codingclub_session()`, the date of today is used. For all options, check the [tutorial online](https://inbo.github.io/tutorials/tutorials/r_setup_codingclub_session/).</small> | ||
<br> | ||
<small> __\*\* Note__: check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup)</small> | ||
|
||
--- | ||
class: left, top | ||
|
||
# Data and scripts description | ||
|
||
Today we will work with rodents data: | ||
|
||
- `20240229_rodentia.txt`: human observations of rodents in Belgium from 1980 to 2020.* | ||
|
||
<small> __\* Note__: Extracted from GBIF.org (27 February 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.ngqknn</small> | ||
|
||
--- | ||
class: left, middle | ||
|
||
# ggplot recipe: data - mapping - geometry | ||
|
||
![:scale 80%]({{ site.baseurl}}/assets/images/20240229/20240229_ggplot_recipe.png) | ||
|
||
--- | ||
class: left, top | ||
|
||
# Load libraries | ||
|
||
```r | ||
library(tidyverse) | ||
``` | ||
|
||
--- | ||
background-image: url({{ site.baseurl}}/assets/images/background_challenge_1.png) | ||
class: left, top | ||
|
||
# Challenge 1 | ||
|
||
1. Make a bar chart with the number of observations of rodents in Belgium per `year`. Tip: check the difference between `geom_bar()` and `geom_col()`, e.g. via help `?geom_bar`. | ||
2. Change x and y labels to "Year" and "number of observations" respectively. | ||
3. Add the title "Evolution of rodents in Belgium" to the plot. | ||
4. Show only data from 2000. | ||
5. **Fill** all bars with color blue and set contour **color** to red. | ||
6. Sometimes, it is better to show aggregated data to better deal with outliers and show trends. In this case, group years in bins of 5 years. | ||
|
||
--- | ||
class: left, top | ||
|
||
## Intermezzo: aesthetics | ||
|
||
|
||
``` | ||
ggplot(data = rodentia, mapping = aes(x = year)) + | ||
geom_bar() | ||
|
||
``` | ||
|
||
This works too: | ||
|
||
``` | ||
ggplot(data = rodentia) + | ||
geom_bar(mapping = aes(x = year)) | ||
``` | ||
|
||
Both are good. Still, we have the feeling that the second version is used more often. The only case in which it is better to opt for the second option is when you plot two or more geometries with **different** aesthetics. For example, plot pressure and temperature vs time* : | ||
|
||
``` | ||
# dummy dataset | ||
press_temp <- tibble::tibble( | ||
year = c(2000:2004), | ||
press = c(1.0, 1.1, 1.4, 1.2, 1.6), | ||
temp = c(13.2, 15.1, 12.2, 11.8, 10.9) | ||
) | ||
ggplot(press_temp) + | ||
geom_point(mapping = aes(x = year, y = press)) + | ||
geom_point(mapping = aes(x = year, y = temp), color = "red") + | ||
scale_y_continuous(sec.axis = sec_axis(~., name = "temperature")) + | ||
theme(axis.line.y.right = element_line(color = "red"), | ||
axis.text.y.right = element_text(color = "red"), | ||
axis.title.y.right = element_text(color = "red")) | ||
``` | ||
|
||
<br> | ||
<small> __\* Note__: such kind of plot is bad practice, by the way. | ||
|
||
--- | ||
background-image: url({{ site.baseurl}}/assets/images/background_challenge_2.png) | ||
class: left, top | ||
|
||
# Challenge 2A | ||
|
||
1. Make a bar chart plot similar to the ones in challenge 1.1 at genus level. How are the bars displayed per genus, by default? | ||
2. Try other ways to "adjust" the bars. Tip: check the [positon adjustment section](https://ggplot2.tidyverse.org/reference/#position-adjustment) section in ggplot2 reference website. | ||
3. How to split the bar chart in 1 in subplots based on family? Set x scale "free". | ||
|
||
--- | ||
background-image: url({{ site.baseurl}}/assets/images/background_challenge_2.png) | ||
class: left, top | ||
|
||
# Challenge 2B | ||
|
||
Bar charts or histograms are a good solution for displaying data where x is a discrete variable, in our case `year`. But we could actually think of year as a continuous variable. And so, points and lines could be a nice alternative. | ||
|
||
After running the provided code: | ||
|
||
1. Create a plot showing the number of observations of Ondatra zibeticus and Rattus norvegicus. Use both points and lines and distinguish the two species via color. | ||
2. Typically, points are combined with a line showing the trend, calculated via a smoother function. ggplot helps us enormously to model and smooth our data. Try different smooth methods. | ||
|
||
|
||
Question for the staticians in the room and at home: what is the best smoother function and parameters to use in this case? | ||
|
||
--- | ||
class: left, top | ||
|
||
## Intermezzo: customize non-data components - themes | ||
|
||
|
||
You can personalize almost everything within your plots. ggplot2 provides some predefined themes via functions named `theme_*()`, e.g. `theme_bw()` for a white background with black lines. See more on cheat sheet. | ||
|
||
|
||
```r | ||
ggplot(data = rodentia, mapping = aes(x = year)) + | ||
geom_bar() + theme_bw() | ||
``` | ||
|
||
But you can also customize any non-data component via function `theme()`! Give a look to help `?theme`. Notice the help functions `element_*()`, e.g. `element_line()`, `element_text()` and `element_rect()` (rect stays for rectangle). | ||
|
||
```r | ||
ggplot(data = rodentia, mapping = aes(x = year)) + | ||
geom_bar() + | ||
# we love flashy plots! | ||
theme(panel.background = element_rect(fill = "green"), | ||
panel.grid.major = element_line(colour = "blue", linewidth = 2)) | ||
``` | ||
|
||
--- | ||
class: left, top | ||
|
||
## Intermezzo: customize non-data components - themes | ||
|
||
Do you know that INBO has its own theme? It's called [INBOtheme](https://inbo.github.io/INBOtheme/): it's a package. Install it, load it and all figures will have authomatically an INBO touch. Read the nice [tutorials](https://inbo.github.io/INBOtheme/articles/index.html) for more. | ||
|
||
```r | ||
library(INBOtheme) | ||
ggplot(data = rodentia, mapping = aes(x = year)) + | ||
geom_bar() + | ||
ggtitle(label = "Temporal distribution of rodents observations in Belgium") | ||
``` | ||
|
||
INBOtheme takes care about the quality of your plots: a warning is returned if more than 4 colors are used. | ||
|
||
``` | ||
library(INBOtheme) | ||
ggplot(data = rodentia, mapping = aes(x = year, fill = family)) + | ||
geom_bar() + | ||
ggtitle(label = "Temporal distribution of rodents observations in Belgium") | ||
> using more than 4 colours might make the plot hard to read | ||
``` | ||
|
||
About finding the most appropriate graph for our data: from [Data to Viz](https://www.data-to-viz.com/) can definitely help us. | ||
|
||
--- | ||
background-image: url({{ site.baseurl}}/assets/images/background_challenge_3.png) | ||
class: left, top | ||
|
||
# Challenge 3 | ||
|
||
Let's use [R graph gallery](https://r-graph-gallery.com/) as source of inspiration to visualize data about rodents in Belgium from 1800 and 2000. | ||
|
||
- Let's start showing the distribution of the observed `species` per `year`. Try to use INBO colors or viridis palette. Try for example [boxplots](https://r-graph-gallery.com/boxplot.html), violin plots, ridgeplots (you need to install [ggridges](https://wilkelab.org/ggridges/index.html) R package first). | ||
- Are all these plots meaningful while showing the distribution per `month` instead of `year`? | ||
- Piecharts are not done in scientific papers, BUT they are perfectly accetable in presentations. They can be the best choice even! In our case, a piechart is a good idea for showing that two genera have the absolute majority of observations of rodents collected in Belgium up to 2020. All the other genera should be put together as "Other". Try to make your best piechart ever :-) | ||
|
||
For the staticians who love astonishing plots, check what you can do using [violin plots with ggstatsplot](https://r-graph-gallery.com/web-violinplot-with-ggstatsplot.html)*. | ||
|
||
<br> | ||
<small> __\* Note__: [ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot/) R package is an extension of ggplot2 package for creating graphics with details from statistical test.</small> | ||
--- | ||
class: left, top | ||
|
||
# The package of the month: Emma's choice | ||
|
||
[ggpattern](https://coolbutuseless.github.io/package/ggpattern/index.html) can be definitely funny, but it can be also the cherry on top :-) Just think to add rodents pictures for each species on each bar. | ||
|
||
![:scale 100%]({{ site.baseurl}}/assets/images/20240229/20240229_ggpattern.png) | ||
|
||
--- | ||
class: left, top | ||
|
||
## Resources | ||
|
||
- Challenges solutions and video recording of the session will be available soon. | ||
- `ggplot2` R package: https://ggplot2.tidyverse.org/ | ||
- The [R Graphics Cookbook, 2nd edition](https://r-graphics.org/): the reference book about graphics in R. | ||
- R for Data Science: [Chapter 3: Data visualization](https://r4ds.had.co.nz/data-visualisation.html) | ||
- Article of H. Wickham about the layered [grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.pdf) | ||
- [Datacarpentry's data visualiation tutorial](https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html) | ||
- [Stanford University tutorial](https://cengel.github.io/R-data-viz/data-visualization-with-ggplot2.html): chapter 1 | ||
- R for Data Science: [Chapter 28: Graphics for communication](https://r4ds.had.co.nz/graphics-for-communication.html) | ||
- [INBOtheme](https://inbo.github.io/INBOtheme/) homepage | ||
- [R Graph Gallery](https://r-graph-gallery.com/index.html): a collection of charts made with R | ||
- Links to the ggplot extensions we mentioned today: [ggridges](https://wilkelab.org/ggridges/index.html), [ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot/). | ||
|
||
--- | ||
class: left, top | ||
|
||
# Topic of the next coding club: you vote | ||
|
||
Yes, you can vote among **two topics**. The poll for March's coding club is open. Let us know your favorite topic before March 7! | ||
|
||
https://forms.gle/Zg4uTCPZNZUZFQfH9 | ||
|
||
You can choose between: | ||
|
||
- Beyond ggplot: ggplot extensions and other visualization packages | ||
- Data wrangling with tidyverse (dplyr, tidyr, ...) | ||
|
||
|
||
--- | ||
class: center, middle | ||
|
||
![:scale 30%]({{ site.baseurl}}/assets/images/coding_club_logo_1.png) | ||
|
||
<!-- Adjust the room and date --> | ||
Room: 01.69 - Paul Janssen<br> | ||
Date: __26/04/2024__, van 10:00 tot 12:30<br> | ||
Subject: to be decided <br> | ||
(registration announced via [email protected]) |