Skip to content

Commit

Permalink
Add draft slides and figures 20240229
Browse files Browse the repository at this point in the history
  • Loading branch information
damianooldoni committed Feb 28, 2024
1 parent 8c78651 commit dc2042c
Show file tree
Hide file tree
Showing 6 changed files with 136,660 additions and 0 deletions.
136,331 changes: 136,331 additions & 0 deletions data/20240229/20240229_rodentia.txt

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
329 changes: 329 additions & 0 deletions docs/sessions/20240229_become_a_ggplot_hero.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,329 @@
---
layout: presentation
title: Become a ggplot hero
---

class: center, middle

![:scale 30%]({{ site.baseurl}}/assets/images/coding_club_logo_1.png)

<!-- Do not forget to adapt the presentation title in the header! -->

<!-- Adjust the presentation to the session. Focus on the challenges,
this is not a coding tutorial.
Note, to include figures, store the image in the `/docs/assets/images/yyyymmdd/`
folder and use the jekyll base.url reference as done in this template
or see https://jekyllrb.com/docs/liquid/tags/#links.
using the scale attribute ![:scale 30%](...), you can adjust the image size.
-->

<!-- Adjust the day, month -->
# 29 FEBRUARY 2024

## INBO coding club

<!-- Adjust the room number and name -->
Herman Teirlinck Building

01.19 - Paul Van Ostaijen

---
class: left, top

## ROOMIE: room reservation

```
> if (isFALSE(roomie)) {
+ warning("Please confirm asap the room reservation on the roomie")
+ }
Warning message:
Please confirm asap the room reservation on the roomie
```

---
class: left, top

## Record the session

Kind reminder to... myself.

---
class: center, middle

<!-- Create a new badge using Inkscape or other programs based on the assets/images/coding_club_badges.svg file -->
![:scale 90%]({{ site.baseurl}}/assets/images/20240229/20240229_badge_ggplot.png)

---
class: center, top

![:scale 100%]({{ site.baseurl}}/assets/images/20240229/20240229_cheat_sheet_ggplot.png)
Download [cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20240229_cheat_sheet_ggplot2.pdf)

CC-BY RStudio. Download the cheat sheet in [English](https://rstudio.github.io/cheatsheets/data-visualization.pdf) or in [Dutch](https://rstudio.github.io/cheatsheets/translations/dutch/data-visualization_nl.pdf). Do you know that this cheat sheet is available as [html](https://rstudio.github.io/cheatsheets/html/data-visualization.html) format? The pdf is, aside Dutch and English, available in other 7 languages. See e.g. the [French](https://rstudio.github.io/cheatsheets/translations/french/data-visualization_fr.pdf) and the [Spanish](https://rstudio.github.io/cheatsheets/translations/spanish/data-visualization_es.pdf) versions.

---
class: left, middle

### How to get started?

Check the [Each session setup](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) to get started.

### First time coding club?

Check the [First time setup](https://inbo.github.io/coding-club/gettingstarted.html#first-time-setup) section to setup.

---
class: left, top

![:scale 100%]({{ site.baseurl}}/assets/images/coding_club_sticky_concept.png)

---
class: center, top

# Share your code during the coding session

<!-- Create a new hackmd file and replace this link (twice!) -->
Go to https://hackmd.io/OKAoEJMTQZWT_2Lvl5cEmQ?both and start by adding your name in section "Participants".

<iframe src="https://hackmd.io/OKAoEJMTQZWT_2Lvl5cEmQ?edit" height="400px" width="800px"></iframe>

---
class: left, top

# Download data and code

You can download the material of today:

- automatically via `inborutils::setup_codingclub_session()`*

- manually** from GitHub foders [coding-club/data/20240229](https://github.com/inbo/coding-club/tree/master/data/20240229) and [coding-club/src/20240229](https://github.com/inbo/coding-club/tree/master/src/20240229)

<br>
<small> __\* Note__: you can use the date in "YYYYMMDD" format to download the coding club material of a specific day, e.g. run `setup_codingclub_session("20230228")` to download the coding club material of February, 28 2023. If date is omitted, i.e. `setup_codingclub_session()`, the date of today is used. For all options, check the [tutorial online](https://inbo.github.io/tutorials/tutorials/r_setup_codingclub_session/).</small>
<br>
<small> __\*\* Note__: check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup)</small>

---
class: left, top

# Data and scripts description

Today we will work with rodents data:

- `20240229_rodentia.txt`: human observations of rodents in Belgium from 1980 to 2020.*

<small> __\* Note__: Extracted from GBIF.org (27 February 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.ngqknn</small>

---
class: left, middle

# ggplot recipe: data - mapping - geometry

![:scale 80%]({{ site.baseurl}}/assets/images/20240229/20240229_ggplot_recipe.png)

---
class: left, top

# Load libraries

```r
library(tidyverse)
```

---
background-image: url({{ site.baseurl}}/assets/images/background_challenge_1.png)
class: left, top

# Challenge 1

1. Make a bar chart with the number of observations of rodents in Belgium per `year`. Tip: check the difference between `geom_bar()` and `geom_col()`, e.g. via help `?geom_bar`.
2. Change x and y labels to "Year" and "number of observations" respectively.
3. Add the title "Evolution of rodents in Belgium" to the plot.
4. Show only data from 2000.
5. **Fill** all bars with color blue and set contour **color** to red.
6. Sometimes, it is better to show aggregated data to better deal with outliers and show trends. In this case, group years in bins of 5 years.

---
class: left, top

## Intermezzo: aesthetics


```
ggplot(data = rodentia, mapping = aes(x = year)) +
geom_bar()

```

This works too:

```
ggplot(data = rodentia) +
geom_bar(mapping = aes(x = year))
```

Both are good. Still, we have the feeling that the second version is used more often. The only case in which it is better to opt for the second option is when you plot two or more geometries with **different** aesthetics. For example, plot pressure and temperature vs time* :

```
# dummy dataset
press_temp <- tibble::tibble(
year = c(2000:2004),
press = c(1.0, 1.1, 1.4, 1.2, 1.6),
temp = c(13.2, 15.1, 12.2, 11.8, 10.9)
)
ggplot(press_temp) +
geom_point(mapping = aes(x = year, y = press)) +
geom_point(mapping = aes(x = year, y = temp), color = "red") +
scale_y_continuous(sec.axis = sec_axis(~., name = "temperature")) +
theme(axis.line.y.right = element_line(color = "red"),
axis.text.y.right = element_text(color = "red"),
axis.title.y.right = element_text(color = "red"))
```

<br>
<small> __\* Note__: such kind of plot is bad practice, by the way.

---
background-image: url({{ site.baseurl}}/assets/images/background_challenge_2.png)
class: left, top

# Challenge 2A

1. Make a bar chart plot similar to the ones in challenge 1.1 at genus level. How are the bars displayed per genus, by default?
2. Try other ways to "adjust" the bars. Tip: check the [positon adjustment section](https://ggplot2.tidyverse.org/reference/#position-adjustment) section in ggplot2 reference website.
3. How to split the bar chart in 1 in subplots based on family? Set x scale "free".

---
background-image: url({{ site.baseurl}}/assets/images/background_challenge_2.png)
class: left, top

# Challenge 2B

Bar charts or histograms are a good solution for displaying data where x is a discrete variable, in our case `year`. But we could actually think of year as a continuous variable. And so, points and lines could be a nice alternative.

After running the provided code:

1. Create a plot showing the number of observations of Ondatra zibeticus and Rattus norvegicus. Use both points and lines and distinguish the two species via color.
2. Typically, points are combined with a line showing the trend, calculated via a smoother function. ggplot helps us enormously to model and smooth our data. Try different smooth methods.


Question for the staticians in the room and at home: what is the best smoother function and parameters to use in this case?

---
class: left, top

## Intermezzo: customize non-data components - themes


You can personalize almost everything within your plots. ggplot2 provides some predefined themes via functions named `theme_*()`, e.g. `theme_bw()` for a white background with black lines. See more on cheat sheet.


```r
ggplot(data = rodentia, mapping = aes(x = year)) +
geom_bar() + theme_bw()
```

But you can also customize any non-data component via function `theme()`! Give a look to help `?theme`. Notice the help functions `element_*()`, e.g. `element_line()`, `element_text()` and `element_rect()` (rect stays for rectangle).

```r
ggplot(data = rodentia, mapping = aes(x = year)) +
geom_bar() +
# we love flashy plots!
theme(panel.background = element_rect(fill = "green"),
panel.grid.major = element_line(colour = "blue", linewidth = 2))
```

---
class: left, top

## Intermezzo: customize non-data components - themes

Do you know that INBO has its own theme? It's called [INBOtheme](https://inbo.github.io/INBOtheme/): it's a package. Install it, load it and all figures will have authomatically an INBO touch. Read the nice [tutorials](https://inbo.github.io/INBOtheme/articles/index.html) for more.

```r
library(INBOtheme)
ggplot(data = rodentia, mapping = aes(x = year)) +
geom_bar() +
ggtitle(label = "Temporal distribution of rodents observations in Belgium")
```

INBOtheme takes care about the quality of your plots: a warning is returned if more than 4 colors are used.

```
library(INBOtheme)
ggplot(data = rodentia, mapping = aes(x = year, fill = family)) +
geom_bar() +
ggtitle(label = "Temporal distribution of rodents observations in Belgium")
> using more than 4 colours might make the plot hard to read
```

About finding the most appropriate graph for our data: from [Data to Viz](https://www.data-to-viz.com/) can definitely help us.

---
background-image: url({{ site.baseurl}}/assets/images/background_challenge_3.png)
class: left, top

# Challenge 3

Let's use [R graph gallery](https://r-graph-gallery.com/) as source of inspiration to visualize data about rodents in Belgium from 1800 and 2000.

- Let's start showing the distribution of the observed `species` per `year`. Try to use INBO colors or viridis palette. Try for example [boxplots](https://r-graph-gallery.com/boxplot.html), violin plots, ridgeplots (you need to install [ggridges](https://wilkelab.org/ggridges/index.html) R package first).
- Are all these plots meaningful while showing the distribution per `month` instead of `year`?
- Piecharts are not done in scientific papers, BUT they are perfectly accetable in presentations. They can be the best choice even! In our case, a piechart is a good idea for showing that two genera have the absolute majority of observations of rodents collected in Belgium up to 2020. All the other genera should be put together as "Other". Try to make your best piechart ever :-)

For the staticians who love astonishing plots, check what you can do using [violin plots with ggstatsplot](https://r-graph-gallery.com/web-violinplot-with-ggstatsplot.html)*.

<br>
<small> __\* Note__: [ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot/) R package is an extension of ggplot2 package for creating graphics with details from statistical test.</small>
---
class: left, top

# The package of the month: Emma's choice

[ggpattern](https://coolbutuseless.github.io/package/ggpattern/index.html) can be definitely funny, but it can be also the cherry on top :-) Just think to add rodents pictures for each species on each bar.

![:scale 100%]({{ site.baseurl}}/assets/images/20240229/20240229_ggpattern.png)

---
class: left, top

## Resources

- Challenges solutions and video recording of the session will be available soon.
- `ggplot2` R package: https://ggplot2.tidyverse.org/
- The [R Graphics Cookbook, 2nd edition](https://r-graphics.org/): the reference book about graphics in R.
- R for Data Science: [Chapter 3: Data visualization](https://r4ds.had.co.nz/data-visualisation.html)
- Article of H. Wickham about the layered [grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.pdf)
- [Datacarpentry's data visualiation tutorial](https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html)
- [Stanford University tutorial](https://cengel.github.io/R-data-viz/data-visualization-with-ggplot2.html): chapter 1
- R for Data Science: [Chapter 28: Graphics for communication](https://r4ds.had.co.nz/graphics-for-communication.html)
- [INBOtheme](https://inbo.github.io/INBOtheme/) homepage
- [R Graph Gallery](https://r-graph-gallery.com/index.html): a collection of charts made with R
- Links to the ggplot extensions we mentioned today: [ggridges](https://wilkelab.org/ggridges/index.html), [ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot/).

---
class: left, top

# Topic of the next coding club: you vote

Yes, you can vote among **two topics**. The poll for March's coding club is open. Let us know your favorite topic before March 7!

https://forms.gle/Zg4uTCPZNZUZFQfH9

You can choose between:

- Beyond ggplot: ggplot extensions and other visualization packages
- Data wrangling with tidyverse (dplyr, tidyr, ...)


---
class: center, middle

![:scale 30%]({{ site.baseurl}}/assets/images/coding_club_logo_1.png)

<!-- Adjust the room and date -->
Room: 01.69 - Paul Janssen<br>
Date: __26/04/2024__, van 10:00 tot 12:30<br>
Subject: to be decided <br>
(registration announced via [email protected])

0 comments on commit dc2042c

Please sign in to comment.