So now you’ve got a handle on Quarto, what are some of the other things to think about learning? Here are some of my recommendations.
16.1 Learn how to use git and github
git is a version control system. Not sure what a version control system is? No worries, let me explain. If you’ve ever named a document something like:
When you run into a problem, or an error, if you can’t work out the answer after some tinkering about, it can be worthwhile spending some time to construct a small example of the code that breaks. This takes a bit of time, and could be its own little blog post. It takes practice. But in the process of reducing the problem down to its core components, I often can solve the problem myself. It’s kind of like that experience of when you talk to someone to try and describe a problem that you are working on, and in talking about it, you arrive at a solution.
There is a great R package that helps you create these reproducible examples, called reprex, by Jenny Bryan. I’ve written about the reprex package here
For the purposes of illustration, let’s briefly tear down a small example using the somewhat large dataset of diamonds
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
diamonds
# A tibble: 53,940 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
# ℹ 53,930 more rows
Let’s say we had a few steps involved in the data summary of diamonds data:
Warning: There were 5 warnings in `summarise()`.
The first warning was:
ℹ In argument: `mean_color = mean(color)`.
ℹ In group 1: `cut = Fair`.
Caused by warning in `mean.default()`:
! argument is not numeric or logical: returning NA
ℹ Run `dplyr::last_dplyr_warnings()` to see the 4 remaining warnings.
# A tibble: 5 × 4
cut price_mean price_sd mean_color
<ord> <dbl> <dbl> <dbl>
1 Fair 3767. 1540. NA
2 Good 3860. 1830. NA
3 Very Good 4014. 2037. NA
4 Premium 4223. 2035. NA
5 Ideal 3920. 2043. NA
We get a clue that the error is in the line mean_color, so let’s just try and do that line:
diamonds %>%mutate(mean_color =mean(color) )
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `mean_color = mean(color)`.
Caused by warning in `mean.default()`:
! argument is not numeric or logical: returning NA
# A tibble: 53,940 × 11
carat cut color clarity depth table price x y z mean_color
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 NA
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 NA
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 NA
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 NA
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 NA
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 NA
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 NA
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 NA
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 NA
10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 NA
# ℹ 53,930 more rows
We still get that error, so what if we just do
mean(diamonds$color)
Warning in mean.default(diamonds$color): argument is not numeric or logical:
returning NA
[1] NA
OK same error. What is in color?
head(diamonds$color)
[1] E E E I J J
Levels: D < E < F < G < H < I < J
Does it really make sense to take the mean of some letters? Ah, of course not!