This short tutorial will just give simple illustrative examples about how to melt/cast data sets with pivot_longer
and pivot_wider
functions from tidyr
package, part of tidyverse
. We use swiss
data set which comes with base R itself.
We start by loading the package and converting it to a tibble.
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
swt <- as_tibble(swiss,rownames="region")
glimpse(swt)
## Rows: 47
## Columns: 7
## $ region <chr> "Courtelary", "Delemont", "Franches-Mnt", "Moutier",…
## $ Fertility <dbl> 80.2, 83.1, 92.5, 85.8, 76.9, 76.1, 83.8, 92.4, 82.4…
## $ Agriculture <dbl> 17.0, 45.1, 39.7, 36.5, 43.5, 35.3, 70.2, 67.8, 53.3…
## $ Examination <int> 15, 6, 5, 12, 17, 9, 16, 14, 12, 16, 14, 21, 14, 19,…
## $ Education <int> 12, 9, 5, 7, 15, 7, 7, 8, 7, 13, 6, 12, 7, 12, 5, 2,…
## $ Catholic <dbl> 9.96, 84.84, 93.40, 33.77, 5.16, 90.57, 92.85, 97.16…
## $ Infant.Mortality <dbl> 22.2, 22.2, 20.2, 20.3, 20.6, 26.6, 23.6, 24.9, 21.0…
Final version of our raw data set is as follows.
print(swt)
## # A tibble: 47 x 7
## region Fertility Agriculture Examination Education Catholic Infant.Mortality
## <chr> <dbl> <dbl> <int> <int> <dbl> <dbl>
## 1 Courte… 80.2 17 15 12 9.96 22.2
## 2 Delemo… 83.1 45.1 6 9 84.8 22.2
## 3 Franch… 92.5 39.7 5 5 93.4 20.2
## 4 Moutier 85.8 36.5 12 7 33.8 20.3
## 5 Neuvev… 76.9 43.5 17 15 5.16 20.6
## 6 Porren… 76.1 35.3 9 7 90.6 26.6
## 7 Broye 83.8 70.2 16 7 92.8 23.6
## 8 Glane 92.4 67.8 14 8 97.2 24.9
## 9 Gruyere 82.4 53.3 12 7 97.7 21
## 10 Sarine 82.9 45.2 16 13 91.4 24.4
## # … with 37 more rows
Without pivots, if we wanted summary statistics of columns we would need to explicitly calculate those values.
swt %>% summarise(mean_fertility = mean(Fertility),min_fertility = min(Fertility),max_fertility = max(Fertility))
## # A tibble: 1 x 3
## mean_fertility min_fertility max_fertility
## <dbl> <dbl> <dbl>
## 1 70.1 35 92.5
Pivot longer “melts” your data set into long format. You may specify the columns to melt (or not to melt). In this example we melt our data set to gather all the numeric values under a single column and all the names under a single name column.
swt %>% pivot_longer(cols=c(-region))
## # A tibble: 282 x 3
## region name value
## <chr> <chr> <dbl>
## 1 Courtelary Fertility 80.2
## 2 Courtelary Agriculture 17
## 3 Courtelary Examination 15
## 4 Courtelary Education 12
## 5 Courtelary Catholic 9.96
## 6 Courtelary Infant.Mortality 22.2
## 7 Delemont Fertility 83.1
## 8 Delemont Agriculture 45.1
## 9 Delemont Examination 6
## 10 Delemont Education 9
## # … with 272 more rows
We can change name/value column names in the function using names_to
and values_to
arguments. Let’s get the min/mean/max values of all categories.
swt_summary_long <- swt %>% pivot_longer(cols=c(-region),names_to="category",values_to="perc_value") %>% group_by(category) %>% summarise(min_val = min(perc_value), mean_val = mean(perc_value), max_value = max(perc_value))
## `summarise()` ungrouping output (override with `.groups` argument)
swt_summary_long
## # A tibble: 6 x 4
## category min_val mean_val max_value
## <chr> <dbl> <dbl> <dbl>
## 1 Agriculture 1.2 50.7 89.7
## 2 Catholic 2.15 41.1 100
## 3 Education 1 11.0 53
## 4 Examination 3 16.5 37
## 5 Fertility 35 70.1 92.5
## 6 Infant.Mortality 10.8 19.9 26.6
Pivot wider converts your “melted” long format data frame to “cast” wide format. You need to specify which column will become the column names and which values will fill the cells. ID columns indicate the columns which will not be changed by the operation.
swt_summary_long %>% pivot_longer(cols=-category) %>% pivot_wider(id_cols = "name",names_from=category,values_from=value)
## # A tibble: 3 x 7
## name Agriculture Catholic Education Examination Fertility Infant.Mortality
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 min_val 1.2 2.15 1 3 35 10.8
## 2 mean_val 50.7 41.1 11.0 16.5 70.1 19.9
## 3 max_val… 89.7 100 53 37 92.5 26.6