This short tutorial will just give simple illustrative examples about how to melt/cast data sets with pivot_longer and pivot_wider functions from tidyr package, part of tidyverse. We use swiss data set which comes with base R itself.

We start by loading the package and converting it to a tibble.

library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
swt <- as_tibble(swiss,rownames="region")
glimpse(swt)
## Rows: 47
## Columns: 7
## $ region           <chr> "Courtelary", "Delemont", "Franches-Mnt", "Moutier",…
## $ Fertility        <dbl> 80.2, 83.1, 92.5, 85.8, 76.9, 76.1, 83.8, 92.4, 82.4…
## $ Agriculture      <dbl> 17.0, 45.1, 39.7, 36.5, 43.5, 35.3, 70.2, 67.8, 53.3…
## $ Examination      <int> 15, 6, 5, 12, 17, 9, 16, 14, 12, 16, 14, 21, 14, 19,…
## $ Education        <int> 12, 9, 5, 7, 15, 7, 7, 8, 7, 13, 6, 12, 7, 12, 5, 2,…
## $ Catholic         <dbl> 9.96, 84.84, 93.40, 33.77, 5.16, 90.57, 92.85, 97.16…
## $ Infant.Mortality <dbl> 22.2, 22.2, 20.2, 20.3, 20.6, 26.6, 23.6, 24.9, 21.0…

Final version of our raw data set is as follows.

print(swt)
## # A tibble: 47 x 7
##    region  Fertility Agriculture Examination Education Catholic Infant.Mortality
##    <chr>       <dbl>       <dbl>       <int>     <int>    <dbl>            <dbl>
##  1 Courte…      80.2        17            15        12     9.96             22.2
##  2 Delemo…      83.1        45.1           6         9    84.8              22.2
##  3 Franch…      92.5        39.7           5         5    93.4              20.2
##  4 Moutier      85.8        36.5          12         7    33.8              20.3
##  5 Neuvev…      76.9        43.5          17        15     5.16             20.6
##  6 Porren…      76.1        35.3           9         7    90.6              26.6
##  7 Broye        83.8        70.2          16         7    92.8              23.6
##  8 Glane        92.4        67.8          14         8    97.2              24.9
##  9 Gruyere      82.4        53.3          12         7    97.7              21  
## 10 Sarine       82.9        45.2          16        13    91.4              24.4
## # … with 37 more rows

Without pivots, if we wanted summary statistics of columns we would need to explicitly calculate those values.

swt %>% summarise(mean_fertility = mean(Fertility),min_fertility = min(Fertility),max_fertility = max(Fertility))
## # A tibble: 1 x 3
##   mean_fertility min_fertility max_fertility
##            <dbl>         <dbl>         <dbl>
## 1           70.1            35          92.5

Pivot Longer

Pivot longer “melts” your data set into long format. You may specify the columns to melt (or not to melt). In this example we melt our data set to gather all the numeric values under a single column and all the names under a single name column.

swt %>% pivot_longer(cols=c(-region))
## # A tibble: 282 x 3
##    region     name             value
##    <chr>      <chr>            <dbl>
##  1 Courtelary Fertility        80.2 
##  2 Courtelary Agriculture      17   
##  3 Courtelary Examination      15   
##  4 Courtelary Education        12   
##  5 Courtelary Catholic          9.96
##  6 Courtelary Infant.Mortality 22.2 
##  7 Delemont   Fertility        83.1 
##  8 Delemont   Agriculture      45.1 
##  9 Delemont   Examination       6   
## 10 Delemont   Education         9   
## # … with 272 more rows

We can change name/value column names in the function using names_to and values_to arguments. Let’s get the min/mean/max values of all categories.

swt_summary_long <- swt %>% pivot_longer(cols=c(-region),names_to="category",values_to="perc_value") %>% group_by(category) %>% summarise(min_val = min(perc_value), mean_val = mean(perc_value), max_value = max(perc_value))
## `summarise()` ungrouping output (override with `.groups` argument)
swt_summary_long
## # A tibble: 6 x 4
##   category         min_val mean_val max_value
##   <chr>              <dbl>    <dbl>     <dbl>
## 1 Agriculture         1.2      50.7      89.7
## 2 Catholic            2.15     41.1     100  
## 3 Education           1        11.0      53  
## 4 Examination         3        16.5      37  
## 5 Fertility          35        70.1      92.5
## 6 Infant.Mortality   10.8      19.9      26.6

Pivot Wider

Pivot wider converts your “melted” long format data frame to “cast” wide format. You need to specify which column will become the column names and which values will fill the cells. ID columns indicate the columns which will not be changed by the operation.

swt_summary_long %>% pivot_longer(cols=-category) %>% pivot_wider(id_cols = "name",names_from=category,values_from=value) 
## # A tibble: 3 x 7
##   name     Agriculture Catholic Education Examination Fertility Infant.Mortality
##   <chr>          <dbl>    <dbl>     <dbl>       <dbl>     <dbl>            <dbl>
## 1 min_val          1.2     2.15       1           3        35               10.8
## 2 mean_val        50.7    41.1       11.0        16.5      70.1             19.9
## 3 max_val…        89.7   100         53          37        92.5             26.6