2  In-class Exercise 1

Author

Canseza Avağ Erdurak

Published

Oct 21, 2022

Planes Data Analysis

First of all, I installed packages namely dplyr, nycflights13 as well as stringr.

I used sub function in stringr package to create a driven column, aircraft_type (ac_type).

library(dplyr)
library(nycflights13)
library(stringr)

options(dplyr.summarise.inform = FALSE)

I take a high-level look at planes data which has 3,222 rows and 9 columns. I also have information about what column names are, which data type they have and how their content is in a short notice.

planes %>% glimpse()
Rows: 3,322
Columns: 9
$ tailnum      <chr> "N10156", "N102UW", "N103US", "N104UW", "N10575", "N105UW…
$ year         <int> 2004, 1998, 1999, 1999, 2002, 1999, 1999, 1999, 1999, 199…
$ type         <chr> "Fixed wing multi engine", "Fixed wing multi engine", "Fi…
$ manufacturer <chr> "EMBRAER", "AIRBUS INDUSTRIE", "AIRBUS INDUSTRIE", "AIRBU…
$ model        <chr> "EMB-145XR", "A320-214", "A320-214", "A320-214", "EMB-145…
$ engines      <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ seats        <int> 55, 182, 182, 182, 55, 182, 182, 182, 182, 182, 55, 55, 5…
$ speed        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ engine       <chr> "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turb…

Create a Dataset

I created a dataset according to manufacturer column. I filtered planes which are manufactured by Airbus, Airbus Industrie and Boeing. I recoded AIRBUS INDUSTRIE as AIRBUS to have a clean dataset.

ab_df <- planes %>%
          select(tailnum:engine) %>%
          filter(manufacturer %in% c('AIRBUS', 'AIRBUS INDUSTRIE', 'BOEING')) %>%
          mutate(manufacturer = recode(manufacturer, 'AIRBUS INDUSTRIE' = 'AIRBUS'))

Top 5 Year by Manufacturers

Here is shown top 5 plane count, average number of seats and engines by manufacturer and year. Top 5 years by manufacturers are mostly overlaps. 5th elements in both Boeing and Airbus have the highest change compared to last years respectively.

ab_df %>%
  group_by(manufacturer, year) %>%
  summarise(count = n(), avg_seats = mean(seats), avg_engines = mean(engines)) %>%
  mutate(yoy_count = (count-lag(count))/lag(count)*100) %>%
  relocate(yoy_count, .after = count) %>%
  top_n(5, count) %>%
  arrange(manufacturer, desc(count)) %>%
  print(n = Inf)
# A tibble: 10 × 6
# Groups:   manufacturer [2]
   manufacturer  year count yoy_count avg_seats avg_engines
   <chr>        <int> <int>     <dbl>     <dbl>       <dbl>
 1 AIRBUS        2001    82      2.5       186            2
 2 AIRBUS        2000    80     35.6       182.           2
 3 AIRBUS        2002    60    -26.8       175.           2
 4 AIRBUS        1999    59     15.7       184.           2
 5 AIRBUS        1998    51    325         185.           2
 6 BOEING        2001   142      5.97      172.           2
 7 BOEING        2000   134      8.06      176.           2
 8 BOEING        1999   124     20.4       180.           2
 9 BOEING        1998   103    119.        182.           2
10 BOEING        2004    77    157.        148.           2

Aircraft Type Analysis

A new column is created named “ac_type” derived from planes’ model. I work in Pegasus currently. As far as I know there is no ac_type starts with “MD” in the fleet. So, I excluded them. As the seat capacity increases, plane count generally decreases independently of manufacturer.

ab_df %>%
  mutate(ac_type = sub("\\-.*", "", model)) %>%
  filter(ac_type != "MD") %>%
  group_by(manufacturer, ac_type) %>%
  summarise(count = n(), avg_seats = mean(seats), med_seats = median(seats), avg_engines = mean(engines)) %>%
  arrange(manufacturer, desc(count))
# A tibble: 12 × 6
# Groups:   manufacturer [2]
   manufacturer ac_type count avg_seats med_seats avg_engines
   <chr>        <chr>   <int>     <dbl>     <dbl>       <dbl>
 1 AIRBUS       A320      415      193.       200        2   
 2 AIRBUS       A319      208      164.       179        2   
 3 AIRBUS       A321       94      297.       379        2   
 4 AIRBUS       A330       18      377.       377        2.11
 5 AIRBUS       A340        1      375        375        4   
 6 BOEING       737      1037      153.       149        2   
 7 BOEING       757       308      186.       178        2   
 8 BOEING       767       170      315.       330        2   
 9 BOEING       717        88      100        100        2   
10 BOEING       777        12      400        400        2   
11 BOEING       787         4      260        260        2   
12 BOEING       747         1      450        450        4   

Seat Count Group Analysis

I created 8 bins sized 50. Fleet consists of planes which have seats between 150 and 200 mostly.

ab_df %>%
  group_by(seat_group = cut(seats, c(0,50,100,150,200,250,300,350,400,450,500), include.lowest = TRUE)) %>%
  summarise(count = n())
# A tibble: 8 × 2
  seat_group count
  <fct>      <int>
1 (50,100]     102
2 (100,150]    934
3 (150,200]   1035
4 (200,250]     13
5 (250,300]     85
6 (300,350]    114
7 (350,400]     82
8 (400,450]      1

Conclusion

Consequently, I analysed data based on year, aircraft type and seat counts.

  1. Top 5 plane count over years doesn’t change much according to manufacturer. Airbus and Boeing have gone hand-in-hand in manufacture-wise over years.

  2. There is a negative relationship between seat capacity and plane count. As the seat capacity increases, plane count decreases and engine count increases. Since maintenance costs are high in aircraft industry, companies may not want to prefer planes that have 4 engines. Therefore, production amount in high-engine sized planes are lower.

  3. Fleet mostly consists of planes which have a seat capacity between 150 and 200. This analysis is aligned with the 2nd one.

Enjoy reading ! :)