library(dplyr)
library(nycflights13)
library(stringr)
options(dplyr.summarise.inform = FALSE)
2 In-class Exercise 1
Planes Data Analysis
First of all, I installed packages namely dplyr, nycflights13 as well as stringr.
I used sub function in stringr package to create a driven column, aircraft_type (ac_type).
I take a high-level look at planes data which has 3,222 rows and 9 columns. I also have information about what column names are, which data type they have and how their content is in a short notice.
%>% glimpse() planes
Rows: 3,322
Columns: 9
$ tailnum <chr> "N10156", "N102UW", "N103US", "N104UW", "N10575", "N105UW…
$ year <int> 2004, 1998, 1999, 1999, 2002, 1999, 1999, 1999, 1999, 199…
$ type <chr> "Fixed wing multi engine", "Fixed wing multi engine", "Fi…
$ manufacturer <chr> "EMBRAER", "AIRBUS INDUSTRIE", "AIRBUS INDUSTRIE", "AIRBU…
$ model <chr> "EMB-145XR", "A320-214", "A320-214", "A320-214", "EMB-145…
$ engines <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ seats <int> 55, 182, 182, 182, 55, 182, 182, 182, 182, 182, 55, 55, 5…
$ speed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ engine <chr> "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turb…
Create a Dataset
I created a dataset according to manufacturer column. I filtered planes which are manufactured by Airbus, Airbus Industrie and Boeing. I recoded AIRBUS INDUSTRIE as AIRBUS to have a clean dataset.
<- planes %>%
ab_df select(tailnum:engine) %>%
filter(manufacturer %in% c('AIRBUS', 'AIRBUS INDUSTRIE', 'BOEING')) %>%
mutate(manufacturer = recode(manufacturer, 'AIRBUS INDUSTRIE' = 'AIRBUS'))
Top 5 Year by Manufacturers
Here is shown top 5 plane count, average number of seats and engines by manufacturer and year. Top 5 years by manufacturers are mostly overlaps. 5th elements in both Boeing and Airbus have the highest change compared to last years respectively.
%>%
ab_df group_by(manufacturer, year) %>%
summarise(count = n(), avg_seats = mean(seats), avg_engines = mean(engines)) %>%
mutate(yoy_count = (count-lag(count))/lag(count)*100) %>%
relocate(yoy_count, .after = count) %>%
top_n(5, count) %>%
arrange(manufacturer, desc(count)) %>%
print(n = Inf)
# A tibble: 10 × 6
# Groups: manufacturer [2]
manufacturer year count yoy_count avg_seats avg_engines
<chr> <int> <int> <dbl> <dbl> <dbl>
1 AIRBUS 2001 82 2.5 186 2
2 AIRBUS 2000 80 35.6 182. 2
3 AIRBUS 2002 60 -26.8 175. 2
4 AIRBUS 1999 59 15.7 184. 2
5 AIRBUS 1998 51 325 185. 2
6 BOEING 2001 142 5.97 172. 2
7 BOEING 2000 134 8.06 176. 2
8 BOEING 1999 124 20.4 180. 2
9 BOEING 1998 103 119. 182. 2
10 BOEING 2004 77 157. 148. 2
Aircraft Type Analysis
A new column is created named “ac_type” derived from planes’ model. I work in Pegasus currently. As far as I know there is no ac_type starts with “MD” in the fleet. So, I excluded them. As the seat capacity increases, plane count generally decreases independently of manufacturer.
%>%
ab_df mutate(ac_type = sub("\\-.*", "", model)) %>%
filter(ac_type != "MD") %>%
group_by(manufacturer, ac_type) %>%
summarise(count = n(), avg_seats = mean(seats), med_seats = median(seats), avg_engines = mean(engines)) %>%
arrange(manufacturer, desc(count))
# A tibble: 12 × 6
# Groups: manufacturer [2]
manufacturer ac_type count avg_seats med_seats avg_engines
<chr> <chr> <int> <dbl> <dbl> <dbl>
1 AIRBUS A320 415 193. 200 2
2 AIRBUS A319 208 164. 179 2
3 AIRBUS A321 94 297. 379 2
4 AIRBUS A330 18 377. 377 2.11
5 AIRBUS A340 1 375 375 4
6 BOEING 737 1037 153. 149 2
7 BOEING 757 308 186. 178 2
8 BOEING 767 170 315. 330 2
9 BOEING 717 88 100 100 2
10 BOEING 777 12 400 400 2
11 BOEING 787 4 260 260 2
12 BOEING 747 1 450 450 4
Seat Count Group Analysis
I created 8 bins sized 50. Fleet consists of planes which have seats between 150 and 200 mostly.
%>%
ab_df group_by(seat_group = cut(seats, c(0,50,100,150,200,250,300,350,400,450,500), include.lowest = TRUE)) %>%
summarise(count = n())
# A tibble: 8 × 2
seat_group count
<fct> <int>
1 (50,100] 102
2 (100,150] 934
3 (150,200] 1035
4 (200,250] 13
5 (250,300] 85
6 (300,350] 114
7 (350,400] 82
8 (400,450] 1
Conclusion
Consequently, I analysed data based on year, aircraft type and seat counts.
Top 5 plane count over years doesn’t change much according to manufacturer. Airbus and Boeing have gone hand-in-hand in manufacture-wise over years.
There is a negative relationship between seat capacity and plane count. As the seat capacity increases, plane count decreases and engine count increases. Since maintenance costs are high in aircraft industry, companies may not want to prefer planes that have 4 engines. Therefore, production amount in high-engine sized planes are lower.
Fleet mostly consists of planes which have a seat capacity between 150 and 200. This analysis is aligned with the 2nd one.
Enjoy reading ! :)