2 inclass1
2.0.1 Let’s look at the description of the dataset.
This package contains information about all flights that departed from NYC (e.g. EWR, JFK and LGA) to destinations in the United States, Puerto Rico,and the American Virgin Islands) in 2013: 336,776 flights in total.
2.0.2 AIRBUS and AIRBUS INDUSTRIE
There are AIRBUS and AIRBUS INDUSTRIE in the data, with quick google search there is no difference between them. it was AIRBUS INDUSTRIE until 2001, since then it’s just AIRBUS. That’s why let’s make them same.
%>%
planes group_by(manufacturer) %>%
summarise(plane_count=n()) %>%
arrange(desc(plane_count)) %>%
print(n = Inf)
# A tibble: 35 × 2
manufacturer plane_count
<chr> <int>
1 BOEING 1630
2 AIRBUS INDUSTRIE 400
3 BOMBARDIER INC 368
4 AIRBUS 336
5 EMBRAER 299
6 MCDONNELL DOUGLAS 120
7 MCDONNELL DOUGLAS AIRCRAFT CO 103
8 MCDONNELL DOUGLAS CORPORATION 14
9 CANADAIR 9
10 CESSNA 9
11 PIPER 5
12 AMERICAN AIRCRAFT INC 2
13 BEECH 2
14 BELL 2
15 GULFSTREAM AEROSPACE 2
16 STEWART MACO 2
17 AGUSTA SPA 1
18 AVIAT AIRCRAFT INC 1
19 AVIONS MARCEL DASSAULT 1
20 BARKER JACK L 1
21 CANADAIR LTD 1
22 CIRRUS DESIGN CORP 1
23 DEHAVILLAND 1
24 DOUGLAS 1
25 FRIEDEMANN JON 1
26 HURLEY JAMES LARRY 1
27 JOHN G HESS 1
28 KILDALL GARY 1
29 LAMBERT RICHARD 1
30 LEARJET INC 1
31 LEBLANC GLENN T 1
32 MARZ BARRY 1
33 PAIR MIKE E 1
34 ROBINSON HELICOPTER CO 1
35 SIKORSKY 1
<- planes %>% mutate(manufacturer = replace(manufacturer, manufacturer == "AIRBUS INDUSTRIE", "AIRBUS")) planes_new
2.0.3 80/20 Rules
There are 34 different manufacturers, first 4 manufacturer dominated the market and 91.4% of planes belongs to them. Not exactly but 80/20 rules somehow works here 11% of the manufacturer dominates the 91.4% of the market.
%>%
planes_new group_by(manufacturer) %>%
summarise(avg_engine=mean(engines),median_engine=median(engines),plane_count=n()) %>%
arrange(desc(plane_count)) %>%
mutate(frequency = round(plane_count/sum(plane_count),3), cumsum = cumsum(frequency)) %>%
print(n = Inf)
# A tibble: 34 × 6
manufacturer avg_engine median_engine plane…¹ frequ…² cumsum
<chr> <dbl> <dbl> <int> <dbl> <dbl>
1 BOEING 2.00 2 1630 0.491 0.491
2 AIRBUS 2.01 2 736 0.222 0.713
3 BOMBARDIER INC 2 2 368 0.111 0.824
4 EMBRAER 2 2 299 0.09 0.914
5 MCDONNELL DOUGLAS 2 2 120 0.036 0.95
6 MCDONNELL DOUGLAS AIRCRAFT CO 2 2 103 0.031 0.981
7 MCDONNELL DOUGLAS CORPORATION 2 2 14 0.004 0.985
8 CANADAIR 2 2 9 0.003 0.988
9 CESSNA 1.33 1 9 0.003 0.991
10 PIPER 1.4 1 5 0.002 0.993
11 AMERICAN AIRCRAFT INC 1 1 2 0.001 0.994
12 BEECH 2 2 2 0.001 0.995
13 BELL 1.5 1.5 2 0.001 0.996
14 GULFSTREAM AEROSPACE 2 2 2 0.001 0.997
15 STEWART MACO 1 1 2 0.001 0.998
16 AGUSTA SPA 2 2 1 0 0.998
17 AVIAT AIRCRAFT INC 1 1 1 0 0.998
18 AVIONS MARCEL DASSAULT 3 3 1 0 0.998
19 BARKER JACK L 1 1 1 0 0.998
20 CANADAIR LTD 4 4 1 0 0.998
21 CIRRUS DESIGN CORP 1 1 1 0 0.998
22 DEHAVILLAND 1 1 1 0 0.998
23 DOUGLAS 4 4 1 0 0.998
24 FRIEDEMANN JON 1 1 1 0 0.998
25 HURLEY JAMES LARRY 1 1 1 0 0.998
26 JOHN G HESS 1 1 1 0 0.998
27 KILDALL GARY 1 1 1 0 0.998
28 LAMBERT RICHARD 1 1 1 0 0.998
29 LEARJET INC 2 2 1 0 0.998
30 LEBLANC GLENN T 1 1 1 0 0.998
31 MARZ BARRY 1 1 1 0 0.998
32 PAIR MIKE E 1 1 1 0 0.998
33 ROBINSON HELICOPTER CO 1 1 1 0 0.998
34 SIKORSKY 2 2 1 0 0.998
# … with abbreviated variable names ¹plane_count, ²frequency
2.0.4 Manufacturer by engine counts
Most of planes have 2 engines and rest of them have 1,3,4 engines. 2 engines have different avg seat, for example BOEING has 187 seat, BOMBARDIER INC has 74 in average. The reason could be pricing and luxury. While BOMBARDIER offering more private experience, BOEIGN offering more seats. 3 and 4 engines might be for long flight. They also differs in terms of available seats. 4 engines CANADAIR LTD has only 2 seats, but BOEING have 450 seats.
%>%
planes_new group_by(engines,manufacturer) %>%
summarise(plane_count=n(),avg_seats=mean(seats),,median_seats=median(seats)) %>%
arrange(engines,desc(plane_count)) %>%
print(n = Inf)
`summarise()` has grouped output by 'engines'. You can override using the
`.groups` argument.
# A tibble: 40 × 5
# Groups: engines [4]
engines manufacturer plane_count avg_seats median_seats
<int> <chr> <int> <dbl> <dbl>
1 1 CESSNA 6 4.33 4
2 1 PIPER 3 6 7
3 1 AMERICAN AIRCRAFT INC 2 2 2
4 1 STEWART MACO 2 2 2
5 1 AVIAT AIRCRAFT INC 1 2 2
6 1 BARKER JACK L 1 2 2
7 1 BELL 1 5 5
8 1 CIRRUS DESIGN CORP 1 4 4
9 1 DEHAVILLAND 1 16 16
10 1 FRIEDEMANN JON 1 2 2
11 1 HURLEY JAMES LARRY 1 2 2
12 1 JOHN G HESS 1 2 2
13 1 KILDALL GARY 1 2 2
14 1 LAMBERT RICHARD 1 2 2
15 1 LEBLANC GLENN T 1 2 2
16 1 MARZ BARRY 1 2 2
17 1 PAIR MIKE E 1 2 2
18 1 ROBINSON HELICOPTER CO 1 5 5
19 2 BOEING 1629 175. 149
20 2 AIRBUS 733 202. 182
21 2 BOMBARDIER INC 368 74.0 80
22 2 EMBRAER 299 45.6 55
23 2 MCDONNELL DOUGLAS 120 162. 172
24 2 MCDONNELL DOUGLAS AIRCRAFT CO 103 142 142
25 2 MCDONNELL DOUGLAS CORPORATION 14 142 142
26 2 CANADAIR 9 55 55
27 2 CESSNA 3 7.33 8
28 2 BEECH 2 9.5 9.5
29 2 GULFSTREAM AEROSPACE 2 22 22
30 2 PIPER 2 8 8
31 2 AGUSTA SPA 1 8 8
32 2 BELL 1 11 11
33 2 LEARJET INC 1 11 11
34 2 SIKORSKY 1 14 14
35 3 AIRBUS 2 379 379
36 3 AVIONS MARCEL DASSAULT 1 12 12
37 4 AIRBUS 1 375 375
38 4 BOEING 1 450 450
39 4 CANADAIR LTD 1 2 2
40 4 DOUGLAS 1 102 102
2.0.5 Leaders of Market
We know that most of plane have 2 engines and they are belongs to 4 manufacturer. Let’s investigate metrics. of only 2 engines comes from 4 manufacturer.
<- planes_new %>%
manufacturers_names group_by(manufacturer) %>%
summarise(avg_engine=mean(engines),median_engine=median(engines),plane_count=n()) %>%
arrange(desc(plane_count)) %>%
mutate(frequency = round(plane_count/sum(plane_count),3), cumsum = cumsum(frequency)) %>%
select(manufacturer) %>%
slice_head(n = 4)
AIRBUS and BOEING look like two compaines have same strategy which is more seats. BOEING have more different planes than AIRBUS. BOMBARDIER and EMBRAER also have same strategy which is less seats.
%>%
planes_new filter(manufacturer %in% manufacturers_names$manufacturer,engines == 2) %>%
group_by(manufacturer) %>%
summarise(mean=mean(seats),std_dev=sd(seats),count=n()) %>%
print(n = Inf)
# A tibble: 4 × 4
manufacturer mean std_dev count
<chr> <dbl> <dbl> <int>
1 AIRBUS 202. 59.2 733
2 BOEING 175. 59.1 1629
3 BOMBARDIER INC 74.0 17.8 368
4 EMBRAER 45.6 15.5 299