2  Inclass Exercise-1

Author

Emre Çakmak

Published

October 19, 2022

2.0.1 INCLASS EXERCISE

This exercise has been prepared for understanding {dplyr} package usage for functional EDA. Main data set in this exercise will be planes data set derived from FAA.

First of all we have to install our packages.

install.packages("tidyverse")
install.packages("nycflights13")

Then we are calling our libraries.

library(tidyverse)
library(nycflights13)

Let’s check first 10 rows of the data set. Fields and their meanings are:

  • tailnum: Tail number.
  • year: Year manufactured.
  • type: Type of plane.
  • manufacturer: Manufacturer of the aircraft.
  • model: Model of the aircraft.
  • engines: Number of engines
  • seats: Number of seats
  • speed: Average cruising speed in mph.
  • engine: Type of engine.
planes %>% 
  slice(1:10)
# A tibble: 10 × 9
   tailnum  year type                   manuf…¹ model engines seats speed engine
   <chr>   <int> <chr>                  <chr>   <chr>   <int> <int> <int> <chr> 
 1 N10156   2004 Fixed wing multi engi… EMBRAER EMB-…       2    55    NA Turbo…
 2 N102UW   1998 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
 3 N103US   1999 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
 4 N104UW   1999 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
 5 N10575   2002 Fixed wing multi engi… EMBRAER EMB-…       2    55    NA Turbo…
 6 N105UW   1999 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
 7 N107US   1999 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
 8 N108UW   1999 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
 9 N109UW   1999 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
10 N110UW   1999 Fixed wing multi engi… AIRBUS… A320…       2   182    NA Turbo…
# … with abbreviated variable name ¹​manufacturer

2.1 EXERCISE 1

Now, how many aircraft does exists for each manufacturing company? Let’s calculate..

planes %>% 
  group_by(manufacturer) %>% 
  summarise(aircraft_count = n()) %>% 
  arrange(desc(aircraft_count)) %>% 
  print(n=Inf)
# A tibble: 35 × 2
   manufacturer                  aircraft_count
   <chr>                                  <int>
 1 BOEING                                  1630
 2 AIRBUS INDUSTRIE                         400
 3 BOMBARDIER INC                           368
 4 AIRBUS                                   336
 5 EMBRAER                                  299
 6 MCDONNELL DOUGLAS                        120
 7 MCDONNELL DOUGLAS AIRCRAFT CO            103
 8 MCDONNELL DOUGLAS CORPORATION             14
 9 CANADAIR                                   9
10 CESSNA                                     9
11 PIPER                                      5
12 AMERICAN AIRCRAFT INC                      2
13 BEECH                                      2
14 BELL                                       2
15 GULFSTREAM AEROSPACE                       2
16 STEWART MACO                               2
17 AGUSTA SPA                                 1
18 AVIAT AIRCRAFT INC                         1
19 AVIONS MARCEL DASSAULT                     1
20 BARKER JACK L                              1
21 CANADAIR LTD                               1
22 CIRRUS DESIGN CORP                         1
23 DEHAVILLAND                                1
24 DOUGLAS                                    1
25 FRIEDEMANN JON                             1
26 HURLEY JAMES LARRY                         1
27 JOHN G HESS                                1
28 KILDALL GARY                               1
29 LAMBERT RICHARD                            1
30 LEARJET INC                                1
31 LEBLANC GLENN T                            1
32 MARZ BARRY                                 1
33 PAIR MIKE E                                1
34 ROBINSON HELICOPTER CO                     1
35 SIKORSKY                                   1

It seems like there is a conflict in manufacturer names. Some of them represent the same company but in different names like Airbus and Airbus Industrie.

We need to clean and rewrite these names. Then we can apply same process again.

planes =
planes %>% 
  mutate(manufacturer = gsub("AIRBUS INDUSTRIE", "AIRBUS", manufacturer), manufacturer=gsub(" AIRCRAFT CO| CORPORATION", "", manufacturer))

The last version of distribution of air crafts according to their manufacturer is here.

planes %>% 
  group_by(manufacturer) %>% 
  summarise(aircraft_count = n()) %>% 
  arrange(desc(aircraft_count)) %>% 
  mutate(aircraft_count_distrubiton=round(aircraft_count/sum(aircraft_count),2)) %>% 
  print(n=Inf)
# A tibble: 32 × 3
   manufacturer           aircraft_count aircraft_count_distrubiton
   <chr>                           <int>                      <dbl>
 1 BOEING                           1630                       0.49
 2 AIRBUS                            736                       0.22
 3 BOMBARDIER INC                    368                       0.11
 4 EMBRAER                           299                       0.09
 5 MCDONNELL DOUGLAS                 237                       0.07
 6 CANADAIR                            9                       0   
 7 CESSNA                              9                       0   
 8 PIPER                               5                       0   
 9 AMERICAN AIRCRAFT INC               2                       0   
10 BEECH                               2                       0   
11 BELL                                2                       0   
12 GULFSTREAM AEROSPACE                2                       0   
13 STEWART MACO                        2                       0   
14 AGUSTA SPA                          1                       0   
15 AVIAT AIRCRAFT INC                  1                       0   
16 AVIONS MARCEL DASSAULT              1                       0   
17 BARKER JACK L                       1                       0   
18 CANADAIR LTD                        1                       0   
19 CIRRUS DESIGN CORP                  1                       0   
20 DEHAVILLAND                         1                       0   
21 DOUGLAS                             1                       0   
22 FRIEDEMANN JON                      1                       0   
23 HURLEY JAMES LARRY                  1                       0   
24 JOHN G HESS                         1                       0   
25 KILDALL GARY                        1                       0   
26 LAMBERT RICHARD                     1                       0   
27 LEARJET INC                         1                       0   
28 LEBLANC GLENN T                     1                       0   
29 MARZ BARRY                          1                       0   
30 PAIR MIKE E                         1                       0   
31 ROBINSON HELICOPTER CO              1                       0   
32 SIKORSKY                            1                       0   

2.2 EXERCISE 2

Let’s check the difference on aircraft capacities year by year

First, get only air crafts which have more than 50 seats. Then clear the data by filtering rows which have no information in Year column.

planes %>% 
  filter(seats>50,!is.na(year))%>%
  group_by(year) %>% 
  summarise(seat_avg = round(mean(seats),2)) %>% 
  arrange(year) %>% 
  print(n=Inf)
# A tibble: 38 × 2
    year seat_avg
   <int>    <dbl>
 1  1956     102 
 2  1965     149 
 3  1975     139 
 4  1976     139 
 5  1977     139 
 6  1978     139 
 7  1979     139 
 8  1980     139 
 9  1984     178 
10  1985     174.
11  1986     196.
12  1987     181.
13  1988     190.
14  1989     163.
15  1990     179.
16  1991     181.
17  1992     195.
18  1993     198.
19  1994     178.
20  1995     187.
21  1996     170.
22  1997     179.
23  1998     169.
24  1999     167.
25  2000     163.
26  2001     152.
27  2002     132.
28  2003     106.
29  2004     116.
30  2005     117.
31  2006     141.
32  2007     140.
33  2008     147.
34  2009     194.
35  2010     164.
36  2011     214.
37  2012     207.
38  2013     206.

Let’s check the biggest air craft in our database with it’s tailnumber.

planes %>%
  arrange(desc(seats)) %>% 
  slice(1) %>%
  select(tailnum, manufacturer, model, year, seats)
# A tibble: 1 × 5
  tailnum manufacturer model    year seats
  <chr>   <chr>        <chr>   <int> <int>
1 N670US  BOEING       747-451  1990   450

Exciting..Here is some information about the biggest airplane’s history

THANKS FOR READING