install.packages("tidyverse")
install.packages("nycflights13")
2 Inclass Exercise-1
2.0.1 INCLASS EXERCISE
This exercise has been prepared for understanding {dplyr} package usage for functional EDA. Main data set in this exercise will be planes data set derived from FAA.
First of all we have to install our packages.
Then we are calling our libraries.
library(tidyverse)
library(nycflights13)
Let’s check first 10 rows of the data set. Fields and their meanings are:
- tailnum: Tail number.
- year: Year manufactured.
- type: Type of plane.
- manufacturer: Manufacturer of the aircraft.
- model: Model of the aircraft.
- engines: Number of engines
- seats: Number of seats
- speed: Average cruising speed in mph.
- engine: Type of engine.
%>%
planes slice(1:10)
# A tibble: 10 × 9
tailnum year type manuf…¹ model engines seats speed engine
<chr> <int> <chr> <chr> <chr> <int> <int> <int> <chr>
1 N10156 2004 Fixed wing multi engi… EMBRAER EMB-… 2 55 NA Turbo…
2 N102UW 1998 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
3 N103US 1999 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
4 N104UW 1999 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
5 N10575 2002 Fixed wing multi engi… EMBRAER EMB-… 2 55 NA Turbo…
6 N105UW 1999 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
7 N107US 1999 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
8 N108UW 1999 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
9 N109UW 1999 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
10 N110UW 1999 Fixed wing multi engi… AIRBUS… A320… 2 182 NA Turbo…
# … with abbreviated variable name ¹manufacturer
2.1 EXERCISE 1
Now, how many aircraft does exists for each manufacturing company? Let’s calculate..
%>%
planes group_by(manufacturer) %>%
summarise(aircraft_count = n()) %>%
arrange(desc(aircraft_count)) %>%
print(n=Inf)
# A tibble: 35 × 2
manufacturer aircraft_count
<chr> <int>
1 BOEING 1630
2 AIRBUS INDUSTRIE 400
3 BOMBARDIER INC 368
4 AIRBUS 336
5 EMBRAER 299
6 MCDONNELL DOUGLAS 120
7 MCDONNELL DOUGLAS AIRCRAFT CO 103
8 MCDONNELL DOUGLAS CORPORATION 14
9 CANADAIR 9
10 CESSNA 9
11 PIPER 5
12 AMERICAN AIRCRAFT INC 2
13 BEECH 2
14 BELL 2
15 GULFSTREAM AEROSPACE 2
16 STEWART MACO 2
17 AGUSTA SPA 1
18 AVIAT AIRCRAFT INC 1
19 AVIONS MARCEL DASSAULT 1
20 BARKER JACK L 1
21 CANADAIR LTD 1
22 CIRRUS DESIGN CORP 1
23 DEHAVILLAND 1
24 DOUGLAS 1
25 FRIEDEMANN JON 1
26 HURLEY JAMES LARRY 1
27 JOHN G HESS 1
28 KILDALL GARY 1
29 LAMBERT RICHARD 1
30 LEARJET INC 1
31 LEBLANC GLENN T 1
32 MARZ BARRY 1
33 PAIR MIKE E 1
34 ROBINSON HELICOPTER CO 1
35 SIKORSKY 1
It seems like there is a conflict in manufacturer names. Some of them represent the same company but in different names like Airbus and Airbus Industrie.
We need to clean and rewrite these names. Then we can apply same process again.
=
planes %>%
planes mutate(manufacturer = gsub("AIRBUS INDUSTRIE", "AIRBUS", manufacturer), manufacturer=gsub(" AIRCRAFT CO| CORPORATION", "", manufacturer))
The last version of distribution of air crafts according to their manufacturer is here.
%>%
planes group_by(manufacturer) %>%
summarise(aircraft_count = n()) %>%
arrange(desc(aircraft_count)) %>%
mutate(aircraft_count_distrubiton=round(aircraft_count/sum(aircraft_count),2)) %>%
print(n=Inf)
# A tibble: 32 × 3
manufacturer aircraft_count aircraft_count_distrubiton
<chr> <int> <dbl>
1 BOEING 1630 0.49
2 AIRBUS 736 0.22
3 BOMBARDIER INC 368 0.11
4 EMBRAER 299 0.09
5 MCDONNELL DOUGLAS 237 0.07
6 CANADAIR 9 0
7 CESSNA 9 0
8 PIPER 5 0
9 AMERICAN AIRCRAFT INC 2 0
10 BEECH 2 0
11 BELL 2 0
12 GULFSTREAM AEROSPACE 2 0
13 STEWART MACO 2 0
14 AGUSTA SPA 1 0
15 AVIAT AIRCRAFT INC 1 0
16 AVIONS MARCEL DASSAULT 1 0
17 BARKER JACK L 1 0
18 CANADAIR LTD 1 0
19 CIRRUS DESIGN CORP 1 0
20 DEHAVILLAND 1 0
21 DOUGLAS 1 0
22 FRIEDEMANN JON 1 0
23 HURLEY JAMES LARRY 1 0
24 JOHN G HESS 1 0
25 KILDALL GARY 1 0
26 LAMBERT RICHARD 1 0
27 LEARJET INC 1 0
28 LEBLANC GLENN T 1 0
29 MARZ BARRY 1 0
30 PAIR MIKE E 1 0
31 ROBINSON HELICOPTER CO 1 0
32 SIKORSKY 1 0
2.2 EXERCISE 2
Let’s check the difference on aircraft capacities year by year
First, get only air crafts which have more than 50 seats. Then clear the data by filtering rows which have no information in Year column.
%>%
planes filter(seats>50,!is.na(year))%>%
group_by(year) %>%
summarise(seat_avg = round(mean(seats),2)) %>%
arrange(year) %>%
print(n=Inf)
# A tibble: 38 × 2
year seat_avg
<int> <dbl>
1 1956 102
2 1965 149
3 1975 139
4 1976 139
5 1977 139
6 1978 139
7 1979 139
8 1980 139
9 1984 178
10 1985 174.
11 1986 196.
12 1987 181.
13 1988 190.
14 1989 163.
15 1990 179.
16 1991 181.
17 1992 195.
18 1993 198.
19 1994 178.
20 1995 187.
21 1996 170.
22 1997 179.
23 1998 169.
24 1999 167.
25 2000 163.
26 2001 152.
27 2002 132.
28 2003 106.
29 2004 116.
30 2005 117.
31 2006 141.
32 2007 140.
33 2008 147.
34 2009 194.
35 2010 164.
36 2011 214.
37 2012 207.
38 2013 206.
Let’s check the biggest air craft in our database with it’s tailnumber.
%>%
planes arrange(desc(seats)) %>%
slice(1) %>%
select(tailnum, manufacturer, model, year, seats)
# A tibble: 1 × 5
tailnum manufacturer model year seats
<chr> <chr> <chr> <int> <int>
1 N670US BOEING 747-451 1990 450
Exciting..Here is some information about the biggest airplane’s history
THANKS FOR READING