This chapter is about Joins. WE load Tennis data in order to analysis this dataset;

tennis_data <- load("~/atp_tennis_data_2017.RData")

1-Rank countries (flag cides) by the number of singles champions

We join two dataframes with player_id and singles_winner_player_id coloumns in order to get winner players.

tennis_lj<-left_join(tourney_df, player_df,by = c("singles_winner_player_id"="player_id"))

First solution of this one:

tennis_lj_sum <-
  tennis_lj %>%
  count(flag_code,sort=TRUE)

Other solution of this one:

tennis_lj_sum <-
  tennis_lj %>%
  group_by(flag_code) %>%
  summarise(total_win = n()) %>%
  arrange(desc(total_win))

2-Rank countries which did not get any singles championships by the games won when they win the match

In this analysis we have to find the non-champion players using anti-join

nonchamp_players <-
  player_df %>% select(player_id,flag_code)%>% anti_join(.,tennis_lj_sum)
## Joining, by = "flag_code"

So the Data set ready we now have to write the SQL in order to fulfill the reqirement.

nonchamp_players %>% left_join(.,score_df,by=c("player_id"="winner_player_id")) %>%
  group_by(flag_code) %>% summarise(total_won=sum(winner_games_won,na.rm=TRUE)) %>%arrange(desc(total_won))
## # A tibble: 93 x 2
##    flag_code total_won
##    <chr>         <dbl>
##  1 AUS            1989
##  2 CZE            1209
##  3 CAN            1190
##  4 SVK             889
##  5 BRA             873
##  6 POR             621
##  7 RSA             566
##  8 KAZ             495
##  9 KOR             438
## 10 GEO             377
## # ... with 83 more rows

3-Final Winners

final_winners <-
  inner_join(player_df,score_df,by=c("player_id"="winner_player_id")) %>%
  filter(tourney_round_name=="Finals") %>%
  group_by(player_slug) %>%
  summarise(final_count = n()) %>%
  arrange(desc(final_count))

final_winners
## # A tibble: 35 x 2
##    player_slug        final_count
##    <chr>                    <int>
##  1 roger-federer                7
##  2 rafael-nadal                 6
##  3 alexander-zverev             5
##  4 grigor-dimitrov              4
##  5 jo-wilfried-tsonga           4
##  6 jack-sock                    3
##  7 lucas-pouille                3
##  8 damir-dzumhur                2
##  9 david-goffin                 2
## 10 gilles-muller                2
## # ... with 25 more rows

4-Final Losers

final_losers <- 
  inner_join(player_df,score_df,by=c("player_id"="loser_player_id")) %>%
  filter(tourney_round_name=="Finals") %>%
  group_by(player_slug) %>%
  summarise(final_count = n()) %>%
  arrange(desc(final_count))

final_losers
## # A tibble: 50 x 2
##    player_slug          final_count
##    <chr>                      <int>
##  1 rafael-nadal                   4
##  2 david-goffin                   3
##  3 adrian-mannarino               2
##  4 albert-ramos-vinolas           2
##  5 alexandr-dolgopolov            2
##  6 dominic-thiem                  2
##  7 joao-sousa                     2
##  8 kei-nishikori                  2
##  9 kevin-anderson                 2
## 10 marin-cilic                    2
## # ... with 40 more rows

5-Most Final Played Players

I merge the Final winners and final losers datasets to analysis the Most Final Played Players.

merge_losers_winners <-
rbind(final_losers,final_winners)

merge_losers_winners%>%
  group_by(player_slug) %>%
  summarise(total_finals=sum(final_count,na.rm=TRUE)) %>%
  arrange(desc(total_finals))  
## # A tibble: 64 x 2
##    player_slug         total_finals
##    <chr>                      <int>
##  1 rafael-nadal                  10
##  2 roger-federer                  8
##  3 alexander-zverev               6
##  4 david-goffin                   5
##  5 grigor-dimitrov                5
##  6 jo-wilfried-tsonga             5
##  7 lucas-pouille                  4
##  8 alexandr-dolgopolov            3
##  9 damir-dzumhur                  3
## 10 dominic-thiem                  3
## # ... with 54 more rows