This chapter is about Joins. WE load Tennis data in order to analysis this dataset;
tennis_data <- load("~/atp_tennis_data_2017.RData")
We join two dataframes with player_id and singles_winner_player_id coloumns in order to get winner players.
tennis_lj<-left_join(tourney_df, player_df,by = c("singles_winner_player_id"="player_id"))
First solution of this one:
tennis_lj_sum <-
tennis_lj %>%
count(flag_code,sort=TRUE)
Other solution of this one:
tennis_lj_sum <-
tennis_lj %>%
group_by(flag_code) %>%
summarise(total_win = n()) %>%
arrange(desc(total_win))
In this analysis we have to find the non-champion players using anti-join
nonchamp_players <-
player_df %>% select(player_id,flag_code)%>% anti_join(.,tennis_lj_sum)
## Joining, by = "flag_code"
So the Data set ready we now have to write the SQL in order to fulfill the reqirement.
nonchamp_players %>% left_join(.,score_df,by=c("player_id"="winner_player_id")) %>%
group_by(flag_code) %>% summarise(total_won=sum(winner_games_won,na.rm=TRUE)) %>%arrange(desc(total_won))
## # A tibble: 93 x 2
## flag_code total_won
## <chr> <dbl>
## 1 AUS 1989
## 2 CZE 1209
## 3 CAN 1190
## 4 SVK 889
## 5 BRA 873
## 6 POR 621
## 7 RSA 566
## 8 KAZ 495
## 9 KOR 438
## 10 GEO 377
## # ... with 83 more rows
final_winners <-
inner_join(player_df,score_df,by=c("player_id"="winner_player_id")) %>%
filter(tourney_round_name=="Finals") %>%
group_by(player_slug) %>%
summarise(final_count = n()) %>%
arrange(desc(final_count))
final_winners
## # A tibble: 35 x 2
## player_slug final_count
## <chr> <int>
## 1 roger-federer 7
## 2 rafael-nadal 6
## 3 alexander-zverev 5
## 4 grigor-dimitrov 4
## 5 jo-wilfried-tsonga 4
## 6 jack-sock 3
## 7 lucas-pouille 3
## 8 damir-dzumhur 2
## 9 david-goffin 2
## 10 gilles-muller 2
## # ... with 25 more rows
final_losers <-
inner_join(player_df,score_df,by=c("player_id"="loser_player_id")) %>%
filter(tourney_round_name=="Finals") %>%
group_by(player_slug) %>%
summarise(final_count = n()) %>%
arrange(desc(final_count))
final_losers
## # A tibble: 50 x 2
## player_slug final_count
## <chr> <int>
## 1 rafael-nadal 4
## 2 david-goffin 3
## 3 adrian-mannarino 2
## 4 albert-ramos-vinolas 2
## 5 alexandr-dolgopolov 2
## 6 dominic-thiem 2
## 7 joao-sousa 2
## 8 kei-nishikori 2
## 9 kevin-anderson 2
## 10 marin-cilic 2
## # ... with 40 more rows
I merge the Final winners and final losers datasets to analysis the Most Final Played Players.
merge_losers_winners <-
rbind(final_losers,final_winners)
merge_losers_winners%>%
group_by(player_slug) %>%
summarise(total_finals=sum(final_count,na.rm=TRUE)) %>%
arrange(desc(total_finals))
## # A tibble: 64 x 2
## player_slug total_finals
## <chr> <int>
## 1 rafael-nadal 10
## 2 roger-federer 8
## 3 alexander-zverev 6
## 4 david-goffin 5
## 5 grigor-dimitrov 5
## 6 jo-wilfried-tsonga 5
## 7 lucas-pouille 4
## 8 alexandr-dolgopolov 3
## 9 damir-dzumhur 3
## 10 dominic-thiem 3
## # ... with 54 more rows