A Detail Study About Turkish Super League (from 93-94 to 18-19)

Group Name: Müjde-R

1. Introduction

This data consists of various Turkish Premier Division match statistics. Match statistics data is available on football-data.co.uk and it covers a wide range of match with a breakdown of home and away teams on weekly basis. Our data starts with the 1994-1995 season and covers each year up to the present. Data grows rich as new variables added year by year. For example, whereas raw data contains only 6 variables in 1993-1994 season, variable number reaches 61 to in 2018-2019 season, even includes bet odds. Raw data contains first half scores separately, enabling us to figure out the second half scores to deepen and richen our analysis scope.

2. Game Plan

2.1. Summary Of Raw Data

We uploaded the raw data into git hub. The data enriches year by year. We have 25 seasons x 34 weeks x 9 Matches = 7650 rows with 6-61 columns information.

  • Let’s first see the minimal data. (93-94 season)
data_url <- "https://github.com/pjournal/mef03g-mujde-r/blob/master/Group%20Project/Raw%20Data/1994-1995.csv?raw=true"
raw_data <- read.csv(data_url,sep=',',header=F)
head(raw_data)
##    V1       V2            V3            V4   V5   V6  V7
## 1 Div     Date      HomeTeam      AwayTeam FTHG FTAG FTR
## 2  T1 13/08/94   Zeytinburnu    Fenerbahce    1    4   A
## 3  T1 14/08/94 Ad. Demirspor       Antalya    1    0   H
## 4  T1 14/08/94    Ankaragucu Gaziantepspor    1    0   H
## 5  T1 14/08/94     Bursaspor      P. Ofisi    5    2   H
## 6  T1 14/08/94   Denizlispor      Besiktas    1    3   A
  • We will use Date, HomeTeam, Away Team, FTGH(Full Time Home Team Goals), FTAG(Full Time Away Team Goals) and FTR(Full Time Result) columns for all years.

  • And let us see the 04-05 data.

data_url <- "https://github.com/pjournal/mef03g-mujde-r/blob/master/Group%20Project/Raw%20Data/2004-2005.csv?raw=true"
raw_data <- read.csv(data_url,sep=',',header=F)
head(raw_data)
##    V1       V2             V3            V4   V5   V6  V7   V8   V9 V10   V11   V12   V13  V14 V15 V16  V17 V18 V19  V20 V21  V22  V23  V24  V25  V26  V27  V28 V29  V30  V31    V32    V33   V34   V35  V36
## 1 Div     Date       HomeTeam      AwayTeam FTHG FTAG FTR HTHG HTAG HTR B365H B365D B365A  BWH BWD BWA  GBH GBD GBA  IWH IWD  IWA  LBH  LBD  LBA  SBH  SBD  SBA WHH  WHD  WHA GB>2.5 GB<2.5 GBAHH GBAHA GBAH
## 2  T1 6/8/2004       Rizespor    Fenerbahce    2    2   D    0    2   A     6   3.4   1.5    6 3.6 1.5    6 3.6 1.5  5.4 3.8 1.45  5.5  3.6 1.53  5.5  3.4 1.57   6  3.4  1.5   1.65      2  2.25   1.5  0.5
## 3  T1 7/8/2004    Denizlispor Gaziantepspor    2    1   H    1    0   H   2.2  3.25  2.75  2.1 3.3   3  2.2 3.2 2.9    2 3.1  3.2  2.1 3.25    3 2.25  3.2  2.8 2.1  3.2    3   1.71   1.95   2.2  1.52 -0.5
## 4  T1 7/8/2004 Genclerbirligi   Sakaryaspor    1    0   H    1    0   H   1.5   3.4     6 1.45 3.7 6.5 1.45 3.7 6.6 1.45 3.8  5.4 1.53  3.6  5.5  1.5  3.5    6 1.4 3.75    7    1.6    2.1  1.45  2.37 -0.5
## 5  T1 7/8/2004    Kayserispor   Trabzonspor    0    3   A    0    1   A  4.33   3.4  1.66    5 3.5 1.6  4.2 3.5 1.7  4.4 3.3 1.65 4.33  3.6 1.67  4.5  3.4 1.67 4.4  3.4 1.65    1.6    2.1  1.91   1.7  0.5
## 6  T1 7/8/2004    Malatyaspor      Besiktas    1    1   D    1    0   H   3.2   3.2     2  3.3 3.2   2  3.1 3.2 2.1  3.5 3.1  1.9  3.5 3.25 1.91  3.5 3.25 1.91 2.8  3.2  2.2   1.65      2  1.57   2.1  0.5
  • Wow there are many new variables. We will use HTHG, HTAG and HTR for half-time details. But the other variables are all sort of bet odds. B365, BW, GB, IW, LB prefixes are different bet companies. And also H,D and A, >2.5, <2.5, AHA suffixes are bet types which are Home Win, Draw, Away Win, Total Score Over 2.5, Total Score Below 2.5, Away Team Handicapped Win.

  • For example: LBAHH = Ladbrokes Asian handicap home team odds, GB<2.5 = Gamebookers under 2.5 goals

  • And finally let us see the latest league data (18-19)

data_url <- "https://github.com/pjournal/mef03g-mujde-r/blob/master/Group%20Project/Raw%20Data/2018-2019.csv?raw=true"
raw_data <- read.csv(data_url,sep=',',header=F)
head(raw_data)
##    V1        V2         V3                   V4   V5   V6  V7   V8   V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22   V23   V24   V25  V26  V27  V28  V29  V30  V31  V32  V33  V34  V35  V36 V37  V38  V39  V40   V41   V42   V43   V44   V45   V46   V47  V48      V49      V50      V51      V52  V53   V54     V55     V56     V57     V58  V59  V60  V61
## 1 Div      Date   HomeTeam             AwayTeam FTHG FTAG FTR HTHG HTAG HTR  HS  AS HST AST  HF  AF  HC  AC  HY  AY  HR  AR B365H B365D B365A  BWH  BWD  BWA  IWH  IWD  IWA  PSH  PSD  PSA  WHH  WHD WHA  VCH  VCD  VCA Bb1X2 BbMxH BbAvH BbMxD BbAvD BbMxA BbAvA BbOU BbMx>2.5 BbAv>2.5 BbMx<2.5 BbAv<2.5 BbAH BbAHh BbMxAHH BbAvAHH BbMxAHA BbAvAHA PSCH PSCD PSCA
## 2  T1 10/8/2018 Ankaragucu          Galatasaray    1    3   A    1    2   A   3  12   1   4  15   9   2   4   2   2   0   0   4.5     4  1.72  4.6  3.9 1.65 4.35 3.85  1.7 4.62 3.97 1.77    4    4 1.7 4.33    4 1.73    34  4.89  4.42  4.09  3.85  1.82  1.71   30      1.7     1.64     2.29      2.2   18     1    1.69    1.64    2.34    2.23 4.39 3.89 1.83
## 3  T1 11/8/2018 Fenerbahce            Bursaspor    2    1   H    2    1   H   7   6   2   3  12  15   4   4   3   3   0   0  1.36     5   7.5 1.33    5 7.75  1.4 4.65  6.9 1.38 5.13 8.28 1.35  4.8   7  1.4  4.8  7.5    36  1.44  1.37  5.35  4.76   9.5  7.42   31     1.66      1.6     2.35     2.27   18  -1.5    2.21    2.11    1.78    1.72 1.35 5.46 9.11
## 4  T1 11/8/2018   Rizespor            Kasimpasa    2    3   A    1    0   H   8  14   5   6  13  14   3   6   2   1   0   0   2.3   3.4     3  2.4  3.1  2.9 2.35 3.35 2.85 2.44 3.52 2.89  2.3  3.5 2.7 2.38  3.4 2.88    35   2.5  2.33  3.64  3.36  3.05  2.86   30     1.86     1.77     2.11        2   17     0    1.77    1.72    2.19    2.11 2.03 3.85 3.56
## 5  T1 11/8/2018  Sivasspor           Alanyaspor    1    0   H    0    0   D  12   9   5   3  13  16   4   7   1   3   0   0  2.25  3.25   3.2 2.15  3.4  3.1 2.15  3.5 3.05 2.18 3.77 3.17  2.1  3.7 2.9  2.2  3.5 3.13    35   2.3  2.18  3.77  3.46   3.3  3.08   32      1.8     1.73     2.17     2.06   18 -0.25    1.98    1.91    1.97    1.91 1.95 3.74 3.95
## 6  T1 12/8/2018   Besiktas Akhisar Belediyespor    2    1   H    2    0   H  13   8   4   4  17  16  12   3   3   6   0   0   1.4     5   6.5 1.33 5.25  7.5 1.43  4.5  6.3 1.42  4.9  7.5  1.4 4.75   6  1.4 4.75    7    36  1.47  1.41  5.25  4.62   8.7  6.69   31     1.64     1.57     2.38     2.32   18    -1     1.7    1.66     2.3     2.2 1.44 4.85 7.49
  • We have new variables like HS = Home Team Shots, AST = Away Team Shots on Target, HFKC = Home Team Free Kicks Conceded, HBP = Home Team Bookings Points (10 = yellow, 25 = red) and new bet companies’ odds

2.2 Organizing Our Data

  • We will organize our data in different ways for each of our analysis.
  • We will row bind all our csvs with mutating a year column which tells us which season it is.
  • We will define a column_name standart to all columns that increase readibility.
  • We will replace N/A values with 0’s. But in some analyses we will filter rows with zero values to.
  • We will tidy up the date column. During our very first examination, we came across some date format anomalies

2.3 Data Validation

  • Since we have all match results we are going to create half-season and end-season league tables. Then we are going to compare the data with Turkish Football Federation archives. We will use rvest package to export the data from TFF page. If we encounter any data anomalies we will tidy up that data.

2.4 Analysis Objectives

With this data infinite number of analyses can be placed. Below you can find our analysis objectives

  • Analyzing match point and total goal score correlation between seasons. Is there a trend?
  • Analyzing cumulative team success over seasons in terms of different variables Which team is the most successfull?
  • Finding the correlation between bet odds and match results, we will utilize data of different bet companies in order to find mean value of bet odds. Which bet odd occurs in which rate?
  • Visualizing data to form dynamic league standings covering 24 years, in week precision for each year. We will develop a shiny application for this task.
  • Analyzing the secondary variables with the math results. These variables can be explained as shoot statistics, free-kick statistics, booking points.
  • Creating some geographical charts about which city has more teams in Super League. We are going to find team/city geo location data for this task
  • Analyzing the statistics that which team is more profitable according to bet odds and match results
  • Analyzing total point / championship count relations
  • Analyzing total goals scored / month-year statistics
  • And many more..

2.5 Visualization Objectives

  • We are going to use dplyr, tidyverse, ggplot and many more libraries to visualize our analyses.
  • We are going to create box plots, pie charts, diagrams, bar charts and many more visualization components to visualize our analyses.
  • We are going to develop one shiny application.

3. Conclusion And References

We gathered our data from football.co.uk site

By this link you can see the the column definitions.

You can explore our raw data below:
Season 18-19       Season 17-18       Season 16-17       Season 15-16       Season 14-15       Season 13-14
Season 12-13       Season 11-12       Season 10-11       Season 09-10       Season 08-09       Season 07-08
Season 06-07       Season 05-06       Season 04-05       Season 03-04       Season 02-03       Season 01-02
Season 00-01       Season 99-00       Season 98-99       Season 97-98       Season 96-97       Season 95-96
Season 94-95