Brief Description of Data Set & Data Acquisition Process

Turkey Electronic Fund Trading Platform (TEFAS) is the centralized electronic transaction environment for mutual and pension funds in Turkey. TEFAS is run by Takasbank which provides the central clearing, settlement and custody services for all types of capital markets and exchanges in Turkey. TEFAS also provides extensive 5 years of historical data for traded funds.In November 2020, total trading volume of the platform was 14.64 Billion TRY (http://fundturkey.com.tr/IstatistikiRaporlar/ToplamIslemHacmi.aspx).

At the initial import we have gathered 5 years data of 693 funds founded by 51 companies, 901893 rows and 35 columns.

Source Data

We have downloaded historical data of funds from https://www.tefas.gov.tr/TarihselVeriler.aspx page.

Pension funds and mutual funds are downloaded separately.

Due to the restrictions of the platform and sizes of data files, we had to query in 6 files.

Files we in xlsx format, each have 2 sheets: * Genel Bilgiler: date, unique fund code, fund names, daily price, total value, shares and investors data. Total 7 columns. * Portföy Dağılımı: date, unique fund code, fund names and daily portfolio allocation of each fund. These are daily percentages of 26 asset types. Total 29 columns.

We spoted significant data coverage differences between English and Turkish versions of data files. Therefore we used Turkish versions and then translated them to English during process. Fund names were the same, Turkish in english version. We did not change them to keep the integrityof data.

These data files unfortunately do not include fund category which is crucial for any analysis. We downloaded price comprison data from TEFAS web site which helped us match the fund categories and funds codes.

We downloaded 6 x 2 +1 = 13 xlsx files from TEFAs website on 16 th of November. Total size is 112MB.

We imported these files and combined them as a single dataset to be ready for analysis. Combined RDS file size reached 31MB

You can review our raw data, exports and import steps at out google drive directory. Github has 25MB data upload limit therefore we can not share them from github: https://drive.google.com/drive/folders/18wJX4YufJGfuj9KWUV3YfNmxSYmhJ8WN?usp=sharing

Data Import Process Summary

Our steps are: 1) Import price data (“genel bilgiler” sheet) of each mutual fund file to data frames 2) Combine each data frame in single frame 3) Import price data (“genel bilgiler” sheet) of each pension fund file to data frames 4) Combine each pension fund price data frame in a single frame 5) combine pension and mutual fund price data frames in a single data frame 6) Repeat above steps for portfolio data (“Porföy Dağılımı” sheet) 7) Combine portfolio data and price data sheets 8) Import fund comparison file as data frame and add fund category field to our main data set joining by fund codes. 9) Export imported combined data file as RDS for eash analysis in nex steps.

Data Import Steps in Details

We will first import price data which is at the “Genel Bilgiler” sheet of the excel files.

#import price data for each individual file

df_mk_price_2015 <- read_excel("MK-2015.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 9,494
## Columns: 7
## $ TARİH                    <dttm> 2015-12-31, 2015-12-31, 2015-12-31, 2015-...
## $ `FON KODU`               <chr> "AAK", "AAL", "AAS", "AAV", "ABU", "ACD", ...
## $ `FON ADI`                <chr> "ATA PORTFÖY ÇOKLU VARLIK DEĞİŞKEN FON", "...
## $ FİYAT                    <dbl> 17.950164, 0.337465, 0.012932, 0.013275, 0...
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 123328, 140641707, 3016220757, 2251610072,...
## $ `KİŞİ SAYISI`            <dbl> 276, 2826, 0, 9, 0, 23, 42, 31170, 630, 29...
## $ `FON TOPLAM DEĞER`       <dbl> 2213758, 47461708, 39004667, 29890591, 938...
df_mk_price_2016 <- read_excel("MK-2016.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 69,043
## Columns: 7
## $ TARİH                    <dttm> 2016-12-30, 2016-12-30, 2016-12-30, 2016-...
## $ `FON KODU`               <chr> "AAK", "AAL", "AAS", "AAV", "ABU", "ACC", ...
## $ `FON ADI`                <chr> "ATA PORTFÖY ÇOKLU VARLIK DEĞİŞKEN FON", "...
## $ FİYAT                    <dbl> 19.748318, 0.367402, 0.015010, 0.015470, 0...
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 140460, 130787262, 4476035063, 2194876119,...
## $ `KİŞİ SAYISI`            <dbl> 276, 2826, 0, 9, 0, 22, 23, 42, 18, 7, 0, ...
## $ `FON TOPLAM DEĞER`       <dbl> 2773849, 48051546, 67184138, 33954333, 148...
df_mk_price_2017 <- read_excel("MK-2017.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 77,017
## Columns: 7
## $ TARİH                    <dttm> 2017-12-29, 2017-12-29, 2017-12-29, 2017-...
## $ `FON KODU`               <chr> "AAK", "AAL", "AAS", "AAV", "ABU", "ACC", ...
## $ `FON ADI`                <chr> "ATA PORTFÖY ÇOKLU VARLIK DEĞİŞKEN FON", "...
## $ FİYAT                    <dbl> 23.928582, 0.408481, 0.016698, 0.023984, 0...
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 966558, 222093869, 2294227597, 1189938819,...
## $ `KİŞİ SAYISI`            <dbl> 276, 2826, 0, 9, 0, 22, 23, 42, 18, 7, 0, ...
## $ `FON TOPLAM DEĞER`       <dbl> 23128362, 90721082, 38308763, 28539534, 13...
df_mk_price_2018 <- read_excel("MK-2018.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 88,122
## Columns: 7
## $ TARİH                    <dttm> 2018-12-31, 2018-12-31, 2018-12-31, 2018-...
## $ `FON KODU`               <chr> "AAK", "AAL", "AAS", "AAV", "ABU", "ACC", ...
## $ `FON ADI`                <chr> "ATA PORTFÖY ÇOKLU VARLIK DEĞİŞKEN FON", "...
## $ FİYAT                    <dbl> 27.885721, 0.477540, 0.018717, 0.021038, 0...
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 691191, 149271091, 97993059, 2300138319, 7...
## $ `KİŞİ SAYISI`            <dbl> 276, 2826, 0, 9, 0, 22, 23, 42, 4, 18, 0, ...
## $ `FON TOPLAM DEĞER`       <dbl> 19274359, 71282888, 1834153, 48390509, 120...
df_mk_price_2019 <- read_excel("MK-2019.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 104,344
## Columns: 7
## $ TARİH                    <dttm> 2019-12-31, 2019-12-31, 2019-12-31, 2019-...
## $ `FON KODU`               <chr> "AAK", "AAL", "AAS", "AAV", "ABU", "ACC", ...
## $ `FON ADI`                <chr> "ATA PORTFÖY ÇOKLU VARLIK DEĞİŞKEN FON", "...
## $ FİYAT                    <dbl> 35.659666, 0.575133, 0.025381, 0.031288, 0...
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 478336, 194111447, 129719961, 2868807200, ...
## $ `KİŞİ SAYISI`            <dbl> 301, 3020, 7, 11, 117, 408, 30, 49, 24, 18...
## $ `FON TOPLAM DEĞER`       <dbl> 17057302, 111639957, 3266979, 89759544, 11...
df_mk_price_2020 <- read_excel("MK-2020.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 117,828
## Columns: 7
## $ TARİH                    <dttm> 2020-11-16, 2020-11-16, 2020-11-16, 2020-...
## $ `FON KODU`               <chr> "AAK", "AAL", "AAS", "AAV", "ABU", "ACC", ...
## $ `FON ADI`                <chr> "ATA PORTFÖY ÇOKLU VARLIK DEĞİŞKEN FON", "...
## $ FİYAT                    <dbl> 41.390035, 0.621141, 0.031061, 0.039120, 0...
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 1897355, 316664117, 3011805055, 4510901300...
## $ `KİŞİ SAYISI`            <dbl> 432, 3249, 97, 34, 106, 214, 65, 59, 14, 3...
## $ `FON TOPLAM DEĞER`       <dbl> 78531591, 196693215, 93548661, 176465576, ...
# Append data frames and add a column whether the data is for mutual or pension fund

df_mk_price_all <- bind_rows(df_mk_price_2015, df_mk_price_2016, df_mk_price_2017, df_mk_price_2018, df_mk_price_2019, df_mk_price_2020) %>% mutate(fund_type="mutual")

We have downloaded historical data of mutual funds from https://www.tefas.gov.tr/TarihselVeriler.aspx Due to the restriction we had to download in 6 files. We will first import price data which is at the “Genel Bilgiler” sheet of the excel files.

#import price data for each individual file


df_bes_price_2015 <- read_excel("BES-2015.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 8,892
## Columns: 7
## $ TARİH                    <dttm> 2015-12-31, 2015-12-31, 2015-12-31, 2015-...
## $ `FON KODU`               <chr> "ABE", "AEA", "AEB", "AEC", "AEE", "AEG", ...
## $ `FON ADI`                <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B.R.I.C.ÜLKEL...
## $ FİYAT                    <dbl> 0.012212, 0.011026, 0.033588, 0.011194, 0....
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 3876175452, 10761804554, 2199655566, 12782...
## $ `KİŞİ SAYISI`            <dbl> 25887, 97857, 13269, 101, 5030, 20268, 595...
## $ `FON TOPLAM DEĞER`       <dbl> 47336763, 118662196, 73881771, 14309129, 5...
df_bes_price_2016 <- read_excel("BES-2016.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 57,978
## Columns: 7
## $ TARİH                    <dttm> 2016-12-30, 2016-12-30, 2016-12-30, 2016-...
## $ `FON KODU`               <chr> "ABE", "AEA", "AEB", "AEC", "AEE", "AEG", ...
## $ `FON ADI`                <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B.R.I.C.ÜLKEL...
## $ FİYAT                    <dbl> 0.018869, 0.013991, 0.037130, 0.012481, 0....
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 5486436554, 21902809585, 2137350621, 11214...
## $ `KİŞİ SAYISI`            <dbl> 28354, 139172, 13513, 121, 10477, 19967, 6...
## $ `FON TOPLAM DEĞER`       <dbl> 103525695, 306452272, 79360891, 13996462, ...
df_bes_price_2017 <- read_excel("BES-2017.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 81,489
## Columns: 7
## $ TARİH                    <dttm> 2017-12-29, 2017-12-29, 2017-12-29, 2017-...
## $ `FON KODU`               <chr> "AAJ", "ABE", "ACV", "AEA", "AEB", "AEC", ...
## $ `FON ADI`                <chr> "AVİVASA EMEKLİLİK VE HAYAT A.Ş. OKS STAND...
## $ FİYAT                    <dbl> 0.000000, 0.024844, 0.010828, 0.016751, 0....
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 0, 7669852085, 4516776022, 40088209536, 17...
## $ `KİŞİ SAYISI`            <dbl> 0, 41801, 132587, 226458, 15240, 109, 1160...
## $ `FON TOPLAM DEĞER`       <dbl> 0, 190552279, 48908982, 671531877, 1011656...
df_bes_price_2018 <- read_excel("BES-2018.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 99,372
## Columns: 7
## $ TARİH                    <dttm> 2018-12-31, 2018-12-31, 2018-12-31, 2018-...
## $ `FON KODU`               <chr> "AAJ", "ABE", "ACV", "AEA", "AEB", "AEC", ...
## $ `FON ADI`                <chr> "AVİVASA EMEKLİLİK VE HAYAT A.Ş. OKS STAND...
## $ FİYAT                    <dbl> 0.010769, 0.031002, 0.012418, 0.022672, 0....
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 17414928883, 9965661970, 2306946443, 62860...
## $ `KİŞİ SAYISI`            <dbl> 127861, 69655, 103259, 324081, 13978, 81, ...
## $ `FON TOPLAM DEĞER`       <dbl> 187539967, 308959287, 28646529, 1425149500...
df_bes_price_2019 <- read_excel("BES-2019.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 99,349
## Columns: 7
## $ TARİH                    <dttm> 2019-12-31, 2019-12-31, 2019-12-31, 2019-...
## $ `FON KODU`               <chr> "AAJ", "ABE", "ACV", "AEA", "AEB", "AEC", ...
## $ `FON ADI`                <chr> "AVİVASA EMEKLİLİK VE HAYAT A.Ş. OKS STAND...
## $ FİYAT                    <dbl> 0.013131, 0.043423, 0.014780, 0.030045, 0....
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 30025100624, 11257973032, 2220544587, 9327...
## $ `KİŞİ SAYISI`            <dbl> 264467, 82051, 111598, 407749, 13451, 76, ...
## $ `FON TOPLAM DEĞER`       <dbl> 394249748, 488855346, 32819762, 2802411409...
df_bes_price_2020 <- read_excel("BES-2020.xlsx", sheet = "Genel Bilgiler", col_types = c("date", "text", "text", "numeric", "numeric", "numeric", "numeric")) %>% glimpse()
## Rows: 88,476
## Columns: 7
## $ TARİH                    <dttm> 2020-11-16, 2020-11-16, 2020-11-16, 2020-...
## $ `FON KODU`               <chr> "AAJ", "ABE", "ACV", "AEA", "AEB", "AEC", ...
## $ `FON ADI`                <chr> "AVİVASA EMEKLİLİK VE HAYAT A.Ş. OKS STAND...
## $ FİYAT                    <dbl> 0.014412, 0.046772, 0.016043, 0.047959, 0....
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 35933708934, 12476111398, 2269093580, 1404...
## $ `KİŞİ SAYISI`            <dbl> 366643, 107295, 103359, 526168, 13023, 72,...
## $ `FON TOPLAM DEĞER`       <dbl> 517888264, 583533145, 36404063, 6734531428...
# Append data frames and add a column whether the data is for mutual or pension fund

df_bes_price_all <- bind_rows(df_bes_price_2015, df_bes_price_2016, df_bes_price_2017, df_bes_price_2018, df_bes_price_2019, df_bes_price_2020) %>% mutate(fund_type="pension")

Now we will combine pension and mutual fund historical price data at a single data frame

df_price_all <- bind_rows(df_bes_price_all, df_mk_price_all) %>%glimpse() 
## Rows: 901,404
## Columns: 8
## $ TARİH                    <dttm> 2015-12-31, 2015-12-31, 2015-12-31, 2015-...
## $ `FON KODU`               <chr> "ABE", "AEA", "AEB", "AEC", "AEE", "AEG", ...
## $ `FON ADI`                <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B.R.I.C.ÜLKEL...
## $ FİYAT                    <dbl> 0.012212, 0.011026, 0.033588, 0.011194, 0....
## $ `TEDAVÜLDEKİ PAY SAYISI` <dbl> 3876175452, 10761804554, 2199655566, 12782...
## $ `KİŞİ SAYISI`            <dbl> 25887, 97857, 13269, 101, 5030, 20268, 595...
## $ `FON TOPLAM DEĞER`       <dbl> 47336763, 118662196, 73881771, 14309129, 5...
## $ fund_type                <chr> "pension", "pension", "pension", "pension"...

We will import fund portfolio allocation data for mutual funds from the “Portföy Dağılımı” sheet, and combine them just like we did for price data.

df_mk_portfolio_2015 <- read_excel("MK-2015.xlsx", sheet = "Portföy Dağılımı")
df_mk_portfolio_2016 <- read_excel("MK-2016.xlsx", sheet = "Portföy Dağılımı")
df_mk_portfolio_2017 <- read_excel("MK-2017.xlsx", sheet = "Portföy Dağılımı")
df_mk_portfolio_2018 <- read_excel("MK-2018.xlsx", sheet = "Portföy Dağılımı")
df_mk_portfolio_2019 <- read_excel("MK-2019.xlsx", sheet = "Portföy Dağılımı")
df_mk_portfolio_2020 <- read_excel("MK-2020.xlsx", sheet = "Portföy Dağılımı")

df_mk_portfolio_all <- bind_rows(df_mk_portfolio_2015, df_mk_portfolio_2016, df_mk_portfolio_2017, df_mk_portfolio_2018, df_mk_portfolio_2019, df_mk_portfolio_2020) %>% mutate(fund_type="mutual") %>%glimpse()
## Rows: 358,240
## Columns: 30
## $ Tarih                                 <dttm> 2015-12-31, 2015-12-31, 2015...
## $ `Fon Kodu`                            <chr> "AAV", "ACD", "ACK", "ADE", "...
## $ `Fon Adı`                             <chr> "ATA PORTFÖY İKİNCİ HİSSE SEN...
## $ `Banka Bonosu (%)`                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Diğer (%)`                           <dbl> 1.70, 3.12, 2.09, 10.19, 0.00...
## $ `Döviz Ödemeli Bono (%)`              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Devlet Tahvili (%)`                  <dbl> 0.00, 21.10, 19.35, 39.00, 1....
## $ `Dövize Ödemeli Tahvil (%)`           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Eurobonds (%)`                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Finansman Bonosu (%)`                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Fon Katılma Belgesi (%)`             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Gayrı Menkul Sertifikası (%)`        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Hazine Bonosu (%)`                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Hisse Senedi (%)`                    <dbl> 98.30, 46.03, 77.65, 0.11, 98...
## $ `Kamu Dış Borçlanma Araçları (%)`     <dbl> 0.00, 0.00, 0.00, 0.00, 0.00,...
## $ `Katılım Hesabı (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Kamu Kira Sertifikaları (%)`         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Kıymetli Madenler (%)`               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Özel Sektör Kira Sertifikaları (%)`  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Özel Sektör Tahvili (%)`             <dbl> 0.00, 14.22, 0.00, 29.86, 0.0...
## $ `Repo (%)`                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Türev Araçları (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `TPP (%)`                             <dbl> 0.00, 15.53, 0.91, 0.38, 2.16...
## $ `Ters-Repo (%)`                       <dbl> 0.00, 0.00, 0.00, 0.00, 0.00,...
## $ `Varlığa Dayalı Menkul Kıymetler (%)` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Vadeli Mevduat (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Borçlanma Aracı (%)`         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Hisse Senedi (%)`            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Menkul Kıymet (%)`           <dbl> 0.00, 0.00, 0.00, 0.00, 0.00,...
## $ fund_type                             <chr> "mutual", "mutual", "mutual",...

Now we will import pension fund portfolio allocation data for pension funds from the “Portföy Dağılımı” sheet, and combine them

df_bes_portfolio_2015 <- read_excel("BES-2015.xlsx", sheet = "Portföy Dağılımı")
df_bes_portfolio_2016 <- read_excel("BES-2016.xlsx", sheet = "Portföy Dağılımı")
df_bes_portfolio_2017 <- read_excel("BES-2017.xlsx", sheet = "Portföy Dağılımı")
df_bes_portfolio_2018 <- read_excel("BES-2018.xlsx", sheet = "Portföy Dağılımı")
df_bes_portfolio_2019 <- read_excel("BES-2019.xlsx", sheet = "Portföy Dağılımı")
df_bes_portfolio_2020 <- read_excel("BES-2020.xlsx", sheet = "Portföy Dağılımı")

df_bes_portfolio_all <- bind_rows(df_bes_portfolio_2015, df_bes_portfolio_2016, df_bes_portfolio_2017, df_bes_portfolio_2018, df_bes_portfolio_2019, df_bes_portfolio_2020) %>% mutate(fund_type="pension") %>% glimpse()
## Rows: 415,467
## Columns: 30
## $ Tarih                                 <dttm> 2015-12-31, 2015-12-31, 2015...
## $ `Fon Kodu`                            <chr> "ABE", "AEA", "AEB", "AEC", "...
## $ `Fon Adı`                             <chr> "ANADOLU HAYAT EMEKLILIK A.Ş....
## $ `Banka Bonosu (%)`                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Diğer (%)`                           <dbl> 3.02, 94.31, 1.90, 0.39, 1.79...
## $ `Döviz Ödemeli Bono (%)`              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Devlet Tahvili (%)`                  <dbl> 0.66, 0.00, 0.84, 60.90, 66.8...
## $ `Dövize Ödemeli Tahvil (%)`           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Eurobonds (%)`                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Finansman Bonosu (%)`                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Fon Katılma Belgesi (%)`             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Gayrı Menkul Sertifikası (%)`        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Hazine Bonosu (%)`                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Hisse Senedi (%)`                    <dbl> 0.00, 0.88, 82.55, 4.76, 29.4...
## $ `Kamu Dış Borçlanma Araçları (%)`     <dbl> 0.00, 0.00, 0.00, 11.09, 0.00...
## $ `Katılım Hesabı (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Kamu Kira Sertifikaları (%)`         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Kıymetli Madenler (%)`               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Özel Sektör Kira Sertifikaları (%)`  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Özel Sektör Tahvili (%)`             <dbl> 0.00, 4.81, 0.00, 22.78, 0.00...
## $ `Repo (%)`                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Türev Araçları (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `TPP (%)`                             <dbl> 0.00, 0.00, 6.61, 0.08, 0.00,...
## $ `Ters-Repo (%)`                       <dbl> 0.40, 0.00, 8.10, 0.00, 1.86,...
## $ `Varlığa Dayalı Menkul Kıymetler (%)` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Vadeli Mevduat (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Borçlanma Aracı (%)`         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Hisse Senedi (%)`            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Menkul Kıymet (%)`           <dbl> 95.92, 0.00, 0.00, 0.00, 0.00...
## $ fund_type                             <chr> "pension", "pension", "pensio...

combine pension and mutual fund portfolio data

df_portfolio_all <- bind_rows(df_bes_portfolio_all, df_mk_portfolio_all) %>% select(-"Fon Adı")%>% rename(TARİH=Tarih, "FON KODU"="Fon Kodu")%>% glimpse()
## Rows: 773,707
## Columns: 29
## $ TARİH                                 <dttm> 2015-12-31, 2015-12-31, 2015...
## $ `FON KODU`                            <chr> "ABE", "AEA", "AEB", "AEC", "...
## $ `Banka Bonosu (%)`                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Diğer (%)`                           <dbl> 3.02, 94.31, 1.90, 0.39, 1.79...
## $ `Döviz Ödemeli Bono (%)`              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Devlet Tahvili (%)`                  <dbl> 0.66, 0.00, 0.84, 60.90, 66.8...
## $ `Dövize Ödemeli Tahvil (%)`           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Eurobonds (%)`                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Finansman Bonosu (%)`                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Fon Katılma Belgesi (%)`             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Gayrı Menkul Sertifikası (%)`        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Hazine Bonosu (%)`                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Hisse Senedi (%)`                    <dbl> 0.00, 0.88, 82.55, 4.76, 29.4...
## $ `Kamu Dış Borçlanma Araçları (%)`     <dbl> 0.00, 0.00, 0.00, 11.09, 0.00...
## $ `Katılım Hesabı (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Kamu Kira Sertifikaları (%)`         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Kıymetli Madenler (%)`               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Özel Sektör Kira Sertifikaları (%)`  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Özel Sektör Tahvili (%)`             <dbl> 0.00, 4.81, 0.00, 22.78, 0.00...
## $ `Repo (%)`                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Türev Araçları (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `TPP (%)`                             <dbl> 0.00, 0.00, 6.61, 0.08, 0.00,...
## $ `Ters-Repo (%)`                       <dbl> 0.40, 0.00, 8.10, 0.00, 1.86,...
## $ `Varlığa Dayalı Menkul Kıymetler (%)` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Vadeli Mevduat (%)`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Borçlanma Aracı (%)`         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Hisse Senedi (%)`            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Yabancı Menkul Kıymet (%)`           <dbl> 95.92, 0.00, 0.00, 0.00, 0.00...
## $ fund_type                             <chr> "pension", "pension", "pensio...

We can merge historical portfolio and historical price data in a single data frame, during this process we have simplified and streamlined variable names.

df_all = full_join (df_price_all, df_portfolio_all)%>% rename(date=TARİH, 
                     code=`FON KODU`,
                     name=`FON ADI`,
                     price=FİYAT,
                     shares=`TEDAVÜLDEKİ PAY SAYISI`,
                     people=`KİŞİ SAYISI`,
                     total_value=`FON TOPLAM DEĞER`,
                     p_Bank_Bills=`Banka Bonosu (%)`,
                     p_Other=`Diğer (%)`,
                     p_FX_Payable_Bills=`Döviz Ödemeli Bono (%)`,
                     p_Government_Bond=`Devlet Tahvili (%)`,
                     p_Foreign_Currency_Bills=`Dövize Ödemeli Tahvil (%)`,
                     p_Eurobonds=`Eurobonds (%)`,
                     p_Commercial_Paper=`Finansman Bonosu (%)`,
                     p_Fund_Participation_Certificate=`Fon Katılma Belgesi (%)`,
                     p_Real_Estate_Certificate=`Gayrı Menkul Sertifikası (%)`,
                     p_Treasury_Bill=`Hazine Bonosu (%)`,
                     p_Stock=`Hisse Senedi (%)`,
                     p_Government_Bonds_and_Bills_FX=`Kamu Dış Borçlanma Araçları (%)`,
                     p_Participation_Account=`Katılım Hesabı (%)`,
                     p_Government_Lease_Certificates=`Kamu Kira Sertifikaları (%)`,
                     p_Precious_Metals=`Kıymetli Madenler (%)`,
                     p_Private_Sector_Lease_Certificates=`Özel Sektör Kira Sertifikaları (%)`,
                     p_Private_Sector_Bond=`Özel Sektör Tahvili (%)`,
                     p_Repo=`Repo (%)`,
                     p_Derivatives=`Türev Araçları (%)`,
                     p_TMM=`TPP (%)`,
                     p_Reverse_Repo=`Ters-Repo (%)`,
                     p_Asset_Backed_Securities=`Varlığa Dayalı Menkul Kıymetler (%)`,
                     p_Term_Deposit=`Vadeli Mevduat (%)`,
                     p_Foreign_Debt_Instruments=`Yabancı Borçlanma Aracı (%)`,
                     p_Foreign_Equity=`Yabancı Hisse Senedi (%)`,
                     p_Foreign_Securities=`Yabancı Menkul Kıymet (%)`)%>%glimpse()
## Joining, by = c("TARİH", "FON KODU", "fund_type")
## Rows: 901,894
## Columns: 34
## $ date                                <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code                                <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ name                                <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price                               <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares                              <dbl> 3876175452, 10761804554, 219965...
## $ people                              <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value                         <dbl> 47336763, 118662196, 73881771, ...
## $ fund_type                           <chr> "pension", "pension", "pension"...
## $ p_Bank_Bills                        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other                             <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond                   <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock                             <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX     <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond               <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM                               <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo                      <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities                <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...

Unfortunaley fund category data is not included at this file. TEFAS shares this data at another excel file import, fund comparison report. We downloaded this report from the English version of the site to get the official English translations of the categories: http://fundturkey.com.tr/FonKarsilastirma.aspx Pension and mutual funds have separate imports.

Let’s import fund comparison file to combine " fund category" data to our data set

df_BES_compare <- read_excel("BES-compare-EN.xlsx", sheet = "Return-Based")%>%rename(code=`Fund Code`, category=`Fund Type`) %>% select(code,category)%>%glimpse()
## Rows: 404
## Columns: 2
## $ code     <chr> "KEH", "AGH", "KEZ", "VYB", "HHM", "FEF", "AEL", "HEA", "Z...
## $ category <chr> "Stock Fund", "Stock Fund", "Variable Fund", "Stock Fund",...
df_MK_compare <- read_excel("MK-compare-EN.xlsx", sheet = "Return-Based")%>%rename(code=`Fund Code`, category=`Umbrella Fund Type`) %>%select(code,category)%>%glimpse()
## Rows: 592
## Columns: 2
## $ code     <chr> "HDH", "HPF", "IBG", "FYA", "TTE", "DPU", "IAR", "TKF", "M...
## $ category <chr> "Hedge Umbrella Fund", "Hedge Umbrella Fund", "Hedge Umbre...
df_compare_all <- bind_rows(df_BES_compare, df_MK_compare) %>% glimpse()
## Rows: 996
## Columns: 2
## $ code     <chr> "KEH", "AGH", "KEZ", "VYB", "HHM", "FEF", "AEL", "HEA", "Z...
## $ category <chr> "Stock Fund", "Stock Fund", "Variable Fund", "Stock Fund",...

now lets combine them to the main data frame, also change the order of important variables close

df_import=left_join(df_all,df_compare_all)%>%relocate(category, .after=code)%>%relocate(fund_type, .after=code)%>% glimpse()
## Joining, by = "code"
## Rows: 901,894
## Columns: 35
## $ date                                <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code                                <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type                           <chr> "pension", "pension", "pension"...
## $ category                            <chr> "Variable Fund", "Gold Fund", "...
## $ name                                <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price                               <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares                              <dbl> 3876175452, 10761804554, 219965...
## $ people                              <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value                         <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills                        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other                             <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond                   <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock                             <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX     <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond               <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM                               <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo                      <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities                <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...

We have completed our first milestone. Our last step is to export imported data not to repeat all those long steps for future work and better team co-operation.

We tested 3 formats: rds, csv, xlsx

saveRDS(df_import, "df_import.rds")
write.csv(df_import, "df_import.csv", fileEncoding = "UTF-8") #specifying encoding is crucial to import Turkish caharacters in .csv files.
write_xlsx(df_import, "df_import.xlsx")#writexl great library utilising C for efficient xlsx uploads

RDS provided faster export and much smaller datafile compared with xlsx and csv. RDS: 31MB xlsx: 123MB csv: 203MB

We will delete bulky xlsx and csv exports. We will keep RDS for future work for efficiency.

You can review our raw data, exports and import steps at out google drive directory. Github has 25MB data upload limit therefore we can not share them from github: https://drive.google.com/drive/folders/18wJX4YufJGfuj9KWUV3YfNmxSYmhJ8WN?usp=sharing