At this stage, we first conducted usual checks for consistency, missing values and freshness of the imported data file thay we produced at the previous Preprocessing Step 1: Data Import & Consolidation. Then, removed obsolete rows from the data set.
We also added new variables which will help us better categorize the fund data at the analysis stage. * Automatic participation funds (Otomatik Katılım), government contribution funds (Devlet Katkısı) and participation funds (katılım fonları) were marked from the fund names. * We extracted founder company names and matched them with the funds as an additiona variable.
As result our initial imported data set total row number decreased from 901,894 to 826,638. total columns increased from 35 to 40. Total unique funds decreased from 993 to 920.
We have uploaded the df_clean.rds file and other material for this milestone to the google drive. Output data file was too big to be uploaded to github.
All data & RDS files can be accessed from this link: https://drive.google.com/drive/folders/1SJPoQqPOXHdLEUReML9zpWw7NaNrcJUY?usp=sharing
Let’s read imported data set. We have 901894 rows and 35 columns at the imported dataset.
df_import = readRDS("df_import.rds") %>% glimpse()
## Rows: 901,894
## Columns: 35
## $ date <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "Variable Fund", "Gold Fund", "...
## $ name <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares <dbl> 3876175452, 10761804554, 219965...
## $ people <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...
unique number of funds : 993
df_import %>% distinct(code) %>% summarise(count=n())
## # A tibble: 1 x 1
## count
## <int>
## 1 993
Unique number of funds by types and categories
df_import %>% group_by(fund_type, category)%>% distinct(code) %>% summarise(count=n())
## `summarise()` regrouping output by 'fund_type' (override with `.groups` argument)
## # A tibble: 32 x 3
## # Groups: fund_type [2]
## fund_type category count
## <chr> <chr> <int>
## 1 mutual Debt Securities Umbrella Fund 71
## 2 mutual Fund Of Funds Umbrella Fund 18
## 3 mutual Hedge Umbrella Fund 295
## 4 mutual Mixed Umbrella Fund 4
## 5 mutual Money Market Umbrella Fund 27
## 6 mutual Participation Umbrella Fund 32
## 7 mutual Precious Metals Umbrella Fund 11
## 8 mutual Stock Umbrella Fund 63
## 9 mutual Variable Umbrella Fund 67
## 10 mutual <NA> 1
## # ... with 22 more rows
A mutual fund has no category, it is “FPR”. This fund category was not in the data set
This fund has not any data, it should be an obsolete entry.
df_import %>% filter(is.na(category)==TRUE)%>%arrange(price)
## # A tibble: 488 x 35
## date code fund_type category name price shares people
## <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2018-12-31 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 2 2018-12-28 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 3 2018-12-27 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 4 2018-12-26 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 5 2018-12-25 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 6 2018-12-24 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 7 2018-12-21 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 8 2018-12-20 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 9 2018-12-19 00:00:00 FPR mutual <NA> <NA> NA NA NA
## 10 2018-12-18 00:00:00 FPR mutual <NA> <NA> NA NA NA
## # ... with 478 more rows, and 27 more variables: total_value <dbl>,
## # p_Bank_Bills <dbl>, p_Other <dbl>, p_FX_Payable_Bills <dbl>,
## # p_Government_Bond <dbl>, p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>
Let’s update our dataset and remove this fund altogether. 901894-488=901406 rows left
df_import=df_import %>% filter(code!="FPR")%>%glimpse()
## Rows: 901,406
## Columns: 35
## $ date <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "Variable Fund", "Gold Fund", "...
## $ name <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares <dbl> 3876175452, 10761804554, 219965...
## $ people <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...
Are any other NA values left in fund code, name or category fields? Let’s find out.
fund codes OK
df_import %>% filter(is.na(code)==TRUE)
## # A tibble: 0 x 35
## # ... with 35 variables: date <dttm>, code <chr>, fund_type <chr>,
## # category <chr>, name <chr>, price <dbl>, shares <dbl>, people <dbl>,
## # total_value <dbl>, p_Bank_Bills <dbl>, p_Other <dbl>,
## # p_FX_Payable_Bills <dbl>, p_Government_Bond <dbl>,
## # p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>
2 rows missing fund names, but their data seems obsolete too.
df_import %>% filter(is.na(name)==TRUE)
## # A tibble: 2 x 35
## date code fund_type category name price shares people
## <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2015-11-30 00:00:00 EIC mutual Variabl~ <NA> NA NA NA
## 2 2015-11-16 00:00:00 AHI mutual Stock U~ <NA> NA NA NA
## # ... with 27 more variables: total_value <dbl>, p_Bank_Bills <dbl>,
## # p_Other <dbl>, p_FX_Payable_Bills <dbl>, p_Government_Bond <dbl>,
## # p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>
We need to remove them from dataset 901406-2=901404 rows left
df_import=df_import %>% filter(is.na(name)==FALSE)%>%glimpse()
## Rows: 901,404
## Columns: 35
## $ date <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "Variable Fund", "Gold Fund", "...
## $ name <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares <dbl> 3876175452, 10761804554, 219965...
## $ people <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...
Let’s check price, no NA values
df_import %>% filter(is.na(price)==TRUE)
## # A tibble: 0 x 35
## # ... with 35 variables: date <dttm>, code <chr>, fund_type <chr>,
## # category <chr>, name <chr>, price <dbl>, shares <dbl>, people <dbl>,
## # total_value <dbl>, p_Bank_Bills <dbl>, p_Other <dbl>,
## # p_FX_Payable_Bills <dbl>, p_Government_Bond <dbl>,
## # p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>
Let’s check shares, no NA values
df_import %>% filter(is.na(shares)==TRUE)
## # A tibble: 0 x 35
## # ... with 35 variables: date <dttm>, code <chr>, fund_type <chr>,
## # category <chr>, name <chr>, price <dbl>, shares <dbl>, people <dbl>,
## # total_value <dbl>, p_Bank_Bills <dbl>, p_Other <dbl>,
## # p_FX_Payable_Bills <dbl>, p_Government_Bond <dbl>,
## # p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>
Let’s check dates, no NA values, no 0 values
df_import %>% filter(is.na(date)==TRUE|date==0)
## # A tibble: 0 x 35
## # ... with 35 variables: date <dttm>, code <chr>, fund_type <chr>,
## # category <chr>, name <chr>, price <dbl>, shares <dbl>, people <dbl>,
## # total_value <dbl>, p_Bank_Bills <dbl>, p_Other <dbl>,
## # p_FX_Payable_Bills <dbl>, p_Government_Bond <dbl>,
## # p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>
Are there any 0 price or share values? There are 68109 rows with 0 values
df_import%>%filter(price==0|shares==0|is.na(price)|is.na(shares))%>%relocate(date,code,price,shares,people)%>%arrange(desc(price))
## # A tibble: 68,109 x 35
## date code price shares people fund_type category name
## <dttm> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 2017-05-08 00:00:00 YAY 36.4 0 3103 mutual Stock U~ YAPI~
## 2 2015-12-07 00:00:00 AZB 0.0505 0 0 pension Variabl~ ALLI~
## 3 2015-12-07 00:00:00 AZO 0.0279 0 0 pension Mixed F~ ALLI~
## 4 2016-12-30 00:00:00 FEO 0 0 0 pension Variabl~ FİBA~
## 5 2016-12-30 00:00:00 FEV 0 0 0 pension Variabl~ FİBA~
## 6 2016-12-30 00:00:00 FEZ 0 0 0 pension Variabl~ FİBA~
## 7 2016-12-30 00:00:00 FYL 0 0 0 pension Partici~ AVİV~
## 8 2016-12-30 00:00:00 FYN 0 0 0 pension Partici~ AVİV~
## 9 2016-12-30 00:00:00 FYU 0 0 0 pension Partici~ ALLİ~
## 10 2016-12-30 00:00:00 FYY 0 0 0 pension Partici~ ALLİ~
## # ... with 68,099 more rows, and 27 more variables: total_value <dbl>,
## # p_Bank_Bills <dbl>, p_Other <dbl>, p_FX_Payable_Bills <dbl>,
## # p_Government_Bond <dbl>, p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>
We will remove 68109 rows with obsolete entries, 833295 rows left
df_import=df_import%>%filter(price>0 & shares>0 &!is.na(price)&!is.na(shares))%>%glimpse()
## Rows: 833,295
## Columns: 35
## $ date <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "Variable Fund", "Gold Fund", "...
## $ name <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares <dbl> 3876175452, 10761804554, 219965...
## $ people <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...
total number of funds decreased from 993 to 952 after removing obsolete data rows.
df_import%>%distinct(code)%>%count()
## # A tibble: 1 x 1
## n
## <int>
## 1 952
Are fund names unique among fund codes? Are there more than 1 name for a fund code? We checked them and did not find any problem.
df_import %>% group_by(code, name)%>% distinct(code) %>% summarise(count=n())%>%filter(count>1)%>%arrange(desc(count))
## `summarise()` regrouping output by 'code' (override with `.groups` argument)
## # A tibble: 0 x 3
## # Groups: code [0]
## # ... with 3 variables: code <chr>, name <chr>, count <int>
Are there any funds which are not active any more? Let’s check how many funds have data in the last day of our data set.
Out of the 952, only 920 funds have up to date price data.
df_import %>% filter(date==max(date))%>%glimpse()
## Rows: 920
## Columns: 35
## $ date <dttm> 2020-11-16, 2020-11-16, 2020-1...
## $ code <chr> "AAJ", "ABE", "ACV", "AEA", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "AES Standard Fund", "Variable ...
## $ name <chr> "AVİVASA EMEKLİLİK VE HAYAT A.Ş...
## $ price <dbl> 0.014412, 0.046772, 0.016043, 0...
## $ shares <dbl> 35933708934, 12476111398, 22690...
## $ people <dbl> 366643, 107295, 103359, 526168,...
## $ total_value <dbl> 517888264, 583533145, 36404063,...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 50.21, 0.00, 0.00, 0.00, 0.00, ...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0.00, 0.03, 0.00, 0.00, 0.00, 1...
## $ p_Commercial_Paper <dbl> 15.08, 0.00, 0.00, 0.00, 0.00, ...
## $ p_Fund_Participation_Certificate <dbl> 13.10, 4.02, 0.00, 0.00, 2.59, ...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 1.41, 0.00, 0.00, 0.00, 0.00, 0...
## $ p_Stock <dbl> 9.93, 0.00, 0.00, 0.00, 97.21, ...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0...
## $ p_Participation_Account <dbl> 0.00, 0.00, 55.79, 0.00, 0.00, ...
## $ p_Government_Lease_Certificates <dbl> 0.00, 0.00, 21.96, 84.05, 0.00,...
## $ p_Precious_Metals <dbl> 1.15, 0.00, 0.00, 15.73, 0.00, ...
## $ p_Private_Sector_Lease_Certificates <dbl> 0.00, 0.00, 22.25, 0.00, 0.00, ...
## $ p_Private_Sector_Bond <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0.00, 0.52, 0.00, 0.00, 0.20, 0...
## $ p_TMM <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0...
## $ p_Reverse_Repo <dbl> 1.94, 1.82, 0.00, 0.00, 0.00, 7...
## $ p_Asset_Backed_Securities <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0...
## $ p_Term_Deposit <dbl> 7.18, 0.00, 0.00, 0.00, 0.00, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0.00, 93.05, 0.00, 0.22, 0.00, ...
## $ p_Foreign_Securities <dbl> 0.00, 0.56, 0.00, 0.00, 0.00, 0...
Let’s locate these 32 funds, they have 6657 rows in the data set
df_uptodatefunds=df_import %>% filter(date==max(date))%>%distinct(code)#%>%glimpse()
df_inactive=anti_join(df_import,df_uptodatefunds)%>%glimpse()
## Joining, by = "code"
## Rows: 6,657
## Columns: 35
## $ date <dttm> 2015-12-01, 2015-12-01, 2016-1...
## $ code <chr> "KRS", "OSV", "AZZ", "AZZ", "KR...
## $ fund_type <chr> "mutual", "mutual", "mutual", "...
## $ category <chr> "Hedge Umbrella Fund", "Hedge U...
## $ name <chr> "KARE PORTFÖY SERBEST (DÖVİZ) F...
## $ price <dbl> 0.019549, 0.008313, 3.549100, 3...
## $ shares <dbl> 1080769072, 9032358776, 5685319...
## $ people <dbl> 0, 0, 0, 0, 4, 19, 4, 0, 0, 0, ...
## $ total_value <dbl> 21127451, 75087720, 20177765, 1...
## $ p_Bank_Bills <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Other <dbl> NA, NA, 82.09, 82.10, 9.89, 0.5...
## $ p_FX_Payable_Bills <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Government_Bond <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Foreign_Currency_Bills <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Eurobonds <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Commercial_Paper <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Fund_Participation_Certificate <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Real_Estate_Certificate <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Treasury_Bill <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Stock <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Government_Bonds_and_Bills_FX <dbl> NA, NA, 0.00, 0.00, 42.36, 54.2...
## $ p_Participation_Account <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Government_Lease_Certificates <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Precious_Metals <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Private_Sector_Lease_Certificates <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Private_Sector_Bond <dbl> NA, NA, 0.00, 0.00, 14.96, 1.14...
## $ p_Repo <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Derivatives <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_TMM <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Reverse_Repo <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Asset_Backed_Securities <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Term_Deposit <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Foreign_Debt_Instruments <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Foreign_Equity <dbl> NA, NA, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ p_Foreign_Securities <dbl> NA, NA, 17.91, 17.90, 32.79, 44...
Let’s check the total number of data points and date coverage for these funds. These are inactive funds all from umbrella hedge funds
df_inactive %>% group_by(code, fund_type, category) %>% summarise(count=n(), latest=max(date), earliest=min(date), length=max(date)-min(date))%>% arrange(desc(latest))
## `summarise()` regrouping output by 'code', 'fund_type' (override with `.groups` argument)
## # A tibble: 32 x 7
## # Groups: code, fund_type [32]
## code fund_type category count latest earliest length
## <chr> <chr> <chr> <int> <dttm> <dttm> <drtn>
## 1 UAP mutual Hedge U~ 116 2020-11-13 00:00:00 2020-06-01 00:00:00 165 d~
## 2 CEY mutual Hedge U~ 400 2020-11-10 00:00:00 2019-04-02 00:00:00 588 d~
## 3 CTP mutual Hedge U~ 78 2020-10-27 00:00:00 2020-07-07 00:00:00 112 d~
## 4 NBF mutual Hedge U~ 51 2020-10-27 00:00:00 2020-08-18 00:00:00 70 d~
## 5 IMV mutual Hedge U~ 286 2020-10-21 00:00:00 2019-09-04 00:00:00 413 d~
## 6 GDL mutual Hedge U~ 163 2020-10-16 00:00:00 2020-02-21 00:00:00 238 d~
## 7 IMY mutual Hedge U~ 519 2020-10-13 00:00:00 2018-09-13 00:00:00 761 d~
## 8 IMZ mutual Hedge U~ 343 2020-10-13 00:00:00 2019-05-28 00:00:00 504 d~
## 9 IMT mutual Hedge U~ 319 2020-10-09 00:00:00 2019-07-02 00:00:00 465 d~
## 10 ICG mutual Hedge U~ 272 2020-10-08 00:00:00 2019-09-11 00:00:00 393 d~
## # ... with 22 more rows
These are all “Serbest”/“hedge fund” types that are only accessible by “niteliklik yatırımcı”/“qualified investor” mostly institutional investors. This subset don’t have reliable data and behave very different than typical mutual funds which are accessible to general public. We will remove the ones with unreliable data from our analysis data set.
When we remove 6657 rows from our data set, we now have 826638 rows
df_import=semi_join(df_import,df_uptodatefunds)%>%glimpse()
## Joining, by = "code"
## Rows: 826,638
## Columns: 35
## $ date <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "Variable Fund", "Gold Fund", "...
## $ name <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares <dbl> 3876175452, 10761804554, 219965...
## $ people <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...
We need to add some additional data to better classify funds in our future analysis.
86 Pension Funds which has OKS in their names are “Otomatik Katılım sistemi” funds. these are special funds forf Automatic Participation system pension contracts.
165 funds which have “katılım” or “participation” either in their names or category are special funds which only invest in securities without interest returns.
28 Pension funds which have “katkı”, “devlet katkısı”, “contribution” or “state contribution” either in their names or category are special funds which government contributions are invested.
df_import=df_import%>%mutate(contribution=(grepl(" KATKI ",name)|grepl(" contribution ",category)), participation=(grepl(" KATILIM ",name)|grepl(" participation ",category)), OKS=grepl(" OKS ",name))%>%relocate(OKS, participation, contribution)%>%glimpse()
## Rows: 826,638
## Columns: 38
## $ OKS <lgl> FALSE, FALSE, FALSE, FALSE, FAL...
## $ participation <lgl> FALSE, TRUE, FALSE, FALSE, FALS...
## $ contribution <lgl> FALSE, FALSE, FALSE, FALSE, FAL...
## $ date <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "Variable Fund", "Gold Fund", "...
## $ name <chr> "ANADOLU HAYAT EMEKLILIK A.Ş.B....
## $ price <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares <dbl> 3876175452, 10761804554, 219965...
## $ people <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...
df_import%>%group_by(OKS,participation,contribution)%>%distinct(code)%>%summarize(n())
## `summarise()` regrouping output by 'OKS', 'participation' (override with `.groups` argument)
## # A tibble: 6 x 4
## # Groups: OKS, participation [4]
## OKS participation contribution `n()`
## <lgl> <lgl> <lgl> <int>
## 1 FALSE FALSE FALSE 698
## 2 FALSE FALSE TRUE 18
## 3 FALSE TRUE FALSE 108
## 4 FALSE TRUE TRUE 10
## 5 TRUE FALSE FALSE 49
## 6 TRUE TRUE FALSE 37
We don’t have the names or ids of portfolio management companies in our data set. We need to parse their names from th “name” field. We can also simplify the long names of the funds which would help us during visualisation stage
There are 920 unique funds.
df_import%>%distinct(name)%>%count()
## # A tibble: 1 x 1
## n
## <int>
## 1 920
From the TEFAS web site, names of the companies are inside the html code of the page. We just copied them to an excel file. Import the names of the owner companies.
df_companies=read_excel("companies.xlsx")
df_companies%>%arrange(company_name)
## # A tibble: 53 x 3
## company_code company_name company_type
## <chr> <chr> <chr>
## 1 PPY ACTUS PORTFÖY YÖNETİMİ A.Ş. mutual
## 2 ANM AEGON EMEKLİLİK VE HAYAT A.Ş. pension
## 3 AKP AK PORTFÖY YÖNETİMİ A.Ş. mutual
## 4 PGA ALBARAKA PORTFÖY YÖNETİMİ A.Ş. mutual
## 5 KHM ALLİANZ HAYAT VE EMEKLİLİK A.Ş. pension
## 6 HYS ALLIANZ YAŞAM VE EMEKLİLİK A.Ş. pension
## 7 AEM ANADOLU HAYAT EMEKLİLİK A.Ş. pension
## 8 APY ATA PORTFÖY YÖNETİMİ A.Ş. mutual
## 9 ALA ATLAS PORTFÖY YÖNETİMİ A.Ş. mutual
## 10 TLR AURA PORTFÖY YÖNETİMİ A.Ş. mutual
## # ... with 43 more rows
There are 53 entries. 36 of them mutual funds, and 17 of them are pension fund companies.
df_companies%>%group_by(company_type)%>%summarise(n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
## company_type `n()`
## <chr> <int>
## 1 mutual 36
## 2 pension 17
TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. has triple entries.Other company entries are unique.
df_companies%>%group_by(company_name)%>%summarise(count=n())%>%arrange(desc(count))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 51 x 2
## company_name count
## <chr> <int>
## 1 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. 3
## 2 ACTUS PORTFÖY YÖNETİMİ A.Ş. 1
## 3 AEGON EMEKLİLİK VE HAYAT A.Ş. 1
## 4 AK PORTFÖY YÖNETİMİ A.Ş. 1
## 5 ALBARAKA PORTFÖY YÖNETİMİ A.Ş. 1
## 6 ALLİANZ HAYAT VE EMEKLİLİK A.Ş. 1
## 7 ALLIANZ YAŞAM VE EMEKLİLİK A.Ş. 1
## 8 ANADOLU HAYAT EMEKLİLİK A.Ş. 1
## 9 ATA PORTFÖY YÖNETİMİ A.Ş. 1
## 10 ATLAS PORTFÖY YÖNETİMİ A.Ş. 1
## # ... with 41 more rows
TURKIYE HAYAT VE EMEKLILIK A.S. is the merger of 3 government owned Pension companies: ZEM: Ziraat Emeklilik VEM: Vakif Emeklilik HEM:Halk Emeklilik
For our initial analysis we will combine them under the code: “TEM” During our advanced phases of our analysis if we decide to dig deeper to compare performances of the original pension fund companies we may look for ways to differentiate them.In the fund names, ZEM, VEM and HEM are included, we can transform them if we need to.
df_companies%>%filter(company_name=="TÜRKİYE HAYAT VE EMEKLİLİK A.Ş.")
## # A tibble: 3 x 3
## company_code company_name company_type
## <chr> <chr> <chr>
## 1 ZEM TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. pension
## 2 VEM TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. pension
## 3 HEM TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. pension
df_import%>%distinct(name)%>%filter(grepl("VEM|ZEM|HEM",name))
## # A tibble: 75 x 1
## name
## <chr>
## 1 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM KATILIM KATKI EMEKLİLİK YATIRIM FONU
## 2 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM HİSSE SENEDİ EMEKLİLİK YATIRIM FONU
## 3 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM KATILIM STANDART EMEKLİLİK YATIRIM FONU
## 4 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM DİNAMİK DEĞİŞKEN EMEKLİLİK YATIRIM FONU
## 5 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM KAMU DIŞ BORÇLANMA ARAÇLARI EMEKLİLİK YA~
## 6 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM KAMU BORÇLANMA ARAÇLARI STANDART EMEKLİL~
## 7 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM KATILIM HİSSE SENEDİ EMEKLİLİK YATIRIM F~
## 8 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM KATILIM DİNAMİK DEĞİŞKEN EMEKLİLİK YATIR~
## 9 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM PARA PİYASASI EMEKLİLİK YATIRIM FONU
## 10 TÜRKİYE HAYAT VE EMEKLİLİK A.Ş. HEM KATKI EMEKLİLİK YATIRIM FONU
## # ... with 65 more rows
Let’s remove rows with VEM and ZEM then change HEM as TEM. Now we have 51 unique companies and codes
df_companies=df_companies%>%filter(company_code!="ZEM" & company_code!="VEM")%>%mutate(company_code=ifelse(company_code=="HEM","TEM",company_code))
df_companies%>%arrange(company_code)
## # A tibble: 51 x 3
## company_code company_name company_type
## <chr> <chr> <chr>
## 1 AEM ANADOLU HAYAT EMEKLİLİK A.Ş. pension
## 2 AHS AVİVASA EMEKLİLİK VE HAYAT A.Ş. pension
## 3 AHY AXA HAYAT VE EMEKLİLİK A.Ş. pension
## 4 AKP AK PORTFÖY YÖNETİMİ A.Ş. mutual
## 5 ALA ATLAS PORTFÖY YÖNETİMİ A.Ş. mutual
## 6 ANM AEGON EMEKLİLİK VE HAYAT A.Ş. pension
## 7 APY ATA PORTFÖY YÖNETİMİ A.Ş. mutual
## 8 ASH BEREKET EMEKLİLİK VE HAYAT A.Ş. pension
## 9 DEN METLİFE EMEKLİLİK VE HAYAT A.Ş. pension
## 10 EPY DENİZ PORTFÖY YÖNETİMİ A.Ş mutual
## # ... with 41 more rows
Can we parse the names of the companies from fund names? Data seems messy, names on fund name fields and company name fields are not consistent.
df_import%>%distinct(name)%>%filter(!grepl("A.Ş.|AŞ|PORTFÖY",name))
## # A tibble: 23 x 1
## name
## <chr>
## 1 ANADOLU HAYAT EMEKLİLİK KATKI EMEK.YAT.FONU
## 2 BNP PARİBAS CARDİF EMEKLİLİK ALTIN EMEKLİLİK YATIRIM FONU
## 3 FİBA EMEKLİLİK VE HAYAT OKS KATILIM STANDART EMEKLİLİK YATIRIM FONU
## 4 AZİMUT PYŞ YABANCI BYF FON SEPETİ FONU
## 5 AZİMUT PYŞ KISA VADELİ BORÇLANMA ARAÇLARI FONU
## 6 AZİMUT PYŞ KİRA SERTİFİKALARI (SUKUK) KATILIM FONU
## 7 AZİMUT PYŞ BİRİNCİ HİSSE SENEDİ FONU (HİSSE SENEDİ YOĞUN FON)
## 8 AZİMUT PYŞ ÇOKLU VARLIK DEĞİŞKEN FON
## 9 AZİMUT PYŞ KAR PAYI ÖDEYEN HİSSE SENEDİ FONU (HİSSE SENEDİ YOĞUN FON)
## 10 AZİMUT PYŞ BİRİNCİ BORÇLANMA ARAÇLARI FONU
## # ... with 13 more rows
Let’s get the starting X characters to be able to parse them by company name and matche them with company list. It seems sometimes “İ” and sometime “I” used in the same name.
df_import%>%distinct(name)%>%mutate(name_start=str_sub(name,1,10))%>%distinct(name_start)%>%arrange(name_start)
## # A tibble: 63 x 1
## name_start
## <chr>
## 1 ACTUS PORT
## 2 AEGON EME
## 3 AEGON EMEK
## 4 AK PORTFÖY
## 5 ALBARAKA P
## 6 ALLIANZ HA
## 7 ALLİANZ HA
## 8 ALLIANZ YA
## 9 ALLİANZ YA
## 10 ANADOLU HA
## # ... with 53 more rows
Get rid of “İ”, “I” duplications and other anoying stuff by changing all turkish characters to lating ascii. We used stri_trans_general function of stringi package.
df_import=df_import%>%mutate(name=stri_trans_general(name,"latin-ascii"))
df_companies=df_companies%>%mutate(company_name=stri_trans_general(company_name,"latin-ascii"))
Let’s try again to extract first several characters to see if we can match them. It seems 9 characters is the sweet spot with 54 just a few above unique number 51
for (i in 1:25) {
df_temp=df_import%>%distinct(name)%>%mutate(name_start=str_sub(name,1,i))%>%distinct(name_start)
print(paste("First ", i,"characters. Distinct counts: ",count(df_temp)[[1,1]]))
}
## [1] "First 1 characters. Distinct counts: 20"
## [1] "First 2 characters. Distinct counts: 41"
## [1] "First 3 characters. Distinct counts: 47"
## [1] "First 4 characters. Distinct counts: 47"
## [1] "First 5 characters. Distinct counts: 47"
## [1] "First 6 characters. Distinct counts: 48"
## [1] "First 7 characters. Distinct counts: 50"
## [1] "First 8 characters. Distinct counts: 50"
## [1] "First 9 characters. Distinct counts: 54"
## [1] "First 10 characters. Distinct counts: 57"
## [1] "First 11 characters. Distinct counts: 57"
## [1] "First 12 characters. Distinct counts: 102"
## [1] "First 13 characters. Distinct counts: 163"
## [1] "First 14 characters. Distinct counts: 220"
## [1] "First 15 characters. Distinct counts: 270"
## [1] "First 16 characters. Distinct counts: 324"
## [1] "First 17 characters. Distinct counts: 376"
## [1] "First 18 characters. Distinct counts: 403"
## [1] "First 19 characters. Distinct counts: 423"
## [1] "First 20 characters. Distinct counts: 453"
## [1] "First 21 characters. Distinct counts: 475"
## [1] "First 22 characters. Distinct counts: 486"
## [1] "First 23 characters. Distinct counts: 496"
## [1] "First 24 characters. Distinct counts: 504"
## [1] "First 25 characters. Distinct counts: 516"
First 9 characters also sufficient to differentiate 51 unique companies
for (i in 1:25) {
df_temp2=df_companies%>%distinct(company_name)%>%mutate(name_start=str_sub(company_name,1,i))%>%distinct(name_start)
print(paste("First ", i,"characters. Distinct counts: ",count(df_temp2)[[1,1]]))
}
## [1] "First 1 characters. Distinct counts: 20"
## [1] "First 2 characters. Distinct counts: 42"
## [1] "First 3 characters. Distinct counts: 48"
## [1] "First 4 characters. Distinct counts: 48"
## [1] "First 5 characters. Distinct counts: 48"
## [1] "First 6 characters. Distinct counts: 49"
## [1] "First 7 characters. Distinct counts: 49"
## [1] "First 8 characters. Distinct counts: 49"
## [1] "First 9 characters. Distinct counts: 51"
## [1] "First 10 characters. Distinct counts: 51"
## [1] "First 11 characters. Distinct counts: 51"
## [1] "First 12 characters. Distinct counts: 51"
## [1] "First 13 characters. Distinct counts: 51"
## [1] "First 14 characters. Distinct counts: 51"
## [1] "First 15 characters. Distinct counts: 51"
## [1] "First 16 characters. Distinct counts: 51"
## [1] "First 17 characters. Distinct counts: 51"
## [1] "First 18 characters. Distinct counts: 51"
## [1] "First 19 characters. Distinct counts: 51"
## [1] "First 20 characters. Distinct counts: 51"
## [1] "First 21 characters. Distinct counts: 51"
## [1] "First 22 characters. Distinct counts: 51"
## [1] "First 23 characters. Distinct counts: 51"
## [1] "First 24 characters. Distinct counts: 51"
## [1] "First 25 characters. Distinct counts: 51"
There are few unmatches, but it is quite obvious how to align them to our data when we merge them on df_companies
df_import_9=df_import%>%distinct(name)%>%mutate(name_start=str_sub(name,1,9))%>%distinct(name_start)%>%arrange(name_start)%>%glimpse()
## Rows: 54
## Columns: 1
## $ name_start <chr> "ACTUS POR", "AEGON EM", "AEGON EME", "AK PORTFO", "ALB...
df_companies=df_companies%>%mutate(name_start=str_sub(company_name,1,9))%>%arrange(name_start)%>%glimpse()
## Rows: 51
## Columns: 4
## $ company_code <chr> "PPY", "ANM", "AKP", "PGA", "KHM", "HYS", "AEM", "APY"...
## $ company_name <chr> "ACTUS PORTFOY YONETIMI A.S.", "AEGON EMEKLILIK VE HAY...
## $ company_type <chr> "mutual", "pension", "mutual", "mutual", "pension", "p...
## $ name_start <chr> "ACTUS POR", "AEGON EME", "AK PORTFO", "ALBARAKA ", "A...
df_companies=full_join(df_import_9,df_companies, by="name_start")%>%arrange(name_start)
df_companies
## # A tibble: 56 x 4
## name_start company_code company_name company_type
## <chr> <chr> <chr> <chr>
## 1 "ACTUS POR" PPY ACTUS PORTFOY YONETIMI A.S. mutual
## 2 "AEGON EM" <NA> <NA> <NA>
## 3 "AEGON EME" ANM AEGON EMEKLILIK VE HAYAT A.S. pension
## 4 "AK PORTFO" AKP AK PORTFOY YONETIMI A.S. mutual
## 5 "ALBARAKA " PGA ALBARAKA PORTFOY YONETIMI A.S. mutual
## 6 "ALLIANZ H" KHM ALLIANZ HAYAT VE EMEKLILIK A.S. pension
## 7 "ALLIANZ Y" HYS ALLIANZ YASAM VE EMEKLILIK A.S. pension
## 8 "ANADOLU H" AEM ANADOLU HAYAT EMEKLILIK A.S. pension
## 9 "ATA PORTF" APY ATA PORTFOY YONETIMI A.S. mutual
## 10 "ATLAS POR" ALA ATLAS PORTFOY YONETIMI A.S. mutual
## # ... with 46 more rows
AEGON EM = AEGON EME=ANM (spaces!) BEREKET= BEREKET E=ASH (spaces!)
AZIMUT PY = AZIMUT PO = GPO HSBC PYS = HSBC PORT=HPY CIGNA FIN=CIGNA SAG=FEM
df_companies=df_companies%>%mutate(company_code=ifelse(grepl("AEGON",name_start),"ANM",company_code), company_name=ifelse(grepl("AEGON",name_start),"AEGON EMEKLILIK VE HAYAT A.S.",company_name), company_type=ifelse(grepl("AEGON",name_start),"pension",company_type) )%>%mutate(company_code=ifelse(grepl("AZIMUT",name_start),"GPO",company_code), company_name=ifelse(grepl("AZIMUT",name_start),"AZIMUT PORTFOY YONETIMI A.S.",company_name), company_type=ifelse(grepl("AZIMUT",name_start),"mutual",company_type) )%>%mutate(company_code=ifelse(grepl("HSBC",name_start),"HPY",company_code), company_name=ifelse(grepl("HSBC",name_start),"HSBC PORTFOY YONETIMI A.S.",company_name), company_type=ifelse(grepl("HSBC",name_start),"mutual",company_type) )%>%mutate(company_code=ifelse(grepl("CIGNA",name_start),"FEM",company_code), company_name=ifelse(grepl("CIGNA",name_start),"CIGNA SAGLIK HAYAT VE EMEKLILIK A.S.",company_name), company_type=ifelse(grepl("CIGNA",name_start),"pension",company_type) )%>%mutate(company_code=ifelse(grepl("BEREKET",name_start),"ASH",company_code), company_name=ifelse(grepl("BEREKET",name_start),"BEREKET EMEKLILIK VE HAYAT A.S.",company_name), company_type=ifelse(grepl("BEREKET",name_start),"pension",company_type) )%>%arrange(name_start)
df_companies
## # A tibble: 56 x 4
## name_start company_code company_name company_type
## <chr> <chr> <chr> <chr>
## 1 "ACTUS POR" PPY ACTUS PORTFOY YONETIMI A.S. mutual
## 2 "AEGON EM" ANM AEGON EMEKLILIK VE HAYAT A.S. pension
## 3 "AEGON EME" ANM AEGON EMEKLILIK VE HAYAT A.S. pension
## 4 "AK PORTFO" AKP AK PORTFOY YONETIMI A.S. mutual
## 5 "ALBARAKA " PGA ALBARAKA PORTFOY YONETIMI A.S. mutual
## 6 "ALLIANZ H" KHM ALLIANZ HAYAT VE EMEKLILIK A.S. pension
## 7 "ALLIANZ Y" HYS ALLIANZ YASAM VE EMEKLILIK A.S. pension
## 8 "ANADOLU H" AEM ANADOLU HAYAT EMEKLILIK A.S. pension
## 9 "ATA PORTF" APY ATA PORTFOY YONETIMI A.S. mutual
## 10 "ATLAS POR" ALA ATLAS PORTFOY YONETIMI A.S. mutual
## # ... with 46 more rows
Now add this company name and code information to our main data set. 826638 rows are the same as before join operation
df_import=left_join(df_import%>%mutate(name_start=str_sub(name,1,9)),df_companies, by="name_start")%>%select(-name_start, -company_type)%>%glimpse()
## Rows: 826,638
## Columns: 40
## $ OKS <lgl> FALSE, FALSE, FALSE, FALSE, FAL...
## $ participation <lgl> FALSE, TRUE, FALSE, FALSE, FALS...
## $ contribution <lgl> FALSE, FALSE, FALSE, FALSE, FAL...
## $ date <dttm> 2015-12-31, 2015-12-31, 2015-1...
## $ code <chr> "ABE", "AEA", "AEB", "AEC", "AE...
## $ fund_type <chr> "pension", "pension", "pension"...
## $ category <chr> "Variable Fund", "Gold Fund", "...
## $ name <chr> "ANADOLU HAYAT EMEKLILIK A.S.B....
## $ price <dbl> 0.012212, 0.011026, 0.033588, 0...
## $ shares <dbl> 3876175452, 10761804554, 219965...
## $ people <dbl> 25887, 97857, 13269, 101, 5030,...
## $ total_value <dbl> 47336763, 118662196, 73881771, ...
## $ p_Bank_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Other <dbl> 3.02, 94.31, 1.90, 0.39, 1.79, ...
## $ p_FX_Payable_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Bond <dbl> 0.66, 0.00, 0.84, 60.90, 66.88,...
## $ p_Foreign_Currency_Bills <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Eurobonds <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Commercial_Paper <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Fund_Participation_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Real_Estate_Certificate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Treasury_Bill <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Stock <dbl> 0.00, 0.88, 82.55, 4.76, 29.47,...
## $ p_Government_Bonds_and_Bills_FX <dbl> 0.00, 0.00, 0.00, 11.09, 0.00, ...
## $ p_Participation_Account <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Government_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Precious_Metals <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Lease_Certificates <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Private_Sector_Bond <dbl> 0.00, 4.81, 0.00, 22.78, 0.00, ...
## $ p_Repo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Derivatives <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_TMM <dbl> 0.00, 0.00, 6.61, 0.08, 0.00, 7...
## $ p_Reverse_Repo <dbl> 0.40, 0.00, 8.10, 0.00, 1.86, 2...
## $ p_Asset_Backed_Securities <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Term_Deposit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Debt_Instruments <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Equity <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ p_Foreign_Securities <dbl> 95.92, 0.00, 0.00, 0.00, 0.00, ...
## $ company_code <chr> "AEM", "AEM", "AHS", "HYS", "KH...
## $ company_name <chr> "ANADOLU HAYAT EMEKLILIK A.S.",...
It matched perfectly, no NA in company related fields.
print(df_import%>%filter(is.na(company_name)))
## # A tibble: 0 x 40
## # ... with 40 variables: OKS <lgl>, participation <lgl>, contribution <lgl>,
## # date <dttm>, code <chr>, fund_type <chr>, category <chr>, name <chr>,
## # price <dbl>, shares <dbl>, people <dbl>, total_value <dbl>,
## # p_Bank_Bills <dbl>, p_Other <dbl>, p_FX_Payable_Bills <dbl>,
## # p_Government_Bond <dbl>, p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>, company_code <chr>, company_name <chr>
print(df_import%>%filter(is.na(company_code)))
## # A tibble: 0 x 40
## # ... with 40 variables: OKS <lgl>, participation <lgl>, contribution <lgl>,
## # date <dttm>, code <chr>, fund_type <chr>, category <chr>, name <chr>,
## # price <dbl>, shares <dbl>, people <dbl>, total_value <dbl>,
## # p_Bank_Bills <dbl>, p_Other <dbl>, p_FX_Payable_Bills <dbl>,
## # p_Government_Bond <dbl>, p_Foreign_Currency_Bills <dbl>, p_Eurobonds <dbl>,
## # p_Commercial_Paper <dbl>, p_Fund_Participation_Certificate <dbl>,
## # p_Real_Estate_Certificate <dbl>, p_Treasury_Bill <dbl>, p_Stock <dbl>,
## # p_Government_Bonds_and_Bills_FX <dbl>, p_Participation_Account <dbl>,
## # p_Government_Lease_Certificates <dbl>, p_Precious_Metals <dbl>,
## # p_Private_Sector_Lease_Certificates <dbl>, p_Private_Sector_Bond <dbl>,
## # p_Repo <dbl>, p_Derivatives <dbl>, p_TMM <dbl>, p_Reverse_Repo <dbl>,
## # p_Asset_Backed_Securities <dbl>, p_Term_Deposit <dbl>,
## # p_Foreign_Debt_Instruments <dbl>, p_Foreign_Equity <dbl>,
## # p_Foreign_Securities <dbl>, company_code <chr>, company_name <chr>
We have completed one more milestone. Let’s export final, clean dataframe as RDS for exploratory analysis.
saveRDS(df_import,"df_clean.rds")
We have uploaded the df_clean.rds file and other material for this milestone to the google drive. You can review them from this link: https://drive.google.com/drive/folders/1SJPoQqPOXHdLEUReML9zpWw7NaNrcJUY?usp=sharing