2  Preprocessing

Published

January 2, 2024

Initial Setup

The provided R code installs and loads the “WDI” package, a tool for accessing World Bank data indicators. Additionally, it installs and loads the “remotes” package, a dependency required for installation.

#install.packages("WDI",repos = "http://cran.us.r-project.org",dependencies=TRUE,eval=FALSE)
library(WDI)

#install.packages("remotes",repos = "http://cran.us.r-project.org",dependencies=TRUE,eval=FALSE)
library(remotes)

Data Preparation

The objective is to obtain specific World Development Indicators (WDI) data for the indicators listed in the “indicator_codes” vector, focusing on countries specified in the “countries” vector. The time range for data extraction is set between the years 2018 and 2023.

The WDI function is utilized with parameters such as indicator codes, countries, and the specified time range. The resulting dataset, denoted as “world_bank_data_2,” is then refined by selecting relevant columns and renaming them for clarity. The summary dataset, named “world_bank_data_summary,” includes columns such as country, year, population, GDP per capita, age dependency ratio, inflation, unemployment rate, female labor force participation rate, and male labor force participation rate.

Finally, the kable function is employed to render the summary dataset in HTML format, and the kable_styling function is applied to enhance the visual presentation of the table.

library(gt)
library(knitr)
library(kableExtra)
indicator_codes <- c("SP.POP.TOTL","NY.GDP.PCAP.CD","SP.POP.DPND","FP.CPI.TOTL.ZG","SL.UEM.TOTL.ZS","SL.TLF.CACT.FE.ZS","SL.TLF.CACT.MA.ZS")
countries <-  c("TUR","UKR","EUU")
start_date <- 2018
end_date <- 2023
world_bank_data_2 <- WDI( indicator = indicator_codes,
                        country = countries,
                        start = start_date, 
                        end = end_date, 
                        extra = FALSE)

world_bank_data_summary <- world_bank_data_2[,c("country","year","SP.POP.TOTL","NY.GDP.PCAP.CD","SP.POP.DPND","FP.CPI.TOTL.ZG","SL.UEM.TOTL.ZS","SL.TLF.CACT.FE.ZS","SL.TLF.CACT.MA.ZS")]
colnames(world_bank_data_summary) <- c("Country","Year","Population","GdpPerCapita","AgeDependancyRatio","Inflation","UnemploymentRate","FemaleLaborRate","MaleLaborRate")

options(max.print = 15)

kable(world_bank_data_summary, "html") %>% kable_styling()
Country Year Population GdpPerCapita AgeDependancyRatio Inflation UnemploymentRate FemaleLaborRate MaleLaborRate
European Union 2018 447001100 35751.573 54.66621 1.7386086 7.253803 51.06276 63.84818
European Union 2019 447367191 35079.534 55.29177 1.6305226 6.679810 51.26601 63.91365
European Union 2020 447692315 34356.575 55.86259 0.4764989 7.054819 50.50284 63.01832
European Union 2021 447178112 38721.763 56.34923 2.5545070 7.000430 51.13162 63.14634
European Union 2022 447370510 37432.560 56.57327 8.8336989 6.098751 51.88009 63.83574
Turkiye 2018 81407204 9568.836 46.30621 16.3324639 10.890000 34.03700 72.43400
Turkiye 2019 82579440 9215.441 46.74186 15.1768216 13.670000 34.21500 71.76100
Turkiye 2020 83384680 8638.739 46.82812 12.2789574 13.110000 30.76700 67.99800
Turkiye 2021 84147318 9743.213 46.75508 19.5964927 11.980000 32.76200 70.13500
Turkiye 2022 84979913 10674.504 46.77459 72.3088360 10.030000 34.16700 71.11600
Ukraine 2018 44622518 3096.562 47.21927 10.9518559 8.800000 49.54200 64.46000
Ukraine 2019 44386203 3661.458 47.75787 7.8867175 8.190000 49.26000 64.81700
Ukraine 2020 44132049 3751.737 48.18610 2.7324921 9.480000 48.15100 63.44800
Ukraine 2021 43822901 4827.846 48.41627 9.3631392 9.830000 47.79100 62.89700
Ukraine 2022 38000000 4533.976 52.05497 20.1836367 NA NA NA

Save as .rds File

In this R code snippet, the saveRDS function is used to save the dataset named “world_bank_data_summary” as an RDS (R Data Serialization) file named “world_bank_data.rds.” Subsequently, the readRDS function is employed to read the saved RDS file back into R and store it in a variable named “world_bank_data.” This process enables the preservation and retrieval of the dataset in a serialized format.

saveRDS(world_bank_data_summary, file = "world_bank_data.rds")
world_bank_data <- readRDS("world_bank_data.rds")