<- c("readxl","dplyr","tidyverse", "ggplot2", "lubridate", "tidyr", "stringi", "hrbrthemes", "viridis", "scales", "knitr")
pti <- pti[!(pti %in% installed.packages())]
pti if(length(pti)>0){
install.packages(pti)
}
library(readxl)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(lubridate)
library(tidyr)
library(stringi)
library(hrbrthemes)
library(viridis)
library(scales)
library(knitr)
3 TUIK EDA
Exploratory Data Analysis (EDA)
Key Takeaways
Men participate in the labour force two times more than women. Labour participation and employment rates are aligned. Economic crisis in 2019 had a negative impact on the labour market. COVID reinforced this effect by increasing the unemployment rate.
In Doğu Karadeniz, Güneydoğu Anadolu and Ortadoğu Anadolu, gender does not matter for the unemployment. Both men and women are unemployed. Men are more unemployed than women in Kuzeydoğu Anadolu whereas women unemployment rate is higher than that for men in other regions.
Being a housewife is the main reason for women that prevents them not being in the labour force. There is a stereotypical gender structure in Turkey that emphasizes on women are house-keepers, men are bread-winners. In addition, discouragement for both men and women overlapped in 2021. People are less motivated to seek for a job in Turkey over years.
Marital status affects the employment rate for women. Men dominance factor is dramatically higher in married women compared to never married or divorced women. It may have 2 different reasons: Married women are struggling with participating in work life; divorced or never married women need to work to obtain their own economic freedom.
Divorce rates between provinces are quite different in Turkey. While the top 5 highest divorce rates are in the Mediterranean or West Side, but the lowest 5 are at the East Side of Turkey. When we combine this info with regional unemployment rates, it’s being more dramatic for women.
Data Dictionary
Non-institutional population: Comprises all the population excluding the residents of dormitories of universities, orphanage, rest homes for elderly persons, special hospitals, prisons and military barracks etc.
Non-institutional working age population: Indicates the population 15 years old and over within the non-institutional civilian population.
Labour force: Labour force covers the working-age population that is willing to supply labour for the production of economic goods and services in the relevant reference period. In determining the labour force, activities that contribute to the production of goods and services within the United Nations System of National Accounts (SNA) production boundary are essential. Labour force comprises all employed and unemployed people.
Labour force participation rate: Indicates the ratio of the labour force to non-institutional working age population.
Employment: Comprises all the non-institutional working age population who are included in the persons “at work” and “not at work” described below.
Persons at work: Persons economically active during the reference period for at least one hour as a regular employee, casual employee, employer, self employed or unpaid family worker.
Persons not at work: Regular employee, casual employee, employer, self employed or unpaid family who have a job but away from job during the entire reference week for various reasons are considered as employed, if there is a tie with job according to the absence reason from job.
Persons who are not at work during the reference week due to official holidays or annual leave, sick leave, working hour arrangements, job related training or legal maternity/paternity leave are employed without the need for additional criteria.
Persons who are not at work to raise a child (parental leave, unpaid leave for childcare) are considered as employed only if they have an assurance of return to work within a period of less than 3 months or less and/or they continue to receive job related wage/salary during their absence.
Persons who are not at work because of off-season/low season are considered as employed only if they performing any task or duty during the off-season period.
Persons who are not at work due to the other reasons such as bad weather, there was no work, slowing down or stopping of work for technical or economic reasons, participating a training unrelated to job are considered as employed only if they have an assurance of return to work within a period of 3 months or less.
The members of producer cooperatives and apprentices or trainees who are working to gain any kind of benefit (income in cash or in kind, social security, traveling cost, pocket money etc.) are considered to be employed.
Employment Rate: It is the ratio of employment to the non-institutional working age population.
Unemployment rate: It is the ratio of unemployed persons to the labour force.
Reference :
https://data.tuik.gov.tr/Search/Search?text=employment%20rate
Analysis
First of all, libraries are installed.
Data files are read in Excel.
<- readRDS("docs/tuik/Labour_force_status_by_reg.rds")
labour_force_status_by_reg
<- readRDS("docs/tuik/Employed_rate_by_marital_status.rds")
employed_rate_by_marital_status
<-employed_rate_by_marital_status%>%filter(gender=="Female")
female<-employed_rate_by_marital_status%>%filter(gender=="Male")
male
<- readRDS("docs/tuik/Reasons_of_not_being_in_lab_for.rds")
reasons_of_not_being_in_lab_for
<- readRDS("docs/tuik/Crude_divorce_rate_by_provinces.rds")
crude_divorce_rate_by_provinces
<- readRDS("docs/tuik/Mean_age_of_mother_by_Statistic.rds") mean_age_of_mother_by_statistic
Descriptive statistics are given for high-level understanding of each data table.
summary(labour_force_status_by_reg)
year region gender
Min. :2014 Length:208 Length:208
1st Qu.:2016 Class :character Class :character
Median :2018 Mode :character Mode :character
Mean :2018
3rd Qu.:2019
Max. :2021
labour_force_participation_rate unemployment_rate employment_rate
Min. :0.1620 Min. :0.0290 Min. :0.1420
1st Qu.:0.3280 1st Qu.:0.0800 1st Qu.:0.2800
Median :0.5375 Median :0.1095 Median :0.4500
Mean :0.5142 Mean :0.1132 Mean :0.4594
3rd Qu.:0.7060 3rd Qu.:0.1472 3rd Qu.:0.6452
Max. :0.7740 Max. :0.2310 Max. :0.6920
summary(mean_age_of_mother_by_statistic)
region district province_id province
Length:1053 Length:1053 Length:1053 Length:1053
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
year mother_mean_age
Min. :2009 Min. :25.77
1st Qu.:2012 1st Qu.:27.58
Median :2015 Median :28.23
Mean :2015 Mean :28.20
3rd Qu.:2018 3rd Qu.:28.82
Max. :2021 Max. :30.52
summary(reasons_of_not_being_in_lab_for)
year gender reason pop_not_in_lab_force
Min. :2014 Length:144 Length:144 Min. : 0.0
1st Qu.:2016 Class :character Class :character 1st Qu.: 386.8
Median :2018 Mode :character Mode :character Median : 1319.5
Mean :2018 Mean : 3233.4
3rd Qu.:2019 3rd Qu.: 2686.8
Max. :2021 Max. :21855.0
summary(employed_rate_by_marital_status)
year gender marital_status employed_rate
Min. :2014 Length:64 Length:64 Min. :0.0650
1st Qu.:2016 Class :character Class :character 1st Qu.:0.2525
Median :2018 Mode :character Mode :character Median :0.3525
Mean :2018 Mean :0.3897
3rd Qu.:2019 3rd Qu.:0.5593
Max. :2021 Max. :0.7279
summary(crude_divorce_rate_by_provinces)
province year crude_divorce_rate
Length:810 Min. :2012 Min. :0.1129
Class :character 1st Qu.:2014 1st Qu.:0.9894
Mode :character Median :2016 Median :1.4774
Mean :2016 Mean :1.4206
3rd Qu.:2019 3rd Qu.:1.8818
Max. :2021 Max. :3.0363
After this section, data visualizations are made to have a better understanding about data.
Labour Force Rates by Gender
This graph shows the labour force participation, employment and unemployment rates by gender over years (2014-2021). Women unemployment rate is higher than men’s unemployment rate whereas other metrics for women are quite low compared to those for men. Until 2019, the unemployment rate slightly changes. However, it starts to increase after 2019 due to the economic crisis in Turkey. COVID made the unemployment rate for men and women almost equal. Labour participation and employment rates for men are two times more than those for women. In 2020, labour participation and employment rates are lower compared to other years due to COVID.
%>%
labour_force_status_by_reg filter(region == "Total") %>%
select(year, gender, labour_force_participation_rate, employment_rate, unemployment_rate) %>%
pivot_longer(cols = -c(gender, year), values_to = "value", names_to = "variable") %>%
ggplot(aes(x = year, y = value, fill = reorder(variable, value))) +
geom_bar(position="dodge", stat="identity") +
facet_grid(. ~ gender) +
scale_x_continuous(breaks = c(min(labour_force_status_by_reg$year):max(labour_force_status_by_reg$year))) +
theme(axis.text.x = element_text(angle=90,vjust=1,hjust=1), legend.position="top", legend.title=element_blank()) +
coord_flip() +
labs(title = "Labour Force Rates by Gender")
Labour Force Participation Rate by Region
This chart displays the distribution of labour force participation by gender over years (2014-2021). Men participate in the labour force more than women independent of region. In addition, the labour force participation rate slightly decreased for all regions in 2010 due to COVID. Regional labour participation rate patterns are quite the same except for Doğu Karadeniz and Güneydoğu Anadolu. Labour force rates for men and women are close in Doğu Karadeniz since women labour force participation rate is higher than other regions. On the contrary, there is a huge gap between men and women in Güneydoğu Anadolu.
%>%
labour_force_status_by_reg filter(region != "Total") %>%
ggplot() +
geom_line(aes(x = year, y = labour_force_participation_rate, color = gender)) +
facet_wrap( ~ region) +
scale_x_continuous(breaks = c(min(labour_force_status_by_reg$year):max(labour_force_status_by_reg$year))) +
theme(axis.text.x = element_text(angle=90,vjust=1,hjust=1)) +
labs(title = "Labour Force Participation Rate by Region")
Employment Rate by Region
This graph illustrates trends in employment rate between 2014 and 2021. It has a similar pattern with labour force participation rate.
%>%
labour_force_status_by_reg filter(region != "Total") %>%
ggplot() +
geom_line(aes(x = year, y = employment_rate, color = gender)) +
facet_wrap( ~ region) +
scale_x_continuous(breaks = c(min(labour_force_status_by_reg$year):max(labour_force_status_by_reg$year))) +
theme(axis.text.x = element_text(angle=90,vjust=1,hjust=1)) +
labs(title = "Employment Rate by Region")
Unemployment Rate by Region
This graph shows trends in unemployment rate between 2014 and 2021. Unemployment rate for men is higher than that for women in Kuzeydoğu Anadolu. Lines overlap in Doğu Karadeniz, Güneydoğu Anadolu and Ortadoğu Anadolu. There is no distinction between men and women. They are both unemployed. Unemployment rate is low in Batı Karadeniz compared to other regions. Unemployment rate peaked in 2019 generally due to the economic crisis in Turkey.
%>%
labour_force_status_by_reg filter(region != "Total") %>%
ggplot() +
geom_line(aes(x = year, y = unemployment_rate, color = gender)) +
facet_wrap( ~ region) +
scale_x_continuous(breaks = c(min(labour_force_status_by_reg$year):max(labour_force_status_by_reg$year))) +
theme(axis.text.x = element_text(angle=90,vjust=1,hjust=1)) +
labs(title = "Unemployment Rate by Region")
This box plot shows us the unemployment rate distribution over years for each region in gender breakdown.
First of all, we may focus on the difference between females and males. Median value of the unemployment rate is higher for females than males in every region.
The second objective is that the Marmara region and the west side of Turkey have a lower unemployment rate than other regions.
The third objective is about stabilization of the unemployment rate. We can easily see that the Marmara region and the west side of Turkey has more consistent unemployment rates. Especially box plot ranges are maximum in Middle East of Anatolian region, It means that there is no stability over year in this region.
%>%
labour_force_status_by_reg filter(region != "Total") %>%
ggplot(aes(y=unemployment_rate, group = gender)) +
geom_boxplot(fill="steelblue") +
facet_grid(gender ~ region) +
theme(axis.text.x = element_text(angle=90,vjust=1,hjust=1)) +
theme(strip.text.x = element_text(size = 3)) +
labs(title = "Unemployment Distribution by Region")
Reasons of not Being in Labour Force by Gender
This chart shows trends for not being in the labour force by reasons. Housewife is the main reason that prevents women participation in the labour force. It is not a choice as the name implies. Since women have to take care of housework, they are unemployed whereas men keep working after having children. Education /Training lines overlap. Both men and women care about Education /Training. “Not seeking for a job, available to start - Discouraged” lines are merged in 2021. Due to the economic crisis in Turkey, people are not motivated to find a job. Being disabled, old, ill etc. puts women back in the labour force.
%>%
reasons_of_not_being_in_lab_for #filter(gender == "Female") %>%
group_by(reason,year, gender) %>%
summarise(sum_of_not_in_lab_force = sum(pop_not_in_lab_force), .groups = "drop") %>%
pivot_wider(values_from = sum_of_not_in_lab_force, names_from = gender) %>%
transmute(year = year,reason = reason, Female_ = Female/ (Female+Male), Male_ = 1 - Female/ (Female+Male)) %>%
pivot_longer(cols= ends_with("ale_"),values_to = "ratio_not_in_labor_force", names_to = "gender") %>%
ggplot(aes(x = year, y = ratio_not_in_labor_force, color = gender)) +
geom_line() +
facet_wrap( ~ reason) +
theme(strip.text.x = element_text(size = 3)) +
labs(title = "Reasons of not being in labour force by Gender")
Employment Ratio
One of our main aims is understanding the proportion of women in labor in general aspects. We calculated a new metric called employment ratio to detect men vs women participation. Employment Ratio explains the ratio of Men Employment Rate to Women Employment Rate. It’s a KPI for understanding men’s dominance over women in working life. The graph shows us there is a major difference between married and divorced women in work life. If the marital status is married for both genders, men’s dominance factor is sharply increasing according to never married or divorced people. It may explain 2 different results in terms of 2 different points of view: Married women are struggling with participating work life; divorced or never married women needs to work to obtain their own economic freedom.
<-female%>%mutate(ratio=male$employed_rate/female$employed_rate) %>%select(-gender)
diffggplot(diff, aes(x = year, y = ratio, color = marital_status)) + geom_line() +
scale_x_continuous(breaks = c(min(female$year):max(female$year))) +
labs(title = "Employment Ratio")
Mean Age of Mother by Year
When we inspect the mean age of the mothers in Istanbul province which is the biggest city of Turkey and represent almost 25% of the country’s population, it’s clear that mean age has increased from 28 to 30 in 12 years. There is no spectacular change in any year transition but there is 2 years absolute change in 12 years total. It means that women tend to be mothers in more old ages.
%>%
mean_age_of_mother_by_statistic filter(province == "İstanbul") %>%
ggplot(aes(x = year, y = mother_mean_age)) +
geom_line() +
coord_cartesian(xlim = c(2009,2021),ylim = c(25,max(mean_age_of_mother_by_statistic$mother_mean_age + 5))) +
geom_line(arrow = arrow(), aes(color = year)) +
geom_point(color = "#0099f9", size = 5) +
labs(title = "Mean Age of Mother by Year in İstanbul")
We can observe the increasing trend on the mean age of mothers year by year in all regions. The histogram plot’s range and distribution stay the same but their minimum and maximum values are rising by year. In particular, the mean motherhood age change is more evident on 2012-2013 transition in spite of changes that are not spectacular between 2009 and 2012. The other major changes are after 2019.
%>%
mean_age_of_mother_by_statistic ggplot(aes(x = mother_mean_age, fill = year)) +
geom_histogram(bins = 10) +
facet_grid(year~.) +
labs(title = "Mean Age of Mother by Year")
<- c("İstanbul","Ankara","İzmir","Bursa","Adana")
five_metropolis
%>%
mean_age_of_mother_by_statistic filter(province %in% five_metropolis, year %in% c(2019,2020,2021)) %>%
ggplot(aes(x = year, y= mother_mean_age)) +
geom_bar(stat = "identity", position = position_dodge(), alpha = 0.75) +
facet_grid(. ~province) +
ylim(0,35) +
geom_text(aes(label = sprintf("%.2f",mother_mean_age)), fontface = "bold", vjust = 1.5, colour = "white", size = 3) +
labs(x = "\n Years", y = "Mean Mother Age\n", title = "\n Mean Mother Age by Year and Metropolis \n") +
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(face="bold", colour="darkgreen", size = 12),
axis.title.y = element_text(face="bold", colour="darkgreen", size = 12),
legend.title = element_text(face="bold", size = 10),
strip.background = element_rect(fill="lightblue", colour="black", size=1),
strip.text = element_text(face="bold", size=rel(1.2))) +
labs(title = "Mean Age of Mother by Metropolis in the last 3 years")
Minimum Average Age of Mother by Region
This is the table of minimum average ages of regions by year. This table tells us that Istanbul has the highest average motherhood age when compared to other regions.
%>%
mean_age_of_mother_by_statistic select(region,year ,mother_mean_age) %>%
group_by(region,year) %>%
summarise(min_mean_mother_age = min(mother_mean_age), .groups = "drop") %>%
pivot_wider(values_from = min_mean_mother_age, names_from = year)
# A tibble: 12 × 14
region `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017` `2018`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 "TR1 İ… 28.0 28.3 28.4 28.7 28.9 29.1 29.2 29.4 29.6 29.7
2 "TR2 B… 27.0 27.0 27.2 27.5 27.7 27.9 27.9 28.1 28.2 28.6
3 "TR3 E… 25.8 26.1 26.3 26.5 26.9 27.0 27.1 27.1 27.3 27.5
4 "TR4 D… 27.1 27.6 27.8 28.0 28.0 28.5 28.3 28.5 28.8 28.9
5 "TR5 B… 26.8 27.0 27.2 27.4 27.7 27.8 28.0 28.1 28.2 28.3
6 "TR6 A… 26.6 26.8 27.0 27.3 27.5 27.7 27.8 27.9 28.0 28.1
7 "TR7 O… 25.8 26.1 26.2 26.7 27.0 27.1 27.2 27.5 27.4 27.6
8 "TR8 B… 26.2 26.4 26.8 27.0 27.4 27.6 27.8 27.9 28.0 27.9
9 "TR9 D… 26.6 26.8 27.1 27.5 27.7 27.9 28.1 28.3 28.4 28.6
10 "TRA K… 26.2 26.4 26.3 26.5 26.7 26.7 26.8 26.8 26.7 26.8
11 "TRB O… 26.6 26.7 26.7 26.7 27.0 27.0 27.0 26.9 27.0 27.1
12 "TRC G… 27.0 26.8 26.8 27.1 27.3 27.2 27.4 27.2 27.1 27.1
# … with 3 more variables: `2019` <dbl>, `2020` <dbl>, `2021` <dbl>
Crude Divorce Rate by Province
When we checked the highest and lowest divorce rates by regions. We observed that the West Side and Mediterranean coast cities are the top 5 cities in divorce rate while the top 5 cities with lowest divorce rate are on the East Side of Turkey. If we combine this information with the unemployment rate, it becomes a more dramatic state.
<-crude_divorce_rate_by_provinces %>%
plot_data1group_by(province) %>%
summarise(avg_rate = mean(crude_divorce_rate))%>%
arrange(desc(avg_rate)) %>%
top_n( 5)
ggplot(plot_data1,aes(x=reorder(province, -avg_rate), y=avg_rate, fill= province)) +
geom_bar(stat = "identity") +
labs(title = "Top 5 provinces for the average divorce rate (2012-2021)") +
xlab("province")
<-crude_divorce_rate_by_provinces %>%
plot_data2group_by(province) %>%
summarise(avg_rate = mean(crude_divorce_rate))%>%
arrange(avg_rate) %>%
slice(1:5)
ggplot(plot_data2,aes(x=reorder(province, -avg_rate), y=avg_rate, fill=province)) +
geom_bar(stat = "identity") +
labs(title = "Top 5 provinces for the lowest average divorce rate (2012-2021)") +
xlab("province")