library(tidyverse)
library(rio)
library(knitr)
In the project, we have four main datasets that we used to make analysis. Below, you can find the reading and preparing stages of these datasets.
The original dataset can be found at this link.
The dataset is grouped by gender and includes job search methods. The data is between January 2014 and August 2020.
year
column. Therefore, empty cells filled with the correct year value.month
column includes both Turkish and English month names. By using str_split_fixed
function, only the English name is taken.job_search_overall <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/search_channel.xls?raw=true",
range = "TÜRKİYE!A6:P91",
col_names = c("year", "month", "total_unemployed", "to_employers", "to_relatives", "to_emp_office", "to_emp_agencies", "to_newspaper", "insert_ad_to_newspaper", "take_interview", "look_place_equip_to_est_bus", "look_credit_license_to_est_bus", "wait_call_from_emp_office", "wait_result_of_app", "wait_result_of_comp_for_public_sec", "others")) %>%
fill(year, .direction = "down") %>%
as_tibble()
job_search_overall$month <- str_split_fixed(job_search_overall$month, " - ", 2)[,2]
Final tibble is as follows.
## Rows: 86
## Columns: 16
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014…
## $ month <chr> "January", "February", "March", "A…
## $ total_unemployed <dbl> 2804, 2825, 2747, 2579, 2551, 2654…
## $ to_employers <dbl> 1826, 1814, 1731, 1621, 1603, 1689…
## $ to_relatives <dbl> 2581, 2605, 2535, 2364, 2326, 2417…
## $ to_emp_office <dbl> 592, 573, 531, 489, 500, 507, 555,…
## $ to_emp_agencies <dbl> 422, 420, 404, 391, 395, 425, 433,…
## $ to_newspaper <dbl> 828, 846, 846, 828, 852, 903, 909,…
## $ insert_ad_to_newspaper <dbl> 185, 214, 229, 214, 203, 199, 185,…
## $ take_interview <dbl> 159, 171, 153, 154, 156, 173, 205,…
## $ look_place_equip_to_est_bus <dbl> 66, 65, 71, 55, 62, 57, 58, 50, 46…
## $ look_credit_license_to_est_bus <dbl> 33, 37, 37, 32, 35, 39, 41, 37, 33…
## $ wait_call_from_emp_office <dbl> 420, 424, 398, 377, 352, 366, 396,…
## $ wait_result_of_app <dbl> 1227, 1261, 1232, 1115, 1108, 1162…
## $ wait_result_of_comp_for_public_sec <dbl> 101, 95, 65, 61, 62, 124, 191, 240…
## $ others <dbl> 7, 7, 6, 4, 3, 2, 1, 1, 2, 4, 3, 3…
job_search_male <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/search_channel.xls?raw=true",
range = "TÜRKİYE!R6:AG91",
col_names = c("year", "month", "total_unemployed", "to_employers", "to_relatives", "to_emp_office", "to_emp_agencies", "to_newspaper", "insert_ad_to_newspaper", "take_interview", "look_place_equip_to_est_bus", "look_credit_license_to_est_bus", "wait_call_from_emp_office", "wait_result_of_app", "wait_result_of_comp_for_public_sec", "others")) %>%
fill(year, .direction = "down") %>%
as_tibble()
job_search_male$month <- str_split_fixed(job_search_male$month, " - ", 2)[,2]
Final tibble is as follows.
## Rows: 86
## Columns: 16
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014…
## $ month <chr> "January", "February", "March", "A…
## $ total_unemployed <dbl> 1889, 1882, 1803, 1675, 1616, 1683…
## $ to_employers <dbl> 1262, 1235, 1161, 1081, 1036, 1086…
## $ to_relatives <dbl> 1760, 1747, 1668, 1543, 1488, 1556…
## $ to_emp_office <dbl> 356, 353, 329, 290, 292, 292, 322,…
## $ to_emp_agencies <dbl> 246, 251, 226, 203, 209, 230, 227,…
## $ to_newspaper <dbl> 511, 519, 513, 473, 490, 517, 519,…
## $ insert_ad_to_newspaper <dbl> 114, 129, 142, 128, 126, 119, 108,…
## $ take_interview <dbl> 83, 88, 80, 83, 84, 82, 102, 108, …
## $ look_place_equip_to_est_bus <dbl> 56, 57, 61, 44, 48, 43, 43, 38, 35…
## $ look_credit_license_to_est_bus <dbl> 28, 33, 31, 24, 26, 30, 34, 30, 29…
## $ wait_call_from_emp_office <dbl> 248, 259, 246, 223, 207, 209, 227,…
## $ wait_result_of_app <dbl> 763, 784, 754, 690, 676, 704, 741,…
## $ wait_result_of_comp_for_public_sec <dbl> 44, 44, 32, 32, 32, 59, 82, 97, 87…
## $ others <dbl> 3, 4, 4, 4, 3, 2, 1, 1, 1, 2, 1, 1…
job_search_female <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/search_channel.xls?raw=true",
range = "TÜRKİYE!AI6:AX91",
col_names = c("year", "month", "total_unemployed", "to_employers", "to_relatives", "to_emp_office", "to_emp_agencies", "to_newspaper", "insert_ad_to_newspaper", "take_interview", "look_place_equip_to_est_bus", "look_credit_license_to_est_bus", "wait_call_from_emp_office", "wait_result_of_app", "wait_result_of_comp_for_public_sec", "others")) %>%
fill(year, .direction = "down") %>%
as_tibble()
job_search_female$month <- str_split_fixed(job_search_female$month, " - ", 2)[,2]
Final tibble is as follows.
## Rows: 86
## Columns: 16
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014…
## $ month <chr> "January", "February", "March", "A…
## $ total_unemployed <dbl> 915, 942, 944, 903, 935, 971, 1065…
## $ to_employers <dbl> 564, 579, 570, 540, 568, 603, 644,…
## $ to_relatives <dbl> 822, 857, 868, 822, 838, 861, 940,…
## $ to_emp_office <dbl> 236, 220, 202, 199, 208, 215, 233,…
## $ to_emp_agencies <dbl> 176, 170, 177, 188, 186, 195, 206,…
## $ to_newspaper <dbl> 318, 328, 333, 354, 362, 386, 390,…
## $ insert_ad_to_newspaper <dbl> 71, 85, 87, 87, 77, 80, 78, 91, 11…
## $ take_interview <dbl> 76, 83, 73, 70, 72, 91, 103, 118, …
## $ look_place_equip_to_est_bus <dbl> 11, 8, 10, 11, 14, 14, 15, 12, 11,…
## $ look_credit_license_to_est_bus <dbl> 6, 3, 7, 8, 9, 9, 7, 7, 3, 4, 7, 9…
## $ wait_call_from_emp_office <dbl> 172, 165, 152, 154, 145, 156, 169,…
## $ wait_result_of_app <dbl> 464, 478, 478, 425, 432, 458, 496,…
## $ wait_result_of_comp_for_public_sec <dbl> 57, 50, 33, 28, 30, 65, 109, 142, …
## $ others <dbl> 3, 3, 2, 1, 0, 0, 0, 1, 2, 2, 2, 1…
The original dataset can be found at this link.
This dataset includes the number of employed and unemployed people by their educational levels and gender. The data is between January 2014 and August 2020.
select
function.year
column. Therefore, empty cells filled with the correct year value.month
column includes both Turkish and English month names. By using str_split_fixed
function, only the English name is taken.month
column which causes these values to be equal to empty strings. Therefore, these values are being found and updated as January.educational_level_overall <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/educational_level.xls?raw=true",
range = "TÜRKİYE!A7:S92",
col_names = c("year", "month", "lf_illeterate", "lf_less_than_hs", "lf_highschool", "lf_voc_hs", "lf_higher_ed", "empty_col_1", "emp_illeterate", "emp_less_than_hs", "emp_highschool", "emp_voc_hs", "emp_higher_ed", "empty_col_2", "unemp_illeterate", "unemp_less_than_hs", "unemp_highschool", "unemp_voc_hs", "unemp_higher_ed" )) %>%
select(-empty_col_1, -empty_col_2) %>%
fill(year, .direction = "down") %>%
as_tibble()
educational_level_overall$month <- str_split_fixed(educational_level_overall$month, " - ", 2)[,2]
educational_level_overall$month[educational_level_overall$month == ""] = "January"
Final tibble is as follows.
## Rows: 86
## Columns: 17
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 20…
## $ month <chr> "January", "February", "March", "April", "May", "J…
## $ lf_illeterate <dbl> 1018, 1087, 1141, 1210, 1248, 1285, 1199, 1179, 11…
## $ lf_less_than_hs <dbl> 15209, 15587, 15954, 16287, 16492, 16585, 16388, 1…
## $ lf_highschool <dbl> 2991, 3028, 3020, 2987, 3028, 3012, 2992, 2962, 29…
## $ lf_voc_hs <dbl> 2672, 2733, 2726, 2765, 2795, 2849, 2979, 2961, 29…
## $ lf_higher_ed <dbl> 5370, 5390, 5489, 5524, 5527, 5509, 5719, 5826, 59…
## $ emp_illeterate <dbl> 942, 1008, 1061, 1135, 1180, 1226, 1139, 1117, 111…
## $ emp_less_than_hs <dbl> 13610, 13979, 14400, 14858, 15110, 15186, 14942, 1…
## $ emp_highschool <dbl> 2646, 2680, 2667, 2656, 2667, 2642, 2601, 2602, 25…
## $ emp_voc_hs <dbl> 2381, 2430, 2444, 2502, 2539, 2573, 2666, 2645, 26…
## $ emp_higher_ed <dbl> 4876, 4902, 5011, 5043, 5043, 4959, 5061, 5083, 51…
## $ unemp_illeterate <dbl> 76, 78, 79, 75, 68, 59, 59, 62, 75, 86, 86, 79, 73…
## $ unemp_less_than_hs <dbl> 1599, 1607, 1554, 1429, 1382, 1400, 1447, 1462, 15…
## $ unemp_highschool <dbl> 344, 348, 353, 331, 360, 370, 390, 361, 348, 341, …
## $ unemp_voc_hs <dbl> 291, 303, 282, 262, 257, 276, 313, 316, 323, 329, …
## $ unemp_higher_ed <dbl> 494, 488, 478, 481, 484, 550, 657, 743, 769, 725, …
The original dataset can be found at this link.
This dataset includes the number of unemployed people by their occupational group and gender. The data is between January 2014 and August 2020.
year
column. Therefore, empty cells filled with the correct year value.month
column includes both Turkish and English month names. By using str_split_fixed
function, only the English name is taken.occ_group_overall <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/occupational_group.xls?raw=true",
range = "TÜRKİYE!A7:L92",
col_names = c("year", "month", "total_unemployed", "manager", "prof", "tech", "cleric", "service", "agricul", "trade", "operator", "elemantary")) %>%
fill(year, .direction = "down") %>%
as_tibble()
occ_group_overall$month <- str_split_fixed(occ_group_overall$month, " - ", 2)[,2]
Final tibble is as follows.
## Rows: 86
## Columns: 12
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014…
## $ month <chr> "January", "February", "March", "April", "May", "Jun…
## $ total_unemployed <dbl> 2804, 2825, 2747, 2579, 2551, 2654, 2867, 2944, 3064…
## $ manager <dbl> 63, 69, 79, 69, 60, 62, 66, 66, 60, 52, 64, 67, 65, …
## $ prof <dbl> 238, 217, 204, 227, 244, 286, 342, 390, 396, 351, 31…
## $ tech <dbl> 180, 169, 166, 157, 170, 180, 206, 218, 236, 250, 25…
## $ cleric <dbl> 342, 350, 354, 351, 376, 395, 415, 434, 461, 462, 44…
## $ service <dbl> 653, 691, 693, 623, 618, 653, 732, 710, 704, 670, 69…
## $ agricul <dbl> 32, 27, 26, 23, 25, 25, 24, 23, 22, 20, 20, 20, 24, …
## $ trade <dbl> 479, 472, 441, 399, 384, 368, 392, 382, 406, 404, 43…
## $ operator <dbl> 257, 274, 260, 271, 253, 258, 255, 258, 271, 283, 30…
## $ elemantary <dbl> 560, 555, 523, 458, 421, 427, 435, 462, 507, 552, 56…
occ_group_male <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/occupational_group.xls?raw=true",
range = "TÜRKİYE!N7:Y92",
col_names = c("year", "month", "total_unemployed", "manager", "prof", "tech", "cleric", "service", "agricul", "trade", "operator", "elemantary")) %>%
fill(year, .direction = "down") %>%
as_tibble()
occ_group_male$month <- str_split_fixed(occ_group_male$month, " - ", 2)[,2]
Final tibble is as follows.
## Rows: 86
## Columns: 12
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014…
## $ month <chr> "January", "February", "March", "April", "May", "Jun…
## $ total_unemployed <dbl> 1889, 1882, 1803, 1675, 1616, 1683, 1801, 1810, 1844…
## $ manager <dbl> 47, 51, 56, 52, 45, 49, 53, 53, 49, 43, 50, 50, 50, …
## $ prof <dbl> 106, 97, 86, 96, 100, 115, 146, 152, 151, 131, 139, …
## $ tech <dbl> 97, 91, 89, 87, 86, 97, 108, 120, 118, 121, 125, 135…
## $ cleric <dbl> 144, 133, 135, 113, 130, 137, 143, 143, 155, 161, 15…
## $ service <dbl> 408, 418, 422, 381, 380, 406, 450, 439, 409, 394, 41…
## $ agricul <dbl> 29, 25, 22, 20, 21, 22, 22, 20, 20, 18, 19, 19, 21, …
## $ trade <dbl> 434, 424, 395, 366, 346, 325, 350, 337, 364, 358, 40…
## $ operator <dbl> 240, 259, 248, 251, 233, 236, 235, 241, 252, 267, 27…
## $ elemantary <dbl> 385, 384, 349, 310, 275, 296, 293, 305, 326, 358, 36…
occ_group_female <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/occupational_group.xls?raw=true",
range = "TÜRKİYE!AA7:AL92",
col_names = c("year", "month", "total_unemployed", "manager", "prof", "tech", "cleric", "service", "agricul", "trade", "operator", "elemantary")) %>%
fill(year, .direction = "down") %>%
as_tibble()
occ_group_female$month <- str_split_fixed(occ_group_female$month, " - ", 2)[,2]
Final tibble is as follows.
## Rows: 86
## Columns: 12
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014…
## $ month <chr> "January", "February", "March", "April", "May", "Jun…
## $ total_unemployed <dbl> 915, 942, 944, 903, 935, 971, 1065, 1133, 1220, 1194…
## $ manager <dbl> 16, 18, 23, 18, 16, 13, 12, 14, 11, 9, 14, 17, 15, 2…
## $ prof <dbl> 132, 120, 118, 130, 144, 171, 196, 238, 245, 220, 17…
## $ tech <dbl> 84, 77, 77, 71, 84, 83, 97, 98, 118, 129, 133, 118, …
## $ cleric <dbl> 199, 217, 220, 238, 246, 258, 272, 292, 306, 301, 28…
## $ service <dbl> 244, 273, 271, 242, 238, 247, 282, 271, 295, 277, 28…
## $ agricul <dbl> 3, 2, 3, 3, 3, 2, 2, 3, 3, 2, 2, 1, 2, 2, 1, 1, 1, 2…
## $ trade <dbl> 45, 48, 46, 34, 38, 43, 42, 45, 42, 46, 36, 41, 42, …
## $ operator <dbl> 17, 16, 12, 20, 21, 22, 20, 17, 19, 16, 23, 32, 19, …
## $ elemantary <dbl> 175, 171, 174, 148, 146, 131, 142, 157, 181, 194, 19…
The original dataset can be found at this link.
It consists of annual numbers of employed and unemployed people by field of education. The dataset consists of the statistics from 2014 to 2019
year
column. Therefore, empty cells filled with the correct year value.last_graduated_major <- import("https://github.com/pjournal/mef04g-rhapsody/blob/gh-pages/Project_Data/major_field.xls?raw=true",
range = "TURKIYE!A6:Y41",
col_names = c("year", "statistics", "higher_ed_grad", "education", "arts", "humanities", "languages", "social_sci", "journalism", "business", "law", "biology_env_related_sci", "physical_sci", "math_stat", "info_communication_tech", "engineering", "manufacturing_processing", "architecture_construction", "agriculture_forestry_fishery", "veterinary", "health", "welfare", "personal_services", "occupational_health_transport_services", "security_services")) %>%
fill(year, .direction = "down") %>%
as_tibble()
Final tibble is as follows.
## Rows: 36
## Columns: 25
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, …
## $ statistics <chr> "İşgücü \…
## $ higher_ed_grad <dbl> 5691.00000, 606.00000, 5085.00…
## $ education <dbl> 786.000000, 58.000000, 728.000…
## $ arts <dbl> 141.00000, 23.00000, 118.00000…
## $ humanities <dbl> 164.000000, 11.000000, 153.000…
## $ languages <dbl> 111.000000, 9.000000, 102.0000…
## $ social_sci <dbl> 527.00000, 59.00000, 468.00000…
## $ journalism <dbl> 24.00000, 7.00000, 17.00000, 2…
## $ business <dbl> 1547.00000, 211.00000, 1336.00…
## $ law <dbl> 109.000000, 8.000000, 101.0000…
## $ biology_env_related_sci <dbl> 75.00000, 11.00000, 64.00000, …
## $ physical_sci <dbl> 154.00000, 22.00000, 132.00000…
## $ math_stat <dbl> 86.000000, 8.000000, 78.000000…
## $ info_communication_tech <dbl> 119.00000, 20.00000, 99.00000,…
## $ engineering <dbl> 662.000000, 58.000000, 604.000…
## $ manufacturing_processing <dbl> 129.00000, 17.00000, 112.00000…
## $ architecture_construction <dbl> 230.00000, 25.00000, 205.00000…
## $ agriculture_forestry_fishery <dbl> 129.00000, 15.00000, 114.00000…
## $ veterinary <dbl> 45.000000, 3.000000, 42.000000…
## $ health <dbl> 350.000000, 11.000000, 339.000…
## $ welfare <dbl> 26.00000, 5.00000, 21.00000, 1…
## $ personal_services <dbl> 139.00000, 18.00000, 121.00000…
## $ occupational_health_transport_services <dbl> 16.00000, 2.00000, 14.00000, 1…
## $ security_services <dbl> 119.000000, 3.000000, 116.0000…
After preparing all the datasets, we can create a .Rdata file. In order to do this, all tibbles are saved into a single file named project_all_data.RData
. In further analysis, loading this .Rdata file will be sufficient to reach all the necessary data.
save(job_search_overall, job_search_male, job_search_female, educational_level_overall,
occ_group_overall, occ_group_male, occ_group_female, last_graduated_major,
file = "project_all_data.RData")
The created .Rdata file can be reached through this link.