2 Preprocessing the Data

Published

January 9, 2023

The Data Sets

There are 4 data sets that are used in the project. These are called;

Number of Departing Visitors by Country of Residence
Foreign and Citizen Visitors by Purpose of Visit
Tourism Income, Expenditure and Average Number of Nights
Tourism Income, Number of Visitors and Average Expenditure per Capita by Months

These data sets were taken from TUIK. TUIK is an official data source that provides data for Turkey in diverse topics. Link to their website. Links for the data sets that are used here are in link1, link2, link3 and link4.

They will be processed before any calculations or visualizations to avoid possible errors/mistakes and to be used more effectively.

The data sets contain multiple headers, column names that are written in 2 languages and other explanations that has no place in a data frame. There could be missing values, symbols and false data types in the data. They will be examined and fixed as well.

Processing will start with importing the necessary libraries.

Code

library(readxl)
library(knitr)
library(dplyr)
library(tidyr)
library(reshape2)
library(writexl)

2.1 Number of Departing Visitors by Country of Residence

Code

dep <- readxl::read_excel("term_project/Number of Departing Visitors by Country of Residence.xls", skip = 4)
glimpse(dep)

Rows: 47
Columns: 11
$ Nationality <chr> "A.B.D. - USA", "Almanya - Germany", "Avusturya - Austria"…
$ `2012`      <dbl> 883408, 7305228, 832019, 663758, 838895, 2643699, 1514894,…
$ `2013`      <dbl> 856728, 7378650, 916069, 715578, 859199, 2738368, 1640259,…
$ `2014`      <dbl> 888077, 7794762, 802133, 726078, 809843, 2818021, 1701021,…
$ `2015`      <dbl> 833850, 8402180, 836755, 650569, 814868, 2776057, 1826947,…
$ `2016`      <dbl> 505989, 6960545, 677284, 656685, 627223, 1957576, 1710276,…
$ `2017`      <dbl> 331239, 7117716, 578074, 792883, 651702, 1951637, 1854683,…
$ `2018`      <dbl> 468281, 8022883, 620002, 916429, 751660, 2575768, 2387679,…
$ `2019`      <dbl> 626298, 8861124, 652020, 933291, 755681, 2978764, 2719962,…
$ `2020(1)`   <dbl> 148914, 2903189, 238682, 250087, 219196, 1122967, 1190803,…
$ `2021`      <dbl> 365211, 6314266, 516081, 512215, 459091, 473681, 1339552, …

First 4 rows are skipped to get the proper header. Similar process will take place to exclude the descriptions at the bottom of the data frame.

Code

dep <- head(dep, - 15)

Second problem seems to be in the “Nationality” column. The countries were written in two different languages. They have a dash between them which can be used to exclude one of the languages. ” - ” string will be fed to the separate function to get rid off the white spaces as well.

Code

dep <- dep %>%
  separate(Nationality, c(NA,"Nationality")," - ")

There is the “(1)” character in the column for the year 2020 which is a note that says the column only contains data for the first, third and last quarter of the year due to the lack of surveys(COVID-19).

It will be removed from the column name.

Code

names(dep)[names(dep) == "2020(1)"] <- "2020"
kable(head(dep, 3))

Nationality	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
USA	883408	856728	888077	833850	505989	331239	468281	626298	148914	365211
Germany	7305228	7378650	7794762	8402180	6960545	7117716	8022883	8861124	2903189	6314266
Austria	832019	916069	802133	836755	677284	578074	620002	652020	238682	516081

Lastly, data type for the year columns should be integers. Lack of warnings will also reveal if there are empty cells. The data frame is saved as an excel file.

Code

dep <- dep %>%
    mutate(across("2012":"2021", as.integer))
write_xlsx(dep,"term_project/depart_by_residence.xlsx")
saveRDS(dep, file = "term_project/depart_by_residence.rds")

melted_dep <- melt(dep, "Nationality")
colnames(melted_dep) <- c('Nationality','Dep_Year','Departing_Visitors')

write_xlsx(melted_dep,"term_project/melted_depart_by_residence.xlsx")
saveRDS(melted_dep, file = "term_project/melted_depart_by_residence.rds")

glimpse(dep)

Rows: 32
Columns: 11
$ Nationality <chr> "USA", "Germany", "Austria", "Azerbaijan", "Belgium", "Uni…
$ `2012`      <int> 883408, 7305228, 832019, 663758, 838895, 2643699, 1514894,…
$ `2013`      <int> 856728, 7378650, 916069, 715578, 859199, 2738368, 1640259,…
$ `2014`      <int> 888077, 7794762, 802133, 726078, 809843, 2818021, 1701021,…
$ `2015`      <int> 833850, 8402180, 836755, 650569, 814868, 2776057, 1826947,…
$ `2016`      <int> 505989, 6960545, 677284, 656685, 627223, 1957576, 1710276,…
$ `2017`      <int> 331239, 7117716, 578074, 792883, 651702, 1951637, 1854683,…
$ `2018`      <int> 468281, 8022883, 620002, 916429, 751660, 2575768, 2387679,…
$ `2019`      <int> 626298, 8861124, 652020, 933291, 755681, 2978764, 2719962,…
$ `2020`      <int> 148914, 2903189, 238682, 250087, 219196, 1122967, 1190803,…
$ `2021`      <int> 365211, 6314266, 516081, 512215, 459091, 473681, 1339552, …

2.2 Foreign and Citizen Visitors by Purpose of Visit

Second data set will be processed. Similar manipulations with the previous chapter will not be explained in detail to offer a better readability. Impracticable rows are excluded again.

This time, column names will be checked first. Only 5 column names are shown.

Code

purp <- readxl::read_excel("term_project/Foreign and Citizen Visitors by Purpose of Visit (Foreigner and Citizens Resident Abroad).xls", skip = 5)
purp <- head(purp, -9)
colnames(purp)[1:5]

[1] "Yıl"                                                                                                        
[2] "Çeyrek"                                                                                                     
[3] "Toplam \nTotal...3"                                                                                         
[4] "Gezi, eğlence, sportif ve kültürel faaliyetler \nTravel, entertainment, sportive or cultural activities...4"
[5] "Akraba ve arkadaş ziyareti \nVisiting relatives and friends...5"

These column names seem chaotic enough to confuse the user. There are several things that can be done.

First of all, they are named in two languages again. Most of them are separated with the new line indicator “\n”. Year and quarter columns are duplicated in two languages, duplicates will be dropped.

In this chunk, everything before “\n” character is dropped if there is one, duplicated columns are dropped, some of the other characters are dropped and spaces are replaced with “_“.

Code

names(purp)[3:23] <- sub(".*?\n", "", names(purp)[3:23])
names(purp) <- gsub(r"{\s*\([^\)]+\)}","",names(purp))
names(purp) <- trimws(names(purp), "l")
names(purp) <- gsub(" / ", "_", names(purp))
names(purp) <- gsub(" ", "_", names(purp))
names(purp)[3:7] <- substr(names(purp)[3:7],1,nchar(names(purp)[3:7])-4)
names(purp)[8:23] <- substr(names(purp)[8:23],1,nchar(names(purp)[8:23])-5)
names(purp) <- gsub(",", "", names(purp))
purp <- subset(purp, select = -c(Yıl,Çeyrek))
purp <- purp[,-11]
names(purp)[1:5]

[1] "Total"                                               
[2] "Travel_entertainment_sportive_or_cultural_activities"
[3] "Visiting_relatives_and_friends"                      
[4] "Education_training"                                  
[5] "Health_or_medical_reasons"

There are still duplicated column names yet they have different values. First columns represent the value for all of the tourists while others represent the values for Turkish citizens who live abroad. They can be distinguished with prefixes.

Code

names(purp)[1:10] <- paste0("ALL_", names(purp)[1:10])
names(purp)[11:20] <- paste0("TR_", names(purp)[11:20])
purp <- purp %>% select(Year, Quarter, ALL_Total:TR_Other)
glimpse(purp)

Rows: 54
Columns: 22
$ Year                                                     <dbl> 2012, NA, NA,…
$ Quarter                                                  <chr> "Annual", "I"…
$ ALL_Total                                                <chr> "36463921.041…
$ ALL_Travel_entertainment_sportive_or_cultural_activities <chr> "24953961", "…
$ ALL_Visiting_relatives_and_friends                       <chr> "6792033", "1…
$ ALL_Education_training                                   <chr> "231152", "51…
$ ALL_Health_or_medical_reasons                            <chr> "240682", "63…
$ ALL_Religion_Pilgrimag                                   <chr> "73510", "112…
$ ALL_Shoppin                                              <chr> "934204", "14…
$ ALL_Transit                                              <chr> "45194", "161…
$ ALL_Business                                             <chr> "2224844", "5…
$ ALL_Other                                                <chr> "968339", "22…
$ TR_Total                                                 <chr> "5121457", "8…
$ TR_Travel_entertainment_sportive_or_cultural_activities  <chr> "1083976", "1…
$ TR_Visiting_relatives_and_friends                        <chr> "3645145", "5…
$ TR_Education_training                                    <chr> "21768", "621…
$ TR_Health_or_medical_reasons                             <chr> "67151", "239…
$ TR_Religion_Pilgrimage                                   <chr> "5973", "1775…
$ TR_Shopping                                              <chr> "26330", "931…
$ TR_Transit                                               <chr> "-", "-", "-"…
$ TR_Business                                              <chr> "244774", "68…
$ TR_Other                                                 <chr> "26340", "399…

Code

kable(head(purp))

Year	Quarter	ALL_Total	ALL_Travel_entertainment_sportive_or_cultural_activities	ALL_Visiting_relatives_and_friends	ALL_Education_training	ALL_Health_or_medical_reasons	ALL_Religion_Pilgrimag	ALL_Shoppin	ALL_Transit	ALL_Business	ALL_Other	TR_Total	TR_Travel_entertainment_sportive_or_cultural_activities	TR_Visiting_relatives_and_friends	TR_Education_training	TR_Health_or_medical_reasons	TR_Religion_Pilgrimage	TR_Shopping	TR_Transit	TR_Business	TR_Other
2012	Annual	36463921.041000001	24953961	6792033	231152	240682	73510	934204	45194	2224844	968339	5121457	1083976	3645145	21768	67151	5973	26330	-	244774	26340
NA	I	4219162	2005504	1168263	51489	63843	11203	148181	16131	532073	222474	844430	130458	599931	6212	23962	1775	9318	-	68779	3994
NA	II	9323459	6752489	1189394	88929	58283	28866	244106	10525	654678	296190	911152	199092	586250	7134	16861	649	9079	-	81063	11025
NA	III	15437123	11439809	3072660	43643	44905	14690	204199	13743	353468	250005	2276006	499200	1722066	5164	8951	1377	4131	-	26509	8607
NA	IV	7484177	4756160	1361716	47091	73652	18751	337718	4795	684625	199669	1089869	255226	736898	3258	17377	2172	3801	-	68422	2715
2013	Annual	39226225.794699997	26817201	7239397	195918	300102	62762	1000734	41172	2404344	1164596	5398751.7854000004	1322033	3655078	20330	90579	4681	39986	-	255493	10571

There are still problems with the data. It can be seen that years are partially empty which should be repeated after first instances. There are problems in the data types as well. Lastly, Transit column for Turkish citizens is filled with dashes, it would be more efficient to simply keep them as zeros.

None values in the Year column are filled below.

Code

purp <- purp %>% fill(Year)

Transit column is filled with zeros.

Code

purp$TR_Transit <- 0

Annual values for 2022 are empty and the values were filled with dashes for the second quarter of the year 2020 as explained before. There are other instances of dashes in the data set as well. They will be filled with zeros.

Code

kable(filter(purp, rowSums(is.na(purp)) > 0 | (Year == 2020 & Quarter == "II")))

Year	Quarter	ALL_Total	ALL_Travel_entertainment_sportive_or_cultural_activities	ALL_Visiting_relatives_and_friends	ALL_Education_training	ALL_Health_or_medical_reasons	ALL_Religion_Pilgrimag	ALL_Shoppin	ALL_Transit	ALL_Business	ALL_Other	TR_Total	TR_Travel_entertainment_sportive_or_cultural_activities	TR_Visiting_relatives_and_friends	TR_Education_training	TR_Health_or_medical_reasons	TR_Religion_Pilgrimage	TR_Shopping	TR_Transit	TR_Business	TR_Other
2020	II	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	0	-	-
2022	Annual	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0	NA	NA

Code

purp <- data.frame(lapply(purp, gsub, pattern = "-", replacement = 0))
purp[is.na(purp)] <- 0

Lastly, the columns are ready to be assigned by their correct types and the data is ready to be exported.

Code

purp[,-2] <- sapply(purp[,-2],as.integer)
write_xlsx(purp,"term_project/purposes.xlsx")
saveRDS(purp, file = "term_project/purposes.rds")
glimpse(purp)

Rows: 54
Columns: 22
$ Year                                                     <int> 2012, 2012, 2…
$ Quarter                                                  <chr> "Annual", "I"…
$ ALL_Total                                                <int> 36463921, 421…
$ ALL_Travel_entertainment_sportive_or_cultural_activities <int> 24953961, 200…
$ ALL_Visiting_relatives_and_friends                       <int> 6792033, 1168…
$ ALL_Education_training                                   <int> 231152, 51489…
$ ALL_Health_or_medical_reasons                            <int> 240682, 63843…
$ ALL_Religion_Pilgrimag                                   <int> 73510, 11203,…
$ ALL_Shoppin                                              <int> 934204, 14818…
$ ALL_Transit                                              <int> 45194, 16131,…
$ ALL_Business                                             <int> 2224844, 5320…
$ ALL_Other                                                <int> 968339, 22247…
$ TR_Total                                                 <int> 5121457, 8444…
$ TR_Travel_entertainment_sportive_or_cultural_activities  <int> 1083976, 1304…
$ TR_Visiting_relatives_and_friends                        <int> 3645145, 5999…
$ TR_Education_training                                    <int> 21768, 6212, …
$ TR_Health_or_medical_reasons                             <int> 67151, 23962,…
$ TR_Religion_Pilgrimage                                   <int> 5973, 1775, 6…
$ TR_Shopping                                              <int> 26330, 9318, …
$ TR_Transit                                               <int> 0, 0, 0, 0, 0…
$ TR_Business                                              <int> 244774, 68779…
$ TR_Other                                                 <int> 26340, 3994, …

2.3 Tourism Income, Expenditure and Average Number of Nights

This data set is quite similar to the previous one.

Code

night <- readxl::read_excel("term_project/Tourism Income, Expenditure and Average Number of Nights.xls", skip = 4)
night <- head(night, -9)
colnames(night)

 [1] "Yıl\nYear"                                                                                                                
 [2] "Yıllık-Annual\nÇeyrek-Quarter"                                                                                            
 [3] "Turizm  geliri\nTourism income\n( 000 $)"                                                                                 
 [4] "Ziyaretçi sayısı\nNumber of departing \nvisitors"                                                                         
 [5] "Kişi başı ortalama \nharcama\nAverage expenditure per capita\n($)...5"                                                    
 [6] "Ortalama geceleme sayısı \nAverage number of overnights...6"                                                              
 [7] "...7"                                                                                                                     
 [8] "Turizm  gideri\nTourism expenditure\n( 000 $)"                                                                            
 [9] "\nTürkiye'de ikamet eden yurt dışını ziyaret eden vatandaş sayısı\nNumber of citizens (resident in Turkey) visited abroad"
[10] "Kişi başı ortalama \nharcama\nAverage expenditure per capita\n($)...10"                                                   
[11] "Ortalama geceleme sayısı \nAverage number of overnights...11"

The usual suspects could be eliminated with similar methods. Yet these column names have so much individual problems in such small amount that it will be shorter to change most of them manually.

Code

night <- night[,-7]
names(night) <- sub(".*?\n", "", names(night))
colnames(night)[2] <- "Quarter"
colnames(night)[3] <- "Tourism_Income_in_ThousandDollars"
colnames(night)[4] <- "Number_of_Departing_Visitors"
colnames(night)[5] <- "ALL_Average_expenditure_per_capita_in_Dollars"
colnames(night)[6] <- "ALL_Average_number_of_overnights"
colnames(night)[7] <- "Tourism_expenditure_in_ThousandDollars"
colnames(night)[8] <- "Number_of_Turkish_citizens_visited_abroad"
colnames(night)[9] <- "TR_Average_expenditure_per_capita_in_Dollars"
colnames(night)[10] <- "TR_Average_number_of_overnights"

names(night)

 [1] "Year"                                         
 [2] "Quarter"                                      
 [3] "Tourism_Income_in_ThousandDollars"            
 [4] "Number_of_Departing_Visitors"                 
 [5] "ALL_Average_expenditure_per_capita_in_Dollars"
 [6] "ALL_Average_number_of_overnights"             
 [7] "Tourism_expenditure_in_ThousandDollars"       
 [8] "Number_of_Turkish_citizens_visited_abroad"    
 [9] "TR_Average_expenditure_per_capita_in_Dollars" 
[10] "TR_Average_number_of_overnights"

Current status of the data frame:

Code

kable(tail(night))

Year	Quarter	Tourism_Income_in_ThousandDollars	Number_of_Departing_Visitors	ALL_Average_expenditure_per_capita_in_Dollars	ALL_Average_number_of_overnights	Tourism_expenditure_in_ThousandDollars	Number_of_Turkish_citizens_visited_abroad	TR_Average_expenditure_per_capita_in_Dollars	TR_Average_number_of_overnights
NA	III	14126732	13640672.334207579	1035.6331193861683	11.36096473622375	584378.93482789095	873026.66579242004	669.37123197430367	18.717043940553779
NA	IV	9306804	9050112.2298819609	1028.3634228612639	13.23171319647536	696183.34344183875	1188802.7701180396	585.61719482930937	16.496943017713928
2022	Yıllık-Annual	NA	NA	NA	NA	NA	NA	NA	NA
NA	I	6561011	6451656.9355938099	1016.949711910887	12.455328498538254	664989.01667372952	1039666.0644061896	639.61789216765612	17.470795565958081
NA	II	10515168	11939130.535165597	880.73147956867979	9.440213100360511	1057787.2055852062	1666135.4648344023	634.87467130431673	16.310156750210435
NA	III	17952361	21000127.519801684	854.8691370122466	9.7038686869594368	1106285.4788077581	2072116.4801983174	533.89154971726202	9.8233256160661639

Null values in the year column are filled again.

Code

night <- night %>% fill(Year)

Some of the cells contain two languages. They are changed.

Code

night$Quarter[night$Quarter == "Yıllık-Annual"] <- "Anual"

Next, null values and dashes are filled with zeros.

Code

night <- data.frame(lapply(night, gsub, pattern = "-", replacement = 0))
night[is.na(night)] <- 0

Lastly, data types are changed and the data frame is exported.

Code

night[,-2] <- sapply(night[,-2],as.numeric)
night[, 1] <- sapply(night[,1],as.integer)
night[, 4] <- sapply(night[,4],as.integer)
night[, 8] <- sapply(night[,8],as.integer)

write_xlsx(night,"term_project/income_nights.xlsx")
saveRDS(night, file = "term_project/income_nights.rds")

kable(head(night))

Year	Quarter	Tourism_Income_in_ThousandDollars	Number_of_Departing_Visitors	ALL_Average_expenditure_per_capita_in_Dollars	ALL_Average_number_of_overnights	Tourism_expenditure_in_ThousandDollars	Number_of_Turkish_citizens_visited_abroad	TR_Average_expenditure_per_capita_in_Dollars	TR_Average_number_of_overnights
2012	Anual	29689249	36463921	814.2089	10.820622	4593390	5802949	791.5612	12.500000
2013	Anual	33073502	39226225	843.1477	10.203777	5253565	7525869	698.0675	13.085486
2014	Anual	35137949	41415070	848.4339	9.986969	5470481	7982263	685.3295	12.901660
2015	Anual	32492212	41617530	780.7338	10.065018	5698423	8750851	651.1850	11.942026
2016	Anual	22839468	31365329	728.1756	11.353400	5049793	7891909	639.8697	10.997575
2017	Anual	27044542	38620345	700.2667	10.864540	5137244	8886916	578.0681	9.872048

2.4 Tourism Income, Number of Visitors and Average Expenditure per Capita by Months

This data set has two headers, the first one shows the years which is not included here.

Code

mon <- readxl::read_excel("term_project/Tourism income, number of visitors and average expenditure per capita by months.xls", skip = 4)
mon <- head(mon, -9)
colnames(mon)[1:7]

[1] "Aylar - Months"                                                          
[2] "Turizm geliri Tourism\nincome\n(000 $)...2"                              
[3] "Ziyaretçi\nsayısı\nNumber of\nvisitors...3"                              
[4] "Kişi başı \nortalama\nharcama\nAverage\nexpenditure\nper capita\n($)...4"
[5] "Turizm geliri Tourism\nincome\n(000 $)...5"                              
[6] "Ziyaretçi\nsayısı\nNumber of\nvisitors...6"                              
[7] "Kişi başı \nortalama\nharcama\nAverage\nexpenditure\nper capita\n($)...7"

Column names are edited again.

Code

names(mon)[1] <- "Months"
names(mon)[c(2,5,8,11,14,17,20,23,26,29,32)] <- "Tourism_Income_in_ThousandDollars"
names(mon)[c(3,6,9,12,15,18,21,24,27,30,33)] <- "Number_of_Visitors"
names(mon)[c(4,7,10,13,16,19,22,25,28,31,34)] <- "Average_expenditure_per_capita"

names(mon)[2:4] <- paste0(names(mon)[2:4], "_2012")
names(mon)[5:7] <- paste0(names(mon)[5:7], "_2013")
names(mon)[8:10] <- paste0(names(mon)[8:10], "_2014")
names(mon)[11:13] <- paste0(names(mon)[11:13], "_2015")
names(mon)[14:16] <- paste0(names(mon)[14:16], "_2016")
names(mon)[17:19] <- paste0(names(mon)[17:19], "_2017")
names(mon)[20:22] <- paste0(names(mon)[20:22], "_2018")
names(mon)[23:25] <- paste0(names(mon)[23:25], "_2019")
names(mon)[26:28] <- paste0(names(mon)[26:28], "_2020")
names(mon)[29:31] <- paste0(names(mon)[29:31], "_2021")
names(mon)[32:34] <- paste0(names(mon)[32:34], "_2022")

names(mon)[1:7]

[1] "Months"                                
[2] "Tourism_Income_in_ThousandDollars_2012"
[3] "Number_of_Visitors_2012"               
[4] "Average_expenditure_per_capita_2012"   
[5] "Tourism_Income_in_ThousandDollars_2013"
[6] "Number_of_Visitors_2013"               
[7] "Average_expenditure_per_capita_2013"

Months column is changed.

Code

mon <- mon %>%
  separate(Months, c(NA,"Months")," - ")

Dashes and null values are filled with zeros.

Code

mon <- data.frame(lapply(mon, gsub, pattern = "-", replacement = 0))
mon[is.na(mon)] <- 0

Lastly, data types are changed and the data frame is exported.

Code

mon[,-1] <- sapply(mon[,-1],as.numeric)
mon[,grepl("Number_of_Visitors",names(mon))]<-sapply(mon[,grepl("Number_of_Visitors", names(mon))],as.integer)


write_xlsx(mon,"term_project/income_months.xlsx")
saveRDS(mon, file = "term_project/income_months.rds")

kable(head(mon))

Months	Tourism_Income_in_ThousandDollars_2012	Number_of_Visitors_2012	Average_expenditure_per_capita_2012	Tourism_Income_in_ThousandDollars_2013	Number_of_Visitors_2013	Average_expenditure_per_capita_2013	Tourism_Income_in_ThousandDollars_2014	Number_of_Visitors_2014	Average_expenditure_per_capita_2014	Tourism_Income_in_ThousandDollars_2015	Number_of_Visitors_2015	Average_expenditure_per_capita_2015	Tourism_Income_in_ThousandDollars_2016	Number_of_Visitors_2016	Average_expenditure_per_capita_2016	Tourism_Income_in_ThousandDollars_2017	Number_of_Visitors_2017	Average_expenditure_per_capita_2017	Tourism_Income_in_ThousandDollars_2018	Number_of_Visitors_2018	Average_expenditure_per_capita_2018	Tourism_Income_in_ThousandDollars_2019	Number_of_Visitors_2019	Average_expenditure_per_capita_2019	Tourism_Income_in_ThousandDollars_2020	Number_of_Visitors_2020	Average_expenditure_per_capita_2020	Tourism_Income_in_ThousandDollars_2021	Number_of_Visitors_2021	Average_expenditure_per_capita_2021	Tourism_Income_in_ThousandDollars_2022	Number_of_Visitors_2022	Average_expenditure_per_capita_2022
Total	29689249	36463920	814.2089	33073502	39226225	843.1477	35137949	41415070	848.4339	32492212	41617530	780.7338	22839468	31365329	728.1756	27044542	38620345	700.2667	30545924	45628672	669.4458	38930474	51860042	750.6834	14817273.3	15826266	936.2457	30173587.5	29357463	1027.7995	0	0	0.0000
January	1143894	1374400	832.2854	1469297	1466127	1002.1615	1540396	1575399	977.7813	1666096	1762004	945.5687	1442336	1691287	852.8040	1168279	1568343	744.9123	1537993	2045340	751.9495	1755674	2226287	788.6106	2085858.3	2529422	824.6380	854474.5	829931	1029.5719	2259265	2158066	1046.8934
February	1052891	1209064	870.8315	1401129	1415328	989.9672	1461263	1523244	959.3096	1462829	1564925	934.7592	1214408	1517503	800.2669	1013689	1432341	707.7149	1324531	1806821	733.0722	1505062	1944956	773.8280	1682608.0	2051922	820.0153	722423.9	727125	993.5335	1870892	1851394	1010.5316
March	1375023	1635696	840.6349	1837104	1892369	970.7956	1869525	1967114	950.3896	1861352	2017645	922.5370	1497145	1898762	788.4848	1260527	1844076	683.5546	1641208	2270019	722.9929	1865797	2473146	754.4225	895925.7	1058067	846.7564	1059070.8	1043410	1015.0086	2430853	2442196	995.3553
April	1763753	2231942	790.2322	2004636	2418962	828.7172	2158634	2573138	838.9108	1923638	2626663	732.3503	1394602	2049238	680.5466	1415774	2278537	621.3523	1873123	2870568	652.5267	2287216	3266255	700.2564	175638.2	0	0.0000	1198135.0	1179561	1015.7460	2525296	2921440	864.4012
May	2466505	3194546	772.0985	3074218	3717734	826.9064	3229089	3863882	835.7109	2806667	3775012	743.4854	1895207	2749648	689.2543	1965730	3095281	635.0730	2506235	3790524	661.1842	3024127	4219837	716.6453	196363.2	0	0.0000	1035085.8	1025559	1009.2889	3611055	4078424	885.4043

This was the final step of the processing. Data sets are ready to be explored.