In this assignment, we analyse the dataset selected from Global Dietary Database website, which includes B12 intake of former Soviet union countries, each row representing the B12 intake estimation for each 5 year between 1990 and 2020 and for each country, based on the variables of gender, residence (urban or rural), age group and education level.
iso3 age female urban edu year median lowerci_95 upperci_95 gender
1 ALB 999 0 999 999 1990 3.634332 2.303403 6.054550 male
2 ALB 999 0 999 999 1995 3.737716 2.433315 5.981505 male
3 ALB 999 0 999 999 2000 3.824925 2.513833 6.216503 male
4 ALB 999 0 999 999 2005 3.938703 2.665851 6.267153 male
5 ALB 999 0 999 999 2010 4.067680 2.738054 6.253339 male
6 ALB 999 0 999 999 2015 4.092445 2.777237 6.344306 male
7 ALB 999 0 999 999 2018 4.070591 2.761252 6.266195 male
8 ALB 999 0 999 999 2020 4.087936 2.801015 6.307502 male
9 ALB 999 1 999 999 1990 3.292622 2.004336 5.514846 female
10 ALB 999 1 999 999 1995 3.399279 2.134920 5.449213 female
residence
1 all residences
2 all residences
3 all residences
4 all residences
5 all residences
6 all residences
7 all residences
8 all residences
9 all residences
10 all residences
Code
#1.3) Education Levelgdd_f <- gdd_r %>%mutate(edu_level =case_when(edu ==1~'low', edu ==2~'medium', edu ==3~'high',edu ==999~'all levels'))gdd_f %>%head(10)
iso3 age female urban edu year median lowerci_95 upperci_95 gender
1 ALB 999 0 999 999 1990 3.634332 2.303403 6.054550 male
2 ALB 999 0 999 999 1995 3.737716 2.433315 5.981505 male
3 ALB 999 0 999 999 2000 3.824925 2.513833 6.216503 male
4 ALB 999 0 999 999 2005 3.938703 2.665851 6.267153 male
5 ALB 999 0 999 999 2010 4.067680 2.738054 6.253339 male
6 ALB 999 0 999 999 2015 4.092445 2.777237 6.344306 male
7 ALB 999 0 999 999 2018 4.070591 2.761252 6.266195 male
8 ALB 999 0 999 999 2020 4.087936 2.801015 6.307502 male
9 ALB 999 1 999 999 1990 3.292622 2.004336 5.514846 female
10 ALB 999 1 999 999 1995 3.399279 2.134920 5.449213 female
residence edu_level
1 all residences all levels
2 all residences all levels
3 all residences all levels
4 all residences all levels
5 all residences all levels
6 all residences all levels
7 all residences all levels
8 all residences all levels
9 all residences all levels
10 all residences all levels
1.3 | DATA ANALYSIS
1.3.1 | Mean and Median Change With Time for Each Country for General Population:
1.3.1.1 | Prepare The Data:
In order to compare the general country B12 intakes in between, filter all the variables as 999 (representing general population) and take the mean and median for each country and summarise:
`summarise()` has grouped output by 'iso3'. You can override using the
`.groups` argument.
Code
gdd_1
# A tibble: 232 × 4
# Groups: iso3 [29]
iso3 year mean_values median
<chr> <int> <dbl> <dbl>
1 EST 2010 7.16 7.10
2 EST 1990 7.15 7.09
3 EST 2018 7.14 7.08
4 EST 2005 7.13 7.08
5 EST 2020 7.13 7.07
6 EST 2015 7.12 7.07
7 EST 2000 7.10 7.05
8 EST 1995 7.07 7.00
9 BLR 2020 4.50 4.10
10 BLR 2018 4.51 4.07
# ℹ 222 more rows
1.3.1.2 |Median of Countries Across Years:
Code
gdd_2 <-ggplot(gdd_1, aes(x = year, y = median, color = iso3)) +geom_line()gdd_2
1.3.1.3 |Result for Country Medians Across Years:
None of the countries show an extreme change in median B12 intake throughout the years, TJK as the most changing one. As per the general population intakes of countries, BGR and partly TJK are the outlier values in terms of median with lower intake. Est has the outlier value as higher intake.It is difficult to talk about a trend per country.
1.3.1.4 |Mean of Countries Across Years:
Code
gdd_3 <-ggplot(gdd_1, aes(x = year, y = mean_values, color = iso3)) +geom_line()gdd_3
1.3.1.5 |Result for Country Means Across Years:
For the countries around average mean value, we can see an extreme upward trend from year 2000 to 2010 and then a decline towards 2015, and remain constant between 2015 and 2020.
In terms of mean, there are outliers below 4 mg intake,but hard to recognise as the colors are similar, and consistently with median graph, EST is the upper outlier with higher intake.
1.3.2 |Comparison of Average B12 Intake of General Population For All The Years In Terms of Country
To show the general population mean B12 intake with a bar chart for each country, first we summarise the table with the average of all the years rather than for each year:
# A tibble: 29 × 2
iso3 mean_values
<chr> <dbl>
1 EST 7.12
2 CZE 4.45
3 BLR 4.43
4 RUS 4.39
5 MNG 4.37
6 LTU 4.37
7 LVA 4.36
8 HUN 4.33
9 SVN 4.33
10 MNE 4.32
# ℹ 19 more rows
Code
#bar chart to compare country average intake:gdd_5 <-ggplot(gdd_4, aes(x = iso3, y = mean_values, color = iso3)) +geom_col() gdd_5
1.3.2.1 |Result:
We can see three BGR, ROU and POL have lower average intake and EST as higher intake more clearly in this chart.
1.3.3 | Try To Understand if Any Specific Age Group Affects Lower Intake Countries’ Values:
1.3.3.1 |For Lower Intake Countries:
We can analyse the countries with outlier values in more detail. To see the effect of age groups for B12 intake, select countries BGR, ROU and POL (lower intake), select for all lifestyles, all education levels, all genders but only different age groups, and visualise the table:
Code
gdd_6 <- gdd %>%filter(female ==999, urban ==999, edu ==999, age <999, iso3 =="BGR"| iso3 =="ROU"| iso3 =="POL") %>%group_by(age) %>%#again to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_6
#visualise to see if there is any major change by different age group:gdd_7 <-ggplot(gdd_6, aes(x = age, y = mean_values, fill = age)) +geom_col() +geom_text(aes(label =round(mean_values, 1))) gdd_7
1.3.3.2 |Result for Lower Intake Countries:
We can see that B12 intake for children up tp 10 year is very low (ranging from 1.5 mg to 2.5mg), however for age groups taking higher B12, the average intake is roughly above 3. It seems there is no extreme low intake for some age groups affecting the average of these countries.
1.3.3.3 | All Countries:
To see if for lower intake countries the age group intake is similar to the other countries, we can prepare the same chart including all countries and prepare another bar chart:
Code
gdd_8 <- gdd %>%filter(female ==999, urban ==999, edu ==999, age <999) %>%group_by(age) %>%#again to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_8
It seems the children’s intake trend compared to adolescent and adult intake is in line for lower-average countries when compared to all countries.We can say that the lower intake of a specific group affecting the average is not the case here.
1.3.4 |Comparison of Lower-Intake / Higher-Intake / All Countries In Terms of Gender and Residence:
1.3.4.1 |All Countries:
Code
gdd_10 <- gdd_f %>%filter(age==999, gender =="female"| gender =="male", residence =="urban"| residence =="rural", edu ==999) %>%group_by(gender, residence) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))
`summarise()` has grouped output by 'gender'. You can override using the
`.groups` argument.
gdd_12 <- gdd_f %>%filter(age==999, gender =="female"| gender =="male", residence =="urban"| residence =="rural", edu ==999, iso3 =="BGR"| iso3 =="ROU"| iso3 =="POL") %>%group_by(gender, residence) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))
`summarise()` has grouped output by 'gender'. You can override using the
`.groups` argument.
gdd_14 <- gdd_f %>%filter(age==999, gender =="female"| gender =="male", residence =="urban"| residence =="rural", edu ==999, iso3 =="EST") %>%group_by(gender, residence) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))
`summarise()` has grouped output by 'gender'. You can override using the
`.groups` argument.
The gap between the females and males is lowest for EST, high intake country, in relative to the other groups, and relatively the highest for lower intake countries, which means female intake is significantly lower than males in low-intake countries. The gap between rural an urban areas is similar in three groups, urban areas having higher intake.
1.3.5 |Comparison of Lower-Intake / Higher-Intake / All Countries In Terms of Education Leven and Gender:
1.3.5.1 |All Countries:
Code
gdd_16 <- gdd_f %>%filter(age==999, urban ==999, edu_level =="low"| edu_level =="medium"| edu_level =="high", gender =="male"| gender =="female") %>%group_by(edu_level, gender) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))
`summarise()` has grouped output by 'edu_level'. You can override using the
`.groups` argument.
Code
gdd_16
# A tibble: 6 × 3
# Groups: edu_level [3]
edu_level gender mean_values
<chr> <chr> <dbl>
1 medium male 4.58
2 high male 4.49
3 low male 4.27
4 medium female 4.12
5 high female 4.04
6 low female 3.84
gdd_18 <- gdd_f %>%filter(age==999, urban ==999, edu_level =="low"| edu_level =="medium"| edu_level =="high", gender =="male"| gender =="female", iso3 =="BGR"| iso3 =="ROU"| iso3 =="POL") %>%group_by(edu_level, gender) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))
`summarise()` has grouped output by 'edu_level'. You can override using the
`.groups` argument.
Code
gdd_18
# A tibble: 6 × 3
# Groups: edu_level [3]
edu_level gender mean_values
<chr> <chr> <dbl>
1 medium male 3.73
2 high male 3.64
3 low male 3.48
4 medium female 3.12
5 high female 3.05
6 low female 2.90
gdd_20 <- gdd_f %>%filter(age==999, urban ==999, edu_level =="low"| edu_level =="medium"| edu_level =="high", gender =="male"| gender =="female", iso3 =="EST") %>%group_by(edu_level, gender) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))
`summarise()` has grouped output by 'edu_level'. You can override using the
`.groups` argument.
Code
gdd_20
# A tibble: 6 × 3
# Groups: edu_level [3]
edu_level gender mean_values
<chr> <chr> <dbl>
1 medium male 7.50
2 high male 7.33
3 medium female 7.02
4 low male 6.98
5 high female 6.86
6 low female 6.53
---title: "Sun Forest: Assignment 1"date: "2022-11-21"editor: visualcode-fold: truecode-tools: true---In this assignment, we analyse the dataset selected from Global Dietary Database website, which includes B12 intake of former Soviet union countries, each row representing the B12 intake estimation for each 5 year between 1990 and 2020 and for each country, based on the variables of gender, residence (urban or rural), age group and education level.## \| IMPORTING THE DATASET AND THE PACKAGES```{r}gdd <-read.csv("https://raw.githubusercontent.com/berkorbay/datasets/master/gdd/gdd_b12_levels.csv") library(dplyr)library(ggplot2)```## \| PREPROCESSING```{r}#1.1) Gendergdd_g <- gdd %>%mutate(gender =case_when(female ==1~'female', female ==0~'male', female ==999~'all genders')) gdd_g %>%head(10)#1.2) Urban / Ruralgdd_r <- gdd_g %>%mutate(residence =case_when(urban ==1~'urban', urban ==0~'rural', urban ==999~'all residences'))gdd_r %>%head(10)#1.3) Education Levelgdd_f <- gdd_r %>%mutate(edu_level =case_when(edu ==1~'low', edu ==2~'medium', edu ==3~'high',edu ==999~'all levels'))gdd_f %>%head(10)```## \| DATA ANALYSIS### \| Mean and Median Change With Time for Each Country for General Population:#### \| Prepare The Data:In order to compare the general country B12 intakes in between, filter all the variables as 999 (representing general population) and take the mean and median for each country and summarise:```{r}gdd_1 <- gdd %>%filter(age==999, female ==999, urban ==999, edu ==999) %>%group_by(iso3, year) %>%summarise(mean_values = (lowerci_95+upperci_95)/2, median) %>%arrange(desc(median))gdd_1```#### \|Median of Countries Across Years:```{r}gdd_2 <-ggplot(gdd_1, aes(x = year, y = median, color = iso3)) +geom_line()gdd_2```#### \|Result for Country Medians Across Years:None of the countries show an extreme change in median B12 intake throughout the years, TJK as the most changing one. As per the general population intakes of countries, BGR and partly TJK are the outlier values in terms of median with lower intake. Est has the outlier value as higher intake.It is difficult to talk about a trend per country.#### \|Mean of Countries Across Years:```{r}gdd_3 <-ggplot(gdd_1, aes(x = year, y = mean_values, color = iso3)) +geom_line()gdd_3```#### \|Result for Country Means Across Years:For the countries around average mean value, we can see an extreme upward trend from year 2000 to 2010 and then a decline towards 2015, and remain constant between 2015 and 2020.In terms of mean, there are outliers below 4 mg intake,but hard to recognise as the colors are similar, and consistently with median graph, EST is the upper outlier with higher intake.### \|Comparison of Average B12 Intake of General Population For All The Years In Terms of CountryTo show the general population mean B12 intake with a bar chart for each country, first we summarise the table with the average of all the years rather than for each year:```{r}gdd_4 <- gdd %>%filter(age==999, female ==999, urban ==999, edu ==999) %>%group_by(iso3) %>%summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_4#bar chart to compare country average intake:gdd_5 <-ggplot(gdd_4, aes(x = iso3, y = mean_values, color = iso3)) +geom_col() gdd_5```#### \|Result:We can see three BGR, ROU and POL have lower average intake and EST as higher intake more clearly in this chart.### \| Try To Understand if Any Specific Age Group Affects Lower Intake Countries' Values:#### \|For Lower Intake Countries:We can analyse the countries with outlier values in more detail. To see the effect of age groups for B12 intake, select countries BGR, ROU and POL (lower intake), select for all lifestyles, all education levels, all genders but only different age groups, and visualise the table:```{r}gdd_6 <- gdd %>%filter(female ==999, urban ==999, edu ==999, age <999, iso3 =="BGR"| iso3 =="ROU"| iso3 =="POL") %>%group_by(age) %>%#again to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_6#visualise to see if there is any major change by different age group:gdd_7 <-ggplot(gdd_6, aes(x = age, y = mean_values, fill = age)) +geom_col() +geom_text(aes(label =round(mean_values, 1))) gdd_7```#### \|Result for Lower Intake Countries:We can see that B12 intake for children up tp 10 year is very low (ranging from 1.5 mg to 2.5mg), however for age groups taking higher B12, the average intake is roughly above 3. It seems there is no extreme low intake for some age groups affecting the average of these countries.#### \| All Countries:To see if for lower intake countries the age group intake is similar to the other countries, we can prepare the same chart including all countries and prepare another bar chart:```{r}gdd_8 <- gdd %>%filter(female ==999, urban ==999, edu ==999, age <999) %>%group_by(age) %>%#again to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_8gdd_9 <-ggplot(gdd_8, aes(x = age, y = mean_values, fill=age)) +geom_col() +geom_text(aes(label =round(mean_values, 1)))gdd_9```#### \|Result for All Countries and Comparison:It seems the children's intake trend compared to adolescent and adult intake is in line for lower-average countries when compared to all countries.We can say that the lower intake of a specific group affecting the average is not the case here.### \|Comparison of Lower-Intake / Higher-Intake / All Countries In Terms of Gender and Residence:#### \|All Countries:```{r}gdd_10 <- gdd_f %>%filter(age==999, gender =="female"| gender =="male", residence =="urban"| residence =="rural", edu ==999) %>%group_by(gender, residence) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_10gdd_11 <-ggplot(gdd_10, aes(x=residence, y=mean_values, fill=gender)) +geom_bar(stat="identity") +expand_limits(x=0) +expand_limits(y=0) +geom_text(aes(label =round(mean_values, 1)), position =position_stack(vjust =0.5)) gdd_11```#### \|Lower Intake Countries (BGR, ROU, POL):```{r}gdd_12 <- gdd_f %>%filter(age==999, gender =="female"| gender =="male", residence =="urban"| residence =="rural", edu ==999, iso3 =="BGR"| iso3 =="ROU"| iso3 =="POL") %>%group_by(gender, residence) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_12gdd_13 <-ggplot(gdd_12, aes(x=residence, y=mean_values, fill=gender)) +geom_bar(stat="identity") +expand_limits(x=0) +expand_limits(y=0) +geom_text(aes(label =round(mean_values, 1)), position =position_stack(vjust =0.5)) gdd_13```#### \|Higher Intake Country (EST):```{r}gdd_14 <- gdd_f %>%filter(age==999, gender =="female"| gender =="male", residence =="urban"| residence =="rural", edu ==999, iso3 =="EST") %>%group_by(gender, residence) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_14gdd_15 <-ggplot(gdd_14, aes(x=residence, y=mean_values, fill=gender)) +geom_bar(stat="identity") +expand_limits(x=0) +expand_limits(y=0) +geom_text(aes(label =round(mean_values, 1)), position =position_stack(vjust =0.5)) gdd_15```#### \|Result:The gap between the females and males is lowest for EST, high intake country, in relative to the other groups, and relatively the highest for lower intake countries, which means female intake is significantly lower than males in low-intake countries. The gap between rural an urban areas is similar in three groups, urban areas having higher intake.### \|Comparison of Lower-Intake / Higher-Intake / All Countries In Terms of Education Leven and Gender:#### \|All Countries:```{r}gdd_16 <- gdd_f %>%filter(age==999, urban ==999, edu_level =="low"| edu_level =="medium"| edu_level =="high", gender =="male"| gender =="female") %>%group_by(edu_level, gender) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_16gdd_17 <-ggplot(gdd_16, aes(x=gender, y=mean_values, fill=edu_level)) +geom_bar(stat="identity") +expand_limits(x=0) +expand_limits(y=0) +geom_text(aes(label =round(mean_values, 1)), position =position_stack(vjust =0.5))gdd_17```#### \|Lower Intake Countries:```{r}gdd_18 <- gdd_f %>%filter(age==999, urban ==999, edu_level =="low"| edu_level =="medium"| edu_level =="high", gender =="male"| gender =="female", iso3 =="BGR"| iso3 =="ROU"| iso3 =="POL") %>%group_by(edu_level, gender) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_18gdd_19 <-ggplot(gdd_18, aes(x=gender, y=mean_values, fill=edu_level)) +geom_bar(stat="identity") +expand_limits(x=0) +expand_limits(y=0) +geom_text(aes(label =round(mean_values, 1)), position =position_stack(vjust =0.5))gdd_19```#### \|Higher Intake Countries:```{r}gdd_20 <- gdd_f %>%filter(age==999, urban ==999, edu_level =="low"| edu_level =="medium"| edu_level =="high", gender =="male"| gender =="female", iso3 =="EST") %>%group_by(edu_level, gender) %>%#to ensure we take the average of seperate years rather than all the years:summarise(mean_values =mean(lowerci_95+upperci_95)/2) %>%arrange(desc(mean_values))gdd_20gdd_21 <-ggplot(gdd_20, aes(x=gender, y=mean_values, fill=edu_level)) +geom_bar( stat="identity") +expand_limits(x=0) +expand_limits(y=0) +geom_text(aes(label =round(mean_values, 1)), position =position_stack(vjust =0.5))gdd_21```#### \|Result:No matter the country group, the difference of B12 intake of males vs females does not vary significantly based on the education level.