1.Esoph Analysis

1.1 Preparation to The Data Analysis

There are some kind of packages that could be used in both analysis downloaded firstly.

library(tidyverse)
library(rio)
library(ggplot2)
library(dplyr)
library(lubridate)

With the aim of understanding what the data includes, a quick snapshot was taken as shown below. The data tells about the age distribution of the people who had an issue about cancer disease(cases or controls) with their alcohol and tobacco usage routines.

head(esoph, 10)
##    agegp     alcgp    tobgp ncases ncontrols
## 1  25-34 0-39g/day 0-9g/day      0        40
## 2  25-34 0-39g/day    10-19      0        10
## 3  25-34 0-39g/day    20-29      0         6
## 4  25-34 0-39g/day      30+      0         5
## 5  25-34     40-79 0-9g/day      0        27
## 6  25-34     40-79    10-19      0         7
## 7  25-34     40-79    20-29      0         4
## 8  25-34     40-79      30+      0         7
## 9  25-34    80-119 0-9g/day      0         2
## 10 25-34    80-119    10-19      0         1
summary(esoph)
##    agegp          alcgp         tobgp        ncases         ncontrols    
##  25-34:15   0-39g/day:23   0-9g/day:24   Min.   : 0.000   Min.   : 1.00  
##  35-44:15   40-79    :23   10-19   :24   1st Qu.: 0.000   1st Qu.: 3.00  
##  45-54:16   80-119   :21   20-29   :20   Median : 1.000   Median : 6.00  
##  55-64:16   120+     :21   30+     :20   Mean   : 2.273   Mean   :11.08  
##  65-74:15                                3rd Qu.: 4.000   3rd Qu.:14.00  
##  75+  :11                                Max.   :17.000   Max.   :60.00

1.2. Distribution of The Cancer Case Among Age Groups

Definition of the Cancer Rate could be given as the number of ncases people who had token controls before the disease was known. After the calculation was done, a pie chart was drawn as it could be seen below. According to the chart the most risky age groups can be ordered as +75; %34, 65-74; %31, 55-64; % 30 …

dt1<- esoph %>%
  group_by(agegp)%>%
  summarise(total_ncases = sum(ncases), total_control = sum(ncontrols))%>%
  mutate(Cancer_Rate = round(total_ncases*100/total_control))%>%
  arrange(desc(Cancer_Rate))
  
  dt1
## # A tibble: 6 x 4
##   agegp total_ncases total_control Cancer_Rate
##   <ord>        <dbl>         <dbl>       <dbl>
## 1 65-74           55           161          34
## 2 55-64           76           242          31
## 3 75+             13            44          30
## 4 45-54           46           213          22
## 5 35-44            9           199           5
## 6 25-34            1           116           1
ggplot(dt1, aes(x="", y=Cancer_Rate,"%", fill=agegp)) + geom_bar(stat="identity") + 
  geom_text(aes(x= 1.2, y = Cancer_Rate/2 + c(0, cumsum(Cancer_Rate)[-length(Cancer_Rate)]), label = Cancer_Rate),  size=5, color = "white")+
  coord_polar(theta = "y") + 
  scale_fill_brewer(palette = "Dark2")+
  theme_void()+ 
  labs(title = "%Cancer Rate & Age Range", fill = "Age Group")

1.3. Relationship Between The Cancer Case and Alcohol Consumption

This analyse sents a warn message to the people who takes 120 g alcohol per a day. Unfortunately high level alcohol consumption could be deadly especially for people who are older than 45 years old.

dt2<- esoph %>%
  group_by(agegp,alcgp)%>%
  summarise(total_ncases = sum(ncases), total_control = sum(ncontrols))%>%
  mutate(Cancer_Rate = total_ncases*100/ total_control)%>%
  arrange(desc(Cancer_Rate))
  
  dt2
## # A tibble: 24 x 5
## # Groups:   agegp [6]
##    agegp alcgp  total_ncases total_control Cancer_Rate
##    <ord> <ord>         <dbl>         <dbl>       <dbl>
##  1 75+   80-119            2             2       100  
##  2 75+   120+              3             3       100  
##  3 45-54 120+             13            15        86.7
##  4 65-74 120+              6             8        75  
##  5 55-64 120+             18            26        69.2
##  6 55-64 80-119           24            43        55.8
##  7 65-74 40-79            25            53        47.2
##  8 65-74 80-119           13            29        44.8
##  9 35-44 120+              4            10        40  
## 10 75+   40-79             4            12        33.3
## # ... with 14 more rows
  ggplot(dt2, aes(x = agegp, y = Cancer_Rate, fill= alcgp))+
  geom_bar(stat="identity", position = "dodge" ) +theme(axis.text.x = element_text(angle=0, size=9, vjust=0.5,hjust=1)) +
    labs(title = "Cancer Rate Distribution", x = "Age Range", fill="Alcohol Consumption")

1.4. Relationship Between The Cancer Case and Tobacco Consumption

This analyse also sents a warn message to the people who takes 30 g tobacco per a day. It could be more dangerous especially for people who are older than 45 years old.

dt3<- esoph %>%
  group_by(agegp,tobgp)%>%
  summarise(total_ncases = sum(ncases), total_control = sum(ncontrols))%>%
  mutate(Cancer_Rate = total_ncases*100/ total_control)%>%
  arrange(desc(Cancer_Rate))
  
  dt3
## # A tibble: 24 x 5
## # Groups:   agegp [6]
##    agegp tobgp    total_ncases total_control Cancer_Rate
##    <ord> <ord>           <dbl>         <dbl>       <dbl>
##  1 55-64 30+                16            22        72.7
##  2 45-54 30+                11            19        57.9
##  3 65-74 20-29              10            20        50  
##  4 65-74 30+                 2             4        50  
##  5 75+   30+                 2             4        50  
##  6 75+   10-19               5            11        45.5
##  7 55-64 10-19              23            65        35.4
##  8 55-64 20-29              12            38        31.6
##  9 65-74 10-19              12            38        31.6
## 10 65-74 0-9g/day           31            99        31.3
## # ... with 14 more rows
  ggplot(dt3, aes(x = agegp, y = Cancer_Rate, fill= tobgp))+
  geom_bar(stat="identity", position = "dodge") +theme(axis.text.x = element_text(angle=0, size=9, vjust=0.5,hjust=1)) +
    labs(title = "Cancer Rate Distribution", x = "Age Range",fill="Tobacco Consumption")

1.5. Effectivness of Both Alcohol and Tobacco Consumption

According to the last two analysis not only tobacco but also alcohol consumption is too harmfull for especially elderly people. Now in this last analysis we tried to understand what will happen if people uses these two bad habit together. If a person take more than 40-79 g alcohol per a day with +30 g tobacco, s/he will definetely get cancer disease at the end. However even if a person with tobacco habit will use more thant 80 g alcohol per a day, s/he will definetely get cancer disease at the end also.

dt3<- esoph %>%
  group_by(agegp,alcgp, tobgp)%>%
  summarise(total_ncases = sum(ncases), total_control = sum(ncontrols))%>%
  mutate(Cancer_Rate = total_ncases*100/ total_control)%>%
  arrange(desc(Cancer_Rate))
  
  dt2
## # A tibble: 24 x 5
## # Groups:   agegp [6]
##    agegp alcgp  total_ncases total_control Cancer_Rate
##    <ord> <ord>         <dbl>         <dbl>       <dbl>
##  1 75+   80-119            2             2       100  
##  2 75+   120+              3             3       100  
##  3 45-54 120+             13            15        86.7
##  4 65-74 120+              6             8        75  
##  5 55-64 120+             18            26        69.2
##  6 55-64 80-119           24            43        55.8
##  7 65-74 40-79            25            53        47.2
##  8 65-74 80-119           13            29        44.8
##  9 35-44 120+              4            10        40  
## 10 75+   40-79             4            12        33.3
## # ... with 14 more rows
  ggplot(dt3, aes(x = alcgp, y = Cancer_Rate, fill= tobgp))+
  geom_bar(stat="identity", position = "dodge") +theme(axis.text.x = element_text(angle=0, size=9, vjust=0.5,hjust=1)) +
    labs(title = "Cancer Rate Distribution", x = "Alcohol Consumption", fill = "Tobacco Consumption")

2.Young People Response Analysis

2.1. Preparation to The Data Analysis

After a quick research, the data of “Young People Responses” was downloaded from kaggle website and then it was uploaded to my progres journal in github for quick access those people who will try to make same work in the In order to get rid of missing values na.omit function was used on raw data to assign as raw_ys.

 ys<- rio::import("https://github.com/pjournal/mef04-sivasbaris/blob/gh-pages/young_people_responses.csv?raw=True")

raw_ys<- na.omit(ys)
glimpse(raw_ys)
## Rows: 686
## Columns: 150
## $ Music                            <int> 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5...
## $ `Slow songs or fast songs`       <int> 3, 4, 5, 3, 3, 5, 3, 3, 3, 3, 3, 3...
## $ Dance                            <int> 2, 2, 2, 4, 2, 5, 3, 2, 3, 1, 1, 5...
## $ Folk                             <int> 1, 1, 2, 3, 3, 3, 2, 5, 2, 1, 2, 3...
## $ Country                          <int> 2, 1, 3, 2, 2, 1, 1, 2, 1, 1, 1, 2...
## $ `Classical music`                <int> 2, 1, 4, 4, 3, 2, 2, 2, 2, 4, 4, 1...
## $ Musical                          <int> 1, 2, 5, 3, 3, 2, 2, 5, 3, 1, 3, 5...
## $ Pop                              <int> 5, 3, 3, 5, 2, 5, 4, 3, 4, 2, 3, 5...
## $ Rock                             <int> 5, 5, 5, 3, 5, 3, 5, 5, 3, 5, 5, 2...
## $ `Metal or Hardrock`              <int> 1, 4, 3, 1, 5, 1, 1, 2, 2, 1, 4, 1...
## $ Punk                             <int> 1, 4, 4, 2, 3, 1, 2, 3, 1, 1, 2, 1...
## $ `Hiphop, Rap`                    <int> 1, 1, 1, 5, 4, 3, 3, 2, 3, 1, 3, 2...
## $ `Reggae, Ska`                    <int> 1, 3, 4, 3, 3, 1, 2, 4, 2, 1, 1, 1...
## $ `Swing, Jazz`                    <int> 1, 1, 3, 2, 4, 1, 2, 4, 2, 2, 1, 3...
## $ `Rock n roll`                    <int> 3, 4, 5, 1, 4, 2, 3, 4, 3, 2, 4, 2...
## $ Alternative                      <int> 1, 4, 5, 2, 5, 3, 1, 4, 3, 5, 3, 1...
## $ Latino                           <int> 1, 2, 5, 4, 3, 3, 2, 5, 3, 2, 2, 3...
## $ `Techno, Trance`                 <int> 1, 1, 1, 2, 1, 5, 3, 1, 4, 1, 1, 1...
## $ Opera                            <int> 1, 1, 3, 2, 3, 2, 2, 2, 2, 2, 2, 1...
## $ Movies                           <int> 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5...
## $ Horror                           <int> 4, 2, 3, 4, 5, 2, 4, 2, 5, 3, 1, 3...
## $ Thriller                         <int> 2, 2, 4, 4, 5, 1, 4, 1, 4, 4, 5, 2...
## $ Comedy                           <int> 5, 4, 4, 5, 5, 5, 5, 5, 5, 4, 4, 5...
## $ Romantic                         <int> 4, 3, 2, 2, 2, 3, 2, 5, 3, 3, 3, 5...
## $ `Sci-fi`                         <int> 4, 4, 4, 3, 3, 1, 3, 1, 3, 2, 1, 2...
## $ War                              <int> 1, 1, 2, 3, 3, 3, 3, 3, 2, 5, 4, 1...
## $ `Fantasy/Fairy tales`            <int> 5, 3, 5, 4, 4, 5, 4, 4, 5, 5, 5, 5...
## $ Animated                         <int> 5, 5, 5, 4, 3, 5, 4, 4, 5, 5, 3, 4...
## $ Documentary                      <int> 3, 4, 2, 3, 3, 3, 3, 4, 3, 5, 3, 2...
## $ Western                          <int> 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1...
## $ Action                           <int> 2, 4, 1, 4, 4, 2, 3, 2, 3, 4, 1, 3...
## $ History                          <int> 1, 1, 1, 3, 5, 3, 5, 3, 3, 2, 4, 2...
## $ Psychology                       <int> 5, 3, 2, 2, 3, 3, 2, 2, 3, 2, 4, 2...
## $ Politics                         <int> 1, 4, 1, 3, 4, 1, 3, 3, 3, 5, 4, 1...
## $ Mathematics                      <int> 3, 5, 5, 2, 2, 1, 1, 3, 2, 1, 1, 1...
## $ Physics                          <int> 3, 2, 2, 2, 3, 1, 1, 1, 1, 1, 1, 1...
## $ Internet                         <int> 5, 4, 4, 2, 4, 2, 5, 5, 4, 5, 3, 3...
## $ PC                               <int> 3, 4, 2, 2, 4, 1, 4, 1, 5, 4, 2, 3...
## $ `Economy Management`             <int> 5, 5, 4, 2, 1, 3, 1, 4, 3, 1, 1, 3...
## $ Biology                          <int> 3, 1, 1, 3, 4, 5, 2, 2, 2, 1, 5, 1...
## $ Chemistry                        <int> 3, 1, 1, 3, 4, 5, 2, 1, 1, 1, 5, 1...
## $ Reading                          <int> 3, 4, 5, 5, 3, 3, 2, 4, 3, 3, 5, 4...
## $ Geography                        <int> 3, 4, 2, 2, 3, 3, 3, 4, 3, 5, 3, 1...
## $ `Foreign languages`              <int> 5, 5, 5, 3, 4, 4, 4, 5, 5, 2, 5, 5...
## $ Medicine                         <int> 3, 1, 2, 3, 4, 5, 1, 1, 2, 1, 5, 1...
## $ Law                              <int> 1, 2, 3, 2, 3, 3, 2, 1, 4, 3, 2, 1...
## $ Cars                             <int> 1, 2, 1, 3, 5, 4, 1, 1, 2, 1, 3, 1...
## $ `Art exhibitions`                <int> 1, 2, 5, 1, 2, 1, 1, 4, 2, 5, 1, 3...
## $ Religion                         <int> 1, 1, 5, 4, 2, 1, 2, 4, 2, 1, 1, 1...
## $ `Countryside, outdoors`          <int> 5, 1, 5, 4, 5, 4, 2, 4, 4, 5, 5, 5...
## $ Dancing                          <int> 3, 1, 5, 1, 1, 3, 1, 5, 1, 1, 3, 3...
## $ `Musical instruments`            <int> 3, 1, 5, 3, 5, 2, 1, 3, 1, 1, 4, 3...
## $ Writing                          <int> 2, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ `Passive sport`                  <int> 1, 1, 5, 3, 5, 5, 4, 4, 5, 5, 5, 3...
## $ `Active sport`                   <int> 5, 1, 2, 1, 4, 3, 5, 4, 1, 3, 3, 3...
## $ Gardening                        <int> 5, 1, 1, 4, 2, 3, 1, 1, 3, 1, 4, 1...
## $ Celebrities                      <int> 1, 2, 1, 3, 1, 1, 3, 2, 2, 2, 3, 5...
## $ Shopping                         <int> 4, 3, 4, 3, 2, 3, 3, 4, 5, 3, 2, 5...
## $ `Science and technology`         <int> 4, 3, 2, 3, 3, 4, 2, 3, 4, 3, 3, 2...
## $ Theatre                          <int> 2, 2, 5, 2, 1, 3, 2, 5, 2, 1, 2, 3...
## $ `Fun with friends`               <int> 5, 4, 5, 4, 3, 5, 4, 5, 4, 3, 4, 5...
## $ `Adrenaline sports`              <int> 4, 2, 5, 2, 3, 1, 2, 2, 1, 1, 1, 4...
## $ Pets                             <int> 4, 5, 5, 1, 2, 5, 5, 2, 5, 1, 2, 5...
## $ Flying                           <int> 1, 1, 1, 1, 3, 1, 3, 4, 1, 4, 1, 2...
## $ Storm                            <int> 1, 1, 1, 2, 2, 3, 2, 5, 1, 1, 1, 3...
## $ Darkness                         <int> 1, 1, 1, 1, 2, 2, 4, 4, 2, 1, 1, 3...
## $ Heights                          <int> 1, 2, 1, 1, 2, 1, 3, 5, 2, 3, 1, 4...
## $ Spiders                          <int> 1, 1, 1, 1, 1, 1, 1, 3, 2, 5, 1, 3...
## $ Snakes                           <int> 5, 1, 1, 1, 2, 5, 5, 4, 1, 5, 3, 4...
## $ Rats                             <int> 3, 1, 1, 2, 2, 1, 3, 4, 1, 5, 1, 3...
## $ Ageing                           <int> 1, 3, 1, 2, 1, 4, 1, 3, 1, 5, 5, 2...
## $ `Dangerous dogs`                 <int> 3, 1, 1, 4, 1, 1, 2, 5, 4, 5, 2, 1...
## $ `Fear of public speaking`        <int> 2, 4, 2, 3, 3, 1, 4, 3, 2, 5, 2, 3...
## $ Smoking                          <chr> "never smoked", "never smoked", "t...
## $ Alcohol                          <chr> "drink a lot", "drink a lot", "dri...
## $ `Healthy eating`                 <int> 4, 3, 3, 4, 2, 4, 2, 3, 3, 3, 3, 3...
## $ `Daily events`                   <int> 2, 3, 1, 3, 2, 3, 3, 4, 3, 3, 4, 5...
## $ `Prioritising workload`          <int> 2, 2, 2, 1, 2, 5, 1, 2, 2, 1, 3, 3...
## $ `Writing notes`                  <int> 5, 4, 5, 2, 3, 5, 3, 2, 4, 5, 5, 3...
## $ Workaholism                      <int> 4, 5, 3, 3, 3, 5, 2, 3, 2, 3, 4, 2...
## $ `Thinking ahead`                 <int> 2, 4, 5, 5, 3, 3, 4, 3, 3, 1, 5, 3...
## $ `Final judgement`                <int> 5, 1, 3, 5, 1, 3, 3, 5, 3, 1, 1, 3...
## $ Reliability                      <int> 4, 4, 4, 5, 3, 4, 3, 4, 4, 3, 5, 5...
## $ `Keeping promises`               <int> 4, 4, 5, 4, 4, 5, 3, 5, 4, 3, 5, 4...
## $ `Loss of interest`               <int> 1, 3, 1, 2, 3, 3, 1, 3, 1, 3, 4, 3...
## $ `Friends versus money`           <int> 3, 4, 5, 3, 2, 4, 4, 4, 3, 3, 4, 3...
## $ Funniness                        <int> 5, 3, 2, 3, 3, 4, 4, 3, 2, 5, 3, 3...
## $ Fake                             <int> 1, 2, 4, 2, 1, 1, 2, 1, 1, 3, 1, 2...
## $ `Criminal damage`                <int> 1, 1, 1, 1, 4, 2, 1, 2, 1, 5, 2, 2...
## $ `Decision making`                <int> 3, 2, 3, 3, 2, 2, 3, 5, 5, 3, 5, 5...
## $ Elections                        <int> 4, 5, 5, 5, 5, 5, 5, 5, 5, 1, 5, 3...
## $ `Self-criticism`                 <int> 1, 4, 4, 5, 4, 3, 3, 4, 4, 5, 5, 5...
## $ `Judgment calls`                 <int> 3, 4, 4, 5, 4, 5, 5, 5, 5, 3, 2, 5...
## $ Hypochondria                     <int> 1, 1, 1, 1, 1, 1, 2, 1, 2, 5, 5, 1...
## $ Empathy                          <int> 3, 2, 5, 3, 4, 4, 1, 4, 5, 5, 5, 4...
## $ `Eating to survive`              <int> 1, 1, 5, 1, 2, 1, 2, 1, 2, 1, 3, 2...
## $ Giving                           <int> 4, 2, 5, 3, 3, 5, 3, 4, 3, 1, 2, 3...
## $ `Compassion to animals`          <int> 5, 4, 4, 3, 5, 5, 5, 5, 5, 2, 5, 5...
## $ `Borrowed stuff`                 <int> 4, 3, 2, 4, 5, 5, 2, 4, 4, 2, 5, 5...
## $ Loneliness                       <int> 3, 2, 5, 3, 2, 3, 2, 2, 2, 4, 5, 2...
## $ `Cheating in school`             <int> 2, 4, 3, 5, 4, 2, 5, 3, 3, 5, 2, 3...
## $ Health                           <int> 1, 4, 2, 3, 3, 3, 3, 4, 3, 2, 5, 3...
## $ `Changing the past`              <int> 1, 4, 5, 4, 3, 1, 2, 2, 3, 3, 3, 2...
## $ God                              <int> 1, 1, 5, 5, 3, 5, 4, 5, 4, 1, 1, 3...
## $ Dreams                           <int> 4, 3, 1, 3, 3, 3, 4, 3, 3, 3, 3, 4...
## $ Charity                          <int> 2, 1, 3, 3, 2, 3, 1, 2, 1, 3, 2, 1...
## $ `Number of friends`              <int> 3, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 3...
## $ Punctuality                      <chr> "i am always on time", "i am often...
## $ Lying                            <chr> "never", "sometimes", "sometimes",...
## $ Waiting                          <int> 3, 3, 2, 3, 3, 4, 1, 1, 3, 3, 2, 3...
## $ `New environment`                <int> 4, 4, 3, 4, 4, 5, 4, 4, 3, 5, 1, 2...
## $ `Mood swings`                    <int> 3, 4, 4, 2, 3, 5, 3, 4, 3, 5, 4, 3...
## $ `Appearence and gestures`        <int> 4, 4, 3, 3, 3, 4, 4, 3, 4, 2, 4, 5...
## $ Socializing                      <int> 3, 4, 5, 3, 4, 5, 2, 4, 3, 5, 2, 3...
## $ Achievements                     <int> 4, 2, 3, 3, 2, 4, 4, 4, 3, 3, 3, 4...
## $ `Responding to a serious letter` <int> 3, 4, 4, 3, 2, 3, 3, 4, 4, 3, 4, 3...
## $ Children                         <int> 5, 2, 4, 5, 3, 2, 4, 3, 5, 5, 5, 3...
## $ Assertiveness                    <int> 1, 2, 3, 4, 4, 3, 3, 4, 2, 4, 5, 2...
## $ `Getting angry`                  <int> 1, 5, 4, 2, 3, 3, 1, 3, 1, 3, 5, 3...
## $ `Knowing the right people`       <int> 3, 4, 3, 3, 4, 4, 4, 4, 3, 5, 3, 4...
## $ `Public speaking`                <int> 5, 4, 2, 5, 4, 3, 5, 5, 3, 5, 3, 5...
## $ Unpopularity                     <int> 5, 4, 4, 5, 4, 3, 2, 3, 3, 2, 2, 3...
## $ `Life struggles`                 <int> 1, 1, 4, 2, 3, 5, 2, 5, 5, 4, 5, 4...
## $ `Happiness in life`              <int> 4, 4, 4, 3, 3, 5, 4, 4, 4, 3, 4, 4...
## $ `Energy levels`                  <int> 5, 3, 4, 5, 4, 4, 4, 4, 3, 3, 3, 4...
## $ `Small - big dogs`               <int> 1, 5, 3, 3, 4, 3, 3, 1, 2, 1, 4, 5...
## $ Personality                      <int> 4, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 4...
## $ `Finding lost valuables`         <int> 3, 4, 3, 2, 3, 2, 2, 3, 2, 3, 1, 2...
## $ `Getting up`                     <int> 2, 5, 4, 4, 3, 2, 5, 4, 4, 5, 3, 5...
## $ `Interests or hobbies`           <int> 3, 3, 5, 3, 5, 4, 4, 3, 3, 4, 2, 5...
## $ `Parents' advice`                <int> 4, 2, 3, 3, 3, 4, 3, 3, 4, 4, 4, 3...
## $ `Questionnaires or polls`        <int> 3, 3, 1, 3, 4, 5, 3, 3, 4, 4, 2, 2...
## $ `Internet usage`                 <chr> "few hours a day", "few hours a da...
## $ Finances                         <int> 3, 3, 2, 4, 2, 4, 3, 4, 2, 2, 4, 3...
## $ `Shopping centres`               <int> 4, 4, 4, 3, 3, 3, 4, 4, 4, 2, 1, 5...
## $ `Branded clothing`               <int> 5, 1, 1, 4, 3, 1, 4, 4, 2, 1, 1, 2...
## $ `Entertainment spending`         <int> 3, 4, 4, 3, 3, 3, 4, 2, 3, 3, 2, 3...
## $ `Spending on looks`              <int> 3, 2, 3, 3, 1, 4, 4, 3, 4, 1, 3, 4...
## $ `Spending on gadgets`            <int> 1, 5, 4, 2, 4, 1, 3, 2, 2, 1, 2, 3...
## $ `Spending on healthy eating`     <int> 3, 2, 2, 4, 4, 5, 2, 4, 2, 2, 3, 3...
## $ Age                              <int> 20, 19, 20, 20, 20, 20, 19, 19, 19...
## $ Height                           <int> 163, 163, 176, 170, 186, 177, 184,...
## $ Weight                           <int> 48, 58, 67, 59, 77, 50, 90, 60, 60...
## $ `Number of siblings`             <int> 1, 2, 2, 1, 1, 1, 1, 3, 2, 1, 10, ...
## $ Gender                           <chr> "female", "female", "female", "fem...
## $ `Left - right handed`            <chr> "right handed", "right handed", "r...
## $ Education                        <chr> "college/bachelor degree", "colleg...
## $ `Only child`                     <chr> "no", "no", "no", "no", "no", "no"...
## $ `Village - town`                 <chr> "village", "city", "city", "villag...
## $ `House - block of flats`         <chr> "block of flats", "block of flats"...

2.2. Selecting Data

After taking a snapshot to cleaned data, the rows between 32 to 63 which were assigned as History: Pets was selected.

survey_df = as.data.frame(raw_ys)
survey_data = survey_df[32:63]
head(survey_data)
##   History Psychology Politics Mathematics Physics Internet PC
## 1       1          5        1           3       3        5  3
## 2       1          3        4           5       2        4  4
## 3       1          2        1           5       2        4  2
## 5       3          2        3           2       2        2  2
## 6       5          3        4           2       3        4  4
## 7       3          3        1           1       1        2  1
##   Economy Management Biology Chemistry Reading Geography Foreign languages
## 1                  5       3         3       3         3                 5
## 2                  5       1         1       4         4                 5
## 3                  4       1         1       5         2                 5
## 5                  2       3         3       5         2                 3
## 6                  1       4         4       3         3                 4
## 7                  3       5         5       3         3                 4
##   Medicine Law Cars Art exhibitions Religion Countryside, outdoors Dancing
## 1        3   1    1               1        1                     5       3
## 2        1   2    2               2        1                     1       1
## 3        2   3    1               5        5                     5       5
## 5        3   2    3               1        4                     4       1
## 6        4   3    5               2        2                     5       1
## 7        5   3    4               1        1                     4       3
##   Musical instruments Writing Passive sport Active sport Gardening Celebrities
## 1                   3       2             1            5         5           1
## 2                   1       1             1            1         1           2
## 3                   5       5             5            2         1           1
## 5                   3       1             3            1         4           3
## 6                   5       1             5            4         2           1
## 7                   2       1             5            3         3           1
##   Shopping Science and technology Theatre Fun with friends Adrenaline sports
## 1        4                      4       2                5                 4
## 2        3                      3       2                4                 2
## 3        4                      2       5                5                 5
## 5        3                      3       2                4                 2
## 6        2                      3       1                3                 3
## 7        3                      4       3                5                 1
##   Pets
## 1    4
## 2    5
## 3    5
## 5    1
## 6    2
## 7    5
summary(survey_data)
##     History        Psychology      Politics      Mathematics       Physics     
##  Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.00   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :3.000   Median :3.00   Median :3.000   Median :2.000   Median :2.000  
##  Mean   :3.226   Mean   :3.14   Mean   :2.627   Mean   :2.401   Mean   :2.096  
##  3rd Qu.:4.000   3rd Qu.:4.00   3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :5.00   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##     Internet           PC        Economy Management    Biology     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000      Min.   :1.000  
##  1st Qu.:4.000   1st Qu.:2.000   1st Qu.:1.000      1st Qu.:1.000  
##  Median :4.000   Median :3.000   Median :2.000      Median :2.000  
##  Mean   :4.188   Mean   :3.136   Mean   :2.662      Mean   :2.621  
##  3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:4.000      3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000      Max.   :5.000  
##    Chemistry        Reading      Geography     Foreign languages
##  Min.   :1.000   Min.   :1.0   Min.   :1.000   Min.   :1.000    
##  1st Qu.:1.000   1st Qu.:2.0   1st Qu.:2.000   1st Qu.:3.000    
##  Median :2.000   Median :3.0   Median :3.000   Median :4.000    
##  Mean   :2.121   Mean   :3.2   Mean   :3.109   Mean   :3.813    
##  3rd Qu.:3.000   3rd Qu.:5.0   3rd Qu.:4.000   3rd Qu.:5.000    
##  Max.   :5.000   Max.   :5.0   Max.   :5.000   Max.   :5.000    
##     Medicine          Law             Cars       Art exhibitions
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.000  
##  Mean   :2.475   Mean   :2.224   Mean   :2.634   Mean   :2.617  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##     Religion     Countryside, outdoors    Dancing      Musical instruments
##  Min.   :1.000   Min.   :1.000         Min.   :1.000   Min.   :1.000      
##  1st Qu.:1.000   1st Qu.:3.000         1st Qu.:1.000   1st Qu.:1.000      
##  Median :2.000   Median :4.000         Median :2.000   Median :2.000      
##  Mean   :2.229   Mean   :3.614         Mean   :2.399   Mean   :2.302      
##  3rd Qu.:3.000   3rd Qu.:5.000         3rd Qu.:3.000   3rd Qu.:3.000      
##  Max.   :5.000   Max.   :5.000         Max.   :5.000   Max.   :5.000      
##     Writing      Passive sport    Active sport     Gardening    
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :1.000   Median :4.000   Median :3.000   Median :1.000  
##  Mean   :1.866   Mean   :3.394   Mean   :3.236   Mean   :1.872  
##  3rd Qu.:2.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:2.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##   Celebrities       Shopping     Science and technology    Theatre     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000          Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000          1st Qu.:2.000  
##  Median :2.000   Median :3.000   Median :3.000          Median :3.000  
##  Mean   :2.319   Mean   :3.257   Mean   :3.271          Mean   :3.023  
##  3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000          3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000          Max.   :5.000  
##  Fun with friends Adrenaline sports      Pets      
##  Min.   :2.000    Min.   :1.00      Min.   :1.000  
##  1st Qu.:4.000    1st Qu.:2.00      1st Qu.:2.000  
##  Median :5.000    Median :3.00      Median :4.000  
##  Mean   :4.552    Mean   :2.88      Mean   :3.324  
##  3rd Qu.:5.000    3rd Qu.:4.00      3rd Qu.:5.000  
##  Max.   :5.000    Max.   :5.00      Max.   :5.000
glimpse(survey_data)
## Rows: 686
## Columns: 32
## $ History                  <int> 1, 1, 1, 3, 5, 3, 5, 3, 3, 2, 4, 2, 2, 1, ...
## $ Psychology               <int> 5, 3, 2, 2, 3, 3, 2, 2, 3, 2, 4, 2, 5, 1, ...
## $ Politics                 <int> 1, 4, 1, 3, 4, 1, 3, 3, 3, 5, 4, 1, 1, 1, ...
## $ Mathematics              <int> 3, 5, 5, 2, 2, 1, 1, 3, 2, 1, 1, 1, 1, 1, ...
## $ Physics                  <int> 3, 2, 2, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Internet                 <int> 5, 4, 4, 2, 4, 2, 5, 5, 4, 5, 3, 3, 4, 5, ...
## $ PC                       <int> 3, 4, 2, 2, 4, 1, 4, 1, 5, 4, 2, 3, 2, 2, ...
## $ `Economy Management`     <int> 5, 5, 4, 2, 1, 3, 1, 4, 3, 1, 1, 3, 3, 1, ...
## $ Biology                  <int> 3, 1, 1, 3, 4, 5, 2, 2, 2, 1, 5, 1, 2, 4, ...
## $ Chemistry                <int> 3, 1, 1, 3, 4, 5, 2, 1, 1, 1, 5, 1, 1, 1, ...
## $ Reading                  <int> 3, 4, 5, 5, 3, 3, 2, 4, 3, 3, 5, 4, 4, 1, ...
## $ Geography                <int> 3, 4, 2, 2, 3, 3, 3, 4, 3, 5, 3, 1, 1, 1, ...
## $ `Foreign languages`      <int> 5, 5, 5, 3, 4, 4, 4, 5, 5, 2, 5, 5, 3, 3, ...
## $ Medicine                 <int> 3, 1, 2, 3, 4, 5, 1, 1, 2, 1, 5, 1, 1, 5, ...
## $ Law                      <int> 1, 2, 3, 2, 3, 3, 2, 1, 4, 3, 2, 1, 1, 1, ...
## $ Cars                     <int> 1, 2, 1, 3, 5, 4, 1, 1, 2, 1, 3, 1, 1, 1, ...
## $ `Art exhibitions`        <int> 1, 2, 5, 1, 2, 1, 1, 4, 2, 5, 1, 3, 4, 1, ...
## $ Religion                 <int> 1, 1, 5, 4, 2, 1, 2, 4, 2, 1, 1, 1, 2, 2, ...
## $ `Countryside, outdoors`  <int> 5, 1, 5, 4, 5, 4, 2, 4, 4, 5, 5, 5, 3, 5, ...
## $ Dancing                  <int> 3, 1, 5, 1, 1, 3, 1, 5, 1, 1, 3, 3, 1, 2, ...
## $ `Musical instruments`    <int> 3, 1, 5, 3, 5, 2, 1, 3, 1, 1, 4, 3, 1, 2, ...
## $ Writing                  <int> 2, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, ...
## $ `Passive sport`          <int> 1, 1, 5, 3, 5, 5, 4, 4, 5, 5, 5, 3, 3, 5, ...
## $ `Active sport`           <int> 5, 1, 2, 1, 4, 3, 5, 4, 1, 3, 3, 3, 1, 1, ...
## $ Gardening                <int> 5, 1, 1, 4, 2, 3, 1, 1, 3, 1, 4, 1, 5, 1, ...
## $ Celebrities              <int> 1, 2, 1, 3, 1, 1, 3, 2, 2, 2, 3, 5, 5, 3, ...
## $ Shopping                 <int> 4, 3, 4, 3, 2, 3, 3, 4, 5, 3, 2, 5, 5, 4, ...
## $ `Science and technology` <int> 4, 3, 2, 3, 3, 4, 2, 3, 4, 3, 3, 2, 2, 2, ...
## $ Theatre                  <int> 2, 2, 5, 2, 1, 3, 2, 5, 2, 1, 2, 3, 4, 1, ...
## $ `Fun with friends`       <int> 5, 4, 5, 4, 3, 5, 4, 5, 4, 3, 4, 5, 5, 5, ...
## $ `Adrenaline sports`      <int> 4, 2, 5, 2, 3, 1, 2, 2, 1, 1, 1, 4, 1, 4, ...
## $ Pets                     <int> 4, 5, 5, 1, 2, 5, 5, 2, 5, 1, 2, 5, 5, 5, ...

2.3. Principal Component Analysis

To understand which component is more effective on analysis, princomp function used on surveyed_data. Between 32 components, the first 14 components were chosen for PCA analysis due to the their cumulative proportion with 72%.

pca <- princomp(as.matrix(survey_data[1:32]),cor=T)
summary(pca,loadings=TRUE)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.0374787 1.8233387 1.60327816 1.48530949 1.26023134
## Proportion of Variance 0.1297287 0.1038926 0.08032815 0.06894201 0.04963072
## Cumulative Proportion  0.1297287 0.2336214 0.31394951 0.38289152 0.43252224
##                            Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     1.20132035 1.07959740 1.06930606 1.05342776 0.98848494
## Proportion of Variance 0.04509908 0.03642283 0.03573173 0.03467844 0.03053445
## Cumulative Proportion  0.47762132 0.51404415 0.54977588 0.58445432 0.61498878
##                           Comp.11    Comp.12    Comp.13    Comp.14    Comp.15
## Standard deviation     0.96843882 0.94342106 0.92688407 0.89350216 0.87546384
## Proportion of Variance 0.02930855 0.02781385 0.02684731 0.02494832 0.02395115
## Cumulative Proportion  0.64429733 0.67211118 0.69895850 0.72390681 0.74785797
##                           Comp.16   Comp.17    Comp.18   Comp.19    Comp.20
## Standard deviation     0.85981220 0.8379842 0.81377184 0.7619490 0.74936288
## Proportion of Variance 0.02310241 0.0219443 0.02069452 0.0181427 0.01754827
## Cumulative Proportion  0.77096037 0.7929047 0.81359919 0.8317419 0.84929016
##                           Comp.21    Comp.22    Comp.23    Comp.24    Comp.25
## Standard deviation     0.73524390 0.72622112 0.70400198 0.70113254 0.67137253
## Proportion of Variance 0.01689324 0.01648116 0.01548809 0.01536209 0.01408566
## Cumulative Proportion  0.86618340 0.88266456 0.89815265 0.91351474 0.92760039
##                           Comp.26    Comp.27    Comp.28    Comp.29     Comp.30
## Standard deviation     0.64706583 0.62748835 0.58745885 0.58160258 0.553117299
## Proportion of Variance 0.01308419 0.01230443 0.01078462 0.01057067 0.009560586
## Cumulative Proportion  0.94068459 0.95298901 0.96377363 0.97434431 0.983904895
##                           Comp.31     Comp.32
## Standard deviation     0.53278768 0.480812500
## Proportion of Variance 0.00887071 0.007224396
## Cumulative Proportion  0.99277560 1.000000000
## 
## Loadings:
##                        Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## History                 0.184  0.117  0.196  0.225  0.146  0.222  0.252       
## Psychology              0.239                0.108  0.168        -0.127  0.201
## Politics                       0.220  0.289  0.143  0.280  0.137  0.111       
## Mathematics                    0.317 -0.119  0.101        -0.292 -0.287       
## Physics                        0.355 -0.231  0.104        -0.135 -0.188  0.127
## Internet                       0.210  0.120               -0.435  0.213 -0.228
## PC                             0.363               -0.176 -0.287              
## Economy Management             0.166  0.310         0.206 -0.196 -0.255       
## Biology                 0.273        -0.371 -0.159  0.211                     
## Chemistry               0.192        -0.414 -0.108  0.240                     
## Reading                 0.279 -0.170         0.195                      -0.124
## Geography               0.140  0.160  0.152                0.154  0.337 -0.358
## Foreign languages       0.204         0.199               -0.145 -0.108 -0.470
## Medicine                0.258        -0.316 -0.132  0.280               -0.122
## Law                     0.123  0.143  0.283         0.395                0.178
## Cars                           0.333        -0.199                            
## Art exhibitions         0.324                      -0.200                     
## Religion                0.222                0.176                            
## Countryside, outdoors   0.203                      -0.369         0.165 -0.143
## Dancing                 0.254               -0.216               -0.237       
## Musical instruments     0.213                      -0.315                     
## Writing                 0.230                0.186 -0.147 -0.107         0.319
## Passive sport                  0.155        -0.215 -0.119         0.245 -0.167
## Active sport                   0.178        -0.258 -0.144  0.364 -0.115  0.203
## Gardening               0.191               -0.144 -0.160         0.333  0.334
## Celebrities                   -0.104  0.167 -0.356  0.119 -0.323  0.136  0.119
## Shopping                      -0.168  0.144 -0.410        -0.228              
## Science and technology         0.364                                          
## Theatre                 0.316 -0.112               -0.126        -0.162       
## Fun with friends                      0.129 -0.270 -0.102        -0.308 -0.266
## Adrenaline sports              0.225        -0.230 -0.181  0.322 -0.195       
## Pets                                        -0.249                0.274  0.205
##                        Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## History                 0.170                                  0.224   0.163 
## Psychology              0.130 -0.272   0.123   0.447  -0.226   0.218  -0.260 
## Politics                              -0.202           0.132           0.158 
## Mathematics            -0.149  0.332                  -0.216                 
## Physics                        0.221          -0.114                         
## Internet                0.107 -0.241           0.184           0.266   0.195 
## PC                      0.103 -0.172   0.119           0.121           0.133 
## Economy Management     -0.263  0.176                  -0.192                 
## Biology                                                                      
## Chemistry                                     -0.113                   0.129 
## Reading                 0.167  0.184                                   0.143 
## Geography              -0.177          0.205  -0.346  -0.196                 
## Foreign languages                      0.171          -0.170  -0.407   0.101 
## Medicine                      -0.231                                         
## Law                                   -0.158           0.140  -0.178         
## Cars                                                   0.320  -0.163  -0.177 
## Art exhibitions         0.141                 -0.172   0.105          -0.308 
## Religion               -0.358 -0.135           0.362   0.352   0.162         
## Countryside, outdoors  -0.298  0.275   0.197   0.279                  -0.124 
## Dancing                -0.311                                  0.147   0.263 
## Musical instruments           -0.308  -0.333                  -0.513   0.210 
## Writing                 0.164 -0.290          -0.228  -0.331                 
## Passive sport                         -0.647   0.105  -0.351   0.111  -0.306 
## Active sport                  -0.137   0.215  -0.128  -0.118           0.101 
## Gardening              -0.171  0.195                           0.161   0.322 
## Celebrities                                   -0.255   0.108          -0.103 
## Shopping                                      -0.103   0.157          -0.169 
## Science and technology  0.281          0.206           0.199          -0.101 
## Theatre                 0.219  0.203  -0.131  -0.116   0.212          -0.268 
## Fun with friends        0.336  0.104  -0.247           0.133   0.272   0.392 
## Adrenaline sports       0.118          0.201          -0.228                 
## Pets                    0.319  0.354   0.102   0.419  -0.167  -0.375         
##                        Comp.16 Comp.17 Comp.18 Comp.19 Comp.20 Comp.21 Comp.22
## History                 0.289                           0.426           0.149 
## Psychology                              0.167   0.280   0.217  -0.109  -0.107 
## Politics                       -0.124   0.156                   0.157   0.191 
## Mathematics             0.236  -0.135  -0.186   0.106                         
## Physics                 0.171  -0.151                                  -0.240 
## Internet                0.168   0.118                  -0.285          -0.116 
## PC                              0.208                  -0.119           0.278 
## Economy Management     -0.303           0.133  -0.202          -0.229   0.346 
## Biology                                                        -0.144   0.155 
## Chemistry                                                                     
## Reading                         0.219  -0.348   0.110  -0.216   0.332         
## Geography               0.138  -0.326   0.104                  -0.306  -0.131 
## Foreign languages      -0.163   0.150  -0.265           0.280          -0.178 
## Medicine                                               -0.174  -0.103         
## Law                             0.119                  -0.344          -0.287 
## Cars                            0.144  -0.155   0.115   0.115  -0.192  -0.436 
## Art exhibitions                 0.120                  -0.113  -0.442         
## Religion                       -0.354  -0.505  -0.230                         
## Countryside, outdoors                   0.357   0.212           0.298         
## Dancing                 0.244   0.183   0.245  -0.349                  -0.254 
## Musical instruments     0.235  -0.141   0.175   0.119   0.147           0.124 
## Writing                -0.199  -0.217          -0.236           0.139  -0.189 
## Passive sport                   0.130  -0.125  -0.146   0.107   0.218         
## Active sport            0.228   0.308  -0.335  -0.110                   0.227 
## Gardening              -0.466   0.128           0.237   0.113  -0.146  -0.116 
## Celebrities             0.216  -0.232  -0.102   0.322           0.184   0.265 
## Shopping                       -0.142          -0.127   0.327   0.180  -0.135 
## Science and technology -0.266           0.108  -0.358   0.172   0.285         
## Theatre                         0.155          -0.148  -0.157           0.115 
## Fun with friends       -0.158  -0.286                          -0.131         
## Adrenaline sports      -0.162  -0.263           0.261  -0.316   0.175         
## Pets                    0.195  -0.167          -0.272  -0.120  -0.168         
##                        Comp.23 Comp.24 Comp.25 Comp.26 Comp.27 Comp.28 Comp.29
## History                 0.227   0.281   0.236           0.246   0.108   0.131 
## Psychology             -0.241  -0.139  -0.145   0.113          -0.165  -0.104 
## Politics               -0.250  -0.302   0.158  -0.214  -0.460   0.165  -0.187 
## Mathematics                                                             0.169 
## Physics                        -0.141                           0.265  -0.183 
## Internet                0.177                   0.185  -0.253   0.219   0.272 
## PC                                             -0.166   0.321  -0.145  -0.569 
## Economy Management              0.365           0.201                         
## Biology                                                                -0.128 
## Chemistry               0.101   0.159   0.251  -0.102  -0.145  -0.126         
## Reading                -0.162   0.209           0.278  -0.234  -0.329  -0.160 
## Geography              -0.116          -0.157   0.201          -0.148  -0.150 
## Foreign languages              -0.185          -0.243   0.133   0.226         
## Medicine                        0.107  -0.121                   0.218         
## Law                     0.376          -0.267           0.296  -0.173         
## Cars                   -0.277   0.430   0.104          -0.211                 
## Art exhibitions         0.206  -0.121   0.154  -0.405  -0.223  -0.253   0.187 
## Religion                                                                      
## Countryside, outdoors   0.142   0.182  -0.282  -0.185  -0.131                 
## Dancing                -0.304           0.244  -0.171   0.172  -0.162         
## Musical instruments                             0.348          -0.116         
## Writing                         0.362  -0.133  -0.250           0.210         
## Passive sport                                           0.136                 
## Active sport                   -0.159  -0.338          -0.212   0.183         
## Gardening                      -0.224   0.110   0.181   0.109                 
## Celebrities            -0.338                  -0.159   0.186           0.207 
## Shopping                0.406  -0.142           0.186  -0.214          -0.344 
## Science and technology -0.164  -0.191  -0.112                  -0.307   0.399 
## Theatre                -0.142                   0.308   0.223   0.459         
## Fun with friends                       -0.331  -0.125                         
## Adrenaline sports       0.124           0.483   0.128                         
## Pets                                                                          
##                        Comp.30 Comp.31 Comp.32
## History                 0.177                 
## Psychology             -0.165                 
## Politics                        0.106         
## Mathematics                     0.592         
## Physics                 0.169  -0.609         
## Internet                                      
## PC                              0.115  -0.115 
## Economy Management             -0.247         
## Biology                                 0.777 
## Chemistry              -0.627  -0.108  -0.295 
## Reading                 0.167  -0.134         
## Geography              -0.102          -0.107 
## Foreign languages                             
## Medicine                0.508   0.136  -0.475 
## Law                    -0.152                 
## Cars                                          
## Art exhibitions                -0.132         
## Religion               -0.127                 
## Countryside, outdoors                         
## Dancing                 0.163                 
## Musical instruments                           
## Writing                         0.133         
## Passive sport                                 
## Active sport           -0.120                 
## Gardening                                     
## Celebrities                    -0.116         
## Shopping                0.160   0.118         
## Science and technology                        
## Theatre                -0.250   0.192         
## Fun with friends                              
## Adrenaline sports                             
## Pets

2.4. Preparation of Limited Components for Correlation

As it can be seen from the graph which was drawn at below, the first 7 PC are responsible for 78.18% of the variation.

pca <- princomp(as.matrix(survey_data[1:14]),cor=T)
summary(pca,loadings=TRUE)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4    Comp.5
## Standard deviation     1.6452835 1.5142414 1.4271869 1.03350621 1.0063014
## Proportion of Variance 0.1933541 0.1637805 0.1454902 0.07629536 0.0723316
## Cumulative Proportion  0.1933541 0.3571346 0.5026248 0.57892018 0.6512518
##                            Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     0.96137744 0.95114911 0.79989367 0.74166663 0.70773166
## Proportion of Variance 0.06601761 0.06462033 0.04570213 0.03929067 0.03577744
## Cumulative Proportion  0.71726939 0.78188972 0.82759186 0.86688253 0.90265997
##                           Comp.11    Comp.12    Comp.13    Comp.14
## Standard deviation     0.67387024 0.58045711 0.56374032 0.50391048
## Proportion of Variance 0.03243579 0.02406646 0.02270023 0.01813756
## Cumulative Proportion  0.93509576 0.95916222 0.98186244 1.00000000
## 
## Loadings:
##                    Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## History             0.129  0.225  0.389  0.426  0.123  0.283         0.172
## Psychology          0.208         0.293 -0.104 -0.422  0.374  0.422 -0.524
## Politics                   0.358  0.300  0.311        -0.229  0.286  0.392
## Mathematics         0.161  0.398 -0.270        -0.344        -0.384       
## Physics             0.266  0.338 -0.313  0.236 -0.171  0.114 -0.243 -0.136
## Internet                   0.346 -0.124 -0.431  0.383  0.310  0.235       
## PC                         0.432 -0.260 -0.150  0.213  0.252         0.192
## Economy Management -0.105  0.354  0.109 -0.241 -0.356 -0.549  0.163       
## Biology             0.517 -0.118                0.101 -0.107  0.151       
## Chemistry           0.485        -0.187                              0.264
## Reading             0.199         0.393 -0.147 -0.117  0.343 -0.449  0.333
## Geography           0.113  0.243  0.259  0.163  0.547 -0.285 -0.255 -0.539
## Foreign languages   0.124  0.110  0.374 -0.559        -0.131 -0.296       
## Medicine            0.499               -0.134  0.105 -0.134  0.244       
##                    Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14
## History                    0.596   0.249   0.216                 
## Psychology                                -0.242                 
## Politics           -0.191 -0.575                           0.100 
## Mathematics         0.136          0.207           0.636         
## Physics            -0.151 -0.127           0.182  -0.679         
## Internet            0.420 -0.207   0.365                         
## PC                 -0.362  0.228  -0.581  -0.165   0.126         
## Economy Management  0.371  0.327  -0.187          -0.228         
## Biology                           -0.132           0.111   0.784 
## Chemistry           0.103  0.137   0.229  -0.647  -0.107  -0.354 
## Reading             0.366 -0.204  -0.389                         
## Geography           0.154         -0.191  -0.148          -0.105 
## Foreign languages  -0.544          0.321                         
## Medicine                          -0.143   0.613   0.111  -0.462

The plot at below shows that with the first 10 component, nearly 90% of the variation is explained.

ggplot(data.frame(pc=1:14,cum_var=c(0.1933541, 0.3571346, 0.5026248, 0.57892018, 0.6512518, 0.71726939, 0.78188972, 0.82759186, 0.86688253, 0.90265997, 0.93509576, 0.95916222, 0.98186244, 1.00000000)),aes(x=pc,y=cum_var)) + 
  geom_point() + 
  geom_line()

2.5. Multidimensional Scaling Analysis

survey_mds_data <- survey_data[,sapply(survey_data,class)=="integer"] %>%
  select(History:Pets)
survey_mds_distance <- 1 - cor(survey_mds_data)

survey_mds <- cmdscale(survey_mds_distance,k=2)

colnames(survey_mds) <- c("x","y")

print(survey_mds)
##                                   x            y
## History                 0.031306226 -0.057585867
## Psychology              0.221673919 -0.034255291
## Politics               -0.210706395  0.084207670
## Mathematics            -0.366106787 -0.301011922
## Physics                -0.338602615 -0.488946202
## Internet               -0.435984609  0.193575251
## PC                     -0.602886441 -0.104205122
## Economy Management     -0.313464728  0.369369369
## Biology                 0.249496019 -0.352229723
## Chemistry               0.111150203 -0.427188341
## Reading                 0.525002226 -0.036403993
## Geography              -0.082496760 -0.010735122
## Foreign languages       0.197742370  0.164499851
## Medicine                0.190107055 -0.319538506
## Law                    -0.091720302  0.234211611
## Cars                   -0.593017958  0.091655704
## Art exhibitions         0.380132671 -0.025937391
## Religion                0.177085206 -0.229380678
## Countryside, outdoors   0.159766147 -0.101417980
## Dancing                 0.326087243  0.151882648
## Musical instruments     0.165516710 -0.156630640
## Writing                 0.291051922 -0.065898705
## Passive sport          -0.288889494  0.153049302
## Active sport           -0.180541543  0.077474365
## Gardening               0.203069385 -0.051532921
## Celebrities             0.059109759  0.515094714
## Shopping                0.248971459  0.505415685
## Science and technology -0.368579088 -0.307211932
## Theatre                 0.452232345  0.006680821
## Fun with friends        0.002109219  0.282818197
## Adrenaline sports      -0.251016420  0.072054588
## Pets                    0.132403055  0.168120560
ggplot(data.frame(survey_mds),aes(x=x,y=y)) +
  geom_text(label=rownames(survey_mds),size=3.5) +
  labs(x="x",y="y", title="MDS - Grouping Hobies By Relational Distance") 

2.6. In Conclusion

To sum up, it can be said that;

  • Celebrities- Shopping
  • Active Sports- Adrenaline- Pssive Sports- Politics
  • Science and technology- Mathematics
  • Medicine-Biology-Chemistry
  • countryside, Outdoors-History- Musical Instruments- Religion
  • Art exhibitions- Theatre- Reading
  • Foreign Language- Dancing
  • Gardening- Psychology- Writing
  • Geography-History

categories are the most relational to each other.