Assginment 3.1: Esoph and Youth Survey

1. Esoph Relationship Analysis

The Esoph data set is related to the relationship between the consumption of alcohol and tobacco in cancer. It contains 88 rows and 5 columns which are grouped by age, alcohol consumption, tobacco consumption and number of cancer cases. Effects of people consuming alcohol and tobacco and the number of cancer cases as a result of will be analyzed. The Esoph data set includes 6 age groups, 4 alcohol and 4 tobacco consumption groups.

library(tidyverse)
library(lubridate)
library(rio)
library('MASS')

head(esoph)

##   agegp     alcgp    tobgp ncases ncontrols
## 1 25-34 0-39g/day 0-9g/day      0        40
## 2 25-34 0-39g/day    10-19      0        10
## 3 25-34 0-39g/day    20-29      0         6
## 4 25-34 0-39g/day      30+      0         5
## 5 25-34     40-79 0-9g/day      0        27
## 6 25-34     40-79    10-19      0         7

summary(esoph)

##    agegp          alcgp         tobgp        ncases         ncontrols    
##  25-34:15   0-39g/day:23   0-9g/day:24   Min.   : 0.000   Min.   : 1.00  
##  35-44:15   40-79    :23   10-19   :24   1st Qu.: 0.000   1st Qu.: 3.00  
##  45-54:16   80-119   :21   20-29   :20   Median : 1.000   Median : 6.00  
##  55-64:16   120+     :21   30+     :20   Mean   : 2.273   Mean   :11.08  
##  65-74:15                                3rd Qu.: 4.000   3rd Qu.:14.00  
##  75+  :11                                Max.   :17.000   Max.   :60.00

- Age Groups

Firstly, each age group is analyzed to show the percentage of Cancer cases. According to the graph below, after age 55, the risk of cancer increases. The riskiest age group is between ages 65 and 74.

esoph %>%
  group_by(agegp) %>% summarise(perc = 100*(round(sum(ncases)/sum(ncontrols),2))) %>%

  ggplot(aes(x=agegp, y=perc ,fill=agegp)) +
   geom_bar(stat="identity") + 
   labs(x= "Age Groups", y= "Percentage Of Cancer Cases", title = "The Rate of Cancer Cases according to Age Categories")+
  scale_fill_manual(values = c("lightgreen", "green", "yellow", "orange", "red", "tomato4")) +
   guides(fill=guide_legend(title="Age Groups"))

- Alcohol Consumption

esoph %>%
  group_by(agegp, alcgp) %>% summarise(perc = 100*(round(sum(ncases)/sum(ncontrols),2))) %>%

  ggplot(aes(x=agegp, y=perc ,fill=alcgp)) +
   geom_bar(stat="identity", position = "dodge") +
   labs(x= "Age Groups", y= "Percentage Of Cancer Cases", title = "The Rate of Cancer Cases according to Age Groups and Alcohol Consumption") + scale_fill_manual(values = c("lightpink1", "rosybrown", "red2", "tomato4")) +
   guides(fill=guide_legend(title="Alcohol Usage"))

When looking at alcohol consumption effects, people with higher alcohol consumption levels increase their chances of developing cancer.

- Tobacco Consumption

esoph %>%
  group_by(agegp, tobgp) %>% summarise(perc = 100*(round(sum(ncases)/sum(ncontrols),2))) %>%

  ggplot(aes(x=agegp, y=perc ,fill=tobgp,)) +
   geom_bar(stat="identity", position = "dodge") +
   labs(x= "Age Groups", y= "Percentage Of Cancer Cases", title = "The Rate of Cancer Cases according to Age Groups and Tobacco Consumption") + scale_fill_manual(values = c("lightpink1", "rosybrown", "red2", "tomato4")) +
   guides(fill=guide_legend(title="Tobacco Usage"))

Continuing with the tobacco consumption based graph, tobacco consumption levels affects the risk of cancer in negative manner. Ages between 55 and 64 are the highest effected group according to the graph above.

- Conclusion

In a nutshell, it can be inferred that getting older, high levels of alcohol and tobacco consumption put people at a higher risk of developing cancer versus those who don’t consume.

2.Young People Survey

2.1 Principle Components Analysis (PCA)

Young People survey is a data set that contains 1010 rows and 139 columns. Each row is an individual answer that belongs to a person. This study will analyze a subset of the Young Survey data set and the subset contains variables related to main topics of Hobbies & Interests categories. Find the data set from Kaggle.

- Importing and Summary Data

Data was imported from github. Visit my Github page by clicking here.

data <- rio::import("https://github.com/pjournal/mef04-baykano/blob/gh-pages/responses.csv?raw=True")

glimpse(data)

## Rows: 1,010
## Columns: 150
## $ Music                            <int> 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5...
## $ `Slow songs or fast songs`       <int> 3, 4, 5, 3, 3, 3, 5, 3, 3, 3, 3, 3...
## $ Dance                            <int> 2, 2, 2, 2, 4, 2, 5, 3, 3, 2, 3, 1...
## $ Folk                             <int> 1, 1, 2, 1, 3, 3, 3, 2, 1, 5, 2, 1...
## $ Country                          <int> 2, 1, 3, 1, 2, 2, 1, 1, 1, 2, 1, 1...
## $ `Classical music`                <int> 2, 1, 4, 1, 4, 3, 2, 2, 2, 2, 2, 4...
## $ Musical                          <int> 1, 2, 5, 1, 3, 3, 2, 2, 4, 5, 3, 1...
## $ Pop                              <int> 5, 3, 3, 2, 5, 2, 5, 4, 3, 3, 4, 2...
## $ Rock                             <int> 5, 5, 5, 2, 3, 5, 3, 5, 5, 5, 3, 5...
## $ `Metal or Hardrock`              <int> 1, 4, 3, 1, 1, 5, 1, 1, 5, 2, 2, 1...
## $ Punk                             <int> 1, 4, 4, 4, 2, 3, 1, 2, 1, 3, 1, 1...
## $ `Hiphop, Rap`                    <int> 1, 1, 1, 2, 5, 4, 3, 3, 1, 2, 3, 1...
## $ `Reggae, Ska`                    <int> 1, 3, 4, 2, 3, 3, 1, 2, 2, 4, 2, 1...
## $ `Swing, Jazz`                    <int> 1, 1, 3, 1, 2, 4, 1, 2, 2, 4, 2, 2...
## $ `Rock n roll`                    <int> 3, 4, 5, 2, 1, 4, 2, 3, 2, 4, 3, 2...
## $ Alternative                      <int> 1, 4, 5, 5, 2, 5, 3, 1, NA, 4, 3, ...
## $ Latino                           <int> 1, 2, 5, 1, 4, 3, 3, 2, 1, 5, 3, 2...
## $ `Techno, Trance`                 <int> 1, 1, 1, 2, 2, 1, 5, 3, 1, 1, 4, 1...
## $ Opera                            <int> 1, 1, 3, 1, 2, 3, 2, 2, 1, 2, 2, 2...
## $ Movies                           <int> 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5...
## $ Horror                           <int> 4, 2, 3, 4, 4, 5, 2, 4, 1, 2, 5, 3...
## $ Thriller                         <int> 2, 2, 4, 4, 4, 5, 1, 4, 5, 1, 4, 4...
## $ Comedy                           <int> 5, 4, 4, 3, 5, 5, 5, 5, 5, 5, 5, 4...
## $ Romantic                         <int> 4, 3, 2, 3, 2, 2, 3, 2, 4, 5, 3, 3...
## $ `Sci-fi`                         <int> 4, 4, 4, 4, 3, 3, 1, 3, 4, 1, 3, 2...
## $ War                              <int> 1, 1, 2, 3, 3, 3, 3, 3, 5, 3, 2, 5...
## $ `Fantasy/Fairy tales`            <int> 5, 3, 5, 1, 4, 4, 5, 4, 4, 4, 5, 5...
## $ Animated                         <int> 5, 5, 5, 2, 4, 3, 5, 4, 4, 4, 5, 5...
## $ Documentary                      <int> 3, 4, 2, 5, 3, 3, 3, 3, 5, 4, 3, 5...
## $ Western                          <int> 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1...
## $ Action                           <int> 2, 4, 1, 2, 4, 4, 2, 3, 1, 2, 3, 4...
## $ History                          <int> 1, 1, 1, 4, 3, 5, 3, 5, 3, 3, 3, 2...
## $ Psychology                       <int> 5, 3, 2, 4, 2, 3, 3, 2, 2, 2, 3, 2...
## $ Politics                         <int> 1, 4, 1, 5, 3, 4, 1, 3, 1, 3, 3, 5...
## $ Mathematics                      <int> 3, 5, 5, 4, 2, 2, 1, 1, 1, 3, 2, 1...
## $ Physics                          <int> 3, 2, 2, 1, 2, 3, 1, 1, 1, 1, 1, 1...
## $ Internet                         <int> 5, 4, 4, 3, 2, 4, 2, 5, 1, 5, 4, 5...
## $ PC                               <int> 3, 4, 2, 1, 2, 4, 1, 4, 1, 1, 5, 4...
## $ `Economy Management`             <int> 5, 5, 4, 2, 2, 1, 3, 1, 1, 4, 3, 1...
## $ Biology                          <int> 3, 1, 1, 3, 3, 4, 5, 2, 3, 2, 2, 1...
## $ Chemistry                        <int> 3, 1, 1, 3, 3, 4, 5, 2, 1, 1, 1, 1...
## $ Reading                          <int> 3, 4, 5, 5, 5, 3, 3, 2, 5, 4, 3, 3...
## $ Geography                        <int> 3, 4, 2, 4, 2, 3, 3, 3, 1, 4, 3, 5...
## $ `Foreign languages`              <int> 5, 5, 5, 4, 3, 4, 4, 4, 1, 5, 5, 2...
## $ Medicine                         <int> 3, 1, 2, 2, 3, 4, 5, 1, 1, 1, 2, 1...
## $ Law                              <int> 1, 2, 3, 5, 2, 3, 3, 2, 1, 1, 4, 3...
## $ Cars                             <int> 1, 2, 1, 1, 3, 5, 4, 1, 1, 1, 2, 1...
## $ `Art exhibitions`                <int> 1, 2, 5, 5, 1, 2, 1, 1, 1, 4, 2, 5...
## $ Religion                         <int> 1, 1, 5, 4, 4, 2, 1, 2, 2, 4, 2, 1...
## $ `Countryside, outdoors`          <int> 5, 1, 5, 1, 4, 5, 4, 2, 4, 4, 4, 5...
## $ Dancing                          <int> 3, 1, 5, 1, 1, 1, 3, 1, 1, 5, 1, 1...
## $ `Musical instruments`            <int> 3, 1, 5, 1, 3, 5, 2, 1, 2, 3, 1, 1...
## $ Writing                          <int> 2, 1, 5, 3, 1, 1, 1, 1, 1, 1, 1, 1...
## $ `Passive sport`                  <int> 1, 1, 5, 1, 3, 5, 5, 4, 4, 4, 5, 5...
## $ `Active sport`                   <int> 5, 1, 2, 1, 1, 4, 3, 5, 1, 4, 1, 3...
## $ Gardening                        <int> 5, 1, 1, 1, 4, 2, 3, 1, 1, 1, 3, 1...
## $ Celebrities                      <int> 1, 2, 1, 2, 3, 1, 1, 3, 5, 2, 2, 2...
## $ Shopping                         <int> 4, 3, 4, 4, 3, 2, 3, 3, 2, 4, 5, 3...
## $ `Science and technology`         <int> 4, 3, 2, 3, 3, 3, 4, 2, 1, 3, 4, 3...
## $ Theatre                          <int> 2, 2, 5, 1, 2, 1, 3, 2, 5, 5, 2, 1...
## $ `Fun with friends`               <int> 5, 4, 5, 2, 4, 3, 5, 4, 4, 5, 4, 3...
## $ `Adrenaline sports`              <int> 4, 2, 5, 1, 2, 3, 1, 2, 1, 2, 1, 1...
## $ Pets                             <int> 4, 5, 5, 1, 1, 2, 5, 5, 1, 2, 5, 1...
## $ Flying                           <int> 1, 1, 1, 2, 1, 3, 1, 3, 2, 4, 1, 4...
## $ Storm                            <int> 1, 1, 1, 1, 2, 2, 3, 2, 3, 5, 1, 1...
## $ Darkness                         <int> 1, 1, 1, 1, 1, 2, 2, 4, 1, 4, 2, 1...
## $ Heights                          <int> 1, 2, 1, 3, 1, 2, 1, 3, 5, 5, 2, 3...
## $ Spiders                          <int> 1, 1, 1, 5, 1, 1, 1, 1, 5, 3, 2, 5...
## $ Snakes                           <int> 5, 1, 1, 5, 1, 2, 5, 5, 5, 4, 1, 5...
## $ Rats                             <int> 3, 1, 1, 5, 2, 2, 1, 3, 2, 4, 1, 5...
## $ Ageing                           <int> 1, 3, 1, 4, 2, 1, 4, 1, 2, 3, 1, 5...
## $ `Dangerous dogs`                 <int> 3, 1, 1, 5, 4, 1, 1, 2, 3, 5, 4, 5...
## $ `Fear of public speaking`        <int> 2, 4, 2, 5, 3, 3, 1, 4, 4, 3, 2, 5...
## $ Smoking                          <chr> "never smoked", "never smoked", "t...
## $ Alcohol                          <chr> "drink a lot", "drink a lot", "dri...
## $ `Healthy eating`                 <int> 4, 3, 3, 3, 4, 2, 4, 2, 1, 3, 3, 3...
## $ `Daily events`                   <int> 2, 3, 1, 4, 3, 2, 3, 3, 1, 4, 3, 3...
## $ `Prioritising workload`          <int> 2, 2, 2, 4, 1, 2, 5, 1, 2, 2, 2, 1...
## $ `Writing notes`                  <int> 5, 4, 5, 4, 2, 3, 5, 3, 1, 2, 4, 5...
## $ Workaholism                      <int> 4, 5, 3, 5, 3, 3, 5, 2, 4, 3, 2, 3...
## $ `Thinking ahead`                 <int> 2, 4, 5, 3, 5, 3, 3, 4, 2, 3, 3, 1...
## $ `Final judgement`                <int> 5, 1, 3, 1, 5, 1, 3, 3, 5, 5, 3, 1...
## $ Reliability                      <int> 4, 4, 4, 3, 5, 3, 4, 3, 5, 4, 4, 3...
## $ `Keeping promises`               <int> 4, 4, 5, 4, 4, 4, 5, 3, 4, 5, 4, 3...
## $ `Loss of interest`               <int> 1, 3, 1, 5, 2, 3, 3, 1, 1, 3, 1, 3...
## $ `Friends versus money`           <int> 3, 4, 5, 2, 3, 2, 4, 4, 4, 4, 3, 3...
## $ Funniness                        <int> 5, 3, 2, 1, 3, 3, 4, 4, 2, 3, 2, 5...
## $ Fake                             <int> 1, 2, 4, 1, 2, 1, 1, 2, 2, 1, 1, 3...
## $ `Criminal damage`                <int> 1, 1, 1, 5, 1, 4, 2, 1, 1, 2, 1, 5...
## $ `Decision making`                <int> 3, 2, 3, 5, 3, 2, 2, 3, 4, 5, 5, 3...
## $ Elections                        <int> 4, 5, 5, 5, 5, 5, 5, 5, 1, 5, 5, 1...
## $ `Self-criticism`                 <int> 1, 4, 4, 5, 5, 4, 3, 3, 3, 4, 4, 5...
## $ `Judgment calls`                 <int> 3, 4, 4, 4, 5, 4, 5, 5, 2, 5, 5, 3...
## $ Hypochondria                     <int> 1, 1, 1, 3, 1, 1, 1, 2, 2, 1, 2, 5...
## $ Empathy                          <int> 3, 2, 5, 3, 3, 4, 4, 1, 5, 4, 5, 5...
## $ `Eating to survive`              <int> 1, 1, 5, 1, 1, 2, 1, 2, 1, 1, 2, 1...
## $ Giving                           <int> 4, 2, 5, 1, 3, 3, 5, 3, 1, 4, 3, 1...
## $ `Compassion to animals`          <int> 5, 4, 4, 2, 3, 5, 5, 5, 4, 5, 5, 2...
## $ `Borrowed stuff`                 <int> 4, 3, 2, 5, 4, 5, 5, 2, 5, 4, 4, 2...
## $ Loneliness                       <int> 3, 2, 5, 5, 3, 2, 3, 2, 4, 2, 2, 4...
## $ `Cheating in school`             <int> 2, 4, 3, 5, 5, 4, 2, 5, 5, 3, 3, 5...
## $ Health                           <int> 1, 4, 2, 1, 3, 3, 3, 3, 4, 4, 3, 2...
## $ `Changing the past`              <int> 1, 4, 5, 5, 4, 3, 1, 2, 5, 2, 3, 3...
## $ God                              <int> 1, 1, 5, 4, 5, 3, 5, 4, 5, 5, 4, 1...
## $ Dreams                           <int> 4, 3, 1, 3, 3, 3, 3, 4, 4, 3, 3, 3...
## $ Charity                          <int> 2, 1, 3, 3, 3, 2, 3, 1, 1, 2, 1, 3...
## $ `Number of friends`              <int> 3, 3, 3, 1, 3, 3, 3, 4, 2, 3, 3, 4...
## $ Punctuality                      <chr> "i am always on time", "i am often...
## $ Lying                            <chr> "never", "sometimes", "sometimes",...
## $ Waiting                          <int> 3, 3, 2, 1, 3, 3, 4, 1, 2, 1, 3, 3...
## $ `New environment`                <int> 4, 4, 3, 1, 4, 4, 5, 4, 2, 4, 3, 5...
## $ `Mood swings`                    <int> 3, 4, 4, 5, 2, 3, 5, 3, 3, 4, 3, 5...
## $ `Appearence and gestures`        <int> 4, 4, 3, 3, 3, 3, 4, 4, 4, 3, 4, 2...
## $ Socializing                      <int> 3, 4, 5, 1, 3, 4, 5, 2, 4, 4, 3, 5...
## $ Achievements                     <int> 4, 2, 3, 3, 3, 2, 4, 4, 2, 4, 3, 3...
## $ `Responding to a serious letter` <int> 3, 4, 4, 3, 3, 2, 3, 3, 2, 4, 4, 3...
## $ Children                         <int> 5, 2, 4, 2, 5, 3, 2, 4, 4, 3, 5, 5...
## $ Assertiveness                    <int> 1, 2, 3, 5, 4, 4, 3, 3, 1, 4, 2, 4...
## $ `Getting angry`                  <int> 1, 5, 4, 5, 2, 3, 3, 1, 3, 3, 1, 3...
## $ `Knowing the right people`       <int> 3, 4, 3, 4, 3, 4, 4, 4, 3, 4, 3, 5...
## $ `Public speaking`                <int> 5, 4, 2, 5, 5, 4, 3, 5, 4, 5, 3, 5...
## $ Unpopularity                     <int> 5, 4, 4, 3, 5, 4, 3, 2, 5, 3, 3, 2...
## $ `Life struggles`                 <int> 1, 1, 4, 3, 2, 3, 5, 2, 4, 5, 5, 4...
## $ `Happiness in life`              <int> 4, 4, 4, 2, 3, 3, 5, 4, 3, 4, 4, 3...
## $ `Energy levels`                  <int> 5, 3, 4, 2, 5, 4, 4, 4, 1, 4, 3, 3...
## $ `Small - big dogs`               <int> 1, 5, 3, 1, 3, 4, 3, 3, 5, 1, 2, 1...
## $ Personality                      <int> 4, 3, 3, 2, 3, 3, 3, 4, 3, 3, 3, 3...
## $ `Finding lost valuables`         <int> 3, 4, 3, 1, 2, 3, 2, 2, 5, 3, 2, 3...
## $ `Getting up`                     <int> 2, 5, 4, 1, 4, 3, 2, 5, 5, 4, 4, 5...
## $ `Interests or hobbies`           <int> 3, 3, 5, NA, 3, 5, 4, 4, 1, 3, 3, ...
## $ `Parents' advice`                <int> 4, 2, 3, 2, 3, 3, 4, 3, 4, 3, 4, 4...
## $ `Questionnaires or polls`        <int> 3, 3, 1, 4, 3, 4, 5, 3, 3, 3, 4, 4...
## $ `Internet usage`                 <chr> "few hours a day", "few hours a da...
## $ Finances                         <int> 3, 3, 2, 2, 4, 2, 4, 3, 2, 4, 2, 2...
## $ `Shopping centres`               <int> 4, 4, 4, 4, 3, 3, 3, 4, 1, 4, 4, 2...
## $ `Branded clothing`               <int> 5, 1, 1, 3, 4, 3, 1, 4, 3, 4, 2, 1...
## $ `Entertainment spending`         <int> 3, 4, 4, 3, 3, 3, 3, 4, 2, 2, 3, 3...
## $ `Spending on looks`              <int> 3, 2, 3, 4, 3, 1, 4, 4, 1, 3, 4, 1...
## $ `Spending on gadgets`            <int> 1, 5, 4, 4, 2, 4, 1, 3, 3, 2, 2, 1...
## $ `Spending on healthy eating`     <int> 3, 2, 2, 1, 4, 4, 5, 2, 4, 4, 2, 2...
## $ Age                              <int> 20, 19, 20, 22, 20, 20, 20, 19, 18...
## $ Height                           <int> 163, 163, 176, 172, 170, 186, 177,...
## $ Weight                           <int> 48, 58, 67, 59, 59, 77, 50, 90, 55...
## $ `Number of siblings`             <int> 1, 2, 2, 1, 1, 1, 1, 1, 1, 3, 2, 1...
## $ Gender                           <chr> "female", "female", "female", "fem...
## $ `Left - right handed`            <chr> "right handed", "right handed", "r...
## $ Education                        <chr> "college/bachelor degree", "colleg...
## $ `Only child`                     <chr> "no", "no", "no", "yes", "no", "no...
## $ `Village - town`                 <chr> "village", "city", "city", "city",...
## $ `House - block of flats`         <chr> "block of flats", "block of flats"...

- Preprocessing

The NA’s were cleaned and the new data set created as subset data frame includes main titles of Young People Survey’s Hobbies & Interests categories. The entire data is not east to read, thus only subset summary was showed here.

young_survey <- na.omit(data)

s_df = as.data.frame(young_survey)

s_df = s_df[32:63]

summary(s_df)

##     History        Psychology      Politics      Mathematics       Physics     
##  Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:2.00   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :3.000   Median :3.00   Median :3.000   Median :2.000   Median :2.000  
##  Mean   :3.226   Mean   :3.14   Mean   :2.627   Mean   :2.401   Mean   :2.096  
##  3rd Qu.:4.000   3rd Qu.:4.00   3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :5.00   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##     Internet           PC        Economy Management    Biology     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000      Min.   :1.000  
##  1st Qu.:4.000   1st Qu.:2.000   1st Qu.:1.000      1st Qu.:1.000  
##  Median :4.000   Median :3.000   Median :2.000      Median :2.000  
##  Mean   :4.188   Mean   :3.136   Mean   :2.662      Mean   :2.621  
##  3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:4.000      3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000      Max.   :5.000  
##    Chemistry        Reading      Geography     Foreign languages
##  Min.   :1.000   Min.   :1.0   Min.   :1.000   Min.   :1.000    
##  1st Qu.:1.000   1st Qu.:2.0   1st Qu.:2.000   1st Qu.:3.000    
##  Median :2.000   Median :3.0   Median :3.000   Median :4.000    
##  Mean   :2.121   Mean   :3.2   Mean   :3.109   Mean   :3.813    
##  3rd Qu.:3.000   3rd Qu.:5.0   3rd Qu.:4.000   3rd Qu.:5.000    
##  Max.   :5.000   Max.   :5.0   Max.   :5.000   Max.   :5.000    
##     Medicine          Law             Cars       Art exhibitions
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :2.000   Median :2.000  
##  Mean   :2.475   Mean   :2.224   Mean   :2.634   Mean   :2.617  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##     Religion     Countryside, outdoors    Dancing      Musical instruments
##  Min.   :1.000   Min.   :1.000         Min.   :1.000   Min.   :1.000      
##  1st Qu.:1.000   1st Qu.:3.000         1st Qu.:1.000   1st Qu.:1.000      
##  Median :2.000   Median :4.000         Median :2.000   Median :2.000      
##  Mean   :2.229   Mean   :3.614         Mean   :2.399   Mean   :2.302      
##  3rd Qu.:3.000   3rd Qu.:5.000         3rd Qu.:3.000   3rd Qu.:3.000      
##  Max.   :5.000   Max.   :5.000         Max.   :5.000   Max.   :5.000      
##     Writing      Passive sport    Active sport     Gardening    
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :1.000   Median :4.000   Median :3.000   Median :1.000  
##  Mean   :1.866   Mean   :3.394   Mean   :3.236   Mean   :1.872  
##  3rd Qu.:2.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:2.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##   Celebrities       Shopping     Science and technology    Theatre     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000          Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000          1st Qu.:2.000  
##  Median :2.000   Median :3.000   Median :3.000          Median :3.000  
##  Mean   :2.319   Mean   :3.257   Mean   :3.271          Mean   :3.023  
##  3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000          3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000          Max.   :5.000  
##  Fun with friends Adrenaline sports      Pets      
##  Min.   :2.000    Min.   :1.00      Min.   :1.000  
##  1st Qu.:4.000    1st Qu.:2.00      1st Qu.:2.000  
##  Median :5.000    Median :3.00      Median :4.000  
##  Mean   :4.552    Mean   :2.88      Mean   :3.324  
##  3rd Qu.:5.000    3rd Qu.:4.00      3rd Qu.:5.000  
##  Max.   :5.000    Max.   :5.00      Max.   :5.000

- Application of PCA

A correlation matrix was provided for the first ten columns due to readability.

cor(s_df[1:10])

##                         History   Psychology    Politics Mathematics    Physics
## History             1.000000000  0.277589253  0.40042159 0.006775058 0.07072863
## Psychology          0.277589253  1.000000000  0.18478753 0.033418822 0.06437929
## Politics            0.400421587  0.184787526  1.00000000 0.094843697 0.10597994
## Mathematics         0.006775058  0.033418822  0.09484370 1.000000000 0.61445068
## Physics             0.070728629  0.064379294  0.10597994 0.614450677 1.00000000
## Internet           -0.000188671 -0.001358391  0.07408342 0.164590446 0.09649070
## PC                  0.027415190 -0.073081652  0.12169054 0.328369470 0.33765152
## Economy Management  0.042573266  0.090542483  0.30657473 0.234618951 0.02414031
## Biology             0.014251442  0.184610980 -0.08653935 0.089351976 0.23540484
## Chemistry           0.020638132  0.043194424 -0.06991511 0.182141475 0.31794058
##                        Internet          PC Economy Management     Biology
## History            -0.000188671  0.02741519         0.04257327  0.01425144
## Psychology         -0.001358391 -0.07308165         0.09054248  0.18461098
## Politics            0.074083423  0.12169054         0.30657473 -0.08653935
## Mathematics         0.164590446  0.32836947         0.23461895  0.08935198
## Physics             0.096490702  0.33765152         0.02414031  0.23540484
## Internet            1.000000000  0.45741969         0.15937134 -0.11974364
## PC                  0.457419695  1.00000000         0.16911260 -0.10609323
## Economy Management  0.159371341  0.16911260         1.00000000 -0.17240504
## Biology            -0.119743644 -0.10609323        -0.17240504  1.00000000
## Chemistry          -0.102591240 -0.06726759        -0.18648820  0.67845540
##                      Chemistry
## History             0.02063813
## Psychology          0.04319442
## Politics           -0.06991511
## Mathematics         0.18214148
## Physics             0.31794058
## Internet           -0.10259124
## PC                 -0.06726759
## Economy Management -0.18648820
## Biology             0.67845540
## Chemistry           1.00000000

Importance of components and their loadings tables were created to understand which variable has an higher effect to identify the variance. First 3 principal components are responsible for 31% of the variance and first 16 PC explain 77% of variance.

pca <- princomp(as.matrix(s_df[1:32]),cor=T)
summary(pca,loadings=TRUE)

## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.0374787 1.8233387 1.60327816 1.48530949 1.26023134
## Proportion of Variance 0.1297287 0.1038926 0.08032815 0.06894201 0.04963072
## Cumulative Proportion  0.1297287 0.2336214 0.31394951 0.38289152 0.43252224
##                            Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     1.20132035 1.07959740 1.06930606 1.05342776 0.98848494
## Proportion of Variance 0.04509908 0.03642283 0.03573173 0.03467844 0.03053445
## Cumulative Proportion  0.47762132 0.51404415 0.54977588 0.58445432 0.61498878
##                           Comp.11    Comp.12    Comp.13    Comp.14    Comp.15
## Standard deviation     0.96843882 0.94342106 0.92688407 0.89350216 0.87546384
## Proportion of Variance 0.02930855 0.02781385 0.02684731 0.02494832 0.02395115
## Cumulative Proportion  0.64429733 0.67211118 0.69895850 0.72390681 0.74785797
##                           Comp.16   Comp.17    Comp.18   Comp.19    Comp.20
## Standard deviation     0.85981220 0.8379842 0.81377184 0.7619490 0.74936288
## Proportion of Variance 0.02310241 0.0219443 0.02069452 0.0181427 0.01754827
## Cumulative Proportion  0.77096037 0.7929047 0.81359919 0.8317419 0.84929016
##                           Comp.21    Comp.22    Comp.23    Comp.24    Comp.25
## Standard deviation     0.73524390 0.72622112 0.70400198 0.70113254 0.67137253
## Proportion of Variance 0.01689324 0.01648116 0.01548809 0.01536209 0.01408566
## Cumulative Proportion  0.86618340 0.88266456 0.89815265 0.91351474 0.92760039
##                           Comp.26    Comp.27    Comp.28    Comp.29     Comp.30
## Standard deviation     0.64706583 0.62748835 0.58745885 0.58160258 0.553117299
## Proportion of Variance 0.01308419 0.01230443 0.01078462 0.01057067 0.009560586
## Cumulative Proportion  0.94068459 0.95298901 0.96377363 0.97434431 0.983904895
##                           Comp.31     Comp.32
## Standard deviation     0.53278768 0.480812500
## Proportion of Variance 0.00887071 0.007224396
## Cumulative Proportion  0.99277560 1.000000000
## 
## Loadings:
##                        Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## History                 0.184  0.117  0.196  0.225  0.146  0.222  0.252       
## Psychology              0.239                0.108  0.168        -0.127  0.201
## Politics                       0.220  0.289  0.143  0.280  0.137  0.111       
## Mathematics                    0.317 -0.119  0.101        -0.292 -0.287       
## Physics                        0.355 -0.231  0.104        -0.135 -0.188  0.127
## Internet                       0.210  0.120               -0.435  0.213 -0.228
## PC                             0.363               -0.176 -0.287              
## Economy Management             0.166  0.310         0.206 -0.196 -0.255       
## Biology                 0.273        -0.371 -0.159  0.211                     
## Chemistry               0.192        -0.414 -0.108  0.240                     
## Reading                 0.279 -0.170         0.195                      -0.124
## Geography               0.140  0.160  0.152                0.154  0.337 -0.358
## Foreign languages       0.204         0.199               -0.145 -0.108 -0.470
## Medicine                0.258        -0.316 -0.132  0.280               -0.122
## Law                     0.123  0.143  0.283         0.395                0.178
## Cars                           0.333        -0.199                            
## Art exhibitions         0.324                      -0.200                     
## Religion                0.222                0.176                            
## Countryside, outdoors   0.203                      -0.369         0.165 -0.143
## Dancing                 0.254               -0.216               -0.237       
## Musical instruments     0.213                      -0.315                     
## Writing                 0.230                0.186 -0.147 -0.107         0.319
## Passive sport                  0.155        -0.215 -0.119         0.245 -0.167
## Active sport                   0.178        -0.258 -0.144  0.364 -0.115  0.203
## Gardening               0.191               -0.144 -0.160         0.333  0.334
## Celebrities                   -0.104  0.167 -0.356  0.119 -0.323  0.136  0.119
## Shopping                      -0.168  0.144 -0.410        -0.228              
## Science and technology         0.364                                          
## Theatre                 0.316 -0.112               -0.126        -0.162       
## Fun with friends                      0.129 -0.270 -0.102        -0.308 -0.266
## Adrenaline sports              0.225        -0.230 -0.181  0.322 -0.195       
## Pets                                        -0.249                0.274  0.205
##                        Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## History                 0.170                                  0.224   0.163 
## Psychology              0.130 -0.272   0.123   0.447  -0.226   0.218  -0.260 
## Politics                              -0.202           0.132           0.158 
## Mathematics            -0.149  0.332                  -0.216                 
## Physics                        0.221          -0.114                         
## Internet                0.107 -0.241           0.184           0.266   0.195 
## PC                      0.103 -0.172   0.119           0.121           0.133 
## Economy Management     -0.263  0.176                  -0.192                 
## Biology                                                                      
## Chemistry                                     -0.113                   0.129 
## Reading                 0.167  0.184                                   0.143 
## Geography              -0.177          0.205  -0.346  -0.196                 
## Foreign languages                      0.171          -0.170  -0.407   0.101 
## Medicine                      -0.231                                         
## Law                                   -0.158           0.140  -0.178         
## Cars                                                   0.320  -0.163  -0.177 
## Art exhibitions         0.141                 -0.172   0.105          -0.308 
## Religion               -0.358 -0.135           0.362   0.352   0.162         
## Countryside, outdoors  -0.298  0.275   0.197   0.279                  -0.124 
## Dancing                -0.311                                  0.147   0.263 
## Musical instruments           -0.308  -0.333                  -0.513   0.210 
## Writing                 0.164 -0.290          -0.228  -0.331                 
## Passive sport                         -0.647   0.105  -0.351   0.111  -0.306 
## Active sport                  -0.137   0.215  -0.128  -0.118           0.101 
## Gardening              -0.171  0.195                           0.161   0.322 
## Celebrities                                   -0.255   0.108          -0.103 
## Shopping                                      -0.103   0.157          -0.169 
## Science and technology  0.281          0.206           0.199          -0.101 
## Theatre                 0.219  0.203  -0.131  -0.116   0.212          -0.268 
## Fun with friends        0.336  0.104  -0.247           0.133   0.272   0.392 
## Adrenaline sports       0.118          0.201          -0.228                 
## Pets                    0.319  0.354   0.102   0.419  -0.167  -0.375         
##                        Comp.16 Comp.17 Comp.18 Comp.19 Comp.20 Comp.21 Comp.22
## History                 0.289                           0.426           0.149 
## Psychology                              0.167   0.280   0.217  -0.109  -0.107 
## Politics                       -0.124   0.156                   0.157   0.191 
## Mathematics             0.236  -0.135  -0.186   0.106                         
## Physics                 0.171  -0.151                                  -0.240 
## Internet                0.168   0.118                  -0.285          -0.116 
## PC                              0.208                  -0.119           0.278 
## Economy Management     -0.303           0.133  -0.202          -0.229   0.346 
## Biology                                                        -0.144   0.155 
## Chemistry                                                                     
## Reading                         0.219  -0.348   0.110  -0.216   0.332         
## Geography               0.138  -0.326   0.104                  -0.306  -0.131 
## Foreign languages      -0.163   0.150  -0.265           0.280          -0.178 
## Medicine                                               -0.174  -0.103         
## Law                             0.119                  -0.344          -0.287 
## Cars                            0.144  -0.155   0.115   0.115  -0.192  -0.436 
## Art exhibitions                 0.120                  -0.113  -0.442         
## Religion                       -0.354  -0.505  -0.230                         
## Countryside, outdoors                   0.357   0.212           0.298         
## Dancing                 0.244   0.183   0.245  -0.349                  -0.254 
## Musical instruments     0.235  -0.141   0.175   0.119   0.147           0.124 
## Writing                -0.199  -0.217          -0.236           0.139  -0.189 
## Passive sport                   0.130  -0.125  -0.146   0.107   0.218         
## Active sport            0.228   0.308  -0.335  -0.110                   0.227 
## Gardening              -0.466   0.128           0.237   0.113  -0.146  -0.116 
## Celebrities             0.216  -0.232  -0.102   0.322           0.184   0.265 
## Shopping                       -0.142          -0.127   0.327   0.180  -0.135 
## Science and technology -0.266           0.108  -0.358   0.172   0.285         
## Theatre                         0.155          -0.148  -0.157           0.115 
## Fun with friends       -0.158  -0.286                          -0.131         
## Adrenaline sports      -0.162  -0.263           0.261  -0.316   0.175         
## Pets                    0.195  -0.167          -0.272  -0.120  -0.168         
##                        Comp.23 Comp.24 Comp.25 Comp.26 Comp.27 Comp.28 Comp.29
## History                 0.227   0.281   0.236           0.246   0.108   0.131 
## Psychology             -0.241  -0.139  -0.145   0.113          -0.165  -0.104 
## Politics               -0.250  -0.302   0.158  -0.214  -0.460   0.165  -0.187 
## Mathematics                                                             0.169 
## Physics                        -0.141                           0.265  -0.183 
## Internet                0.177                   0.185  -0.253   0.219   0.272 
## PC                                             -0.166   0.321  -0.145  -0.569 
## Economy Management              0.365           0.201                         
## Biology                                                                -0.128 
## Chemistry               0.101   0.159   0.251  -0.102  -0.145  -0.126         
## Reading                -0.162   0.209           0.278  -0.234  -0.329  -0.160 
## Geography              -0.116          -0.157   0.201          -0.148  -0.150 
## Foreign languages              -0.185          -0.243   0.133   0.226         
## Medicine                        0.107  -0.121                   0.218         
## Law                     0.376          -0.267           0.296  -0.173         
## Cars                   -0.277   0.430   0.104          -0.211                 
## Art exhibitions         0.206  -0.121   0.154  -0.405  -0.223  -0.253   0.187 
## Religion                                                                      
## Countryside, outdoors   0.142   0.182  -0.282  -0.185  -0.131                 
## Dancing                -0.304           0.244  -0.171   0.172  -0.162         
## Musical instruments                             0.348          -0.116         
## Writing                         0.362  -0.133  -0.250           0.210         
## Passive sport                                           0.136                 
## Active sport                   -0.159  -0.338          -0.212   0.183         
## Gardening                      -0.224   0.110   0.181   0.109                 
## Celebrities            -0.338                  -0.159   0.186           0.207 
## Shopping                0.406  -0.142           0.186  -0.214          -0.344 
## Science and technology -0.164  -0.191  -0.112                  -0.307   0.399 
## Theatre                -0.142                   0.308   0.223   0.459         
## Fun with friends                       -0.331  -0.125                         
## Adrenaline sports       0.124           0.483   0.128                         
## Pets                                                                          
##                        Comp.30 Comp.31 Comp.32
## History                 0.177                 
## Psychology             -0.165                 
## Politics                        0.106         
## Mathematics                     0.592         
## Physics                 0.169  -0.609         
## Internet                                      
## PC                              0.115  -0.115 
## Economy Management             -0.247         
## Biology                                 0.777 
## Chemistry              -0.627  -0.108  -0.295 
## Reading                 0.167  -0.134         
## Geography              -0.102          -0.107 
## Foreign languages                             
## Medicine                0.508   0.136  -0.475 
## Law                    -0.152                 
## Cars                                          
## Art exhibitions                -0.132         
## Religion               -0.127                 
## Countryside, outdoors                         
## Dancing                 0.163                 
## Musical instruments                           
## Writing                         0.133         
## Passive sport                                 
## Active sport           -0.120                 
## Gardening                                     
## Celebrities                    -0.116         
## Shopping                0.160   0.118         
## Science and technology                        
## Theatre                -0.250   0.192         
## Fun with friends                              
## Adrenaline sports                             
## Pets

These first 16 PC that explain 77% of variance used as input to PCA matrix.

pca <- princomp(as.matrix(s_df[1:16]),cor=T)
summary(pca,loadings=TRUE)

## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     1.6552342 1.6248718 1.4967461 1.08102708 1.03100086
## Proportion of Variance 0.1712375 0.1650130 0.1400156 0.07303872 0.06643517
## Cumulative Proportion  0.1712375 0.3362505 0.4762661 0.54930482 0.61574000
##                            Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     1.01792860 0.95720869 0.83035903 0.81731697 0.73044156
## Proportion of Variance 0.06476116 0.05726553 0.04309351 0.04175044 0.03334655
## Cumulative Proportion  0.68050116 0.73776669 0.78086020 0.82261064 0.85595719
##                           Comp.11    Comp.12    Comp.13    Comp.14    Comp.15
## Standard deviation     0.71950819 0.67698546 0.66000951 0.57274280 0.56040481
## Proportion of Variance 0.03235575 0.02864433 0.02722578 0.02050214 0.01962835
## Cumulative Proportion  0.88831294 0.91695728 0.94418306 0.96468520 0.98431355
##                           Comp.16
## Standard deviation     0.50098221
## Proportion of Variance 0.01568645
## Cumulative Proportion  1.00000000
## 
## Loadings:
##                    Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## History             0.193  0.207  0.311         0.266  0.416  0.247       
## Psychology          0.249         0.239 -0.145 -0.316         0.571 -0.357
## Politics            0.125  0.366  0.264  0.280                            
## Mathematics         0.167  0.290 -0.295 -0.280 -0.325  0.286 -0.258       
## Physics             0.258  0.242 -0.356        -0.128  0.379              
## Internet                   0.297 -0.138 -0.359  0.258 -0.346  0.374 -0.129
## PC                         0.376 -0.283 -0.224  0.167         0.238  0.164
## Economy Management         0.352  0.119        -0.436 -0.322 -0.294 -0.264
## Biology             0.480 -0.176 -0.159               -0.189        -0.127
## Chemistry           0.442 -0.145 -0.240                                   
## Reading             0.226 -0.119  0.338 -0.389         0.181         0.466
## Geography           0.157  0.207  0.173         0.573        -0.417 -0.472
## Foreign languages   0.170         0.305 -0.427        -0.382 -0.272  0.280
## Medicine            0.475 -0.127 -0.114  0.141        -0.294              
## Law                 0.155  0.292  0.263  0.386 -0.206 -0.163         0.246
## Cars                       0.327 -0.219  0.335  0.206 -0.160         0.393
##                    Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## History                    0.483   0.259   0.383           0.223         
## Psychology         -0.457 -0.134                          -0.217  -0.113 
## Politics            0.279         -0.546  -0.281  -0.471                 
## Mathematics                                0.164  -0.123   0.261  -0.585 
## Physics            -0.101 -0.207  -0.136                           0.698 
## Internet            0.401 -0.243   0.183   0.235  -0.301           0.114 
## PC                         0.290  -0.238  -0.376   0.535  -0.106  -0.156 
## Economy Management         0.395   0.394  -0.174                   0.248 
## Biology                    0.107          -0.106                         
## Chemistry           0.216  0.249           0.171  -0.175  -0.644  -0.131 
## Reading             0.173 -0.154   0.340  -0.483  -0.101                 
## Geography          -0.132 -0.256          -0.147   0.130  -0.142         
## Foreign languages  -0.321  0.123  -0.373   0.355                         
## Medicine                                  -0.135           0.595         
## Law                 0.225 -0.456           0.235   0.443  -0.132         
## Cars               -0.535          0.297  -0.112  -0.330                 
##                    Comp.16
## History                   
## Psychology                
## Politics                  
## Mathematics               
## Physics                   
## Internet                  
## PC                        
## Economy Management        
## Biology             0.782 
## Chemistry          -0.320 
## Reading                   
## Geography                 
## Foreign languages         
## Medicine           -0.497 
## Law                       
## Cars

According to graph below, with the first 8 PC, 78% of variance is explained. As a result, first 3 PC explain 48% and it is almost half of the variance and 78% of variance consist of first 8 PC.

ggplot(data.frame(pc=1:16,cum_var=c(0.1712375, 0.3362505, 0.4762661, 0.54930482, 0.61574000, 0.68050116, 0.73776669, 0.78086020, 0.82261064, 0.85595719, 0.88831294, 0.91695728, 0.94418306, 0.96468520, 0.98431355, 1.00000000)),aes(x=pc,y=cum_var)) + 
  geom_point() + 
  geom_line()

2.2 Multidimensional Scaling (MDS)

- Application of MDS

Multidimensional Scaling was applied to the same subset of the Young Survey Data set to investigate the relationship of each hobbies and interests.

From History to Pets, total of 32 categories were selected, and calculated the distance values by subtracting the correlation values from 1. Then each categories was located on a scale that has x and y coordinates and its origin (0,0).

s_df_mds <- s_df[,sapply(s_df,class)=="integer"] %>%
  dplyr::select(History:Pets)

set.seed(42)

s_df_mds_dist <- 1 - cor(s_df_mds)

s_df_mds <- cmdscale(s_df_mds_dist,k=2)

colnames(s_df_mds) <- c("x","y")

print(s_df_mds)

##                                   x            y
## History                 0.031306226 -0.057585867
## Psychology              0.221673919 -0.034255291
## Politics               -0.210706395  0.084207670
## Mathematics            -0.366106787 -0.301011922
## Physics                -0.338602615 -0.488946202
## Internet               -0.435984609  0.193575251
## PC                     -0.602886441 -0.104205122
## Economy Management     -0.313464728  0.369369369
## Biology                 0.249496019 -0.352229723
## Chemistry               0.111150203 -0.427188341
## Reading                 0.525002226 -0.036403993
## Geography              -0.082496760 -0.010735122
## Foreign languages       0.197742370  0.164499851
## Medicine                0.190107055 -0.319538506
## Law                    -0.091720302  0.234211611
## Cars                   -0.593017958  0.091655704
## Art exhibitions         0.380132671 -0.025937391
## Religion                0.177085206 -0.229380678
## Countryside, outdoors   0.159766147 -0.101417980
## Dancing                 0.326087243  0.151882648
## Musical instruments     0.165516710 -0.156630640
## Writing                 0.291051922 -0.065898705
## Passive sport          -0.288889494  0.153049302
## Active sport           -0.180541543  0.077474365
## Gardening               0.203069385 -0.051532921
## Celebrities             0.059109759  0.515094714
## Shopping                0.248971459  0.505415685
## Science and technology -0.368579088 -0.307211932
## Theatre                 0.452232345  0.006680821
## Fun with friends        0.002109219  0.282818197
## Adrenaline sports      -0.251016420  0.072054588
## Pets                    0.132403055  0.168120560

ggplot(data.frame(s_df_mds),aes(x=x,y=y)) +
  geom_text(label=rownames(s_df_mds),angle= 0, size=3) +

  labs(x="x",y="y", title="Multidimensional Scale of Hobbies and Interests")

- Result of MDS

According to the graph above, variances related to Sports is a group but Politics is also in this group. It can be stated that people tend to enjoy both Sports and Politics according to people who answered the survey. It may be an interesting result. Chemistry, Biology and Medicine is another group and it may result from people’s profession. Science and Technology and Mathematics have a strong relationship. On the other hand, Psychology and Gardening are two very close interests and its reasons can be investigated.

- Application of K Means

K Means clustering method was applied to output of MDS and the variables were clustered as 8 different centers.

set.seed(42)


categories_cluster<-kmeans(s_df_mds,centers=8)


mds_clusters<-data.frame(categories=names(categories_cluster$cluster),cluster_mds=categories_cluster$cluster) %>% arrange(cluster_mds,categories)
mds_clusters

##                                    categories cluster_mds
## Art exhibitions               Art exhibitions           1
## Dancing                               Dancing           1
## Reading                               Reading           1
## Theatre                               Theatre           1
## Mathematics                       Mathematics           2
## Physics                               Physics           2
## Science and technology Science and technology           2
## Active sport                     Active sport           3
## Adrenaline sports           Adrenaline sports           3
## Economy Management         Economy Management           3
## Geography                           Geography           3
## Law                                       Law           3
## Passive sport                   Passive sport           3
## Politics                             Politics           3
## Foreign languages           Foreign languages           4
## Fun with friends             Fun with friends           4
## Pets                                     Pets           4
## Biology                               Biology           5
## Chemistry                           Chemistry           5
## Medicine                             Medicine           5
## Religion                             Religion           5
## Cars                                     Cars           6
## Internet                             Internet           6
## PC                                         PC           6
## Countryside, outdoors   Countryside, outdoors           7
## Gardening                           Gardening           7
## History                               History           7
## Musical instruments       Musical instruments           7
## Psychology                         Psychology           7
## Writing                               Writing           7
## Celebrities                       Celebrities           8
## Shopping                             Shopping           8

- K Means Clusters

#Plot the output
ggplot(data.frame(s_df_mds) %>% mutate(clusters=as.factor(categories_cluster$cluster),category=rownames(s_df_mds)),aes(x=x,y=y)) + geom_text(aes(label=category,color=clusters),angle=45,size=3) + geom_point(data=as.data.frame(categories_cluster$centers),aes(x=x,y=y)
)

- Hierarchical Clustering

The graph below is a cluster dendrogram created from the Young Survey distance data set. Different types of cluster dendrogram can be provided by changing the method. The first graph method is the complete, the second one is the average. The closest and largest dissimilarity and, location of nodes are adjustment points for each method.

s_hc<-hclust(as.dist(s_df_mds_dist),method="complete")
plot(s_hc,hang=-1)

s_hc<-hclust(as.dist(s_df_mds_dist),method="average")
plot(s_hc,hang=-1)