VEDAT MILOR’S RESTAURANT RECOMMENDATIONS

by Deniz SARI and Sevde CIFCISERTBASI

Brief

As a group, our project is about Vedat Milor’s restaurant recommendations. This dataset contains 100 restaurants with their ID, Name, Country, City, State, Rating, Restaurant Type, Vegan/Vegetarian Option, Price Range and Vedat Milor’s Rate. In this project, first we collect data about the restaurants which Vedat Milor recommended. There were 146 restaurants. Then while using R, we attract data from Foursquare. After removing encoding, we obtain 98 restaurants’ data. Here is the code for collecting data from foursquare. This is free vercion of developer.foursquare.com subscription so it took time to form our current data set.

#library(tidyverse)
#library(jsonlite)
#library(lubridate)
#raw_data <- readxl::read_excel("C:\\Users\\Deniz\\Desktop\\VedatMilor-R.xlsx") %>% filter(!is.na(foursquare_id))

#get_place_details_from_id <- function(place_id="5bd03ceae075500038a886e9",client_id,client_key){

#  the_url <- paste0("https://api.foursquare.com/v2/venues/", place_id,"?client_id=",client_id,"&client_secret=",client_key,"&v=",format(lubridate::today(),"%Y%m%d"))

#  res <- jsonlite::fromJSON(the_url)
#  place_data <-
#    tibble(
#      id = place_id,
#      name = res$response$venue$name,
#      lat = res$response$venue$location$lat,
#      lng = res$response$venue$location$lng,
#      country = res$response$venue$location$country,
#      city = ifelse(is.null(res$response$venue$location$city),"",res$response$venue$location$city),
#      state = ifelse(is.null(res$response$venue$location$state),"",res$response$venue$location$state),
#      fsq_rating = ifelse(is.null(res$response$venue$rating),-50,res$response$venue$rating)

#    )

#  place_categories <- res$response$venue$categories
#  return(list(data=place_data,categories=place_categories,raw_result=res))
#}

#placedata=tibble()
#for(i in 52:nrow(raw_data)) {
#  print(i)
#  dd <- get_place_details_from_id(raw_data$foursquare_id[i], client_id = "ER2Q0HIICA25JOPBBNWV205ZSVQKODJAPSESAQJMILD2LPBC", client_key = "DPZBQBYW2NK0RC2V4FW4SOLCAYHPBCMFMQDAOAAL2JE11MWE")
#placedata=bind_rows(placedata, dd$data)
  
#}

#save(placedata,file="day1.RData")
#load("day1.RData")

After than, we transfer our data set to an excelsheet. Finally we used data set named as raw_data in this project for analysis.

###Objective

In this project we did: -Collect data from Foursquare -Clean raw data -Creating new data set named as raw data -Analyze data set -Compare data -Data mining -Visualization

####Project

First we installed and updated packages from library and data excelsheet.

library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------- tidyverse 1.2.1 --
## <U+221A> ggplot2 3.1.0       <U+221A> purrr   0.3.2  
## <U+221A> tibble  2.1.1       <U+221A> dplyr   0.8.0.1
## <U+221A> tidyr   0.8.3       <U+221A> stringr 1.4.0  
## <U+221A> readr   1.3.1       <U+221A> forcats 0.4.0
## -- Conflicts ------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(rworldmap)
## Loading required package: sp
## ### Welcome to rworldmap ###
## For a short introduction type :   vignette('rworldmap')
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(maptools)
## Checking rgeos availability: FALSE
##      Note: when rgeos is not available, polygon geometry     computations in maptools depend on gpclib,
##      which has a restricted licence. It is disabled by default;
##      to enable gpclib, type gpclibPermit()
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(ggplot2)
library(rworldmap)
library("tm")
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library("SnowballC")
library("wordcloud")
## Loading required package: RColorBrewer
library("RColorBrewer")
library("XML")
library("RCurl")
## Loading required package: bitops
## 
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
## 
##     complete
library(dplyr)
library(wordcloud)
raw_data <- readxl::read_excel("C:\\Users\\Deniz\\Desktop\\vedo2.xlsx")
## New names:
## * `` -> ...1
raw_data$Rating <- as.numeric(raw_data$Rating)
raw_data %>% 
 tbl_df()
## # A tibble: 98 x 11
##     ...1 id    Name  Country City  State Rating Type  `vegan/vejetary~
##    <dbl> <chr> <chr> <chr>   <chr> <chr>  <dbl> <chr>            <dbl>
##  1     1 4b7d~ L'Am~ France  Paris Île-~    8.4 fren~                0
##  2     2 5630~ Le C~ France  Paris Île-~    7.6 fren~                0
##  3     3 4e53~ Hedo~ United~ Lond~ Grea~    7.7 fren~                0
##  4     4 4adc~ Arpè~ France  Paris Île-~    8.4 fren~                1
##  5     5 4e36~ Rist~ Italy   Cast~ Abru~    7.5 ital~                0
##  6     6 4c07~ Le P~ France  Avig~ Prov~    0   fren~                0
##  7     7 4da0~ D'be~ Spain   Grove Gali~    8.8 seaf~                0
##  8     8 5025~ Kara~ Turkey  İsta~ İsta~    8.5 turk~                0
##  9     9 50b1~ Pepe~ Italy   Caia~ Camp~    9   pizz~                1
## 10    10 5bd5~ Kimo~ Japan   Tokyo Tokyo    0   japa~                0
## # ... with 88 more rows, and 2 more variables: `Price Range` <dbl>, `Vedat
## #   Milor's Rate` <dbl>
glimpse(raw_data)
## Observations: 98
## Variables: 11
## $ ...1                 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
## $ id                   <chr> "4b7daa23f964a5203fcd2fe3", "5630c420498e...
## $ Name                 <chr> "L'Ambroisie", "Le Clarence", "Hedone", "...
## $ Country              <chr> "France", "France", "United Kingdom", "Fr...
## $ City                 <chr> "Paris", "Paris", "London", "Paris", "Cas...
## $ State                <chr> "Île-de-France", "Île-de-France", "Greate...
## $ Rating               <dbl> 8.4, 7.6, 7.7, 8.4, 7.5, 0.0, 8.8, 8.5, 9...
## $ Type                 <chr> "french", "french", "french", "french", "...
## $ `vegan/vejetaryan`   <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ `Price Range`        <dbl> 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 4, 3, 3, 0,...
## $ `Vedat Milor's Rate` <dbl> 10.0, 9.0, 9.7, 10.0, 8.5, 6.0, 8.5, 8.0,...
raw_data %>%
  filter(Rating < `Vedat Milor's Rate`)
## # A tibble: 32 x 11
##     ...1 id    Name  Country City  State Rating Type  `vegan/vejetary~
##    <dbl> <chr> <chr> <chr>   <chr> <chr>  <dbl> <chr>            <dbl>
##  1     1 4b7d~ L'Am~ France  Paris Île-~    8.4 fren~                0
##  2     2 5630~ Le C~ France  Paris Île-~    7.6 fren~                0
##  3     3 4e53~ Hedo~ United~ Lond~ Grea~    7.7 fren~                0
##  4     4 4adc~ Arpè~ France  Paris Île-~    8.4 fren~                1
##  5     5 4e36~ Rist~ Italy   Cast~ Abru~    7.5 ital~                0
##  6     6 4c07~ Le P~ France  Avig~ Prov~    0   fren~                0
##  7    10 5bd5~ Kimo~ Japan   Tokyo Tokyo    0   japa~                0
##  8    11 5092~ Sais~ United~ San ~ CA       8.2 new ~                0
##  9    12 587c~ CHIU~ Japan   Tokyo Tokyo    0   japa~                0
## 10    14 4ca0~ Trat~ Italy   Pogg~ Abru~    0   oste~                0
## # ... with 22 more rows, and 2 more variables: `Price Range` <dbl>, `Vedat
## #   Milor's Rate` <dbl>
value <- raw_data %>%
  transmute(Name, Rating, `Vedat Milor's Rate`,realValue = ifelse(Rating<`Vedat Milor's Rate`,"Overrated","Undervalued")) %>%
  print()
## # A tibble: 98 x 4
##    Name                        Rating `Vedat Milor's Rate` realValue  
##    <chr>                        <dbl>                <dbl> <chr>      
##  1 L'Ambroisie                    8.4                 10   Overrated  
##  2 Le Clarence                    7.6                  9   Overrated  
##  3 Hedone                         7.7                  9.7 Overrated  
##  4 Arpège                         8.4                 10   Overrated  
##  5 Ristorante Reale               7.5                  8.5 Overrated  
##  6 Le Petit Nice                  0                    6   Overrated  
##  7 D'berto                        8.8                  8.5 Undervalued
##  8 Karadeniz Tadal Pide Salonu    8.5                  8   Undervalued
##  9 Pepe In Grani                  9                    8   Undervalued
## 10 Kimoto                         0                    5   Overrated  
## # ... with 88 more rows

Firstly, we identified main and common words of dataset.Than with these the common words we created a word map:

a=c("Italian","French","Burger","Steak","Wine","Seafood","SideDish", 
   "Turkish","Kebab","Breakfast","FineDinig","Delish","VedatMilor", 
   "Spanish","Taco","Pizza","IceCream","AmericanStreetFood","FastFood","Greek","Drinks", "Foursquare",
   "Gourmet","English","Coffee","Asian","Japaneese","StreetFood")
 
#I give a frequency to each word of this list 
b=sample(seq(0,1,0.01) , length(a) , replace=TRUE) 
 
#The package will automatically make the wordcloud ! (I add a black background)
par(bg="red") 
wordcloud(a , b , col=terrain.colors(length(a) , alpha=0.9) , rot.per=0.3 )
## Warning in wordcloud(a, b, col = terrain.colors(length(a), alpha = 0.9), :
## Taco could not be fit on page. It will not be plotted.
## Warning in wordcloud(a, b, col = terrain.colors(length(a), alpha = 0.9), :
## Japaneese could not be fit on page. It will not be plotted.

Here in this table, we classified restaurants as their types and it shows that Vedat Milor’s favorite kitchens are Italian and French. Turkish, Spanish and American kitchens are following them.

raw_data %>%
  count(Type, wt=NULL) %>%
  arrange(desc(n))
## # A tibble: 28 x 2
##    Type             n
##    <chr>        <int>
##  1 italian         19
##  2 french          14
##  3 seafood         11
##  4 restaurant       9
##  5 turkish          6
##  6 new american     4
##  7 pizza place      4
##  8 spanish          4
##  9 steakhouse       3
## 10 burger joint     2
## # ... with 18 more rows

We created a new data which named as topTen. topTen data shows us ten restaurants which deserves and has highest ratings in order to foursquare users. Than did the same thing for Vedat Milor’s rating which named as topTenofVM. When we compere top ten rated restaurants we identified that there were just two common ones.They are Le Cinq and El Celler de can Roca. It shows that society and Vedat Milor think different. It is not possible to say that these restaurants neither good or bad in order to these ratings. There are different factors such number of rate and the treatment against Vedat Milor.However we can easily say that these two common ones are well deserved. Also we checked the restaurant which has the largest difference. Yolgecen has the largest difference with 8 points. Actually rather than Yolgecen, Le Petit Nice surprised us with 6 points difference. Vedat Milor published long articles about this restaurant and also took video, this is kind of one of his favorite restaurant but seems like foursquare users dont like this restaurant as much as Vedat Milor.

topTen <- raw_data %>%
  arrange(desc(Rating)) %>%
  slice (1:10) %>%
  print(Name, Rating)
## # A tibble: 10 x 11
##     ...1 id    Name  Country City  State Rating Type  `vegan/vejetary~
##    <dbl> <chr> <chr> <chr>   <chr> <chr>  <dbl> <chr>            <dbl>
##  1    29 4adc~ Le C~ France  Paris Île-~    9.3 fren~                1
##  2    33 4b7f~ El C~ Spain   Giro~ Cata~    9.3 medi~                0
##  3    53 3fd6~ Le B~ United~ New ~ NY       9.2 seaf~                0
##  4    49 40e0~ Bone~ United~ Atla~ GA       9.1 stea~                0
##  5    51 4acb~ Katz~ United~ New ~ NY       9.1 jewi~                1
##  6    75 5b5d~ Brek~ Turkey  İsta~ İsta~    9.1 brea~                1
##  7    79 4b83~ Beyti Turkey  Bakı~ İsta~    9.1 turk~                0
##  8     9 50b1~ Pepe~ Italy   Caia~ Camp~    9   pizz~                1
##  9    45 54aa~ Mour~ United~ San ~ CA       9   moro~                1
## 10    46 49be~ La C~ United~ San ~ CA       9   ital~                1
## # ... with 2 more variables: `Price Range` <dbl>, `Vedat Milor's
## #   Rate` <dbl>
topTenofVM <- raw_data %>%
  arrange(desc(`Vedat Milor's Rate`)) %>%
  slice (1:10) %>%
  select(Name, `Vedat Milor's Rate`) %>%
  print()
## # A tibble: 10 x 2
##    Name                  `Vedat Milor's Rate`
##    <chr>                                <dbl>
##  1 L'Ambroisie                           10  
##  2 Arpège                                10  
##  3 Saison                                10  
##  4 Beyti                                 10  
##  5 Fauna                                 10  
##  6 Hedone                                 9.7
##  7 Dal Pescatore                          9.5
##  8 Le Clarence                            9  
##  9 Le Cinq                                9  
## 10 El Celler de Can Roca                  9
raw_data %>%
  transmute(Name, Difference = (abs(Rating-`Vedat Milor's Rate`))) %>%
  arrange(desc(Difference))
## # A tibble: 98 x 2
##    Name                  Difference
##    <chr>                      <dbl>
##  1 Yolgeçen (Lome Köyü)         8  
##  2 CHIUnE                       7  
##  3 Le Lampare                   7  
##  4 Le Petit Nice                6  
##  5 Trattoria della Posta        6  
##  6 Il Pomo D'Oro                6  
##  7 Restaurante Fagollaga        6  
##  8 La Broche des Ours           6  
##  9 Kimoto                       5  
## 10 Delfina                      3.8
## # ... with 88 more rows

Than we create another data set named as value. This is for compering ratigs which are foursquare users and Vedat Milor’s. When Vedat Milor rating restaurant higher than the foursquare users, we say its undervalued, and when Vedat Milor rating it lower than foursquare users it is overrated. First we graphed it as rating-restaurant name and Vedat Milor’s rating-restaurant name, than we also made another graph for compering on the same graph.

value <- raw_data %>%
  transmute(Name, Rating, `Vedat Milor's Rate`,realValue = ifelse(Rating<`Vedat Milor's Rate`,"Overrated","Undervalued")) %>%
  print()
## # A tibble: 98 x 4
##    Name                        Rating `Vedat Milor's Rate` realValue  
##    <chr>                        <dbl>                <dbl> <chr>      
##  1 L'Ambroisie                    8.4                 10   Overrated  
##  2 Le Clarence                    7.6                  9   Overrated  
##  3 Hedone                         7.7                  9.7 Overrated  
##  4 Arpège                         8.4                 10   Overrated  
##  5 Ristorante Reale               7.5                  8.5 Overrated  
##  6 Le Petit Nice                  0                    6   Overrated  
##  7 D'berto                        8.8                  8.5 Undervalued
##  8 Karadeniz Tadal Pide Salonu    8.5                  8   Undervalued
##  9 Pepe In Grani                  9                    8   Undervalued
## 10 Kimoto                         0                    5   Overrated  
## # ... with 88 more rows
 ggplot(data=topTen,aes(x= Name, y= Rating))+
  geom_point(stat="identity")

 ggplot(data=topTenofVM,aes(x= Name, y= `Vedat Milor's Rate`))+
  geom_point(stat="identity")

raw_data %>% 
  mutate(realValue = ifelse(Rating<`Vedat Milor's Rate`,"Overrated","Undervalued")) %>% ggplot() +
  geom_point(aes(x =Rating, y= `Vedat Milor's Rate`, color = realValue))

As we mentioned before there can be different factors on these rates so it is not proper to say Vedat Milor overrated this restaurants or not. However, in this project with our limited data we can easily say it.Number of undervalued restaurant are higer than the overrated, we can easily think the reason can be Vedat Milor is a gourmet and his taste is more sensitive etc.

Later, we found the Lat-Long coordinates of 5 best restaurants in order to Vedat Milor’s ratings. With the coordinates of these restaurants, we created a word map. Red points are showing the location of top five restaurants of Vedat Milor.

place.x <- c(14.364706, 84.388067, 19.929652, 4.835775, 0.684870)
place.y <- c(41.178419, 33.749063, 39.736833, 45.764055, 47.394185) 

mp <- NULL
mapWorld <- borders("world", colour="gray50", fill = "white")
mp <- ggplot() + mapWorld

mp <- mp+ geom_point(aes(x=place.x, y=place.y) ,color="red", size=3) 
mp

With counting countries, we identidied that 31 Turkish restaurants in this data which have been recommended by Vedat Milor. This is normal that Turkey is the first place, because he is living in Turkey and it is much more easy to go to a restaurant and taste. Italy and French are following Turkey and as we mentioned before, these are Vedat Milor’s favorite kitchens.

raw_data %>%
  count(Country) %>%
  arrange(desc(n)) %>%
  slice (1:6)
## # A tibble: 6 x 2
##   Country           n
##   <chr>         <int>
## 1 Turkey           31
## 2 Italy            16
## 3 France           14
## 4 Spain            13
## 5 United States    12
## 6 Greece            5

While using data, we identified that Vedat Milor mostly go Italy, France, America, Turkey and Spain for tasting and checking restaurants. And here, we show these contries on word map.

theCountries <- c("TUR","ITA","FRA","ESP","USA")

malIDF <- data.frame(country = c("TUR","ITA","FRA","ESP","USA"), Countries = c(1,1,1,1,1))

malMAP <- joinCountryData2Map(malIDF, joinCode = "ISO3", nameJoinColumn = "country")
## 5 codes from your data successfully matched countries in the map
## 0 codes from your data failed to match with a country code in the map
## 238 codes from the map weren't represented in your data
mapCountryData(malMAP, nameColumnToPlot = "Countries", catMethod = "categorical",missingCountryCol = gray(.8), colourPalette = c("red", "blue", "yellow","pink","green"))
## Warning in rwmGetColours(colourPalette, numColours): 5 colours specified
## and 1 required, using interpolation to calculate colours

And finally we created a radar chart of price range of restaurants. In foursquare, price ranges have been classified with signs from 1 to 4. one is cheapest and four is the most expensive one. We analyzed the amount of restaurant for each price range and determined that Vedat Milor mostly go mid and high level restaurant but he also prefer cheap ones.

library(radarchart)
library(fmsb)
 
# Create data: note in High school for Jonathan:
raw_data=as.data.frame(matrix( sample( 1:4 , 4 , replace=T) , ncol=4))
colnames(raw_data)=c("Ex1" , "Ex2" , "Ex3" , "Ex4" )
 
# To use the fmsb package, I have to add 2 lines to the dataframe: the max and min of each topic to show on the plot!
raw_data=rbind(rep(10,1) , rep(0,10) , raw_data)
 
# The default radar chart proposed by the library:
radarchart(raw_data)

# Custom the radarChart !
radarchart(raw_data  , axistype=1 , 
 
    #custom polygon
    pcol=rgb(0.16,0.29,0.48,0.05) , pfcol=rgb(0.16,0.29,0.5,0.05) , plwd=4 , 
 
    #custom the grid
    cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(2,3,4), cglwd=0.8,
 
    #custom labels
    vlcex=0.8)

CONCLUSION

In this project, we firsly created a data set from Vedat Milor’s articles, websites and restaurant information from foursquare. After obtaining clear data set, we started to working on it. Checked Vedat Milor’s desicions of food, restaurants, countries and his rating. We also compere these rates with Foursquare users’. At the end Vedat Milor is a gourmet and we respect his recommendations.