The aim of this document is to analyze Istanbul Property Dataset by using dplyr
and ggplot2
mainly. You may find the libraries that used during the analysis in the below code chunk. You can use the button in the right upper corner of this document to show all code chunks or simply just press the code button of every code chunk that you want to see.
library(dplyr)
library(tidyverse)
library(ggplot2)
library(readxl)
library(lubridate)
library(reshape2)
There were some explanations at the end of our Excel file so n_max
parameter was used to limit the number of rows to be read. Tarih
column converted to Date
format by using paste0
and as.Date
functions. All column names changed to make them explanatory. Also we can see that there are some missing values which ggplot
will help us to ignore them.
data<-read_xlsx(file.choose(),n_max = 130)
data$Tarih<-paste0(data$Tarih,as.character("-01"))
data$Tarih<-as.Date(data$Tarih, format="%Y-%m-%d")
data<-rename(data,Tarih=Tarih,
Toplam_Konut=`TP AKONUTSAT1 T40`,
Ipotek= `TP AKONUTSAT2 T40`,
Ilk_El=`TP AKONUTSAT3 T40`,
Ikinci_El=`TP AKONUTSAT4 T40`,
Yabanci=`TP DISKONSAT ISTANBUL`,
Yeni_Endeks=`TP HEDONIKYKFE IST`,
TR10=`TP HKFE02`,
Birim_Fiyat=`TP TCBF02 ISTANBUL`)
head(data)
## # A tibble: 6 x 9
## Tarih Toplam_Konut Ipotek Ilk_El Ikinci_El Yabanci Yeni_Endeks TR10
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2010-01-01 NA NA NA NA NA 35.9 36
## 2 2010-02-01 NA NA NA NA NA 36.6 36.2
## 3 2010-03-01 NA NA NA NA NA 37.4 36.5
## 4 2010-04-01 NA NA NA NA NA 38 36.9
## 5 2010-05-01 NA NA NA NA NA 38 37.1
## 6 2010-06-01 NA NA NA NA NA 37.6 37
## # ... with 1 more variable: Birim_Fiyat <dbl>
sapply(data, function(x) sum(is.na(x)))
## Tarih Toplam_Konut Ipotek Ilk_El Ikinci_El Yabanci
## 0 36 36 36 36 36
## Yeni_Endeks TR10 Birim_Fiyat
## 1 1 1
Below you can see the distribution of each variable. We can see that most of our variables skewed and there is no normally distributed variable. There are some outliers on the high-end of Yabanci_Satis
and it may be the impact of confering citizenship to the foreign people in return of buying property/land.
data %>%
select(Toplam_Konut:Birim_Fiyat) %>%
gather() %>%
ggplot(aes(x=value)) +
geom_histogram(mapping = aes(y=..density..,fill=key), color="darkgray") +
facet_wrap(~ key, scales = "free") +
theme_minimal() +
theme(legend.position = 'none',axis.text.x = element_text(angle=45,hjust=1,vjust=1))+
scale_y_continuous(labels=function(x) format(x, big.mark = ".", decimal.mark = ",", scientific = FALSE))
We can see that Toplam_Konut
is the sum of Ikinci_El
and Ilk_El
. Since Ikinci_El
increased after 2017, we can say that the decrease in Toplam_Konut
after 2018 was affected by Ilk_El
. Also, We can see that Birim_Fiyat
, Yeni_Endeks
and TR10
index increases year by year in a correlated manner.
yearly_data<-data%>%
group_by(Year=year(Tarih)) %>%
summarise_at(vars(Toplam_Konut:Birim_Fiyat), sum, na.rm = FALSE)
plot1<-melt(yearly_data, id.vars="Year")%>%
ggplot(aes(Year,value)) +
geom_point() +
stat_smooth(method="loess") +
facet_wrap(~variable,scales = "free") +theme_minimal()
plot1+
theme(legend.position="top",axis.text.x = element_text(angle=60,hjust=1,vjust=1))+
scale_y_continuous(labels=function(x) format(x, big.mark = ".", decimal.mark = ",", scientific = FALSE))+
scale_x_continuous(breaks = c(2010:2020))