After downloading the dataset from EPİAŞ official website Understanding the columns and dataset process began.
Preparation was as follows, my table had non character values as column names due to language detection problem. So I changed the local language by using Sys.setlocale()
command.
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
Electricity_Data <- read.csv("C:/Users/cbilg/Documents/R/BDA503/Electricity_Assignment.csv")
Sys.setlocale(locale="Turkish_Turkey.1254")
To see the main elements of data, I used glimpse()
command. As result I saw some of my values seem like they are chars instead of numeric. Also my time stamp was inproper. So I started to adjust to table to understand better.
glimpse(Electricity_Data)
## Rows: 720
## Columns: 6
## $ Tarih <chr> "01.09.20 00:00", "01.09.20 01:...
## $ PTF <chr> "302,39", "300,25", "292,64", "...
## $ SMF <chr> "332,39", "325,25", "317,64", "...
## $ Pozitif.Dengesizlik.Fiyatı..TL.MWh. <chr> "293,32", "291,24", "283,86", "...
## $ Negatif.Dengesizlik.Fiyatı..TL.MWh. <chr> "342,36", "335,01", "327,17", "...
## $ SMF.Yön <chr> "? Enerji Açığı", "? Enerji Açı...
While keeping old table, I created new table named Electricity_Data_Adjusted
with the changes I wanted to make.
I changed two of the column names to make it easier to read and use in codes.
Electricity_Data_Adjusted <-
Electricity_Data %>% rename(poz_deng_fiyat = Pozitif.Dengesizlik.Fiyatı..TL.MWh.) %>%
rename(neg_den_fiyat = Negatif.Dengesizlik.Fiyatı..TL.MWh.)
Electricity_Data_Adjusted$Tarih <- gsub(pattern = "\\.","-",Electricity_Data_Adjusted$Tarih)
Electricity_Data_Adjusted$Tarih <- as.POSIXct(Electricity_Data_Adjusted$Tarih,format = "%d-%m-%y %H:%M")
Electricity_Data_Adjusted$PTF <- gsub(pattern = ",",".",Electricity_Data_Adjusted$PTF)
Electricity_Data_Adjusted$PTF <- as.double(Electricity_Data_Adjusted$PTF)
Electricity_Data_Adjusted$SMF <- gsub(pattern = ",",".",Electricity_Data_Adjusted$SMF)
Electricity_Data_Adjusted$SMF <- as.double(Electricity_Data_Adjusted$SMF)
Electricity_Data_Adjusted$poz_deng_fiyat <- gsub(pattern = ",",".",Electricity_Data_Adjusted$poz_deng_fiyat)
Electricity_Data_Adjusted$poz_deng_fiyat <- as.double(Electricity_Data_Adjusted$poz_deng_fiyat)
Electricity_Data_Adjusted$neg_den_fiyat <- gsub(pattern = ",",".",Electricity_Data_Adjusted$neg_den_fiyat)
Electricity_Data_Adjusted$neg_den_fiyat <- as.double(Electricity_Data_Adjusted$neg_den_fiyat)
Finally, adjusted table’s key metrics are as follows:
summary(Electricity_Data_Adjusted)
## Tarih PTF SMF poz_deng_fiyat
## Min. :2020-09-01 00:00:00 Min. :198.4 Min. :129.0 Min. :125.1
## 1st Qu.:2020-09-08 11:45:00 1st Qu.:292.7 1st Qu.:275.0 1st Qu.:258.0
## Median :2020-09-15 23:30:00 Median :305.1 Median :320.0 Median :293.8
## Mean :2020-09-15 23:30:00 Mean :308.2 Mean :321.1 Mean :286.8
## 3rd Qu.:2020-09-23 11:15:00 3rd Qu.:314.9 3rd Qu.:351.2 3rd Qu.:304.8
## Max. :2020-09-30 23:00:00 Max. :982.0 Max. :982.0 Max. :952.5
## NA's :1
## neg_den_fiyat SMF.Yön
## Min. :204.7 Length:720
## 1st Qu.:305.4 Class :character
## Median :330.3 Mode :character
## Mean :341.8
## 3rd Qu.:363.8
## Max. :824.0
## NA's :3
As I looked into the table, energy need of summarized in three categorical value under the SMF.Yön
column. SMF.Yön
grouped summarized values were needed. We can see when the energy imbalance is zero average PTF and SMF have the same value.
Electricity_Data_Adjusted %>%
group_by(SMF.Yön) %>%
summarize(tot_SMF = sum(SMF, na.rm=T), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat),
tot_NegDenge = sum(neg_den_fiyat, na.rm=T), avg_SMF =mean(SMF, na.rm=T), avg_PTF =mean(PTF))
## # A tibble: 3 x 7
## SMF.Yön tot_SMF tot_PTF tot_PozDenge tot_NegDenge avg_SMF avg_PTF
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ? Dengede 10367. 10367. 10056. 10678. 273. 273.
## 2 ? Enerji Açığı 189382. 171371. 166230. 193041. 347. 313.
## 3 ?Enerji Fazlası 31111. 40169. 30178. 41374. 230. 298.
Using Lubridate
library, I created daily total table for dataset.
Electricity_Data_Daily2 <- Electricity_Data_Adjusted %>%
group_by(date = lubridate::date(Tarih)) %>%
summarize(tot_SMF = sum(SMF, na.rm=T), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat), tot_NegDenge = sum(neg_den_fiyat),
avg_SMF = mean(SMF, na.rm=T), avg_PTF = mean(PTF))
This graph is showing us the daily average SMF value change over time.
ggplot(Electricity_Data_Daily2, aes(x=date, y=avg_SMF))+
geom_line()+
labs(title="Daily Avg. SMF Graph",x ="Date", y = "Avg. SMF")
This graph is showing us the daily average PTF value change over time.
ggplot(Electricity_Data_Daily2, aes(x=date, y=avg_PTF))+
geom_line()+
labs(title="Daily Avg. PTF Graph",x ="Date", y = "Avg. PTF")
SMF.Yön
Daily values grouped by SMF.Yön
and date.
Electricity_Data_Daily <- Electricity_Data_Adjusted %>%
group_by(date = lubridate::date(Tarih), SMF.Yön) %>%
summarize(tot_SMF = sum(SMF), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat), tot_NegDenge = sum(neg_den_fiyat),
avg_SMF = mean(SMF), avg_PTF = mean(PTF))
Bar plot of Energy Status on a daily basis to see their frequencies in September.
ggplot(Electricity_Data_Daily, aes(x=SMF.Yön, fill=SMF.Yön))+
geom_histogram(stat = "count")+
labs(title="Electricity Status Histogram",x ="Status", y = "Total Count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
Examining the data set hourly basis:
Electricity_Data_Hourly <- Electricity_Data_Adjusted %>%
group_by(hour = lubridate::hour(Tarih)) %>%
summarize(tot_SMF = sum(SMF, na.rm=T), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat), tot_NegDenge = sum(neg_den_fiyat),
avg_SMF = mean(SMF, na.rm=T), avg_PTF = mean(PTF))
This graph is showing us the hourly average SMF value change over time.
ggplot(Electricity_Data_Hourly, aes(x=hour, y=avg_SMF))+
geom_line()+
labs(title="Hourly Avg. SMF Graph",x ="Hour", y = "Avg. SMF")
This graph is showing us the hourly average PTF value change over time.
ggplot(Electricity_Data_Hourly, aes(x=hour, y=avg_PTF))+
geom_line()+
labs(title="Hourly Avg. PTF Graph",x ="Hour", y = "Avg. PTF")
This graph shows us comparison between hourly avg. SMF and hourly avg. PTF in a scatter plot:
ggplot(Electricity_Data_Hourly, aes(x=avg_SMF, y=avg_PTF))+
geom_point()+
labs(title="SMF vs. PTF Scatter",x ="Avg. Hourly SMF", y = "Avg. Hourly PTF")