First Look at Dataset

After downloading the dataset from EPİAŞ official website Understanding the columns and dataset process began.

Preparation was as follows, my table had non character values as column names due to language detection problem. So I changed the local language by using Sys.setlocale() command.

library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)

Electricity_Data <- read.csv("C:/Users/cbilg/Documents/R/BDA503/Electricity_Assignment.csv")
Sys.setlocale(locale="Turkish_Turkey.1254")

To see the main elements of data, I used glimpse() command. As result I saw some of my values seem like they are chars instead of numeric. Also my time stamp was inproper. So I started to adjust to table to understand better.

glimpse(Electricity_Data)
## Rows: 720
## Columns: 6
## $ Tarih                               <chr> "01.09.20 00:00", "01.09.20 01:...
## $ PTF                                 <chr> "302,39", "300,25", "292,64", "...
## $ SMF                                 <chr> "332,39", "325,25", "317,64", "...
## $ Pozitif.Dengesizlik.Fiyatı..TL.MWh. <chr> "293,32", "291,24", "283,86", "...
## $ Negatif.Dengesizlik.Fiyatı..TL.MWh. <chr> "342,36", "335,01", "327,17", "...
## $ SMF.Yön                             <chr> "? Enerji Açığı", "? Enerji Açı...

Creating Adjusted Table

While keeping old table, I created new table named Electricity_Data_Adjusted with the changes I wanted to make.

Column names

I changed two of the column names to make it easier to read and use in codes.

Electricity_Data_Adjusted <- 
  Electricity_Data %>% rename(poz_deng_fiyat = Pozitif.Dengesizlik.Fiyatı..TL.MWh.) %>% 
  rename(neg_den_fiyat = Negatif.Dengesizlik.Fiyatı..TL.MWh.)

Date formant

Electricity_Data_Adjusted$Tarih <- gsub(pattern = "\\.","-",Electricity_Data_Adjusted$Tarih)
Electricity_Data_Adjusted$Tarih <- as.POSIXct(Electricity_Data_Adjusted$Tarih,format = "%d-%m-%y %H:%M")

Column types

Electricity_Data_Adjusted$PTF <- gsub(pattern = ",",".",Electricity_Data_Adjusted$PTF)
Electricity_Data_Adjusted$PTF <- as.double(Electricity_Data_Adjusted$PTF)

Electricity_Data_Adjusted$SMF <- gsub(pattern = ",",".",Electricity_Data_Adjusted$SMF)
Electricity_Data_Adjusted$SMF <- as.double(Electricity_Data_Adjusted$SMF)

Electricity_Data_Adjusted$poz_deng_fiyat <- gsub(pattern = ",",".",Electricity_Data_Adjusted$poz_deng_fiyat)
Electricity_Data_Adjusted$poz_deng_fiyat <- as.double(Electricity_Data_Adjusted$poz_deng_fiyat)

Electricity_Data_Adjusted$neg_den_fiyat <- gsub(pattern = ",",".",Electricity_Data_Adjusted$neg_den_fiyat)
Electricity_Data_Adjusted$neg_den_fiyat <- as.double(Electricity_Data_Adjusted$neg_den_fiyat)

Summary of the final table

Finally, adjusted table’s key metrics are as follows:

summary(Electricity_Data_Adjusted)
##      Tarih                          PTF             SMF        poz_deng_fiyat 
##  Min.   :2020-09-01 00:00:00   Min.   :198.4   Min.   :129.0   Min.   :125.1  
##  1st Qu.:2020-09-08 11:45:00   1st Qu.:292.7   1st Qu.:275.0   1st Qu.:258.0  
##  Median :2020-09-15 23:30:00   Median :305.1   Median :320.0   Median :293.8  
##  Mean   :2020-09-15 23:30:00   Mean   :308.2   Mean   :321.1   Mean   :286.8  
##  3rd Qu.:2020-09-23 11:15:00   3rd Qu.:314.9   3rd Qu.:351.2   3rd Qu.:304.8  
##  Max.   :2020-09-30 23:00:00   Max.   :982.0   Max.   :982.0   Max.   :952.5  
##                                                NA's   :1                      
##  neg_den_fiyat     SMF.Yön         
##  Min.   :204.7   Length:720        
##  1st Qu.:305.4   Class :character  
##  Median :330.3   Mode  :character  
##  Mean   :341.8                     
##  3rd Qu.:363.8                     
##  Max.   :824.0                     
##  NA's   :3

Analysis of Dataset

As I looked into the table, energy need of summarized in three categorical value under the SMF.Yön column. SMF.Yöngrouped summarized values were needed. We can see when the energy imbalance is zero average PTF and SMF have the same value.

Electricity_Data_Adjusted %>%
  group_by(SMF.Yön) %>%
  summarize(tot_SMF = sum(SMF, na.rm=T), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat), 
            tot_NegDenge = sum(neg_den_fiyat, na.rm=T), avg_SMF =mean(SMF, na.rm=T), avg_PTF =mean(PTF))
## # A tibble: 3 x 7
##   SMF.Yön         tot_SMF tot_PTF tot_PozDenge tot_NegDenge avg_SMF avg_PTF
##   <chr>             <dbl>   <dbl>        <dbl>        <dbl>   <dbl>   <dbl>
## 1 ? Dengede        10367.  10367.       10056.       10678.    273.    273.
## 2 ? Enerji Açığı  189382. 171371.      166230.      193041.    347.    313.
## 3 ?Enerji Fazlası  31111.  40169.       30178.       41374.    230.    298.

Examining the dataset daily basis

Using Lubridate library, I created daily total table for dataset.

Electricity_Data_Daily2 <- Electricity_Data_Adjusted %>%
  group_by(date = lubridate::date(Tarih)) %>%
  summarize(tot_SMF = sum(SMF, na.rm=T), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat), tot_NegDenge = sum(neg_den_fiyat),
            avg_SMF = mean(SMF, na.rm=T), avg_PTF = mean(PTF))

Daily key metrics’ visualization

This graph is showing us the daily average SMF value change over time.

ggplot(Electricity_Data_Daily2, aes(x=date, y=avg_SMF))+
  geom_line()+
  labs(title="Daily Avg. SMF Graph",x ="Date", y = "Avg. SMF")  

This graph is showing us the daily average PTF value change over time.

ggplot(Electricity_Data_Daily2, aes(x=date, y=avg_PTF))+
  geom_line()+
  labs(title="Daily Avg. PTF Graph",x ="Date", y = "Avg. PTF") 

Analyzing daily values grouped by SMF.Yön

Daily values grouped by SMF.Yönand date.

Electricity_Data_Daily <- Electricity_Data_Adjusted %>%
  group_by(date = lubridate::date(Tarih), SMF.Yön) %>%
  summarize(tot_SMF = sum(SMF), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat), tot_NegDenge = sum(neg_den_fiyat),
            avg_SMF = mean(SMF), avg_PTF = mean(PTF))

Bar plot of Energy Status on a daily basis to see their frequencies in September.

ggplot(Electricity_Data_Daily, aes(x=SMF.Yön, fill=SMF.Yön))+
  geom_histogram(stat = "count")+
  labs(title="Electricity Status Histogram",x ="Status", y = "Total Count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Examining the dataset hourly basis

Examining the data set hourly basis:

Electricity_Data_Hourly <- Electricity_Data_Adjusted %>%
  group_by(hour = lubridate::hour(Tarih)) %>%
  summarize(tot_SMF = sum(SMF, na.rm=T), tot_PTF = sum(PTF), tot_PozDenge = sum(poz_deng_fiyat), tot_NegDenge = sum(neg_den_fiyat),
            avg_SMF = mean(SMF, na.rm=T), avg_PTF = mean(PTF))

Hourly key metrics’ visualization

This graph is showing us the hourly average SMF value change over time.

ggplot(Electricity_Data_Hourly, aes(x=hour, y=avg_SMF))+
  geom_line()+
  labs(title="Hourly Avg. SMF Graph",x ="Hour", y = "Avg. SMF")  

This graph is showing us the hourly average PTF value change over time.

ggplot(Electricity_Data_Hourly, aes(x=hour, y=avg_PTF))+
  geom_line()+
  labs(title="Hourly Avg. PTF Graph",x ="Hour", y = "Avg. PTF") 

This graph shows us comparison between hourly avg. SMF and hourly avg. PTF in a scatter plot:

ggplot(Electricity_Data_Hourly, aes(x=avg_SMF, y=avg_PTF))+
  geom_point()+
  labs(title="SMF vs. PTF Scatter",x ="Avg. Hourly SMF", y = "Avg. Hourly PTF")