Electricity Market Analysis with R

This is an analysis of electricity prices in September 2020, Turkey.

First of all, the data set was imported and looked out.

Energy <- read_excel("C:/Users/Aybike/Desktop/Energy.xlsx")
energy_data <- Energy
glimpse(energy_data)
## Rows: 720
## Columns: 6
## $ Tarih                                 <dttm> 2020-09-01 00:00:00, 2020-09...
## $ `PTF/MCP`                             <dbl> 302.39, 300.25, 292.64, 290.0...
## $ `SMF/SMP`                             <dbl> 332.39, 325.25, 317.64, 320.0...
## $ `Pozitif Dengesizlik Fiyatı (TL/MWh)` <dbl> 293.32, 291.24, 283.86, 281.3...
## $ `Negatif Dengesizlik Fiyatı (TL/MWh)` <dbl> 342.36, 335.01, 327.17, 329.6...
## $ `SMF Yön`                             <chr> "Enerji Açığı", "Enerji Açığı...

Preprocessing

Before the analysis, the preprocessing was done. The subset of the data was selected as Tarih, PTF/MCP, SMF/SMP. Besides, the columns of the subset were renamed.

The DateTime column was separated as Dates, WeekDays and Hours. Lastly, the difference between SMP and MCP was calculated. All of them were assigned to mutated_energy_data.

energy_data %>% select(Tarih, `PTF/MCP`,`SMF/SMP`) 
## # A tibble: 720 x 3
##    Tarih               `PTF/MCP` `SMF/SMP`
##    <dttm>                  <dbl>     <dbl>
##  1 2020-09-01 00:00:00      302.      332.
##  2 2020-09-01 01:00:00      300.      325.
##  3 2020-09-01 02:00:00      293.      318.
##  4 2020-09-01 03:00:00      290       320 
##  5 2020-09-01 04:00:00      290       330 
##  6 2020-09-01 05:00:00      290       339 
##  7 2020-09-01 06:00:00      292.      342.
##  8 2020-09-01 07:00:00      295.      345 
##  9 2020-09-01 08:00:00      307.      339.
## 10 2020-09-01 09:00:00      315.      346.
## # ... with 710 more rows
renamed_energy_data <- energy_data %>% rename(DateTime= Tarih,MCP=`PTF/MCP`, SMP=`SMF/SMP`) %>% select(DateTime,MCP,SMP)
mutated_energy_data <- renamed_energy_data %>% mutate(Dates = as_date(DateTime)) %>% mutate(Hours = hour(DateTime)) %>% mutate(WeekDays = weekdays(DateTime),Diff_SMP_MCP = SMP - MCP)

Data Analysis with R

After preprocessing, mutated_energy_ data was summarised. The outlier was detected but wasn’t taken action to protect historical integrity. The outlier was calculated with the IQR method, box plot.

summary(mutated_energy_data )
##     DateTime                        MCP             SMP        
##  Min.   :2020-09-01 00:00:00   Min.   :198.4   Min.   : 129.0  
##  1st Qu.:2020-09-08 11:45:00   1st Qu.:292.7   1st Qu.: 275.0  
##  Median :2020-09-15 23:30:00   Median :305.1   Median : 320.0  
##  Mean   :2020-09-15 23:30:00   Mean   :308.2   Mean   : 323.4  
##  3rd Qu.:2020-09-23 11:15:00   3rd Qu.:314.9   3rd Qu.: 351.3  
##  Max.   :2020-09-30 23:00:00   Max.   :982.0   Max.   :2000.0  
##      Dates                Hours         WeekDays          Diff_SMP_MCP    
##  Min.   :2020-09-01   Min.   : 0.00   Length:720         Min.   :-180.99  
##  1st Qu.:2020-09-08   1st Qu.: 5.75   Class :character   1st Qu.:   0.00  
##  Median :2020-09-15   Median :11.50   Mode  :character   Median :  18.25  
##  Mean   :2020-09-15   Mean   :11.50                      Mean   :  15.21  
##  3rd Qu.:2020-09-23   3rd Qu.:17.25                      3rd Qu.:  40.00  
##  Max.   :2020-09-30   Max.   :23.00                      Max.   :1570.00
boxplot(mutated_energy_data$SMP, mutated_energy_data$MCP, col=c("#d3501d","#008080"),names =c("SMP","MCP"))

Pearson Correlation was calculated. The correlation between the two variables MCP and SMP was 0.684. Besides, the outlier (MCP=430, SMP=2000) was seen again on the left bottom graph.

cor(mutated_energy_data$MCP,mutated_energy_data$SMP)
## [1] 0.6839136
ggpairs(mutated_energy_data %>% select(MCP,SMP))

Bar charts of the weekdays were created according to median MCP, median SMP and median of the difference between SMP and MPC. I preferred to use the median because of the outlier.

bar_df_weekdays <- mutated_energy_data %>% group_by(WeekDays) %>% summarise(Median_MCP = median(MCP), Median_SMP= median(SMP), Median_Diff_SMP_MCP = median(Diff_SMP_MCP))
bar_df_weekdays$WeekDays <- factor(bar_df_weekdays$WeekDays,levels = c("Pazartesi", "Salı", "Çarşamba", "Perşembe","Cuma","Cumartesi", "Pazar"))  
SMP_bar <- ggplot(bar_df_weekdays, aes(x=WeekDays, y=Median_SMP)) +
  geom_bar(stat="identity", position="dodge",fill="#d3501d", colour="black") +
  labs(x = "WeekDays",
       y = "Median_SMP")
MCP_bar <- ggplot(bar_df_weekdays, aes(x=WeekDays, y=Median_MCP)) + 
  geom_bar(stat="identity", position="dodge",fill="#008080", colour="black") +
  labs(x = "WeekDays",
       y = "Median_MCP")
Diff_bar <- ggplot(bar_df_weekdays, aes(x=WeekDays, y=Median_Diff_SMP_MCP)) +
  geom_bar(stat="identity", position="dodge",fill="#f5f5dc", colour="black") +
  labs(x = "WeekDays",
       y = "Median_Diff_SMP_MCP")
grid.arrange(SMP_bar,MCP_bar,Diff_bar)

  • On Thursday, SMP had the relatively highest median value. Also, on weekends, SMP had the relatively lowest median value. I thought that it could be because Cumartesi and Pazar are holidays. The firms don’t work at full capacity on those days because of the holiday.

  • Only on Saturday, the difference between SMP and MCP was negative. It means that MCP was bigger than SMP on that day.

  • Besides, I thought that lockdown on weekends due to Covid may also have affected the data set.

Bar charts of the hours were created according to median MCP, median SMP and the median of the difference between SMP and MPC. I preferred to use the median because of the outlier.

bar_df_hours <- mutated_energy_data %>% group_by(Hours) %>% summarise(Median_MCP = median(MCP), Median_SMP= median(SMP), Median_Diff_SMP_MCP = median(Diff_SMP_MCP))
bar_df_hours$Hours <- factor(bar_df_hours$Hours,levels = c(0:23))  
SMP_bar_hour <- ggplot(bar_df_hours, aes(x=Hours, y=Median_SMP)) +
  geom_bar(stat="identity", position="dodge",fill="#d3501d", colour="black") +
  labs(x = "Hours",
       y = "Median_SMP")
MCP_bar_hour <- ggplot(bar_df_hours, aes(x=Hours, y=Median_MCP)) +
geom_bar(stat="identity", position="dodge",fill="#008080", colour="black") +
  labs(x = "Hours",
       y = "Median_MCP")
Diff_bar_hour <- ggplot(bar_df_hours, aes(x=Hours, y=Median_Diff_SMP_MCP)) +
  geom_bar(stat="identity", position="dodge",fill="#f5f5dc", colour="black") +
  labs(x = "Hours",
       y = "Median_Diff_SMP_MPC")
grid.arrange(SMP_bar_hour,MCP_bar_hour,Diff_bar_hour)

  • On the night time, the median of the SMP and MCP has lower values than in the day time. I thought that it could be because most people sleep on those hours and they don’t consume electricity individually. Also, night shifts don’t as crowded as other shifts in most of the firms.

  • Only at 7 o’clock, the difference between SMP and MCP was negative. It means that MCP was bigger than SMP on that day. I thought that it could be because of the morning shift change time.

Reference

Data Source

You may click here to reach other items of my progress journal.