This is an analysis of electricity prices in September 2020, Turkey.
First of all, the data set was imported and looked out.
Energy <- read_excel("C:/Users/Aybike/Desktop/Energy.xlsx")
energy_data <- Energy
glimpse(energy_data)
## Rows: 720
## Columns: 6
## $ Tarih <dttm> 2020-09-01 00:00:00, 2020-09...
## $ `PTF/MCP` <dbl> 302.39, 300.25, 292.64, 290.0...
## $ `SMF/SMP` <dbl> 332.39, 325.25, 317.64, 320.0...
## $ `Pozitif Dengesizlik Fiyatı (TL/MWh)` <dbl> 293.32, 291.24, 283.86, 281.3...
## $ `Negatif Dengesizlik Fiyatı (TL/MWh)` <dbl> 342.36, 335.01, 327.17, 329.6...
## $ `SMF Yön` <chr> "Enerji Açığı", "Enerji Açığı...
Before the analysis, the preprocessing was done. The subset of the data was selected as Tarih, PTF/MCP, SMF/SMP. Besides, the columns of the subset were renamed.
The DateTime column was separated as Dates, WeekDays and Hours. Lastly, the difference between SMP and MCP was calculated. All of them were assigned to mutated_energy_data.
## # A tibble: 720 x 3
## Tarih `PTF/MCP` `SMF/SMP`
## <dttm> <dbl> <dbl>
## 1 2020-09-01 00:00:00 302. 332.
## 2 2020-09-01 01:00:00 300. 325.
## 3 2020-09-01 02:00:00 293. 318.
## 4 2020-09-01 03:00:00 290 320
## 5 2020-09-01 04:00:00 290 330
## 6 2020-09-01 05:00:00 290 339
## 7 2020-09-01 06:00:00 292. 342.
## 8 2020-09-01 07:00:00 295. 345
## 9 2020-09-01 08:00:00 307. 339.
## 10 2020-09-01 09:00:00 315. 346.
## # ... with 710 more rows
renamed_energy_data <- energy_data %>% rename(DateTime= Tarih,MCP=`PTF/MCP`, SMP=`SMF/SMP`) %>% select(DateTime,MCP,SMP)
mutated_energy_data <- renamed_energy_data %>% mutate(Dates = as_date(DateTime)) %>% mutate(Hours = hour(DateTime)) %>% mutate(WeekDays = weekdays(DateTime),Diff_SMP_MCP = SMP - MCP)
After preprocessing, mutated_energy_ data was summarised. The outlier was detected but wasn’t taken action to protect historical integrity. The outlier was calculated with the IQR method, box plot.
## DateTime MCP SMP
## Min. :2020-09-01 00:00:00 Min. :198.4 Min. : 129.0
## 1st Qu.:2020-09-08 11:45:00 1st Qu.:292.7 1st Qu.: 275.0
## Median :2020-09-15 23:30:00 Median :305.1 Median : 320.0
## Mean :2020-09-15 23:30:00 Mean :308.2 Mean : 323.4
## 3rd Qu.:2020-09-23 11:15:00 3rd Qu.:314.9 3rd Qu.: 351.3
## Max. :2020-09-30 23:00:00 Max. :982.0 Max. :2000.0
## Dates Hours WeekDays Diff_SMP_MCP
## Min. :2020-09-01 Min. : 0.00 Length:720 Min. :-180.99
## 1st Qu.:2020-09-08 1st Qu.: 5.75 Class :character 1st Qu.: 0.00
## Median :2020-09-15 Median :11.50 Mode :character Median : 18.25
## Mean :2020-09-15 Mean :11.50 Mean : 15.21
## 3rd Qu.:2020-09-23 3rd Qu.:17.25 3rd Qu.: 40.00
## Max. :2020-09-30 Max. :23.00 Max. :1570.00
boxplot(mutated_energy_data$SMP, mutated_energy_data$MCP, col=c("#d3501d","#008080"),names =c("SMP","MCP"))
Pearson Correlation was calculated. The correlation between the two variables MCP and SMP was 0.684. Besides, the outlier (MCP=430, SMP=2000) was seen again on the left bottom graph.
## [1] 0.6839136
Bar charts of the weekdays were created according to median MCP, median SMP and median of the difference between SMP and MPC. I preferred to use the median because of the outlier.
bar_df_weekdays <- mutated_energy_data %>% group_by(WeekDays) %>% summarise(Median_MCP = median(MCP), Median_SMP= median(SMP), Median_Diff_SMP_MCP = median(Diff_SMP_MCP))
bar_df_weekdays$WeekDays <- factor(bar_df_weekdays$WeekDays,levels = c("Pazartesi", "Salı", "Çarşamba", "Perşembe","Cuma","Cumartesi", "Pazar"))
SMP_bar <- ggplot(bar_df_weekdays, aes(x=WeekDays, y=Median_SMP)) +
geom_bar(stat="identity", position="dodge",fill="#d3501d", colour="black") +
labs(x = "WeekDays",
y = "Median_SMP")
MCP_bar <- ggplot(bar_df_weekdays, aes(x=WeekDays, y=Median_MCP)) +
geom_bar(stat="identity", position="dodge",fill="#008080", colour="black") +
labs(x = "WeekDays",
y = "Median_MCP")
Diff_bar <- ggplot(bar_df_weekdays, aes(x=WeekDays, y=Median_Diff_SMP_MCP)) +
geom_bar(stat="identity", position="dodge",fill="#f5f5dc", colour="black") +
labs(x = "WeekDays",
y = "Median_Diff_SMP_MCP")
grid.arrange(SMP_bar,MCP_bar,Diff_bar)
On Thursday, SMP had the relatively highest median value. Also, on weekends, SMP had the relatively lowest median value. I thought that it could be because Cumartesi and Pazar are holidays. The firms don’t work at full capacity on those days because of the holiday.
Only on Saturday, the difference between SMP and MCP was negative. It means that MCP was bigger than SMP on that day.
Besides, I thought that lockdown on weekends due to Covid may also have affected the data set.
Bar charts of the hours were created according to median MCP, median SMP and the median of the difference between SMP and MPC. I preferred to use the median because of the outlier.
bar_df_hours <- mutated_energy_data %>% group_by(Hours) %>% summarise(Median_MCP = median(MCP), Median_SMP= median(SMP), Median_Diff_SMP_MCP = median(Diff_SMP_MCP))
bar_df_hours$Hours <- factor(bar_df_hours$Hours,levels = c(0:23))
SMP_bar_hour <- ggplot(bar_df_hours, aes(x=Hours, y=Median_SMP)) +
geom_bar(stat="identity", position="dodge",fill="#d3501d", colour="black") +
labs(x = "Hours",
y = "Median_SMP")
MCP_bar_hour <- ggplot(bar_df_hours, aes(x=Hours, y=Median_MCP)) +
geom_bar(stat="identity", position="dodge",fill="#008080", colour="black") +
labs(x = "Hours",
y = "Median_MCP")
Diff_bar_hour <- ggplot(bar_df_hours, aes(x=Hours, y=Median_Diff_SMP_MCP)) +
geom_bar(stat="identity", position="dodge",fill="#f5f5dc", colour="black") +
labs(x = "Hours",
y = "Median_Diff_SMP_MPC")
grid.arrange(SMP_bar_hour,MCP_bar_hour,Diff_bar_hour)
On the night time, the median of the SMP and MCP has lower values than in the day time. I thought that it could be because most people sleep on those hours and they don’t consume electricity individually. Also, night shifts don’t as crowded as other shifts in most of the firms.
Only at 7 o’clock, the difference between SMP and MCP was negative. It means that MCP was bigger than SMP on that day. I thought that it could be because of the morning shift change time.