Introduction

This report on July 2020 electricity prices is prepared using RMarkdown, dplyr and ggplot2. The data is available at EPIAS Website.

Electricity markets are designed with “balance” in mind. Balance means electricity production should be equal to electricity consumption as much as possible at all times. Every day, consumers declares next day’s hourly forecast on energy demand. Producers try to predict how much energy would be needed hourly as well. ‘MCP’ represents Market Clearing Price and ‘SMP’ represents System Marginal Price. ‘SMP’ is always higher than ‘MCP’ if system has Energy Deficit, and lower if there is Energy Surplus. Click here for more information about Energy Markets.

Overview of Dataset prices

prices %>% glimpse()
## Rows: 744
## Columns: 6
## $ Date                              <fct> 01.07.20 00:00, 01.07.20 01:00, 0...
## $ MCP.                              <dbl> 323.85, 326.95, 324.31, 322.11, 3...
## $ SMP.                              <dbl> 211.00, 201.00, 211.00, 211.00, 2...
## $ Positive.Imbalance.Price..TL.MWh. <dbl> 204.67, 194.97, 204.67, 204.67, 1...
## $ Negative.Imbalance.Price..TL.MWh. <dbl> 333.57, 336.76, 334.04, 331.77, 3...
## $ SMP.Direction                     <fct> ?Energy Surplus, ?Energy Surplus,...

Data Manipulation

Some rearrangements are required since ‘Date’ is type of <factor>. Only ‘MCP’ and ‘SMP’ columns has been selected and ‘Date’ column has been changed to type of <datetime> as follows. Keep in mind that the data is hourly.

prices <- prices %>% 
  select(Date, MCP., SMP.) %>% 
  mutate(Date = as.POSIXct(Date, format="%d.%m.%y %H:%M"))
prices %>% glimpse()
## Rows: 744
## Columns: 3
## $ Date <dttm> 2020-07-01 00:00:00, 2020-07-01 01:00:00, 2020-07-01 02:00:00...
## $ MCP. <dbl> 323.85, 326.95, 324.31, 322.11, 320.00, 286.21, 210.13, 318.20...
## $ SMP. <dbl> 211.00, 201.00, 211.00, 211.00, 201.00, 181.00, 113.75, 201.00...

Some Analysis

Comparison of MCP vs. SMP between Times of Day

Let’s create a scatterplot of MCP versus SMP prices in July, 2020 and divide hours into 4 times of day which are ‘Early Morning’, ‘Morning’, ‘Afternoon’ and ‘Evening’. In addition, the graphic has been made more readable by showing the parts of the day in different colors.

As can be clearly seen from the graph below, while both prices are generally low in the early of the day, they are increasing considerably towards the end of the day.

plot_comparison <- prices %>% 
  mutate(times_of_day = case_when(
    lubridate::hour(Date) >= 1 & lubridate::hour(Date) <= 6 ~ "01:00-06:00 Early Morning",
    lubridate::hour(Date) >= 7 & lubridate::hour(Date) <= 12 ~ "07:00-12:00 Morning",
    lubridate::hour(Date) >= 13 & lubridate::hour(Date) <= 18 ~ "13:00-18:00 Afternoon",
    lubridate::hour(Date) >= 19 & lubridate::hour(Date) <= 23  ~ "19:00-00:00 Evening",
    lubridate::hour(Date) == 0 ~ "19:00-00:00 Evening")) %>%
  select(times_of_day, MCP., SMP.)

ggplot(plot_comparison) + 
  geom_point(aes(x = MCP., y = SMP., color=times_of_day)) + 
  xlim(100,400) + 
  ylim(0,500) + 
  theme_minimal() + 
  labs(x="MCP", y="SMP", color="Times of Day", title = "Comparison: MCP vs SMP", subtitle = "Price comparison in July 2020, times/parts of day") + 
  theme(legend.position="right",axis.text.x = element_text(angle=45,hjust=1,vjust=1))

Proportion of Energy Direction (Surplus/Deficit)

In cases where MCP is greater than SMP, there is an surpluss of energy, ie. the estimated amount of energy is higher than the actual required amount of energy.

In cases where SMP is greater than MCP, there is an deficit of energy, ie. the estimated amount of energy is lower than the actual required amount of energy.

Considering July 2020, there is an energy surplus in 25% cases, while there is energy deficit in 75% cases. A higher energy deficit is obviously not desirable. In such cases, generating the necessary extra energy is challenging for the producers and also costly to the consumers.

plot_energy_direction <- prices %>% 
  summarise(Surplus = sum(MCP.>SMP.)/nrow(prices), Deficit = sum(MCP.<=SMP.)/nrow(prices)) %>% 
  pivot_longer(cols = c(Surplus, Deficit), names_to = "Energy_Direction")

ggplot(plot_energy_direction, aes(x = "", y = value, fill = Energy_Direction)) + 
  geom_col() +
  coord_polar(theta = "y") +
  geom_text(aes(label = scales::percent(round(value,2))), position = position_stack(vjust = 0.5)) + 
  theme_void() + 
  theme(plot.title = element_text(vjust = 0.5)) + 
  ggtitle("Pie Chart of Energy Direction (Surplus/Deficit) of July,2020", subtitle = "Data Source: EPIAS")

Energy Direction (Surplus/Deficit) over July,2020

Now, we all know that the energy deficit proportion is 75%. We can use the bar chart below in order to see easily when these shortages happened during July 2020. We see that there is a deficit of energy at the beginning and end of the month. Also, we do not see any energy deficit between July 7 - 9. However, we can easily say that there is a shortage in other times.

plot_MCP_SMP_hourly <- prices %>% 
  mutate(EnergyDirection = ifelse(MCP.>SMP.,"Surplus","Deficit")) %>%
  mutate(maxvalue = case_when(
    MCP.>SMP. ~ MCP.,
    MCP.<=SMP. ~ SMP.)) %>% 
  pivot_longer(EnergyDirection) 

ggplot(plot_MCP_SMP_hourly, aes(x=Date, y=maxvalue, fill=value)) + 
  geom_bar(stat="identity") +
  theme_test() +
  labs(title = "Energy Direction (Surplus/Deficit) over July,2020", subtitle = "Data Source: EPIAS", y="Price (TL/MWh)", value="Energy Direction")

Daily Average ‘MCP’ and ‘SMP’ Prices

First, the data is grouped by day in order to find daily average prices. After reaching necessary information, we can draw the change of daily average prices as line graph.

Looking at the line chart below, the both prices are generally in equilibrium, with a movement in one affecting the other. However, we can say that MCP is more stable or its standard deviation is lower and SMP is more fragile. For both prices, they had a higher averages at the beginning of the month, while they fell slightly at the end of the month. They also appear to be in an uptrend, albeit small, which can be seen in more detail in the next analysis.

plot_daily_average <- prices %>% 
  group_by(day = lubridate::day(Date)) %>% 
  summarise(mean_MCP = mean(MCP.), mean_SMP = mean(SMP.)) %>% 
  ungroup() %>% 
  select(day, mean_MCP, mean_SMP) %>%
  pivot_longer(cols=c(mean_MCP, mean_SMP), names_to="PriceType")

ggplot(plot_daily_average, aes(x=day, y=value, color=PriceType)) + 
  geom_line() + 
  theme_test() + 
  labs(x="Days of July,2020", y="Price (TL/MWh)", title = "Daily Average Price Comparison: MCP vs SMP", subtitle = "Data Source: EPIAS") + 
  theme(legend.position="right")

Average Hourly Percentage Change (The Trend)

The average hourly percentage change of MCP and SMP in July,2020 can be found as follows. (e.g. Suppose hourly price is 300 in hour 1 and 320 in hour 2, then the change should be (320-300)/300 = 0.07)

As mentioned before, it seems there are slight upward trends.

prices %>%
  mutate(lag_MCP = lag(MCP.,1), lag_SMP = lag(SMP.,1)) %>%
  mutate(perc_change_MCP = (MCP. - lag_MCP)/lag_MCP, perc_change_SMP = (SMP. - lag_SMP)/lag_SMP) %>%
  summarise(average_change_MCP = mean(perc_change_MCP, na.rm = TRUE), average_change_SMP = mean(perc_change_SMP, na.rm = TRUE)) %>%
  select(average_change_MCP, average_change_SMP)
##   average_change_MCP average_change_SMP
## 1        0.005842365         0.02357685

Boxplot of Prices

Boxplot is very effective for gaining insight into the distributions of multiple data.

Looking at the Boxplot below, the distribution of prices cannot be said to be similar. The standard deviation of MCP is quite low while SMP is high. On the other hand, MCP has many outliers. The median of SMP is above the MCP. While SMP was reached higher and lower prices, MCP advanced more stable.

boxplot <- prices %>%
  pivot_longer(cols = c(MCP.,SMP.), names_to = "PriceType")

ggplot(boxplot, aes(x=PriceType, y=value, fill=PriceType)) +
  geom_boxplot() +
  theme_test() +
  labs(title = "Boxplot of MCP and SMP", subtitle = "Data Source: EPIAS", y = "Price (TL/MWh)")

Average Prices of Weekdays

Analyzing the days of the week will be important in understanding the data more accurately. Visualizing this with the bar chart will help us. We can also see average prices in the charts.

Looking at the bar chart below, we can say that SMP watched the lowest price on Wednesday while watching a decrease in general on weekdays. MCP, on the other hand, is more stable on weekdays and is around 300 TL / MWh. At the weekend, both prices followed different patterns in contrast to the weekdays. SMP is considerably higher than MCP on the weekends.

plot_weekday <- prices %>%
  mutate(weekday = wday(Date, label = TRUE, week_start = 1)) %>%
  group_by(weekday) %>%
  summarise(mean_MCP = mean(MCP.), mean_SMP = mean(SMP.)) %>%
  ungroup() %>%
  pivot_longer(cols=c(mean_MCP, mean_SMP), names_to="PriceType")

ggplot(plot_weekday, aes(x=weekday, y=value, fill=PriceType)) +
            theme_minimal() +
            geom_bar(stat="identity", position="dodge") +
            labs(title = "Avg. prices of weekday in July,2020", subtitle = "Data Source: EPIAS", x="Weekday", y="Price (TL/MWh)") +
            geom_text(aes(label = format(value,digits=3)), size=4, position = position_dodge(0.9),vjust = 10) +
            theme(axis.text.x = element_text(angle = 45), legend.position = "right")

Overview of Daily ‘MCP’ (min,mean,max,range)

First, the data is grouped by day in order to find daily min, average and max prices. We can here also calculate the daily range (max-min). After reaching necessary information, we can draw the all changes of daily prices as line graph.

plot_MCP <- prices %>% 
  group_by(day = lubridate::day(Date)) %>% 
  summarise(min_MCP = min(MCP.), mean_MCP = mean(MCP.), max_MCP = max(MCP.)) %>% 
  ungroup() %>% 
  mutate(range_MCP = max_MCP - min_MCP) %>% 
  select(day, min_MCP, mean_MCP, max_MCP, range_MCP) 

plot_MCP
## # A tibble: 31 x 5
##      day min_MCP mean_MCP max_MCP range_MCP
##    <int>   <dbl>    <dbl>   <dbl>     <dbl>
##  1     1    210.     320.    332.     122. 
##  2     2    199.     298.    322.     122. 
##  3     3    293      317.    331.      37.6
##  4     4    238.     306.    314.      76.2
##  5     5    174.     258.    306.     132. 
##  6     6    165.     290.    327.     162. 
##  7     7    222.     307.    350.     128. 
##  8     8    200.     304.    321.     121. 
##  9     9    197.     288.    317.     120. 
## 10    10    199.     282.    306.     107. 
## 11    11    192.     273.    306.     114. 
## 12    12    197.     261.    306.     110. 
## 13    13    140.     289.    311.     171. 
## 14    14    194.     281.    310.     116. 
## 15    15    197.     292.    316.     119. 
## 16    16    222.     306.    327.     105. 
## 17    17    199.     303.    320.     122. 
## 18    18    195.     296.    316.     122. 
## 19    19    195.     292.    320.     125. 
## 20    20    250.     313.    335.      85.0
## 21    21    198.     297.    316.     118. 
## 22    22    202.     304.    322.     121. 
## 23    23    222.     304.    325.     103. 
## 24    24    245.     315.    350.     105. 
## 25    25    193.     292.    313.     119. 
## 26    26    200.     290.    319.     119. 
## 27    27    229.     316.    350      121. 
## 28    28    233.     310.    325.      91.6
## 29    29    233.     314.    350.     117. 
## 30    30    211.     288.    312.     101. 
## 31    31    196.     280.    326.     130.

Looking at the line chart below, we can say that the average price is closer to the maximum price compared to the minimum price. The highest of minimum price is on July 3. The lowest price observed was observed on July 13. The average and maximum price fluctuates less. Average MCP price varied around 300 TL / MWh during July, 2020. The highest price was observed in a few days. The specific date-time of the minimum and maximum observed has been found in the next analysis.

plot_MCP %>% 
  select(day, min_MCP, mean_MCP, max_MCP) %>% 
  pivot_longer(.,-day) %>% 
  ggplot(.,aes(x=day, y=value, color=name)) + 
  geom_line() + 
  theme_test() + 
  labs(x="Days of July,2020", y="Price (TL/MWh)", color="Labels", title = "Overview of 'MCP'", subtitle = "Data Source: EPIAS") + 
  theme(legend.position="right")

Which datetime/s give the min or max ‘MCP’?

We can also investigate the datetimes which gives the min or max ‘MCP’ by following codes.

For min ‘MCP’:

prices %>%
  mutate(the_rank = rank(MCP., ties.method = "random")) %>%
  # There is only one datetime gives the min 'MCP', so the_rank == 1.
  filter(the_rank == 1) %>%
  select(Date, MCP.)
##                  Date   MCP.
## 1 2020-07-13 06:00:00 140.01

For max ‘MCP’:

prices %>%
  mutate(the_rank = rank(-MCP., ties.method = "random")) %>%
  # There are three datetimes give the max 'MCP', so the_rank < 3.
  filter(the_rank <= 3) %>%
  select(Date, MCP.)
##                  Date MCP.
## 1 2020-07-27 14:00:00  350
## 2 2020-07-27 15:00:00  350
## 3 2020-07-27 16:00:00  350

Overview of Daily ‘SMP’ (min,mean,max,range)

The data is grouped by day in order to find daily min, average and max prices. We can here also calculate the daily range (max-min). After reaching necessary information, we can draw the all changes of daily prices as line graph.

plot_SMP <- prices %>% 
  group_by(day = lubridate::day(Date)) %>% 
  summarise(min_SMP = min(SMP.), mean_SMP = mean(SMP.), max_SMP = max(SMP.)) %>% 
  ungroup() %>% 
  mutate(range_SMP = max_SMP - min_SMP) %>% 
  select(day, min_SMP, mean_SMP, max_SMP, range_SMP) 

plot_SMP
## # A tibble: 31 x 5
##      day min_SMP mean_SMP max_SMP range_SMP
##    <int>   <dbl>    <dbl>   <dbl>     <dbl>
##  1     1    114.     268.    377.     264. 
##  2     2    229.     357.    404.     175. 
##  3     3    335.     373.    460      125. 
##  4     4    235      321.    365      130  
##  5     5    120      198.    266.     146. 
##  6     6    156.     307.    367.     211. 
##  7     7    179      298.    353.     174. 
##  8     8    151.     191.    235.      84.1
##  9     9    124.     189.    207.      83.7
## 10    10    177.     292.    344.     167. 
## 11    11    232      309.    346      114  
## 12    12    200.     283.    346.     147. 
## 13    13    140.     272.    340.     199. 
## 14    14    200.     302.    369.     169. 
## 15    15    232.     337.    386.     154. 
## 16    16    209.     312.    362.     153. 
## 17    17    199.     263.    320.     122. 
## 18    18    205.     308.    353.     149. 
## 19    19    222.     325.    360      138. 
## 20    20    203.     335.    385      182. 
## 21    21    208.     323.    361      153. 
## 22    22    201.     286.    350.     149. 
## 23    23    222.     331.    365.     143. 
## 24    24    245.     346.    389.     144. 
## 25    25    203.     305.    333.     129. 
## 26    26    266.     350.    419.     153. 
## 27    27    244.     353.    435      191. 
## 28    28    212.     335.    374.     161. 
## 29    29    209.     296.    365      156. 
## 30    30    246.     315.    362.     116. 
## 31    31     10      199.    340.     330.

Looking at the line chart below, we can say that the average price is closer to the maximum price compared to the minimum price. The highest of minimum price is on July 3. The lowest price observed was observed on July 31 with a very low price. Average MCP price varied around 300 TL / MWh but with a higher standard deviation during July, 2020. The highest price was observed also on July 3. The specific date-time of the minimum and maximum observed has been found in the next analysis.

plot_SMP %>% 
  select(day, min_SMP, mean_SMP, max_SMP) %>% 
  pivot_longer(.,-day) %>% 
  ggplot(.,aes(x=day, y=value, color=name)) + 
  geom_line() + 
  theme_test() + 
  labs(x="Days of July,2020", y="Price (TL/MWh)", color="Labels", title = "Overview of 'SMP'", subtitle = "Data Source: EPIAS") + theme(legend.position="right")

Which datetime/s give the min or max ‘SMP’?

We can also investigate the datetimes which gives the min or max ‘SMP’ by following codes.

For min ‘SMP’:

prices %>%
  mutate(the_rank = rank(SMP., ties.method = "random")) %>%
  # There are three datetime gives the min 'SMP', so the_rank <= 3.
  filter(the_rank <= 3) %>%
  select(Date, SMP.)
##                  Date SMP.
## 1 2020-07-31 10:00:00   10
## 2 2020-07-31 11:00:00   10
## 3 2020-07-31 12:00:00   10

For max ‘SMP’:

prices %>%
  mutate(the_rank = rank(-SMP., ties.method = "random")) %>%
  # There are two datetime gives the max 'SMP', so the_rank <= 2.
  filter(the_rank <= 2) %>%
  select(Date, SMP.)
##                  Date SMP.
## 1 2020-07-03 14:00:00  460
## 2 2020-07-03 15:00:00  460

References

End of the Assignment 2. Visit my Progress Journal