Intro, Dataset

In this study, an exploratory data analysis will be made about July 2020’s electricity prices using the dataset which is accessible at EPIAS/EXIST MCP/SMP Page.

library(dplyr)
library(ggplot2)
library(reshape2)

df <- readxl::read_xls('ptf-smf.xls')

Let’s investigate the summary of the dataset by glimpse.

glimpse(df)
## Rows: 744
## Columns: 6
## $ Date                                <chr> "01.07.20 00:00", "01.07.20 01:00…
## $ MCP                                 <chr> "323.85", "326.95", "324.31", "32…
## $ SMP                                 <chr> "211.00", "201.00", "211.00", "21…
## $ `Positive Imbalance Price (TL/MWh)` <chr> "204.67", "194.97", "204.67", "20…
## $ `Negative Imbalance Price (TL/MWh)` <chr> "333.57", "336.76", "334.04", "33…
## $ `SMP Direction`                     <chr> "↓Energy Surplus", "↓Energy Surpl…

It seems that we need to make datetime operations to utilize date columns. Also, price columns should be converted to numeric since they are in string as is and columns should be renamed to make them easily callable.

cnames <- c("date", "mcp", "smp", "pip", "nip", "direction")
colnames(df) <- cnames

df$date2 <- strptime(df$date,format="%d.%m.%y %H:%M")
df$hour <- lubridate::hour(df$date2)
df$day <- lubridate::day(df$date2)

df$mcp <- as.numeric(df$mcp)
df$smp <- as.numeric(df$smp)
df$pip <- as.numeric(df$pip)
df$nip <- as.numeric(df$nip)

After this quick preprocess, let’s see some sample rows from top and bottom of the dataset.

head(df)
## # A tibble: 6 x 9
##   date         mcp   smp   pip   nip direction   date2                hour   day
##   <chr>      <dbl> <dbl> <dbl> <dbl> <chr>       <dttm>              <int> <int>
## 1 01.07.20 …  324.   211  205.  334. ↓Energy Su… 2020-07-01 00:00:00     0     1
## 2 01.07.20 …  327.   201  195.  337. ↓Energy Su… 2020-07-01 01:00:00     1     1
## 3 01.07.20 …  324.   211  205.  334. ↓Energy Su… 2020-07-01 02:00:00     2     1
## 4 01.07.20 …  322.   211  205.  332. ↓Energy Su… 2020-07-01 03:00:00     3     1
## 5 01.07.20 …  320    201  195.  330. ↓Energy Su… 2020-07-01 04:00:00     4     1
## 6 01.07.20 …  286.   181  176.  295. ↓Energy Su… 2020-07-01 05:00:00     5     1
tail(df)
## # A tibble: 6 x 9
##   date        mcp   smp   pip   nip direction    date2                hour   day
##   <chr>     <dbl> <dbl> <dbl> <dbl> <chr>        <dttm>              <int> <int>
## 1 31.07.20…  311.  206.  200.  320. ↓Energy Sur… 2020-07-31 18:00:00    18    31
## 2 31.07.20…  318.  260   252.  327. ↓Energy Sur… 2020-07-31 19:00:00    19    31
## 3 31.07.20…  325.  205   199.  335. ↓Energy Sur… 2020-07-31 20:00:00    20    31
## 4 31.07.20…  326.  267.  259.  336. ↓Energy Sur… 2020-07-31 21:00:00    21    31
## 5 31.07.20…  325.  340.  315.  350. ↑ Energy De… 2020-07-31 22:00:00    22    31
## 6 31.07.20…  322.  210   204.  331. ↓Energy Sur… 2020-07-31 23:00:00    23    31

Exploratory Analyses

Let’s compare MCP and SMP values at midday.

ggplot(melt(filter(df, hour == 12)[, c("day", "mcp", "smp")], id=c("day")), aes(x = day, y = value, group = variable)) + 
    geom_line(aes(color = variable, linetype = variable)) + 
    scale_color_manual(values = c("darkred", "steelblue")) + 
    ggtitle("MCP vs. SMP - Midday") +
    theme(axis.text.x = element_text(angle = 55, vjust = 0.5, hjust=1)) +
  labs(x="Days in July", y="Price")

Let’s also see this comparison at day ends. It seems that MCP flattens out late in the day.

ggplot(melt(filter(df, hour == 23)[, c("day", "mcp", "smp")], id=c("day")), aes(x = day, y = value, group = variable)) + 
    geom_line(aes(color = variable, linetype = variable)) + 
    scale_color_manual(values = c("darkred", "steelblue")) + 
    ggtitle("MCP vs. SMP - End of the Days") +
    theme(axis.text.x = element_text(angle = 55, vjust = 0.5, hjust=1)) +
  labs(x="Days in July", y="Price")

The imbalance prices are calculated as below; - Positive Imbalance Price (TL/MWh) = 0,97min(MCP,SMP) - Negative Imbalance Price (TL/MWh)= 1,03max(MCP,SMP)

Let’s compare them at midday.

ggplot(melt(filter(df, hour == 12)[, c("day", "pip", "nip")], id=c("day")), aes(x = day, y = value, group = variable)) + 
    geom_line(aes(color = variable, linetype = variable)) + 
    scale_color_manual(values = c("darkred", "steelblue")) + 
    ggtitle("Positive vs. Negative Imbalance Prices - Midday") +
    theme(axis.text.x = element_text(angle = 55, vjust = 0.5, hjust=1)) +
  labs(x="Days in July", y="Price")

Similar to previous, let’s see the comparison again at the end of days. It seems that the gap between them is getting larger at the end of days, right?

ggplot(melt(filter(df, hour == 23)[, c("day", "pip", "nip")], id=c("day")), aes(x = day, y = value, group = variable)) + 
    geom_line(aes(color = variable, linetype = variable)) + 
    scale_color_manual(values = c("darkred", "steelblue")) + 
    ggtitle("Positive vs. Negative Imbalance Prices - End of the Days") +
    theme(axis.text.x = element_text(angle = 55, vjust = 0.5, hjust=1)) +
  labs(x="Days in July", y="Price")

At the end let’s also see the breakdown of energy directions in all the days of July.

df$dummy <- 1
dd <- melt(df[, c("day", "direction", "dummy")], id=c("day", "direction"))
df3 <- dd %>% group_by(day, direction) %>% summarise(Periods = sum(value))
## `summarise()` regrouping output by 'day' (override with `.groups` argument)
ggplot(df3, aes(fill=direction, y=Periods, x=day)) + 
    geom_bar(position="stack", stat="identity") +
    scale_fill_viridis(discrete = T) +
    ggtitle("Balance Direction") +
    #theme_ipsum() +
    coord_cartesian(xlim = c(1, 31), ylim = c(0, 24)) +
    theme(axis.text.x = element_text(angle = 55, vjust = 0.5, hjust=1)) +
    xlab("")