AIM

In this report, we will analyze the data of the electricity prices. We used the data from EPIAS. You can download the data from here. In this analysis, we used the data from 01 - 31.07.2020 electricity prices. You can get more information from here

PREPARATION

Importing the Data

After downloading the data, we will import the data to an object called data. After importing the data we can observe the data with glimpse function.

data = read.csv("ptf-smf.csv")
data %>% glimpse()
## Rows: 744
## Columns: 6
## $ Tarih                               <chr> "01.07.20 00:00", "01.07.20 01:...
## $ PTF                                 <chr> "323,85", "326,95", "324,31", "...
## $ SMF                                 <chr> "211,00", "201,00", "211,00", "...
## $ Pozitif.Dengesizlik.Fiyatı..TL.MWh. <chr> "204,67", "194,97", "204,67", "...
## $ Negatif.Dengesizlik.Fiyatı..TL.MWh. <chr> "333,57", "336,76", "334,04", "...
## $ SMF.Yön                             <chr> "?Enerji Fazlası", "?Enerji Faz...

Preprocessing

When we look at the data, the type of the date field is string. So, we need to change the type of that column. Also, we can change the name of the field (translating Turkish to English).

data$PTF = as.numeric(gsub(",", ".", gsub("\\.", "", data$PTF)))
data$SMF = as.numeric(gsub(",", ".", gsub("\\.", "", data$SMF)))
data$Negatif.Dengesizlik.Fiyatı..TL.MWh. = as.numeric(gsub(",", ".", gsub("\\.", "", data$Negatif.Dengesizlik.Fiyatı..TL.MWh.)))
data$Pozitif.Dengesizlik.Fiyatı..TL.MWh. = as.numeric(gsub(",", ".", gsub("\\.", "", data$Pozitif.Dengesizlik.Fiyatı..TL.MWh.)))
data$Tarih = gsub(pattern = "\\.","-",data$Tarih)

data_last = data %>%
  select(Date = Tarih, MCP = PTF, SMP = SMF, NIP = Negatif.Dengesizlik.Fiyatı..TL.MWh., PIP = Pozitif.Dengesizlik.Fiyatı..TL.MWh., SMPDirection = SMF.Yön) %>%
  mutate(DateTime = as.POSIXct(factor(Date), format = "%d-%m-%y %H:%M")) %>%
  mutate(Day = wday(DateTime, week_start = 1), Hour = hour(DateTime), Date = as.Date(Date, format = "%d-%m-%y %H:%M"))

data_last %>% glimpse()
## Rows: 744
## Columns: 9
## $ Date         <date> 2020-07-01, 2020-07-01, 2020-07-01, 2020-07-01, 2020-...
## $ MCP          <dbl> 323.85, 326.95, 324.31, 322.11, 320.00, 286.21, 210.13...
## $ SMP          <dbl> 211.00, 201.00, 211.00, 211.00, 201.00, 181.00, 113.75...
## $ NIP          <dbl> 333.57, 336.76, 334.04, 331.77, 329.60, 294.80, 216.43...
## $ PIP          <dbl> 204.67, 194.97, 204.67, 204.67, 194.97, 175.57, 110.34...
## $ SMPDirection <chr> "?Enerji Fazlası", "?Enerji Fazlası", "?Enerji Fazlası...
## $ DateTime     <dttm> 2020-07-01 00:00:00, 2020-07-01 01:00:00, 2020-07-01 ...
## $ Day          <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
## $ Hour         <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...

At the end of this process, we have 9 columns that also contain the date information.

ANALYSIS

Our data has date information. So, we can group the data and get aggregated information about the electricity prices. We can list these ideas like below:

  • Average MCP over days / hours
  • Max MCP over hours
  • Average SMP over days / hours
  • MAX SMP over hours

MCP

We can create a variable for getting the information and creating the plot of the daily averages of MCP.

MCP_Daily = data_last %>% 
  group_by(Day) %>%
  summarize(Avg = mean(MCP))

MCP_Daily %>% glimpse()
## Rows: 7
## Columns: 2
## $ Day <dbl> 1, 2, 3, 4, 5, 6, 7
## $ Avg <dbl> 302.0398, 298.8338, 306.9160, 297.0389, 299.2423, 291.8207, 275...
ggplot(MCP_Daily, aes(Day, Avg)) + 
  geom_col() +
  expand_limits(y = 0)

We can easily understand from the plot that daily averages of MCP is decreasing on the weekend. In other days, their values are very close. We can do these steps for hour, too.

MCP_Hourly = data_last %>% 
  group_by(Hour) %>%
  summarize(Avg = mean(MCP))

MCP_Hourly %>% glimpse()
## Rows: 24
## Columns: 2
## $ Hour <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...
## $ Avg  <dbl> 297.3552, 305.8410, 298.1139, 284.3048, 284.8490, 246.7165, 21...
ggplot(MCP_Hourly, aes(Hour, Avg)) + 
  geom_line() +
  expand_limits(y = 0)

Around 06:00, the average MCP has very low values respect to other hours. At other hours, they are nearly equal.

options(tibble.print_max = 24)

data_last %>% 
  group_by(Hour) %>%
  top_n(1, MCP) %>%
  select(Hour, MCP) %>%
  arrange(desc(MCP))
## # A tibble: 24 x 2
## # Groups:   Hour [24]
##     Hour   MCP
##    <int> <dbl>
##  1    14  350 
##  2    15  350 
##  3    16  350 
##  4    17  331.
##  5    11  330.
##  6    21  330.
##  7    10  330.
##  8    13  329.
##  9    20  329.
## 10    22  328.
## 11     9  328.
## 12    18  327.
## 13     1  327.
## 14    19  325.
## 15     8  325.
## 16    12  324.
## 17     2  324.
## 18     0  324.
## 19    23  323.
## 20     3  322.
## 21     4  320 
## 22     7  318.
## 23     5  302.
## 24     6  293
options(tibble.print_max = 10)

We can see that the highest three MCP values have occurred at 14.00, 15.00 and 16.00 in July,2020.

SMP

We can create a variable for getting the information and creating the plot of the daily averages of SMP.

SMP_Daily = data_last %>% 
  group_by(Day) %>%
  summarize(Avg = mean(SMP))

SMP_Daily %>% glimpse()
## Rows: 7
## Columns: 2
## $ Day <dbl> 1, 2, 3, 4, 5, 6, 7
## $ Avg <dbl> 316.7281, 314.4270, 275.7717, 300.8255, 294.4326, 310.8506, 289...
ggplot(SMP_Daily, aes(Day, Avg)) + 
  geom_col() +
  expand_limits(y = 0)

We can easily understand from the plot that daily averages of SMP on Monday and Tuesday are higher than the other days. On wednesday, it has the lowest average SMP values. We can do these steps for hour, too.

SMP_Hourly = data_last %>% 
  group_by(Hour) %>%
  summarize(Avg = mean(SMP))

SMP_Hourly %>% glimpse()
## Rows: 24
## Columns: 2
## $ Hour <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...
## $ Avg  <dbl> 294.3274, 307.5423, 297.3471, 282.9535, 280.0355, 248.3558, 21...
ggplot(SMP_Hourly, aes(Hour, Avg)) + 
  geom_line() +
  expand_limits(y = 0)

Around 06:00, the average SMP has very low values respect to other hours, too. At other hours, they are nearly equal.

options(tibble.print_max = 24)

data_last %>% 
  group_by(Hour) %>%
  top_n(1, SMP) %>%
  select(Hour, SMP) %>%
  arrange(desc(SMP))
## # A tibble: 26 x 2
## # Groups:   Hour [24]
##     Hour   SMP
##    <int> <dbl>
##  1    14  460 
##  2    15  460 
##  3    16  435 
##  4    21  419.
##  5    17  404.
##  6    20  403.
##  7    13  402 
##  8    13  402 
##  9    12  402.
## 10    18  402.
## # ... with 16 more rows
options(tibble.print_max = 10)

We can see that the highest three SMP values have occurred at 14.00, 15.00 and 16.00 in July,2020.