İzmir Fish Prices for 2021
You can reach the whole years from here
Or you can download the data from here.
Understanding the shape of data and the columns
df=read.csv(file = 'https://raw.githubusercontent.com/pjournal/mef05-stuncers/gh-pages/balik_hal_fiyatlari%20(4).csv',
stringsAsFactors = FALSE, header = TRUE,sep = ";", encoding="UTF-8")
head(df)
## TARIH MAL_TURU MAL_ADI BIRIM ASGARI_UCRET AZAMI_UCRET
## 1 2021-01-02 00:00:00 BALIK TIRSI (DENIZ) KG 5.83 12.5
## 2 2021-01-02 00:00:00 BALIK KIRLANGIÇ (DENIZ) KG 3.00 80.0
## 3 2021-01-02 00:00:00 BALIK ÇIMÇIM (DENIZ) KG 3.50 8.0
## 4 2021-01-02 00:00:00 BALIK HANOS ( DENIZ ) KG 2.50 5.0
## 5 2021-01-02 00:00:00 BALIK KILIÇ (DENIZ) KG 45.00 45.0
## 6 2021-01-02 00:00:00 BALIK LÜFER ( DENIZ) KG 130.00 130.0
summary(df)
## TARIH MAL_TURU MAL_ADI BIRIM
## Length:23311 Length:23311 Length:23311 Length:23311
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## ASGARI_UCRET AZAMI_UCRET
## Min. : 0.00 Min. : 0.42
## 1st Qu.: 5.00 1st Qu.: 15.00
## Median : 10.00 Median : 35.00
## Mean : 31.13 Mean : 67.50
## 3rd Qu.: 35.00 3rd Qu.: 90.00
## Max. :800.00 Max. :5150.00
Finding maximum, minimum and the standard deviation on price
max_min_prices <- df %>%
group_by(MAL_ADI, BIRIM) %>%
summarise(MAX_MAX_PRICE = max(AZAMI_UCRET), MAX_MIN_PRICE = min(AZAMI_UCRET), MIN_MAX_PRICE = max(ASGARI_UCRET), MIN_MIN_PRICE = min(ASGARI_UCRET), .groups = 'drop' ) %>%
arrange(desc(MAX_MAX_PRICE)) %>%ungroup()%>%
head(5)
max_min_prices
## # A tibble: 5 x 6
## MAL_ADI BIRIM MAX_MAX_PRICE MAX_MIN_PRICE MIN_MAX_PRICE MIN_MIN_PRICE
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 BARBUN (TEKIR) KG 5150 30 45 2
## 2 ITHAL SOMON(TAT~ KG 1153 50 140 17.5
## 3 ISTAKOZ (DENIZ) KG 800 20 800 3
## 4 LEVREK (DENIZ) KG 500 30 140 5
## 5 MERCAN (BÜYÜKBO~ KG 500 20 100 2
SD_MAX_MIN <- df %>%
group_by(MAL_ADI, BIRIM) %>%
summarise(sd_max = sd(AZAMI_UCRET), sd_min = sd(ASGARI_UCRET), .groups = 'drop' ) %>%
arrange(desc(sd_max)) %>%ungroup()%>%
head(5)
SD_MAX_MIN
## # A tibble: 5 x 4
## MAL_ADI BIRIM sd_max sd_min
## <chr> <chr> <dbl> <dbl>
## 1 BARBUN (TEKIR) KG 334. 5.75
## 2 ISTAKOZ (DENIZ) KG 169. 137.
## 3 MERCAN (BÜYÜKBOY) KG 84.3 9.06
## 4 FANGRI (DENIZ) KG 82.7 72.4
## 5 SINARIT (DENIZ) KG 80.2 66.8
Analze is showing us that prices are not correct. It has been write wrong in price or unit. Eiter way, we have to cut outliers to get healty data.
df_outliers <- df %>%
group_by(MAL_ADI) %>%
transmute(AZAMI_UCRET, ASGARI_UCRET, FARK = AZAMI_UCRET - ASGARI_UCRET,
IQR_FARK = IQR(FARK),
UPPER = quantile(FARK, 0.75) + IQR_FARK * 1.5,
LOWER = quantile(FARK, 0.25) - IQR_FARK * 1.5) %>%
filter(!(FARK <= UPPER & FARK >= LOWER))
df_cutoff_outliers <- df %>%
group_by(MAL_ADI) %>%
mutate(FARK = AZAMI_UCRET - ASGARI_UCRET,
IQR_FARK = IQR(FARK),
UPPER = quantile(FARK, 0.75) + IQR_FARK * 1.5,
LOWER = quantile(FARK, 0.25) - IQR_FARK * 1.5) %>%
filter(FARK <= UPPER & FARK >= LOWER) %>%
ungroup()
df_cutoff_outliers
## # A tibble: 22,338 x 10
## TARIH MAL_TURU MAL_ADI BIRIM ASGARI_UCRET AZAMI_UCRET FARK IQR_FARK UPPER
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2021-0~ BALIK TIRSI ~ KG 5.83 12.5 6.67 5.83 22.1
## 2 2021-0~ BALIK KIRLAN~ KG 3 80 77 71 244.
## 3 2021-0~ BALIK ÇIMÇIM~ KG 3.5 8 4.5 8.62 33.6
## 4 2021-0~ BALIK HANOS ~ KG 2.5 5 2.5 2.5 6.75
## 5 2021-0~ BALIK KILIÇ ~ KG 45 45 0 20 50
## 6 2021-0~ BALIK LÜFER ~ KG 130 130 0 35 97.5
## 7 2021-0~ BALIK USKUMR~ KG 38 38 0 39 112.
## 8 2021-0~ KÜLTÜR LEVREK~ KG 25 55 30 21 96.5
## 9 2021-0~ BALIK KÖPEK ~ KG 10 10 0 8 20
## 10 2021-0~ BALIK ISTAVR~ KG 2.5 6.67 4.17 3.33 10.8
## # ... with 22,328 more rows, and 1 more variable: LOWER <dbl>
We cut off the outliers but in order to reach fresh fish, we should cut off the hunting ban time.
av_serbest <- df_cutoff_outliers %>% mutate (Zaman=lubridate::date(TARIH), Ay=lubridate::month(TARIH))%>%
filter(!(TARIH >= "2021-04-15" & TARIH <= "2021-08-31"))
Time to check, when we can eat… Let’s say, big juicy pandora(Mercan)
mercan <-av_serbest %>% filter(MAL_ADI=="MERCAN (BÜYÜKBOY)") %>%
mutate (ay=lubridate::month(TARIH),daily_change=(AZAMI_UCRET-ASGARI_UCRET),daily_avg=(AZAMI_UCRET+ASGARI_UCRET)/2)
boxplot(daily_avg~ay,
data=mercan,
main="Mercan Data",
xlab="Month Number",
ylab="Daily Average",
col="green",
border="red"
)
The boxplot is showing that the winter months are the best for pandora and the best month to eat is April of course. I searched it on Google and it is absolutely true.
Thank you for reading…
I want to thank both Meryem and Emirhan. I get great help from their analysis.