1 Turkish Startups in 2021

Published

January 9, 2023

Key Takeaways

The top 3 start-ups with the highest investments in 2021 are:
- trendyol
- Getir
- hepsiburada
  - 85% of the investments is shared between these companies in 2021.
The top 3 sectors with the highest investments in 2021 are:
- Ecommerce Enabler
- Delivery & Logistics
- Gaming
The sector with the highest median investment is:
- Telecom
The sector with the highest mean investment is:
- Ecommerce Enabler
The gaming sector has the highest standard deviation compared to its mean. It is also the sector with the highest company amount which has an effect on the variation.

1.1 Data Pre-Processing

To be able to take a look at the data set, the necessary libraries and data should be imported first.

Code

# install.packages("readxl")
# install.package(kni)
library(readxl)
library(knitr)
library(ggplot2)
library(dplyr)
library(scales)
df <- readxl::read_excel("assignment1/startup_deals_2021.xlsx")

The problematic characters in the column names are removed to make accessing the columns easier. Then we can get a “glimpse” of our data.

Code

names(df) <- gsub("%", "", names(df))
names(df) <- gsub(" ", "_", names(df))
names(df) <- gsub("[()]", "", names(df))
glimpse(df)

Rows: 297
Columns: 9
$ Target_Company      <chr> "Abonesepeti", "Abrakadabra", "Ace Games", "Adlema…
$ Sector              <chr> "SaaS", "Gaming", "Gaming", "Internet of things", …
$ Investor            <chr> "Keiretsu Forum, Berkan Burla", "WePlay Ventures",…
$ Announcement_Date   <chr> "June 2021", "December 2021", "April 2021", "June …
$ Financial_Investor  <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
$ `Investor's_Origin` <chr> "Turkey", "Turkey", "Turkey, USA", "Turkey", "Turk…
$ Stake_              <chr> "5.00%", "5.00%", "NA", "NA", "NA", "NA", "10.92%"…
$ Deal_Value_USD      <chr> "100000", "250000", "NA", "120000", "100000", "100…
$ Investment_Stage    <chr> "Seed Stage", "Seed Stage", "Seed Stage", "Seed St…

It can be seen that the “Stake_” and “Deal_Value_USD” columns appear as strings. That’s not good for business. If we don’t fix this problem, we won’t be able to use our numerical data correctly and get false results.

Let’s start with the “Deal_Value_USD” column. suppressWarnings used to avoid the warnings for the NaN values.

Code

df$"Deal_Value_USD"<- suppressWarnings(as.integer(df$"Deal_Value_USD"))

It gets a bit cumbersome when it comes to the “Stake_” column. If we were to change it like we did for the first column, we would get NaNs for the values. We can check the values in the column to investigate this problem.

Code

table(df["Stake_"])

Stake_
  0.46%   1.33%  10.00%  10.42%  10.65% 10.71 % 10.91 %  10.92% 100.00%  11.00% 
      1       1      12       1       1       1       1       1      12       2 
 11.03% 11.11 %  11.55%  12.00%  12.50% 12.61 %  14.06%  14.28%  14.29%  14.30% 
      1       1       1       1       4       1       1       1       2       1 
 14.60% 14.71 % 14.91 %  15.00%  15.38%  15.49%  15.50%  15.60%  15.79%  16.13% 
      1       1       1       1       1       1       1       1       1       2 
  1750%  18.66%  19.99%   2.00%  2.21 %   2.59%  20.00%  20.83%  22.00%  22.73% 
      1       1       1       2       1       1       7       1       1       1 
 23.50%  24.29%  25.00%  25.23%  28.00%   3.33%   3.38%   3.63%  30.00%  33.33% 
      1       1       2       1       1       2       1       1       1       1 
 35.00%  35.09%  37.50%  37.70%   4.00%   4.44%   4.75%   5.00%   5.20%   5.67% 
      1       1       1       1       1       1       1      10       1       1 
 5.81 %   5.88%   5.90%  50.00%   6.00%   6.25%   6.34%  6.51 %   6.60%  60.00% 
      1       2       1       3       2       1       1       1       1       4 
 69.82%   7.00%   7.14%   7.39%   7.50%   7.69%  75.00%   8.05%   8.08%   8.33% 
      1       1       1       1       1       1       1       1       1       1 
  8.47%   8.70%  88.89%   9.39%   9.40%  9.91 %      NA 
      2       1       1       1       1       1     157

“%” characters and the white space in some of the cells cause problems. They are removed, and the type is changed. The warnings are suppressed again.

Code

df$"Stake_"<- gsub("%","", df$"Stake_") %>% trimws(df$"Stake_", which = c("both")) %>% as.numeric(df$"Stake_")
glimpse(df)

Rows: 297
Columns: 9
$ Target_Company      <chr> "Abonesepeti", "Abrakadabra", "Ace Games", "Adlema…
$ Sector              <chr> "SaaS", "Gaming", "Gaming", "Internet of things", …
$ Investor            <chr> "Keiretsu Forum, Berkan Burla", "WePlay Ventures",…
$ Announcement_Date   <chr> "June 2021", "December 2021", "April 2021", "June …
$ Financial_Investor  <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
$ `Investor's_Origin` <chr> "Turkey", "Turkey", "Turkey, USA", "Turkey", "Turk…
$ Stake_              <dbl> 5.00, 5.00, NA, NA, NA, NA, 10.92, NA, NA, 15.38, …
$ Deal_Value_USD      <int> 100000, 250000, NA, 120000, 100000, 1000000, 25000…
$ Investment_Stage    <chr> "Seed Stage", "Seed Stage", "Seed Stage", "Seed St…

1.2 Value by Company

The amount of investment received by companies can be examined as the first analysis. To do so, we should group the data set by company name and sum the deal values. We can also keep their sector information to observe the distribution.

Adding or removing sectors when grouping by company should not change the row count because a company’s sector should be the same in every record. Yet if we try that, we get different results, which means that there is something wrong in the sector column. Let’s take a closer look.

Code

table(df["Sector"])

Sector
            3D Printing      Advanced materials             Advertising 
                      3                       1                       2 
               Agritech Artificial intelligence Artificial Intelligence 
                      8                      11                       3 
      Augmented Reality            Aviationtech             B lockchain 
                      2                       1                       4 
                Biotech                    Book          Cybersec urity 
                      6                       1                       1 
          Cybersecurity          Data analytics          Data Analytics 
                      4                       2                       1 
               Deeptech    Delivery & Logistics      Diğital Comparison 
                     11                      13                       1 
      Ecommerce enabler       Ecommerce Enabler               Education 
                      8                       1                       6 
                 Energy                 Fintech                Foodtech 
                      3                      23                       9 
                 Gaming              Healthtech                  HRTech 
                     51                      14                       6 
         I mage process      Internet of things           Marketingtech 
                      1                       5                       6 
            Marketplace                   Media                Mobility 
                     17                      12                       8 
               Proptech                Research              Retailtech 
                      1                       1                       3 
               Robotics                    SaaS                  Sports 
                      3                      28                       5 
                Telecom                 Telecpm          Transportation 
                      3                       1                       1 
                 Travel              Visiontech  Vitamins & Supplements 
                      4                       1                       1

Typos, lower- and upper-case sensitivities, etc. disrupt the pattern. There are also records that show a company in two different sectors. In a larger set, we could remove the whitespace, make every single one of the records lowercase, perform a fuzzy search or similar things, and solve the majority of the problems. But this data set is small enough to make manual adjustments. Let’s change them.

Code

df$Sector[df$Sector == "Artificial intelligence"] <- "Artificial Intelligence"
df$Sector[df$Sector == "B lockchain"] <- "Blockchain"
df$Sector[df$Sector == "Cybersec urity"] <- "Cybersecurity"
df$Sector[df$Sector == "Data analytics"] <- "Data Analytics"
df$Sector[df$Sector == "Diğital Comparison"] <- "Digital Comparison"
df$Sector[df$Sector == "Ecommerce enabler"] <- "Ecommerce Enabler"
df$Sector[df$Sector == "I mage process"] <- "Image Process"
df$Sector[df$Sector == "Internet of things"] <- "Internet of Things"
df$Sector[df$Sector == "Telecpm"] <- "Telecom"

df$Sector[df$Target_Company == "ART Labs"] <- "Artificial Intelligence"
df$Sector[df$Target_Company == "Juphy"] <- "SaaS"

Now we are ready to get our new dataframe.

Code

suppressMessages(comp_grouped <- df %>%
  group_by(Target_Company, Sector) %>%
  summarize(Deal_Value_USD = sum(Deal_Value_USD, na.rm=T))%>%
  arrange(desc(Deal_Value_USD)))

kable(head(comp_grouped, 10))

Target_Company	Sector	Deal_Value_USD
trendyol	Ecommerce Enabler	1435000000
Getir	Delivery & Logistics	1018000000
hepsiburada	Ecommerce Enabler	761481000
Dream Games	Gaming	155000050
Libra Softworks	Gaming	30000000
Prota	SaaS	30000000
BluTV	Media	20800000
Arvento	SaaS	20565000
Akinon	Ecommerce Enabler	20000000
Biotrend Energy	Energy	20000000

We can also visualize these companies and compare them.

Visualizing so many companies will make the plot unreadable. We can just select the top 5% and feed them to our plot.

We first calculate the value for the filter, then add it to the code, and then we sort the bars according to the values. Next, we insert the elements of the plot. Note that the y axis is in millions.

Code

qu <- quantile(comp_grouped$Deal_Value_USD, probs = 0.95, na.rm = TRUE)

comp_grouped %>% filter(Deal_Value_USD >= qu) %>%
  ggplot(aes(x = reorder(Target_Company, +Deal_Value_USD),y = Deal_Value_USD, 
             fill = Sector))+ scale_fill_brewer(palette="Set3") +
  geom_col(width = 0.5) + theme(axis.text.x = element_text(angle = 45, hjust=1)) + 
  ggtitle("Total Investments by Company") +
  xlab("Company") + ylab("Deal Value in USD (Millions)") + 
  scale_y_continuous(labels = label_number(suffix = " M", scale = 1e-6))

Three companies that are arguably similar lead the way, with significant differences between their closest competitors. Despite being in the same sector, hepsiburada and trendyol are both in the top 3, which emphasize the share of the Ecommerce sector.

1.3 Value by Sector

When it comes to sectors, we can work on similar calculations.

A group by function will be needed again. To see more information, we will also include the number of companies in the sector.

Code

sum_deal = sum(df$Deal_Value_USD, na.rm = T)

sec_grouped <- df %>%
  group_by(Sector) %>%
  summarize(Deal_Value_USD = sum(Deal_Value_USD, na.rm=T), Amount_of_Company = n_distinct(Target_Company))%>%
  arrange(desc(Deal_Value_USD)) %>%
  mutate(Investment_Percentage = round(Deal_Value_USD*100 / sum_deal, 2))

kable(head(sec_grouped, 10))

Sector	Deal_Value_USD	Amount_of_Company	Investment_Percentage
Ecommerce Enabler	2221235634	8	58.78
Delivery & Logistics	1027811561	10	27.20
Gaming	221235284	50	5.85
SaaS	84157048	26	2.23
Fintech	28894578	23	0.76
Marketplace	26477315	15	0.70
Mobility	25905560	8	0.69
Media	21759000	10	0.58
Energy	21608314	3	0.57
Deeptech	16357491	10	0.43

Ecommerce Enabler companies got more than half of the investments, even when there were only eight of them! Great, let’s start a company called “hepsiburalarda” and be done with it, right? right? Well, not quite.

1.4 Distribution within Sectors

We should investigate how these investments are distributed among companies and collect descriptive statistics on them. There is a plot for such tasks: the boxplot. But almost every sector has its own extreme outliers, and if we were to plot these, we would have to filter several sectors for a readable graph.

Code

filtered_df <- df %>% 
  filter(!Sector %in% c("Ecommerce Enabler", "Delivery & Logistics", "Gaming",
                        "SaaS", "Fintech", "Marketplace", "Mobility", "Media",
                        "Energy", "Telecom", "Deeptech", "Education",
                        "Agritech", "Proptech", "Vitamins & Supplements",
                        "Foodtech", "Healthtech"))
  
ggplot(filtered_df, aes(x=Sector, y=Deal_Value_USD)) + 
  geom_boxplot() +  scale_y_continuous(labels = label_number(suffix = " M", scale = 1e-6)) + theme(axis.text.x = element_text(angle = 45, hjust=1))

So, the boxplot did not come to our aid this time. We will calculate the descriptive statistics instead. Sectors with only one sample are excluded.

Code

summed_df <- df %>%                    
  group_by(Sector) %>% 
  summarize(Mean = mean(Deal_Value_USD, na.rm =TRUE),
            Median = median(Deal_Value_USD, na.rm =TRUE),
            Standard_Dev = sd(Deal_Value_USD, na.rm =TRUE),
            Min = min(Deal_Value_USD, na.rm =TRUE),
            Quantile1 = quantile(Deal_Value_USD, 0.25, na.rm =TRUE),
            Quantile3 = quantile(Deal_Value_USD, 0.75, na.rm =TRUE),
            Max = max(Deal_Value_USD,na.rm =TRUE),
            Amount_of_Company = n_distinct(Target_Company)) %>%
  
  filter(,!is.na(Standard_Dev)) %>%
  mutate(Coefficient_of_Variation = Standard_Dev / Mean) %>%
  arrange(desc(Median))

kable(head(summed_df))

Sector	Mean	Median	Standard_Dev	Min	Quantile1	Quantile3	Max	Amount_of_Company	Coefficient_of_Variation
Telecom	3085875.0	1546750	4303515.8	50000	82625.0	4550000	9.200e+06	4	1.3945853
Energy	7202771.3	1400000	11098730.8	208314	804157.0	10700000	2.000e+07	3	1.5408973
Ecommerce Enabler	246803959.3	1103361	511282193.7	50000	376800.0	20000000	1.435e+09	8	2.0716126
Delivery & Logistics	79062427.8	1000000	166933603.5	55000	365264.0	35000000	5.550e+08	10	2.1114151
Mobility	3700794.3	1000000	5707703.7	139400	383080.0	4000000	1.600e+07	8	1.5422915
Robotics	807422.3	900000	683584.4	82267	491133.5	1170000	1.440e+06	3	0.8466256

This was the final step of our EDA. We ran fundamental analyses on such an exciting data set.