Proje_Salary

Let’s start with reading and save our big aggregated csv on our Environment. In this part of study, we examined the salaries and programming language relations.

library(gdata)
library(dplyr)
library(tidyverse)
Stackofbinded <- read.csv('C:\\Users\\esigi\\Downloads\\Stackofbinded.csv')
colnamesall<-colnames(Stackofbinded)

We always need column names throughout our works. So it is good to see it in our environment. Also, it always comes into question how to review data easily to test the reliability of data. Therefore I use an easy way to move our small data to excel by copying. I used this formula frequently when it comes to reviewing small amounts of data. Hence if there were some examples of them in our code, it is about to think about data frame and does not show anything in this study. here are the examples of library ‘clipr’

#library(clipr)
#write_clip()

Because of the size of our Data, I started to select the columns that we need. After selecting them it is always good to quick look at data with head and summary function.

SalaryofStack<-Stackofbinded[,c('Year','Salary','LanguageWorkedWith','Country','MainBranch','CompTotal','CompFreq','SalaryType','ConvertedComp', 'Currency')]

head(SalaryofStack)

##   Year Salary            LanguageWorkedWith        Country
## 1 2017     NA                         Swift  United States
## 2 2017     NA JavaScript; Python; Ruby; SQL United Kingdom
## 3 2017 113750             Java; PHP; Python United Kingdom
## 4 2017     NA        Matlab; Python; R; SQL  United States
## 5 2017     NA                                  Switzerland
## 6 2017     NA         JavaScript; PHP; Rust    New Zealand
##                                             MainBranch CompTotal CompFreq
## 1                                              Student        NA         
## 2                                              Student        NA         
## 3                               Professional developer        NA         
## 4 Professional non-developer who sometimes writes code        NA         
## 5                               Professional developer        NA         
## 6                                              Student        NA         
##   SalaryType ConvertedComp                     Currency
## 1                       NA                             
## 2                       NA British pounds sterling (Â£)
## 3                       NA British pounds sterling (Â£)
## 4                       NA                             
## 5                       NA                             
## 6                       NA

summary(SalaryofStack)

##       Year          Salary           LanguageWorkedWith   Country         
##  Min.   :2017   Min.   : 0.000e+00   Length:387030      Length:387030     
##  1st Qu.:2018   1st Qu.: 1.320e+04   Class :character   Class :character  
##  Median :2019   Median : 5.000e+04   Mode  :character   Mode  :character  
##  Mean   :2019   Mean   :1.582e+100                                        
##  3rd Qu.:2020   3rd Qu.: 9.950e+04                                        
##  Max.   :2021   Max.   :1.000e+105                                        
##                 NA's   :323812                                            
##   MainBranch          CompTotal            CompFreq          SalaryType       
##  Length:387030      Min.   : 0.000e+00   Length:387030      Length:387030     
##  Class :character   1st Qu.: 1.850e+04   Class :character   Class :character  
##  Mode  :character   Median : 6.500e+04   Mode  :character   Mode  :character  
##                     Mean   :8.054e+241                                        
##                     3rd Qu.: 1.280e+05                                        
##                     Max.   :1.111e+247                                        
##                     NA's   :249076                                            
##  ConvertedComp        Currency        
##  Min.   :       0   Length:387030     
##  1st Qu.:   25356   Class :character  
##  Median :   55562   Mode  :character  
##  Mean   :  112456                     
##  3rd Qu.:   97288                     
##  Max.   :45241312                     
##  NA's   :201905

After selecting columns, I started to clean and merge columns because although I had corrected the data while merging there were still different and problematic columns in our data frame. For instance, there were two expressions for the United States which is a very important country for our study.

Also, I added to the tables the average currency of 2017 because there was no column for converted salary currency for 2017. Later I will come back to this.

SalaryofStack2 <- SalaryofStack %>% 
      mutate(Freqnew = coalesce(CompFreq,SalaryType)) %>% mutate(Freqnew = ifelse(CompFreq == '' ,SalaryType,Freqnew)) %>%
mutate(Country = ifelse(Country == 'United States of America' ,'United States',Country)) 

SalaryofStack2$ConvertedComp <- as.numeric(SalaryofStack2$ConvertedComp)

#write_clip(SalaryofStack2 %>% distinct(ConvertedComp, Year))

library(readxl)
Currency2017<-read_xlsx("C:\\Users\\esigi\\Documents\\Attachments\\average_currency_2017.xlsx")


library(dplyr)
SalaryofStack3<-left_join(SalaryofStack2, Currency2017, by =c("Currency"="CurrencyType"))

library(psych)
describe(SalaryofStack3)

##                     vars      n          mean            sd median  trimmed
## Year                   1 387030  2.019080e+03  1.350000e+00   2019  2019.10
## Salary                 2  63218 1.581828e+100 3.977220e+102  50000 57002.76
## LanguageWorkedWith*    3 387030  4.929239e+04  3.021072e+04  47566 50722.15
## Country*               4 387030  1.452400e+02  7.706000e+01    152   148.89
## MainBranch*            5 387030  2.910000e+00  2.410000e+00      2     2.43
## CompTotal              6 137954 8.054215e+241           Inf  65000 79908.26
## CompFreq*              7 387030  1.810000e+00  1.170000e+00      1     1.64
## SalaryType*            8 387030  1.250000e+00  7.400000e-01      1     1.04
## ConvertedComp          9 185125  1.124556e+05  3.392271e+05  55562 61842.67
## Currency*             10 387030  3.372000e+01  5.277000e+01      1    22.73
## Freqnew*              11 387030  2.070000e+00  1.230000e+00      2     1.96
## Currency2017          12  71208  1.960000e+00  8.200000e+00      1     1.05
##                          mad     min           max         range  skew kurtosis
## Year                    1.48 2017.00  2.021000e+03  4.000000e+00  0.06    -1.20
## Salary              59487.04    0.00 1.000000e+105 1.000000e+105   NaN      NaN
## LanguageWorkedWith* 45395.73    1.00  8.847800e+04  8.847700e+04 -0.27    -1.29
## Country*              114.16    1.00  2.500000e+02  2.490000e+02 -0.10    -1.47
## MainBranch*             1.48    1.00  1.100000e+01  1.000000e+01  1.65     1.53
## CompTotal           78577.80    0.00 1.111111e+247 1.111111e+247   NaN      NaN
## CompFreq*               0.00    1.00  4.000000e+00  3.000000e+00  1.10    -0.44
## SalaryType*             0.00    1.00  4.000000e+00  3.000000e+00  3.06     8.17
## ConvertedComp       50903.59    0.00  4.524131e+07  4.524131e+07 30.27  2494.12
## Currency*               0.00    1.00  1.690000e+02  1.680000e+02  1.45     0.61
## Freqnew*                1.48    1.00  4.000000e+00  3.000000e+00  0.70    -1.16
## Currency2017            0.19    0.02  1.121400e+02  1.121300e+02 12.49   163.58
##                                se
## Year                 0.000000e+00
## Salary              1.581828e+100
## LanguageWorkedWith*  4.856000e+01
## Country*             1.200000e-01
## MainBranch*          0.000000e+00
## CompTotal                     Inf
## CompFreq*            0.000000e+00
## SalaryType*          0.000000e+00
## ConvertedComp        7.884200e+02
## Currency*            8.000000e-02
## Freqnew*             0.000000e+00
## Currency2017         3.000000e-02

Unfortunately, the calculations with currency do not show reliable results for 2017 when we compare the distribution of other years. Firstly I started with multiplying the salary with the currency mean of 2017.

SalaryofStack4 <- SalaryofStack3 %>% mutate(Finalsalary = ifelse((Year==2017), (Salary)*(Currency2017), ConvertedComp))

SalaryofStack4 <- select(SalaryofStack4, c(-CompFreq, -SalaryType, -Currency2017, -CompTotal, -Currency, -Salary , -Freqnew ))

SalaryofStack4 <- select(SalaryofStack4, c(-ConvertedComp))

str(SalaryofStack4)

## 'data.frame':    387030 obs. of  5 variables:
##  $ Year              : int  2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
##  $ LanguageWorkedWith: chr  "Swift" "JavaScript; Python; Ruby; SQL" "Java; PHP; Python" "Matlab; Python; R; SQL" ...
##  $ Country           : chr  "United States" "United Kingdom" "United Kingdom" "United States" ...
##  $ MainBranch        : chr  "Student" "Student" "Professional developer" "Professional non-developer who sometimes writes code" ...
##  $ Finalsalary       : num  NA NA 146624 NA NA ...

SalaryofStack4<-SalaryofStack4 %>% filter(!is.na(Finalsalary))

SalaryofStack5<-SalaryofStack4 %>% filter(between(Finalsalary, quantile(Finalsalary, 0.05), quantile(Finalsalary, 0.95)))

hist(SalaryofStack5$Finalsalary)

SalaryofStack5$Finalsalary<-as.numeric(SalaryofStack5$Finalsalary)

SalaryofStack5plot2017<-SalaryofStack5 %>% filter(Year==2017)
SalaryofStack5plot2018<-SalaryofStack5 %>% filter(Year==2018)
SalaryofStack5plot2019<-SalaryofStack5 %>% filter(Year==2019)
plot(density(SalaryofStack5plot2017$Finalsalary),col='Blue')
lines(density(SalaryofStack5plot2018$Finalsalary),col='Red')
lines(density(SalaryofStack5plot2019$Finalsalary),col='Black')
lines(density(SalaryofStack5$Finalsalary),col='Orange')
title(sub="Year Distribution - Blue is 2017")

Unfortunately, again the method does not seem correct when we looked at the distribution. Therefore just looked at the salary data.

SalaryofStack4 <- SalaryofStack3 %>% mutate(Finalsalary = ifelse((Year==2017), (Salary), ConvertedComp))

SalaryofStack4 <- select(SalaryofStack4, c(-CompFreq, -SalaryType, -Currency2017, -CompTotal, -Currency, -Salary , -Freqnew ))

SalaryofStack4 <- select(SalaryofStack4, c(-ConvertedComp))

str(SalaryofStack4)

## 'data.frame':    387030 obs. of  5 variables:
##  $ Year              : int  2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
##  $ LanguageWorkedWith: chr  "Swift" "JavaScript; Python; Ruby; SQL" "Java; PHP; Python" "Matlab; Python; R; SQL" ...
##  $ Country           : chr  "United States" "United Kingdom" "United Kingdom" "United States" ...
##  $ MainBranch        : chr  "Student" "Student" "Professional developer" "Professional non-developer who sometimes writes code" ...
##  $ Finalsalary       : num  NA NA 113750 NA NA ...

SalaryofStack4<-SalaryofStack4 %>% filter(!is.na(Finalsalary))

SalaryofStack5<-SalaryofStack4 %>% filter(between(Finalsalary, quantile(Finalsalary, 0.04), quantile(Finalsalary, 0.96)))

hist(SalaryofStack5$Finalsalary)

SalaryofStack5$Finalsalary<-as.numeric(SalaryofStack5$Finalsalary)

SalaryofStack5plot2017<-SalaryofStack5 %>% filter(Year==2017)
SalaryofStack5plot2018<-SalaryofStack5 %>% filter(Year==2018)
SalaryofStack5plot2019<-SalaryofStack5 %>% filter(Year==2019)
plot(density(SalaryofStack5plot2017$Finalsalary),col='Blue')
lines(density(SalaryofStack5plot2018$Finalsalary),col='Red')
lines(density(SalaryofStack5plot2019$Finalsalary),col='Black')
title(sub="Year Distribution - Blue is 2017")

If I do not multiply and just use the Salary column then the results are still cannot be dependable. As you can see from above and below two of the table blue one is show the salary distribution for 2017 and they are not compatible with other years and total.

Then, we changed the range of salary because we noticed that there is some missing and misleading answers in our data frame.

Due to lack of data information when we observed the distribution for 2017 cannot be reliable.

Hence we eliminated 2017’s salary data.

SalaryofStack5<-SalaryofStack5%>%filter(Year!=2017)

SalaryofStack5Turkey<-SalaryofStack5%>%filter(Country=="Turkey")

SalaryofStack5notTurkey<-SalaryofStack5%>%filter(Country!="Turkey")

SalaryofStack5USA<-SalaryofStack5%>%filter(Country=="United States")

SalaryofStack5Sweeden<-SalaryofStack5%>%filter(Country=="Sweden")

And group them for some countries.

Here we are looked at our data quickly to understand the frame.

plot(density(SalaryofStack5Turkey$Finalsalary),col='red')
lines(density(SalaryofStack5notTurkey$Finalsalary),col='blue')

SalaryofStack5Turkey$Cntry <- 'TR'
SalaryofStack5notTurkey$Cntry <- 'Other'
SalaryofStack5USA$Cntry<-'USA'
SalaryofStack5Sweeden$Cntry<-'Sweden'


compareturkey <- rbind(SalaryofStack5Turkey, SalaryofStack5notTurkey)
options(scipen = 5)
ggplot(compareturkey, aes(Finalsalary, fill = Cntry)) + geom_density(alpha = 0.6)+
    scale_x_continuous(limits = c(10000, 200000)) + 
  labs(
    caption = "density summary"  )

Here is our result distribution that shows Turkey and other countries’ differences. When it comes to Salary Turkey is observed as below the world average. While Turkey’s salaries are intense in 12-33k intervals world is between 27 and 91 k.

Here are some other countries comparison from world and summary tables:

compareturkey <- rbind(SalaryofStack5Turkey, SalaryofStack5notTurkey, SalaryofStack5USA, SalaryofStack5Sweeden)

options(scipen = 5)
ggplot(compareturkey, aes(Finalsalary, fill = Cntry)) + geom_density(alpha = 0.5)  + 
  scale_x_continuous(limits = c(10000, 250000)) + 
  labs(
    caption = "density summary"  )

sumother<-summary(SalaryofStack5notTurkey$Finalsalary)
sumtr<-summary(SalaryofStack5Turkey$Finalsalary)
sumusa<-summary(SalaryofStack5USA$Finalsalary)
sumsweeden<-summary(SalaryofStack5Sweeden$Finalsalary)

print("Other")

## [1] "Other"

sumother

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4020   27923   55776   68263   91788  400000

print("TR")

## [1] "TR"

sumtr

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4380   12816   20688   28414   33114  381468

print("USA")

## [1] "USA"

sumusa

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4050   79000  107500  116885  143000  400000

print("Sweeden")

## [1] "Sweeden"

sumsweeden

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4367   46893   56340   60512   67501  331400

Here we compared different programs and salaries and started to look and change our data.

Because of the fact that one programmer knows more than one language, I separated these persons depending on the language. It is assumed that languages that are known are the main cause of salaries.

It is always good to look at the histogram of data to see the distribution.

library(splitstackshape)

SalaryofStack5<-SalaryofStack5 %>% filter(!is.na(LanguageWorkedWith))

SalaryofStack5language <- trim(cSplit(SalaryofStack5, "LanguageWorkedWith", sep = ";", direction = "long"))

SalaryofStack5language<- SalaryofStack5language%>% mutate(LanguageWorkedWith=ifelse(LanguageWorkedWith=='Bash/Shell', 'Bash.Shell.PowerShell',LanguageWorkedWith))%>% mutate(LanguageWorkedWith=ifelse(LanguageWorkedWith=='Bash/Shell/PowerShell', 'Bash.Shell.PowerShell',LanguageWorkedWith))

SalaryofStack5languageTurkey<- SalaryofStack5language%>% filter(Country=="Turkey")

Because of the fast increase in USD-TL currency rate in 2018, there is a sharp decrease in salaries in Turkey for 2019. In 2021 the salaries are caching the previous year’s salary. But unfortunately, one of the most decreases in the value of TL has recently occurred in late 2021. Thus, again another decrease may be observed in the data of next year.

In line with the MEF Master’s program, I focused on Bash, R, and Python Languages and according to our studies Bash.Shell.Powershell is the winner of the comparison. Lack of R users shows fluctuation in salaries therefore it is hard to predict the salary power of R in Turkey. Python seems to more preferable in Turkey.

aggregatedlanguageturkey<-aggregate(SalaryofStack5languageTurkey[, Finalsalary], list(SalaryofStack5languageTurkey$LanguageWorkedWith), mean)

aggregatedlanguagecountturkey<-SalaryofStack5languageTurkey %>% count(LanguageWorkedWith, sort=TRUE)

joinedlanguagecountandsalaryturkey<-left_join(aggregatedlanguagecountturkey,aggregatedlanguageturkey, by=c("LanguageWorkedWith"="Group.1"))

aggregatedlanguageyearturkey<-aggregate(SalaryofStack5languageTurkey[, c('Finalsalary')], list(SalaryofStack5languageTurkey$LanguageWorkedWith,SalaryofStack5languageTurkey$Year), mean)

aggregatedlanguageyearfilterturkey<- filter(aggregatedlanguageyearturkey, Group.1 %in% c('SQL','C#', 'R', 'C+', 'Java', 'Python', 'Bash.Shell.PowerShell'))

library(ggplot2)
plotsmoothbylanguagepopulartr = ggplot(data=aggregatedlanguageyearfilterturkey, aes(x = Group.2))+
  geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
  scale_y_continuous(limits = c(10000, 90000))
plotsmoothbylanguagepopulartr

Now here, I also analyzed the data for the world because of the fact that there are more data that increase the dependability.

SalaryofStack5<-SalaryofStack5 %>% filter(!is.na(LanguageWorkedWith))

aggregatedcountry<-aggregate(SalaryofStack5language[, Finalsalary], list(SalaryofStack5language$Country), mean)

aggregatedlanguage<-aggregate(SalaryofStack5language[, Finalsalary], list(SalaryofStack5language$LanguageWorkedWith), mean)
aggregatedlanguagecount<-SalaryofStack5language %>% count(LanguageWorkedWith, sort=TRUE)

CNTR<-(count(SalaryofStack5language, Country)%>%filter(n>300))
SalaryofStack5languagecntr<- SalaryofStack5language %>% filter(Country==CNTR$Country)

aggregatedcountrYfianal<-aggregate(SalaryofStack5languagecntr[, Finalsalary], list(SalaryofStack5languagecntr$Country), mean)

aggregatedlanguageyear<-aggregate(SalaryofStack5language[, c('Finalsalary')], list(SalaryofStack5language$LanguageWorkedWith,SalaryofStack5language$Year), mean)

joinedlanguagecountandsalary<-left_join(aggregatedlanguagecount,aggregatedlanguage, by=c("LanguageWorkedWith"="Group.1"))


#These are the most popular languages and we choose them to show in our smooth graph.

aggregatedlanguageyearfilter<- filter(aggregatedlanguageyear, Group.1 %in% c('SQL','C#', 'R', 'C+', 'Java', 'Rust', 'Python', 'Ruby', 'Go'))

library(ggplot2)
plotsmoothbylanguage = ggplot(data=aggregatedlanguageyearfilter, aes(x = Group.2))+
  geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
  scale_y_continuous(limits = c(20000, 100000))
                                  
plotsmoothbylanguage

And lastly here are the codes that show the R Python, and Bash trend in recent years and a comparison for Turkey and World average.

RvsPythonvsBash<-filter(aggregatedlanguageyear, Group.1 %in% c("R", "Python", "Bash.Shell.PowerShell"))

library(ggplot2)
RvsPythonvsBashpl = ggplot(data=RvsPythonvsBash, aes(x = Group.2))+
  geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
  scale_y_continuous(limits = c(10000, 90000))
                               
RvsPythonvsBashpl

RvsPythonvsBashtr<- filter(aggregatedlanguageyearfilterturkey, Group.1 %in% c("R", "Python", "Bash.Shell.PowerShell"))

library(ggplot2)
RvsPythonvsBashtrpl = ggplot(data=RvsPythonvsBashtr, aes(x = Group.2))+
  geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
  scale_y_continuous(limits = c(10000, 90000))
                               
RvsPythonvsBashtrpl

This is the summary table of language popularity and salary mean in the world.

colnames(joinedlanguagecountandsalary)[2]<-"Popularity"
colnames(joinedlanguagecountandsalary)[3]<-"Salarymean"

joinedlanguagecountandsalary

##        LanguageWorkedWith Popularity Salarymean
##  1:            JavaScript     117802   66826.45
##  2:                   SQL      94637   67638.56
##  3:              HTML/CSS      75284   65709.81
##  4:                Python      68291   72674.63
##  5: Bash.Shell.PowerShell      63688   77876.20
##  6:                  Java      62619   65812.54
##  7:                    C#      55054   67776.55
##  8:            TypeScript      45970   70445.32
##  9:                   PHP      41291   54252.80
## 10:                   C++      32565   68485.41
## 11:                  HTML      29192   64552.90
## 12:                   CSS      27946   64236.07
## 13:                     C      27176   66728.40
## 14:                    Go      15899   88125.83
## 15:                  Ruby      15775   83792.09
## 16:               Node.js      15520   69491.79
## 17:                Kotlin      11499   67868.40
## 18:                 Swift      11004   72155.14
## 19:                     R       9019   72092.89
## 20:                   VBA       8631   64009.06
## 21:           Objective-C       8330   76506.47
## 22:              Assembly       7318   68419.79
## 23:                 Scala       6873   88505.13
## 24:                  Rust       6819   88018.26
## 25:            PowerShell       5208   78610.54
## 26:             Other(s):       4504   73026.90
## 27:                  Dart       4081   53631.01
## 28:                  Perl       4043   86117.73
## 29:                Groovy       3865   84529.27
## 30:                Matlab       3531   57844.66
## 31:                VB.NET       2727   60814.22
## 32:               Clojure       2426   97417.12
## 33:               Haskell       2234   75775.44
## 34:                Elixir       1766   88175.37
## 35:                    F#       1731   89884.60
## 36:          CoffeeScript       1640   76239.11
## 37:                   Lua       1438   71582.51
## 38:        Visual Basic 6       1414   58467.32
## 39:                Erlang       1349   85019.62
## 40:                Delphi        989   60054.49
## 41:  Delphi/Object Pascal        984   58305.77
## 42:                 Julia        846   75958.85
## 43:           WebAssembly        508   85174.32
## 44:                  LISP        484   88985.20
## 45:                 Cobol        269   66765.23
## 46:               Crystal        226   86323.80
## 47:                 Ocaml        207   79481.40
## 48:                 COBOL        206   65016.13
## 49:                   APL        136   99375.16
## 50:                  Hack         94   90458.07
##        LanguageWorkedWith Popularity Salarymean

This is the summary table of language popularity and salary mean in Turkey.

colnames(joinedlanguagecountandsalaryturkey)[2]<-"Popularity"
colnames(joinedlanguagecountandsalaryturkey)[3]<-"Salarymean"
joinedlanguagecountandsalaryturkey

##        LanguageWorkedWith Popularity Salarymean
##  1:            JavaScript       1208   28905.71
##  2:                   SQL        947   29178.99
##  3:                  Java        776   28440.70
##  4:              HTML/CSS        770   27128.99
##  5:                    C#        763   28964.94
##  6:                Python        605   28569.65
##  7:                   PHP        477   27118.58
##  8:            TypeScript        404   30157.50
##  9: Bash.Shell.PowerShell        400   33211.27
## 10:                   C++        392   26902.76
## 11:                     C        354   28874.39
## 12:                  HTML        216   32096.10
## 13:                   CSS        202   32446.61
## 14:               Node.js        187   37128.46
## 15:                 Swift        171   28023.57
## 16:                    Go        147   36519.37
## 17:                Kotlin        136   28897.93
## 18:           Objective-C        119   34511.23
## 19:              Assembly        106   30348.26
## 20:                  Dart         85   25966.73
## 21:                  Ruby         80   34374.11
## 22:                     R         63   27843.52
## 23:                Matlab         61   29383.56
## 24:                   VBA         59   29819.02
## 25:                 Scala         54   28292.89
## 26:            PowerShell         38   39685.50
## 27:             Other(s):         33   24224.09
## 28:  Delphi/Object Pascal         29   37093.31
## 29:                  Rust         28   26838.75
## 30:                VB.NET         27   33991.74
## 31:                Delphi         26   30476.88
## 32:                  Perl         25   60944.24
## 33:                Groovy         21   57199.24
## 34:        Visual Basic 6         19   34337.26
## 35:          CoffeeScript         11   48831.09
## 36:                Elixir          9   27928.00
## 37:                Erlang          8   27076.25
## 38:               Haskell          7   36338.86
## 39:                  LISP          7   28532.00
## 40:                 Cobol          6   57510.00
## 41:                    F#          6   27086.00
## 42:                 Julia          6   29179.67
## 43:               Clojure          5   59613.60
## 44:                   Lua          4   91491.00
## 45:           WebAssembly          4   14649.00
## 46:                 COBOL          3   30025.00
## 47:                   APL          2   21822.00
## 48:               Crystal          2   22686.00
## 49:                  Hack          1   23844.00
##        LanguageWorkedWith Popularity Salarymean

And this is the model that shows the regression popularity and salary. This is the data that shows the regression between language popularity and salary. We expect that if the language is well known than the salary is lower.

model<-lm(Popularity~ Salarymean, data=joinedlanguagecountandsalary)
model

## 
## Call:
## lm(formula = Popularity ~ Salarymean, data = joinedlanguagecountandsalary)
## 
## Coefficients:
## (Intercept)   Salarymean  
##  69660.1710      -0.6958

plot(joinedlanguagecountandsalary$Salarymean, joinedlanguagecountandsalary$Popularity,col = "green", main="The Relation Between Salary and Popularity-World")
abline (model, col="blue")
selectedw<-c(1,4,5,19)
text(joinedlanguagecountandsalary$Salarymean[selectedw], joinedlanguagecountandsalary$Popularity[selectedw], labels = joinedlanguagecountandsalary$LanguageWorkedWith[selectedw], cex = 0.6, pos = 4, col = "blue")

modelturkey<-lm(Popularity~ Salarymean, data=joinedlanguagecountandsalaryturkey)
modelturkey

## 
## Call:
## lm(formula = Popularity ~ Salarymean, data = joinedlanguagecountandsalaryturkey)
## 
## Coefficients:
## (Intercept)   Salarymean  
##  340.706673    -0.004613

plot(joinedlanguagecountandsalaryturkey$Salarymean, joinedlanguagecountandsalaryturkey$Popularity,col = "blue", main="The Relation Between Salary and Popularity-Turkey")
abline (modelturkey, col="red")
selected<-c(1,2,6,9,22)
text(joinedlanguagecountandsalaryturkey$Salarymean[selected], joinedlanguagecountandsalaryturkey$Popularity[selected], labels = joinedlanguagecountandsalaryturkey$LanguageWorkedWith[selected], cex = 0.6, pos = 4, col = "red")

Unlike our expectation, the relation between language popularity and salary is not strong. It shows that if one chooses to learn a programming language s/he also needs to evaluate the demand of language in the sector. It is acceptable that there is a relation but it is not strong.

Proje_Salary

emre

12/5/2021