Let’s start with reading and save our big aggregated csv on our Environment. In this part of study, we examined the salaries and programming language relations.
library(gdata)
library(dplyr)
library(tidyverse)
Stackofbinded <- read.csv('C:\\Users\\esigi\\Downloads\\Stackofbinded.csv')
colnamesall<-colnames(Stackofbinded)
We always need column names throughout our works. So it is good to see it in our environment. Also, it always comes into question how to review data easily to test the reliability of data. Therefore I use an easy way to move our small data to excel by copying. I used this formula frequently when it comes to reviewing small amounts of data. Hence if there were some examples of them in our code, it is about to think about data frame and does not show anything in this study. here are the examples of library ‘clipr’
#library(clipr)
#write_clip()
Because of the size of our Data, I started to select the columns that we need. After selecting them it is always good to quick look at data with head and summary function.
SalaryofStack<-Stackofbinded[,c('Year','Salary','LanguageWorkedWith','Country','MainBranch','CompTotal','CompFreq','SalaryType','ConvertedComp', 'Currency')]
head(SalaryofStack)
## Year Salary LanguageWorkedWith Country
## 1 2017 NA Swift United States
## 2 2017 NA JavaScript; Python; Ruby; SQL United Kingdom
## 3 2017 113750 Java; PHP; Python United Kingdom
## 4 2017 NA Matlab; Python; R; SQL United States
## 5 2017 NA Switzerland
## 6 2017 NA JavaScript; PHP; Rust New Zealand
## MainBranch CompTotal CompFreq
## 1 Student NA
## 2 Student NA
## 3 Professional developer NA
## 4 Professional non-developer who sometimes writes code NA
## 5 Professional developer NA
## 6 Student NA
## SalaryType ConvertedComp Currency
## 1 NA
## 2 NA British pounds sterling (£)
## 3 NA British pounds sterling (£)
## 4 NA
## 5 NA
## 6 NA
summary(SalaryofStack)
## Year Salary LanguageWorkedWith Country
## Min. :2017 Min. : 0.000e+00 Length:387030 Length:387030
## 1st Qu.:2018 1st Qu.: 1.320e+04 Class :character Class :character
## Median :2019 Median : 5.000e+04 Mode :character Mode :character
## Mean :2019 Mean :1.582e+100
## 3rd Qu.:2020 3rd Qu.: 9.950e+04
## Max. :2021 Max. :1.000e+105
## NA's :323812
## MainBranch CompTotal CompFreq SalaryType
## Length:387030 Min. : 0.000e+00 Length:387030 Length:387030
## Class :character 1st Qu.: 1.850e+04 Class :character Class :character
## Mode :character Median : 6.500e+04 Mode :character Mode :character
## Mean :8.054e+241
## 3rd Qu.: 1.280e+05
## Max. :1.111e+247
## NA's :249076
## ConvertedComp Currency
## Min. : 0 Length:387030
## 1st Qu.: 25356 Class :character
## Median : 55562 Mode :character
## Mean : 112456
## 3rd Qu.: 97288
## Max. :45241312
## NA's :201905
After selecting columns, I started to clean and merge columns because although I had corrected the data while merging there were still different and problematic columns in our data frame. For instance, there were two expressions for the United States which is a very important country for our study.
Also, I added to the tables the average currency of 2017 because there was no column for converted salary currency for 2017. Later I will come back to this.
SalaryofStack2 <- SalaryofStack %>%
mutate(Freqnew = coalesce(CompFreq,SalaryType)) %>% mutate(Freqnew = ifelse(CompFreq == '' ,SalaryType,Freqnew)) %>%
mutate(Country = ifelse(Country == 'United States of America' ,'United States',Country))
SalaryofStack2$ConvertedComp <- as.numeric(SalaryofStack2$ConvertedComp)
#write_clip(SalaryofStack2 %>% distinct(ConvertedComp, Year))
library(readxl)
Currency2017<-read_xlsx("C:\\Users\\esigi\\Documents\\Attachments\\average_currency_2017.xlsx")
library(dplyr)
SalaryofStack3<-left_join(SalaryofStack2, Currency2017, by =c("Currency"="CurrencyType"))
library(psych)
describe(SalaryofStack3)
## vars n mean sd median trimmed
## Year 1 387030 2.019080e+03 1.350000e+00 2019 2019.10
## Salary 2 63218 1.581828e+100 3.977220e+102 50000 57002.76
## LanguageWorkedWith* 3 387030 4.929239e+04 3.021072e+04 47566 50722.15
## Country* 4 387030 1.452400e+02 7.706000e+01 152 148.89
## MainBranch* 5 387030 2.910000e+00 2.410000e+00 2 2.43
## CompTotal 6 137954 8.054215e+241 Inf 65000 79908.26
## CompFreq* 7 387030 1.810000e+00 1.170000e+00 1 1.64
## SalaryType* 8 387030 1.250000e+00 7.400000e-01 1 1.04
## ConvertedComp 9 185125 1.124556e+05 3.392271e+05 55562 61842.67
## Currency* 10 387030 3.372000e+01 5.277000e+01 1 22.73
## Freqnew* 11 387030 2.070000e+00 1.230000e+00 2 1.96
## Currency2017 12 71208 1.960000e+00 8.200000e+00 1 1.05
## mad min max range skew kurtosis
## Year 1.48 2017.00 2.021000e+03 4.000000e+00 0.06 -1.20
## Salary 59487.04 0.00 1.000000e+105 1.000000e+105 NaN NaN
## LanguageWorkedWith* 45395.73 1.00 8.847800e+04 8.847700e+04 -0.27 -1.29
## Country* 114.16 1.00 2.500000e+02 2.490000e+02 -0.10 -1.47
## MainBranch* 1.48 1.00 1.100000e+01 1.000000e+01 1.65 1.53
## CompTotal 78577.80 0.00 1.111111e+247 1.111111e+247 NaN NaN
## CompFreq* 0.00 1.00 4.000000e+00 3.000000e+00 1.10 -0.44
## SalaryType* 0.00 1.00 4.000000e+00 3.000000e+00 3.06 8.17
## ConvertedComp 50903.59 0.00 4.524131e+07 4.524131e+07 30.27 2494.12
## Currency* 0.00 1.00 1.690000e+02 1.680000e+02 1.45 0.61
## Freqnew* 1.48 1.00 4.000000e+00 3.000000e+00 0.70 -1.16
## Currency2017 0.19 0.02 1.121400e+02 1.121300e+02 12.49 163.58
## se
## Year 0.000000e+00
## Salary 1.581828e+100
## LanguageWorkedWith* 4.856000e+01
## Country* 1.200000e-01
## MainBranch* 0.000000e+00
## CompTotal Inf
## CompFreq* 0.000000e+00
## SalaryType* 0.000000e+00
## ConvertedComp 7.884200e+02
## Currency* 8.000000e-02
## Freqnew* 0.000000e+00
## Currency2017 3.000000e-02
Unfortunately, the calculations with currency do not show reliable results for 2017 when we compare the distribution of other years. Firstly I started with multiplying the salary with the currency mean of 2017.
SalaryofStack4 <- SalaryofStack3 %>% mutate(Finalsalary = ifelse((Year==2017), (Salary)*(Currency2017), ConvertedComp))
SalaryofStack4 <- select(SalaryofStack4, c(-CompFreq, -SalaryType, -Currency2017, -CompTotal, -Currency, -Salary , -Freqnew ))
SalaryofStack4 <- select(SalaryofStack4, c(-ConvertedComp))
str(SalaryofStack4)
## 'data.frame': 387030 obs. of 5 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ LanguageWorkedWith: chr "Swift" "JavaScript; Python; Ruby; SQL" "Java; PHP; Python" "Matlab; Python; R; SQL" ...
## $ Country : chr "United States" "United Kingdom" "United Kingdom" "United States" ...
## $ MainBranch : chr "Student" "Student" "Professional developer" "Professional non-developer who sometimes writes code" ...
## $ Finalsalary : num NA NA 146624 NA NA ...
SalaryofStack4<-SalaryofStack4 %>% filter(!is.na(Finalsalary))
SalaryofStack5<-SalaryofStack4 %>% filter(between(Finalsalary, quantile(Finalsalary, 0.05), quantile(Finalsalary, 0.95)))
hist(SalaryofStack5$Finalsalary)
SalaryofStack5$Finalsalary<-as.numeric(SalaryofStack5$Finalsalary)
SalaryofStack5plot2017<-SalaryofStack5 %>% filter(Year==2017)
SalaryofStack5plot2018<-SalaryofStack5 %>% filter(Year==2018)
SalaryofStack5plot2019<-SalaryofStack5 %>% filter(Year==2019)
plot(density(SalaryofStack5plot2017$Finalsalary),col='Blue')
lines(density(SalaryofStack5plot2018$Finalsalary),col='Red')
lines(density(SalaryofStack5plot2019$Finalsalary),col='Black')
lines(density(SalaryofStack5$Finalsalary),col='Orange')
title(sub="Year Distribution - Blue is 2017")
Unfortunately, again the method does not seem correct when we looked at the distribution. Therefore just looked at the salary data.
SalaryofStack4 <- SalaryofStack3 %>% mutate(Finalsalary = ifelse((Year==2017), (Salary), ConvertedComp))
SalaryofStack4 <- select(SalaryofStack4, c(-CompFreq, -SalaryType, -Currency2017, -CompTotal, -Currency, -Salary , -Freqnew ))
SalaryofStack4 <- select(SalaryofStack4, c(-ConvertedComp))
str(SalaryofStack4)
## 'data.frame': 387030 obs. of 5 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ LanguageWorkedWith: chr "Swift" "JavaScript; Python; Ruby; SQL" "Java; PHP; Python" "Matlab; Python; R; SQL" ...
## $ Country : chr "United States" "United Kingdom" "United Kingdom" "United States" ...
## $ MainBranch : chr "Student" "Student" "Professional developer" "Professional non-developer who sometimes writes code" ...
## $ Finalsalary : num NA NA 113750 NA NA ...
SalaryofStack4<-SalaryofStack4 %>% filter(!is.na(Finalsalary))
SalaryofStack5<-SalaryofStack4 %>% filter(between(Finalsalary, quantile(Finalsalary, 0.04), quantile(Finalsalary, 0.96)))
hist(SalaryofStack5$Finalsalary)
SalaryofStack5$Finalsalary<-as.numeric(SalaryofStack5$Finalsalary)
SalaryofStack5plot2017<-SalaryofStack5 %>% filter(Year==2017)
SalaryofStack5plot2018<-SalaryofStack5 %>% filter(Year==2018)
SalaryofStack5plot2019<-SalaryofStack5 %>% filter(Year==2019)
plot(density(SalaryofStack5plot2017$Finalsalary),col='Blue')
lines(density(SalaryofStack5plot2018$Finalsalary),col='Red')
lines(density(SalaryofStack5plot2019$Finalsalary),col='Black')
title(sub="Year Distribution - Blue is 2017")
If I do not multiply and just use the Salary column then the results are still cannot be dependable. As you can see from above and below two of the table blue one is show the salary distribution for 2017 and they are not compatible with other years and total.
Then, we changed the range of salary because we noticed that there is some missing and misleading answers in our data frame.
Due to lack of data information when we observed the distribution for 2017 cannot be reliable.
Hence we eliminated 2017’s salary data.
SalaryofStack5<-SalaryofStack5%>%filter(Year!=2017)
SalaryofStack5Turkey<-SalaryofStack5%>%filter(Country=="Turkey")
SalaryofStack5notTurkey<-SalaryofStack5%>%filter(Country!="Turkey")
SalaryofStack5USA<-SalaryofStack5%>%filter(Country=="United States")
SalaryofStack5Sweeden<-SalaryofStack5%>%filter(Country=="Sweden")
And group them for some countries.
Here we are looked at our data quickly to understand the frame.
plot(density(SalaryofStack5Turkey$Finalsalary),col='red')
lines(density(SalaryofStack5notTurkey$Finalsalary),col='blue')
SalaryofStack5Turkey$Cntry <- 'TR'
SalaryofStack5notTurkey$Cntry <- 'Other'
SalaryofStack5USA$Cntry<-'USA'
SalaryofStack5Sweeden$Cntry<-'Sweden'
compareturkey <- rbind(SalaryofStack5Turkey, SalaryofStack5notTurkey)
options(scipen = 5)
ggplot(compareturkey, aes(Finalsalary, fill = Cntry)) + geom_density(alpha = 0.6)+
scale_x_continuous(limits = c(10000, 200000)) +
labs(
caption = "density summary" )
Here is our result distribution that shows Turkey and other countries’ differences. When it comes to Salary Turkey is observed as below the world average. While Turkey’s salaries are intense in 12-33k intervals world is between 27 and 91 k.
Here are some other countries comparison from world and summary tables:
compareturkey <- rbind(SalaryofStack5Turkey, SalaryofStack5notTurkey, SalaryofStack5USA, SalaryofStack5Sweeden)
options(scipen = 5)
ggplot(compareturkey, aes(Finalsalary, fill = Cntry)) + geom_density(alpha = 0.5) +
scale_x_continuous(limits = c(10000, 250000)) +
labs(
caption = "density summary" )
sumother<-summary(SalaryofStack5notTurkey$Finalsalary)
sumtr<-summary(SalaryofStack5Turkey$Finalsalary)
sumusa<-summary(SalaryofStack5USA$Finalsalary)
sumsweeden<-summary(SalaryofStack5Sweeden$Finalsalary)
print("Other")
## [1] "Other"
sumother
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4020 27923 55776 68263 91788 400000
print("TR")
## [1] "TR"
sumtr
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4380 12816 20688 28414 33114 381468
print("USA")
## [1] "USA"
sumusa
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4050 79000 107500 116885 143000 400000
print("Sweeden")
## [1] "Sweeden"
sumsweeden
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4367 46893 56340 60512 67501 331400
Here we compared different programs and salaries and started to look and change our data.
Because of the fact that one programmer knows more than one language, I separated these persons depending on the language. It is assumed that languages that are known are the main cause of salaries.
It is always good to look at the histogram of data to see the distribution.
library(splitstackshape)
SalaryofStack5<-SalaryofStack5 %>% filter(!is.na(LanguageWorkedWith))
SalaryofStack5language <- trim(cSplit(SalaryofStack5, "LanguageWorkedWith", sep = ";", direction = "long"))
SalaryofStack5language<- SalaryofStack5language%>% mutate(LanguageWorkedWith=ifelse(LanguageWorkedWith=='Bash/Shell', 'Bash.Shell.PowerShell',LanguageWorkedWith))%>% mutate(LanguageWorkedWith=ifelse(LanguageWorkedWith=='Bash/Shell/PowerShell', 'Bash.Shell.PowerShell',LanguageWorkedWith))
SalaryofStack5languageTurkey<- SalaryofStack5language%>% filter(Country=="Turkey")
Because of the fast increase in USD-TL currency rate in 2018, there is a sharp decrease in salaries in Turkey for 2019. In 2021 the salaries are caching the previous year’s salary. But unfortunately, one of the most decreases in the value of TL has recently occurred in late 2021. Thus, again another decrease may be observed in the data of next year.
In line with the MEF Master’s program, I focused on Bash, R, and Python Languages and according to our studies Bash.Shell.Powershell is the winner of the comparison. Lack of R users shows fluctuation in salaries therefore it is hard to predict the salary power of R in Turkey. Python seems to more preferable in Turkey.
aggregatedlanguageturkey<-aggregate(SalaryofStack5languageTurkey[, Finalsalary], list(SalaryofStack5languageTurkey$LanguageWorkedWith), mean)
aggregatedlanguagecountturkey<-SalaryofStack5languageTurkey %>% count(LanguageWorkedWith, sort=TRUE)
joinedlanguagecountandsalaryturkey<-left_join(aggregatedlanguagecountturkey,aggregatedlanguageturkey, by=c("LanguageWorkedWith"="Group.1"))
aggregatedlanguageyearturkey<-aggregate(SalaryofStack5languageTurkey[, c('Finalsalary')], list(SalaryofStack5languageTurkey$LanguageWorkedWith,SalaryofStack5languageTurkey$Year), mean)
aggregatedlanguageyearfilterturkey<- filter(aggregatedlanguageyearturkey, Group.1 %in% c('SQL','C#', 'R', 'C+', 'Java', 'Python', 'Bash.Shell.PowerShell'))
library(ggplot2)
plotsmoothbylanguagepopulartr = ggplot(data=aggregatedlanguageyearfilterturkey, aes(x = Group.2))+
geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
scale_y_continuous(limits = c(10000, 90000))
plotsmoothbylanguagepopulartr
Now here, I also analyzed the data for the world because of the fact that there are more data that increase the dependability.
SalaryofStack5<-SalaryofStack5 %>% filter(!is.na(LanguageWorkedWith))
aggregatedcountry<-aggregate(SalaryofStack5language[, Finalsalary], list(SalaryofStack5language$Country), mean)
aggregatedlanguage<-aggregate(SalaryofStack5language[, Finalsalary], list(SalaryofStack5language$LanguageWorkedWith), mean)
aggregatedlanguagecount<-SalaryofStack5language %>% count(LanguageWorkedWith, sort=TRUE)
CNTR<-(count(SalaryofStack5language, Country)%>%filter(n>300))
SalaryofStack5languagecntr<- SalaryofStack5language %>% filter(Country==CNTR$Country)
aggregatedcountrYfianal<-aggregate(SalaryofStack5languagecntr[, Finalsalary], list(SalaryofStack5languagecntr$Country), mean)
aggregatedlanguageyear<-aggregate(SalaryofStack5language[, c('Finalsalary')], list(SalaryofStack5language$LanguageWorkedWith,SalaryofStack5language$Year), mean)
joinedlanguagecountandsalary<-left_join(aggregatedlanguagecount,aggregatedlanguage, by=c("LanguageWorkedWith"="Group.1"))
#These are the most popular languages and we choose them to show in our smooth graph.
aggregatedlanguageyearfilter<- filter(aggregatedlanguageyear, Group.1 %in% c('SQL','C#', 'R', 'C+', 'Java', 'Rust', 'Python', 'Ruby', 'Go'))
library(ggplot2)
plotsmoothbylanguage = ggplot(data=aggregatedlanguageyearfilter, aes(x = Group.2))+
geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
scale_y_continuous(limits = c(20000, 100000))
plotsmoothbylanguage
And lastly here are the codes that show the R Python, and Bash trend in recent years and a comparison for Turkey and World average.
RvsPythonvsBash<-filter(aggregatedlanguageyear, Group.1 %in% c("R", "Python", "Bash.Shell.PowerShell"))
library(ggplot2)
RvsPythonvsBashpl = ggplot(data=RvsPythonvsBash, aes(x = Group.2))+
geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
scale_y_continuous(limits = c(10000, 90000))
RvsPythonvsBashpl
RvsPythonvsBashtr<- filter(aggregatedlanguageyearfilterturkey, Group.1 %in% c("R", "Python", "Bash.Shell.PowerShell"))
library(ggplot2)
RvsPythonvsBashtrpl = ggplot(data=RvsPythonvsBashtr, aes(x = Group.2))+
geom_smooth(aes(y = Finalsalary, color=as.character(Group.1)))+
scale_y_continuous(limits = c(10000, 90000))
RvsPythonvsBashtrpl
This is the summary table of language popularity and salary mean in the world.
colnames(joinedlanguagecountandsalary)[2]<-"Popularity"
colnames(joinedlanguagecountandsalary)[3]<-"Salarymean"
joinedlanguagecountandsalary
## LanguageWorkedWith Popularity Salarymean
## 1: JavaScript 117802 66826.45
## 2: SQL 94637 67638.56
## 3: HTML/CSS 75284 65709.81
## 4: Python 68291 72674.63
## 5: Bash.Shell.PowerShell 63688 77876.20
## 6: Java 62619 65812.54
## 7: C# 55054 67776.55
## 8: TypeScript 45970 70445.32
## 9: PHP 41291 54252.80
## 10: C++ 32565 68485.41
## 11: HTML 29192 64552.90
## 12: CSS 27946 64236.07
## 13: C 27176 66728.40
## 14: Go 15899 88125.83
## 15: Ruby 15775 83792.09
## 16: Node.js 15520 69491.79
## 17: Kotlin 11499 67868.40
## 18: Swift 11004 72155.14
## 19: R 9019 72092.89
## 20: VBA 8631 64009.06
## 21: Objective-C 8330 76506.47
## 22: Assembly 7318 68419.79
## 23: Scala 6873 88505.13
## 24: Rust 6819 88018.26
## 25: PowerShell 5208 78610.54
## 26: Other(s): 4504 73026.90
## 27: Dart 4081 53631.01
## 28: Perl 4043 86117.73
## 29: Groovy 3865 84529.27
## 30: Matlab 3531 57844.66
## 31: VB.NET 2727 60814.22
## 32: Clojure 2426 97417.12
## 33: Haskell 2234 75775.44
## 34: Elixir 1766 88175.37
## 35: F# 1731 89884.60
## 36: CoffeeScript 1640 76239.11
## 37: Lua 1438 71582.51
## 38: Visual Basic 6 1414 58467.32
## 39: Erlang 1349 85019.62
## 40: Delphi 989 60054.49
## 41: Delphi/Object Pascal 984 58305.77
## 42: Julia 846 75958.85
## 43: WebAssembly 508 85174.32
## 44: LISP 484 88985.20
## 45: Cobol 269 66765.23
## 46: Crystal 226 86323.80
## 47: Ocaml 207 79481.40
## 48: COBOL 206 65016.13
## 49: APL 136 99375.16
## 50: Hack 94 90458.07
## LanguageWorkedWith Popularity Salarymean
This is the summary table of language popularity and salary mean in Turkey.
colnames(joinedlanguagecountandsalaryturkey)[2]<-"Popularity"
colnames(joinedlanguagecountandsalaryturkey)[3]<-"Salarymean"
joinedlanguagecountandsalaryturkey
## LanguageWorkedWith Popularity Salarymean
## 1: JavaScript 1208 28905.71
## 2: SQL 947 29178.99
## 3: Java 776 28440.70
## 4: HTML/CSS 770 27128.99
## 5: C# 763 28964.94
## 6: Python 605 28569.65
## 7: PHP 477 27118.58
## 8: TypeScript 404 30157.50
## 9: Bash.Shell.PowerShell 400 33211.27
## 10: C++ 392 26902.76
## 11: C 354 28874.39
## 12: HTML 216 32096.10
## 13: CSS 202 32446.61
## 14: Node.js 187 37128.46
## 15: Swift 171 28023.57
## 16: Go 147 36519.37
## 17: Kotlin 136 28897.93
## 18: Objective-C 119 34511.23
## 19: Assembly 106 30348.26
## 20: Dart 85 25966.73
## 21: Ruby 80 34374.11
## 22: R 63 27843.52
## 23: Matlab 61 29383.56
## 24: VBA 59 29819.02
## 25: Scala 54 28292.89
## 26: PowerShell 38 39685.50
## 27: Other(s): 33 24224.09
## 28: Delphi/Object Pascal 29 37093.31
## 29: Rust 28 26838.75
## 30: VB.NET 27 33991.74
## 31: Delphi 26 30476.88
## 32: Perl 25 60944.24
## 33: Groovy 21 57199.24
## 34: Visual Basic 6 19 34337.26
## 35: CoffeeScript 11 48831.09
## 36: Elixir 9 27928.00
## 37: Erlang 8 27076.25
## 38: Haskell 7 36338.86
## 39: LISP 7 28532.00
## 40: Cobol 6 57510.00
## 41: F# 6 27086.00
## 42: Julia 6 29179.67
## 43: Clojure 5 59613.60
## 44: Lua 4 91491.00
## 45: WebAssembly 4 14649.00
## 46: COBOL 3 30025.00
## 47: APL 2 21822.00
## 48: Crystal 2 22686.00
## 49: Hack 1 23844.00
## LanguageWorkedWith Popularity Salarymean
And this is the model that shows the regression popularity and salary. This is the data that shows the regression between language popularity and salary. We expect that if the language is well known than the salary is lower.
model<-lm(Popularity~ Salarymean, data=joinedlanguagecountandsalary)
model
##
## Call:
## lm(formula = Popularity ~ Salarymean, data = joinedlanguagecountandsalary)
##
## Coefficients:
## (Intercept) Salarymean
## 69660.1710 -0.6958
plot(joinedlanguagecountandsalary$Salarymean, joinedlanguagecountandsalary$Popularity,col = "green", main="The Relation Between Salary and Popularity-World")
abline (model, col="blue")
selectedw<-c(1,4,5,19)
text(joinedlanguagecountandsalary$Salarymean[selectedw], joinedlanguagecountandsalary$Popularity[selectedw], labels = joinedlanguagecountandsalary$LanguageWorkedWith[selectedw], cex = 0.6, pos = 4, col = "blue")
modelturkey<-lm(Popularity~ Salarymean, data=joinedlanguagecountandsalaryturkey)
modelturkey
##
## Call:
## lm(formula = Popularity ~ Salarymean, data = joinedlanguagecountandsalaryturkey)
##
## Coefficients:
## (Intercept) Salarymean
## 340.706673 -0.004613
plot(joinedlanguagecountandsalaryturkey$Salarymean, joinedlanguagecountandsalaryturkey$Popularity,col = "blue", main="The Relation Between Salary and Popularity-Turkey")
abline (modelturkey, col="red")
selected<-c(1,2,6,9,22)
text(joinedlanguagecountandsalaryturkey$Salarymean[selected], joinedlanguagecountandsalaryturkey$Popularity[selected], labels = joinedlanguagecountandsalaryturkey$LanguageWorkedWith[selected], cex = 0.6, pos = 4, col = "red")
Unlike our expectation, the relation between language popularity and salary is not strong. It shows that if one chooses to learn a programming language s/he also needs to evaluate the demand of language in the sector. It is acceptable that there is a relation but it is not strong.