Assignment 1 - RMarkdown Homework

About Me

My name is Emre Kemal Yurderi. I graduated from 9 Eylul University in 2014 with a degree in Statistics. My career started as a human resources specialist in Allianz Insurance. I worked in Budgeting, Reporting and Compensation & Benefits departments for 4 years. Now, I work in Ekol Logistics for almost a year, in the Organizational Development department. I believe, working in the human resources improved me a lot. I had chance to getting know the other departments that what makes a company and their processes as well. And I also believe, It is time to take a step again, getting out of comfort zone and moving my career to a different direction.

Here is my Linkedin profile

Besides my professional life, I am from Izmir but I live in Istanbul for almost 6 years. I am married for 1.5 years. We have a cat, named Rico. You can see from these pictures how funny and happy boy he is.

Even though I don’t have much time nowadays, I like playing guitar. If you would like to check out, I have a couple of cover videos on Youtube (Let me warn you, they are not good)

useR2018: Does normalizing your data affect outlier detection?

by Sevvandi Kandanaarachchi

An investigation of triangular relationship between datasets, normalization methods and outlier detection methods.

When we start to analyze a dataset, there are some basic steps that we follow at the very beginning. Such as; calculating min-max, mean, median, checking the distribution, looking for missing values etc. Detecting the outliers is one of these steps too. There are different ways to find out if there are outliers in our data or not. Results can be different by which method that we use and also when we use.

At some point, we might need to normalize our columns to get more appropriate outputs. However, some of the outlier detection applications can be affected by the normalization techniques.

According to the research that conducted by Sevvandi and her team, different normalization techniques have different effects on the outlier detection methods. Sevandi’s team had investigated more than 12.000 datasets and they proved that various normalization techniques may have big impact on the outlier detection. A brief conclusion of their presentation is, do not stuck on only one normalization method. Always check the other methods’ findings and decide based on the dataset’s structure.

For further information, please click on heading.

5 Ways to Subset a Data Frame in R

by Douglas E Rice, published on 29.11.2016 in R bloggers

This article took my attention as a newbie Python user and an amateur data science learner. Because when I study on a data, I always have trouble with subsetting at some point. It is getting better project by project but whenever conditions’ complexity increase, it makes subsetting more complicated. In this article, the author explains 5 basic ways to subset a data in R.

First way; slicing method. You can select the rows and columns you would like to print by square brackets. For instance; [c(4:6), 2]
Second way; again slicing method but reverse. Instead of selecting the rows and columns we would like to print, we select the unwanted ones. When we add “-” sign before the vector we assign, R actually ignores the selected rows and columns. Lets assume we have 10 rows and 3 columns in our data. If we will write [-c(1:3, 7:10), -c(1,3)] that would give us the rows from 4 to 7 and only second column.
Third way; by which() function. We must use square brackets as we use them in slicing, because which function gives us the indices that suit to the condition. The given example in the article: education[which(education$Region == 2),names(education) %in% c("State","Minor.Population","Education.Expenditures")]. It means, print the “State”, “Minor Population”, “Education Expenditures” columns, but only for the rows that Region value equals to 2.
Fourth way; by subset() function. Subset has 3 arguments. subset(data, condition, select). Data is our data, that we would like to subset. Condition, filters the rows by a rule. Select, selects the columns that will be printed. The given example in the article: subset(education, Region == 2, select = c("State", "Minor.Population", "Education.Expenditures")) As in the third way, Region == 2 defines the rule and it will return true only for the rows that has value of 2 in the Region column.
Fifth way; dplyr package. We need to use 2 functions. filter() and select(). Filter has 2 arguments. filter(dataframe, condition). Select has 2 arguments as well. select(dataframe, columns). If we combine them, we are able to subset a dataset according to a condition that we wish. The given example in the article: select(filter(education, Region == 2),c(State,Minor.Population:Education.Expenditures)). At first, filter function gives us a dataframe that only have “2” in the “Region” column, then select function selects 3 columns’ values from given dataframe by filter function.

For further information, please click on heading.

Trump Got COVID and Twitter Is on Fire

by Almog Simchon, published on 04.10.2020 in almogsi.com

When the president of United States, Donald Trump, announced that he and his wife infected with COVID-19, it was on the headlines all around the world. The author of this article was curious about what was happening in the place that Trump really enjoys to be in. Which is Twitter. He analyzed that, if most of the Twitter reactions were about supporting or gloating. According to a note in the article, Twitter does not provide the accounts, who likes a tweet. Therefore Mr. Simchon dealt with accounts who retweeted Trump’s announcement. He has used rtweet package to make himself a sample. Rtweet function only provides 100 users for each running, but by the aid of a loop, a couple of thousand sample size was available to make the research. The next step was categorizing the ideology of users. In terms of this, Mr. Simchon followed Barberà method. A brief explanation of this method is, there are some accounts that the ideological tendencies are already known, such as; political figures, newspapers etc. The algorithm uses these “flagged” accounts that followed by the users in the sample. There were 753 accounts from the sample with known ideology after running the algorithm. The density of 753 accounts’ estimated ideology has drawn as follows:

The negative values in the x-axis represents liberal views and the positive values represents conservative views. The conclusion of the article is, since there is a right-leaning plot, most of the retweets was intent of support to the president. However, it may also say, most of the gloating were happened in the subtweets as well.

For further information, please click on heading.

Why Do I Have a Data Science Blog? 7 Benefits of Sharing Your Code

by Antoine Soetewey, published on 02.09.2020 in statsandr.com

This article is not quiet about R, but an R blogger’s thoughts about benefits of writing a blog to his self-improvement.

Learn By Writing

As we already know from the school, before an exam, if you could make your friend learn the lesson, it means you have learned that lesson very well. It is pretty much same for a blog. If you wish to write and pass your knowledge about a topic to other people, first you must study your lesson and learn it well.

Get Feedback

Since you publish your work and codes, there will always be other coders who give you constructive feedback, correct your typos or improve your works.

Personal Note to Remind My Future Self

Finding a code chunk that you have written before, among your work can be like finding a needle in a haystack. However blogs generally organized by topics, which makes your work much easier in future.

Contribute to The Open Source Community

I think this is the most honorable reason for writing a blog. All of us learn a new language or a function mostly from other blogs, tutorials or forums etc. Sharing your own works with others for free is like paying back to the community.

Stay Humble, Stay Curious

As the author mentioned in the #1, writing a blog needs a lot of studying. The more you study other people’s works, the more you realize there are a lot of great scientist in the world.

Learn to Be Less Perfectionist and to Prioritize

Sometimes there are more added value works can be done in a time that you spend to make your writing grammarly more accurate. It doesn’t mean grammar or punctuation is not important, but an opportunity for adding a great value of the work should be more prior than the making writing perfect.

Build Connections and Professional Relationships

Sharing your works may open new, unpredicted doors in your professional life. You can keep in touch with the other people who have same interests as yours and you may have chance to work with them in future.

For further information, please click on heading.

Assignment 1 - RMarkdown Homework

Emre Kemal Yurderi

Created 11.10.2020 / Last edited 16.10.2020

About Me

useR2018: Does normalizing your data affect outlier detection?

by Sevvandi Kandanaarachchi

5 Ways to Subset a Data Frame in R

by Douglas E Rice, published on 29.11.2016 in R bloggers

Trump Got COVID and Twitter Is on Fire

by Almog Simchon, published on 04.10.2020 in almogsi.com

Why Do I Have a Data Science Blog? 7 Benefits of Sharing Your Code

by Antoine Soetewey, published on 02.09.2020 in statsandr.com