1  Assignment 1

Published

October 18, 2022

1.1 About me

Hi, I’m Ugur Ozata. I’ve been working as Senior Data Analyst at Peak Games for 3,5 years. I’m in the data field since my third year in the university. Finding insights for real problems from the data encourages me to work in this field. Until now I showed what happended in the data, my future plan is to show what is going to happen from the data.

1.2 Wonderful Speech from RStudio Conf. 2022

Hi I chose “Websites & Books & Blogs, oh my! Creating Rich Content with Quarto” from Devin Pastoor. We’ll be using Quarto throughout the course 503 that is why i chose this to see real life example.

He tried bunch of things to create documentation or website such as hugodown, bookdown, blogdown and faces many troubles. In quarto, it is possible to both do documentation and create a website, beside initialize the project with one command line and support almost everything.

While quarto supports markdown, it can also work without markdown. We are all used to writing with Markdown. What about when we want to add a picture? We all google. In quarto, when we drag and drop the picture, it both uploads the picture and creates a folder directory for the picture. It is just that easy to do this.

In addition, it is very easy to reach the main config of the project and add something new in quarto. The whole config is in _quarto.yml. When we want to add an external framework, we first need to install it into our project and import it in a way that quarto can understand.

He wanted to create a blog tab first. He wrote the things below into .qmd file.

--- 
title: "myblog"
listing:
contents: posts
sort: "date desc"
type: default
categories: true
---

One of the blog posts, there was a pictures which was not clickable. He thought this is not professional way then he installed lightbox library then did the same thing as he did for blog, wrote the things below then all pictures in the page was clickable! It was that easy.

---
filters:
    - lightbox
lightbox: auto
---

He mentioned there are pretty much different templates to inspire. It can be reached from quarto.org/docs/gallery.

1.3 Couple of interesting R Posts

1.3.1 How to read&write CSV

Data seperator define the file type such as csv(comma),tsv(tab) etc… Most of time it will be csv or tsv, so example will be around them.

While the seperator changes in the file, function is same but parameter (sep) changes.

Let’s import dataset for both.

csv_data <- read.csv(file='mydata.csv',sep=',')
tsv_data <- read.csv(file='mydata.csv',sep='\t')

Let’s use header parameters.

All columns have its own name but only when it’s indicated in the dataset. If our dataset have column name, we set header parameters True, it means first row of file will be column names. Otherwise, set false.

csv_data <- read.csv(file='mydata.csv',sep=',',header=TRUE)
tsv_data <- read.csv(file='mydata.csv',sep='\t',header=TRUE)

Let’s write our data into csv or tsv. We have row.names parameters this time. DataFrame in R have its row number, if we dont want to keep this in our file, we should set row.names=FALSE

read.csv(file='mydata.csv',sep=',',row.names=FALSE)
read.csv(file='mydata.csv',sep='\t',row.names=FALSE)

For more details.

1.3.2 With and Within function

1.3.2.1 With Function

The with function makes the calculation from DataFrame.

Num <- c(100,100,100,100,100)
Cost <- c(1200,1300,1400,1500,1600)
data_A <- data.frame(Num,Cost,stringsAsFactors = FALSE)
print(with(data_A, Num*Cost))
[1] 120000 130000 140000 150000 160000
print(with(data_A, Num/Cost))
[1] 0.08333333 0.07692308 0.07142857 0.06666667 0.06250000

1.3.2.2 Within Function

The within function makes the calculation and write it into DataFrame.

Num <- c(100,100,100,100,100)
Cost <- c(1200,1300,1400,1500,1600)
data_A <- data.frame(Num,Cost,stringsAsFactors = FALSE)
within(data_A, product <- Num*Cost)
  Num Cost product
1 100 1200  120000
2 100 1300  130000
3 100 1400  140000
4 100 1500  150000
5 100 1600  160000
within(data_A, devide <- Num/Cost)
  Num Cost     devide
1 100 1200 0.08333333
2 100 1300 0.07692308
3 100 1400 0.07142857
4 100 1500 0.06666667
5 100 1600 0.06250000

For more details.

1.3.3 Summaris(z)e function from dplyr library

We find the calculation row by row above, what about making calcalation by groups? It’s the summarize and group_by function. NOTE: We’ll need dplyr library.

Let’s find average cost for Brands.


Brand <- c('Brand1','Brand1','Brand2','Brand2','Brand2')
Cost <- c(10,20,100,150,200)
data_A <- data.frame(Brand,Cost,stringsAsFactors = FALSE)
data_A %>%
  group_by(Brand) %>%
    summarize(mean = mean(Cost))

Result will be: 

Brand   mean
Brand1  15          
Brand2  150 

For more details.