Assignment 1

R Markdown Homework

I’m Buğra Balantekin. I graduated from Başkent University Political Science and Int’l Relations. I have a master degree in Int’l Trade and Finance. I’ve attended BilgeAdam .Net Software Developer course in 2019 which included C#, SQL, HTML,CSS,JS, ASP.NET MVC and created a website for myself (new features to be added soon). After working 8 years as specialist I decided to combine my business domain expertise with Data Science (especially machine learning algorithms, anomaly detection) and get into this growing field.

Website

useR! 2020: Very Easy Web Scraping with ralger

Instructor tells us how to create a web scraper with ralger that takes at least two arguments as input; web link and html/css elements. He uses boxofficemojo website for scraping. The first problem is what if the info you want to scrape is divided into several pages? In this situation you need to find a pattern in URL for pages. In his example every page increments with 200. There’s a package called glue which you can define the increment pattern in URL. Also you can use the table_scrap function in ralger, only if the page scraped has an HTML table. and There comes the second problem; what if the web page didn’t have HTML table (eg:IMDB website) tidy_scrap function does this by taking vector of HTML/CSS elements and it will extract a table.

Web Scraping with ralger

3 Topics

John Hopkins University - Data Science Specialization

My favorite online course about Data Science is offered by Coursera - John Hopkins University. It has 9 courses and a capstone project. Moreover free books of Roger Peng is widely accepted in field. Here is the list of courses;

The Data Scientist’s Toolbox
R Programming
Getting and Cleaning Data
Exploratory Data Analysis
Reproducible Research
Statistical Inference
Regression Models
Practical Machine Learning
Developing Data Products
Capstone

JHU-Data Science

Duke University - Statistics with R

Another online course from coursera is teaching statistics with R programming language offered by Duke University and tought by Mine Çetinkaya Rundel. Here’s the list of courses

Introduction to Probability and Data with R
Inferential Statistics
Linear Regression and Modeling
Bayesian Statistics
Capstone

Duke-Statistics

Datacamp - Anomaly Detection in R

This course looks like the best match with my expertise. It starts with statistical outlier detection and continues with distance and density based anomaly detection and isolation forests. Finally compares these methods performances.

Datacamp-Anomaly Detection