I’m Buğra Balantekin. I graduated from Başkent University Political Science and Int’l Relations. I have a master degree in Int’l Trade and Finance. I’ve attended BilgeAdam .Net Software Developer course in 2019 which included C#, SQL, HTML,CSS,JS, ASP.NET MVC and created a website for myself (new features to be added soon). After working 8 years as specialist I decided to combine my business domain expertise with Data Science (especially machine learning algorithms, anomaly detection) and get into this growing field.
Instructor tells us how to create a web scraper with ralger that takes at least two arguments as input; web link and html/css elements. He uses boxofficemojo website for scraping. The first problem is what if the info you want to scrape is divided into several pages? In this situation you need to find a pattern in URL for pages. In his example every page increments with 200. There’s a package called glue which you can define the increment pattern in URL. Also you can use the table_scrap function in ralger, only if the page scraped has an HTML table. and There comes the second problem; what if the web page didn’t have HTML table (eg:IMDB website) tidy_scrap function does this by taking vector of HTML/CSS elements and it will extract a table.
My favorite online course about Data Science is offered by Coursera - John Hopkins University. It has 9 courses and a capstone project. Moreover free books of Roger Peng is widely accepted in field. Here is the list of courses;
Another online course from coursera is teaching statistics with R programming language offered by Duke University and tought by Mine Çetinkaya Rundel. Here’s the list of courses
This course looks like the best match with my expertise. It starts with statistical outlier detection and continues with distance and density based anomaly detection and isolation forests. Finally compares these methods performances.