I am Eren Melih Altun, 23 years old and working at Yves Rocher as Customer Analytics Specialist. I have a special interest in Customer Analytics and its Data Science applications. In addition to this, I worked in People Analytics projects too at Denizbank. I’m planning to have a deep expertise in Customer Analytics and after several years, I’ll dive into more advanced topics like NLP, Image recognition etc.
In this video, Phd. candidate M. El F. Ihaddaden explains about his package ralger which can be used for web scraping. It’s an easy package to retrieve information from websites. This package can help us to enrich our database, interpreting our research etc. In the package, scrap function takes two arguments: URL and the CSS node of the information. If we need information from multiple pages with the same structure, we can use glue function and find a pattern of each page. glue function will help us to create URLs for each page. Also, there is one more function which is table_scrap. It’s very useful, if we have an HTML table in URL. If we provide the link directly in that function, it creates a table immediately. If there is not an HTML table in the link, we can identify multiple CSS nodes and column names to create table. In a nutshell, this is a useful package to collect data for our researches or projects. It helps us to automate data collection process and enrich our database.
In this post, Susan Li explains the importance of Churn definition. She uses Empirical Cumulative Distribution Function (ECDF) and states that Churn Problem is an Anomaly Detection problem within Non-Contractual Business environment. By using ECDF, she says that 9 times out of 10, Customer X has Y days between purchase day. It means that if Customer X exceeds Y days from his/her last purchase, then it’s churn.
It is very important to make validation of a predictive model’s success. We can use Lift values to measure model’s success. In this post, there is a basic understanding of Lift charts. Author creates a decilewise lift chart to show us model’s predictive power. These lift charts can become very important when we’re running out of budget. We can target groups with high lift ratios to follow a cost effective approach.
Propensity to buy is a common problem in Data Science field, especially in CRM/Customer Analytics. In this post, author tries to find a target group that will buy their products. There are several different variables from their customer base and author makes a basic assumption with Correlation Test (as a feature selection method). After that, model training and validation part comes. Author builds a Naive Bayes Classifier which I think this is not an appropriate method because Naive Bayes Classifier make an assumption that each variable have the same importance. Author comes up with a model of 0.72 accuracy.