In this section you can find summaries about different packages about machine learning in R that I am interested in.
mlr3automl - Automated Machine Learning in R
As the header points out, it is an automl package for R written by Alexander Hanf for his master thesis project in 2021. Automl packages basically automates machine learning workflows, such as preprocessing, model selection and hyperparameter tuning with very few lines of code. In this case, mlr3automl is an automl package for regression and classification analyses based on mlr3. This package is very simple to use, just like many other automl packages in Python. A basic usage exmaple is given below.
- Creating the automl model is the first step. Below is an example of creating a model. Parameters in AutoML represents different tuning arguments, but the main and necessary argument is ‘task’, which takes the training data as input. Whether you want to implement preprocessing or custom learner, this is the part where you need to define those arguments.
automl_model = AutoML(
task = train_tsk,
learner_list = c('regr.ranger', 'regr.lm'),
learner_timeout = 10,
runtime = 300)
automl_model$train()
- Predicting is the last step of basic usage. Just running the simple line of code below can give dependable results.
predictions = automl_model$predict(predict_tsk)
In summary, this package seems like a good example of how an automl package can be used in R and gives a general overview of how to use such a package basically. Further information about mlr3automl and video resource can be found below.
Robyn - Automated Marketing Mix Modeling open source code by Facebook
I have been dying to get my hands on an automated Marketing Mix Modeling(MMM) package for months now. MMM is basically a multiple regression modeling technique which focuses on explaining media investment effects on revenue. There are many things to consider when dealing with media investments, and Robyn takes most of them into account. Main difference Robyn has that it uses ridge regression to model. Because it uses ridge regression in conjunction with cross validation, it gives the user to ability to predict since the model generated has overfit protection. It also proves the user with simple one-pagers to let them understand the end model easily. Right now, it lacks automated model selection process, because MMM’s are generally case-specific. Maybe implementing a deep learning approach could generate optimum results, but it is only my opinion.
They just released this package and it is still in development I think. They are adding new features every week as far as I know. My projection is that such tools will increase overtime and will lead the industries and their clients into a more optimized future. That is why I think this tool is very important right now and because of that I am most eager to learn R and dive into this package right away.
- You can click here and here to find out more about Robyn.
caret - Classification and Regression Training package
I am currently engaged with PyCaret and its capabilities in the Python side, but getting to know about caret in R is one of my priorities. What caret package does is it streamlines the process for creating predictive models. In my opinion its strong suit is feature selection capabilities, since it contains many different methods for automating feature selection process. Basic functions of caret includes:
- pre-processing
- data splitting
- feature selection
- model tuning using resampling
- variable importance estimation
Pre-processing is the period where you get your dataset into great shape for training. caret takes care of this problem by letting you create dummy variables, identifying correlated predictors, centering and scaling the data, transforming predictors and much more.
After that you have to split your data into tran and test parts. But not every dataset can be treated the same way. caret allows you to split your data set accordingly by letting you choose if it is a time series data, simple regression or classification data or some complex data where you have to choose the importance of features.
Moving on you have classic model training and tuning phase where you select model training parameters and tune it with very simple line of codes. You can also customize the tuning process as well. In the end, you get to choose the best model for your applications.
After you select your model, you get to train your model using lots of different modeling techniques. Such techniques include but not limited to Bayesian Model, L1 and L2 Regularization, Linear Regression, Random Forest, Ridge Regression etc.
There are many things we can implement caret in to, but right now learning about a detailed feature selection library is very important for the projects I am working on.
SuperML - Unified Model Training in R
The main focus of this package is to create a unified approach in R using Python syntax for machine learning models. It mostly aims to help people who stich between R and Python frequently in oder to understand machine learning syntax differences. This package provides Python’s scikit-learn interface to cut this time down. It’s capabilities include Simple Regression, KNN Regression, SVM Regression, Ridge Regression, Randm Forest, Grid Search as well as handling binary classification datasets using similar approches.
- You can find more about SuperML from it’s CRAN and GitHub page.