I’m Aybike Dilek, from Turkey. I graduated as valedictorian from Industrial Engineering department. During my undergraduate study, I tried to find the most appropriate field for me, while extending my knowledge. My interest in Big Data Analytics began after I took data analytics courses, such as Exploratory Data Analytics and Marketing Analytics, during my undergraduate education at MEF University. The courses had given me an analytics background and familiarity with the R programming language. Now, I am placed in Yemeksepeti, which is one of the biggest online food delivery companies in Turkey. I continue my learning journey as an analyst. The best part of my job is to analyze different types of data, which are produced and stored on different sources. I had a chance to work on different structured data types in detail. I used R to develop many kinds of statistical models using linear regression, clustering and prediction methods such as HoltWinters, ARIMA, seasonal naive, simple exponential smoothing. I like to work on real-life forecasting cases like stock prices, sales volumes.
I want to share a brief video about the automated error detection tools to detect errors at machine learning stages. It was presented by Helena Kotthaus from the Artificial Intelligence group at TU Dortmund, Germany. They work on debugging tools to support trustworthy machine learning. She highlighted that trustworthy machine learning is important for safety-critical areas as medical applications and explained that the main problem with machine learning errors is how they accumulate over time, making it hard to find a root cause. They had surveyed to understand which errors could be detected in the machine learning stage before they started developing automated error detection tools. The survey was web-based and the sample size was 85 from academia and industry. The question was “At which stage, in the data science project you observed the most serious errors?”. The stages of the machine learning pipelines were formulation, acquisition, preprocessing, exploration, modeling, reporting. The preprocessing stage was one of the most important for either. She shared that according to the results of the survey, they have focused on the preprocessing stage firstly. As a second step, she mentioned that they have tried to find how they could automatically detect errors in the preprocessing stage. She explained it with an example like SMV classification tasks need normalized data, so the debugger checks is the input data normalized or not. Also, the debugger checks the code, is there any normalization step or, is there any special ML function which provides default normalization. Therefore, the debugger checks the data statistically. She pointed out that to construct a useful debugger, they used several checkpoints instead of an expert one. She kindly shared that they need reproducible code examples to test their tool. If you want to contribute their tool, you can find her e-mail at the link.
I want to share a basic video about why we should use R in some cases instead of one of the most popular traditional analytic tools, which is Excel. The presenter started with a speed benchmark between R and Excel using the “vlookup” function. According to the volume of data, this calculation can take more than one hour in Excel, less than a second in R. Excel can handle thousands of records, R can handle millions of records. Excel can handle millions of records technically but if we add calculated column to the data set we can see the difference between Excel and R. He mentioned that VBA can be used to automate calculations but, it will be still 10 times longer than R. From the visualization perspective, he said that R looks like a Tableau. One of the differences between R and Tableau is licensing costs. Also, he shared that interactive visualization with R is no need for an account requirement as Tableau because you can export it in HTML. It means that somebody can open the file on a web browser. Additionally, he pointed out that R has collaboration with Github.
I want to share a simple video about an example of a forecasting model using the “prophet” package in R. The presenter explained that the “prophet” package is preferred most of the time for time series analysis in business analytics like sales forecasting or order volume, those have a lot of seasonality and trend. Also, he called the “tidyverse” library to load the data set. First, he loaded the data set, name with bitcoin. As he had called the “prophet” package, he changed the name of the date column in the data set as “ds” and the name of the price column as “y”. He mentioned that it is a requirement of the “prophet” package. Then, he called the “prophet” function to fit a model. The future value variable was created to extend the data set to be able to include future values using the “make_future_dataframe” function. Lastly, the forecast variable was created using the “predict” function. The presenter ploted the model estimations using the “dyplot” function. Thus, the actual and predicted values were shown together on the plot. As a result of the plot, he shared that the model isn’t perfect but it gets the trend pretty well.
I want to share a non-voice video about forecasting India Monthly Car Sales. The presenter called 3 libraries; prophet, forecast, ggplot2. “prophet” and “forecast” are forecasting packages. “ggplot2” is a data visualization package. First of all, he imported a CSV file then investigated data with some functions like “head”, “summary”, “str”. After investigation, he changed the date format with the “as.Date” function. He used the “aggregate” function to get monthly sales, then he created a new data frame. To modeling, he removed unnecessary columns from the data frame. He created a forecasting model with the “prophet” package. Then, used this model to forecast future value with the “predict” function. Lastly, he created a plot and investigated the predict and actual values. It was the prophet forecasting part of the video. Besides, he did the same analysis with the arima forecasting model. In arima part, he converted data into univariate time series using the “ts” function, he fitted an auto arima model. This time, he used the “forecast” function to predict future values. Lastly, he compared the predicted values using “ggplot2”.