My name is Can Aytöre and I’m from Eskişehir. I’m interested in basketball and watching tennis. I like to travel and take photographs. I’m currently in a sailing team, participating sailing regettas on the weekends.
I graduated from Istanbul Technical University Industrial Engineering department in June 2019. I’ve started master’s degree in Industrial Engineering at Boğaziçi University last year in order to have more expertise on the areas of operations research, stochastic processes and data science. In next semester, I will be working on my master’s thesis.
I have been working in the marketing department of BSH company for more than a year now. In addition to the usual work of the marketing department, I’m working on the big data comes from market research companies and making meaningful analyzes and useful predictions. Of course, I use advanced statistical methods for that.
After my internships/work experiences, I’ve decided that I could work on the data science. Since data science attracts me, I pay attention to make my course choices and studies in this field. I will try to fill this personal journal carefully with this perspective.
As it is well-known that the missing data is nearly everywhere. That’s also true for time series. The reasons are usually manifold; the problem may occur in data recording, data transmission or data processing. There are some useful packages such as “imputeTS package” for overcome with missing values. With the feature of imputation, the missing values are replacing with reasonable values. With the feature of visualization, the missing data are easily detected. First step is to get first impression of where missing values are via visualization of NA distribution. Therefore, we can perfectly see where the missing values are located in time series. Once we’re done with doing the first analyzes, we can explore the imputation results. With the visualization, the replaced missing value in time series can be easily seen. Thus, we can easily check whether the fitted values are consistent or not. One more nice feature of this package is that it can be visualized with ggplot outputs. Since missing data might create problems for analyzing data, imputation method is seen as a good way to overcome. Besides, this practice can be said to be basically a simple application of machine learning.
Using visual elements such as graphics and maps to present data in a more understandable form is called data visualization. Today, the data visualization is very critical because it plays an important role in analyzing large amounts of information in big data world and making decisions based on this analysis. There are many ways to make data visualization. Scatter charts, line charts, bar charts, and histograms are the examples. The purpose of each tool is different. It is also important to use correct tool at the right time. For example, the scatter charts are used to understand the relationship between two numerical variables. Histograms are used to show the distribution of numerical values and are very significant in data science. Data visualization also facilitates the detection of outliers. Data visualization tools can be used in R programming language. Even more advanced packages are available, such as the ggplot2 package. As a result, data visualization is the fastest and most powerful method of understanding available information.
Arranging a series of objects so that those with similar properties are in the same group is called the cluster analysis. Unlike traditional methods, this process can be easily applied via statistical tools. Cluster analysis enables the subgroups in a data set to be found. There are many ways to do this analysis. The most common is hierarchical cluster analysis. Many methods can also be used while hierarchical clustering. For example, the closest neighbor approach can be used to form clusters from pairs that give the smallest distance by listing the distances from the smallest to the largest. Then their distances are compared with the remaining observations and this process ends with the matching of all objects. The dendrogram graphic is used to visualize such analyzes. Another method is K-means. In this analysis, it is already known how many clusters the data will be divided into. Before this, we can also know the optimum number of clusters with some methods. To sum up, the cluster analysis is a statistical method used in many fields such as data mining, image analysis and machine learning.
Network models exist almost everywhere in our lives. Traffic infrastructure, internet, food chains and artificial neural networks can be given as examples. However, in network models, we see that some nodes are more important than others. So what does the most important node mean and does it not differ? There are many ways to make sense of this. The first is degree centrality. This method finds the most connected node. The second is the closeness centrality. This method determines which node in the network can spread the flow fastest. It is found by summing the paths from one node to the other nodes. The most recent example of this is the super spreaders of highly contagious diseases such as COVID-19. The third method is betweenness centrality. In this method, it finds out which node is most important in maintaining the connection across the network. As it seems, there is more than one definition of “most important”. To sum up, the network analysis plays an important role in our understanding of relationships between objects.