I am Kutay Akalın. I graduated as an Industrial Engineer at ITU in 2017. After I finished the school, I started to work in Akdatasoft Software Company as a Project Engineer. Everyday data which surround all of us, grows exponentially. Because most of the data isn’t labelled, this growth causes infollution. If we want to make better decision in our personal or business life, we should process data properly. Thus, in my opinion, data science topics are very important for personal development in our current world.
useR! 2019 Toulouse - Social Science Marketing & Business - Gert Janssenswillen
In this video, Gert Janssenswillen explains the how using Bayesian inference and MCMC to enhance discovered process models.
First, he briefly explained the process mining. Process mining is sort of process-oriented data science which unites the Business Process Management and data science. Type of processes vary widely for companies, such as order-to-cash process, customer journeys etc. However, all processes have a common: they generate data which called event logs and these data has 3 main characteristics: Activity / Task, Case and Timestamp. Mr. Janssenswillen and his colleagues try to discover process models from this data. They use Business Process Model and Notation (BPMN) which is standard and it is easy to understand (Example). But there is a disadvantage of using BPMN. When process was defined, probabilities of elements cannot be discovered or learned. Thus, we cannot know the dependency between choices. Mr. Janssenswillen is trying to learn these probabilities and dependency between different elements of process with using Bayesian inference and MCMC.
He gave an example in video (Model Example). The data which he used is result of a multinomial distribution with four different probabilities and certain number of observations. He defined the choices in this model as Zeta 1 and Zeta 2 and different probabilities one for each sequence combinations as R1 and R2. Then, he set priors for Zeta 1 and Zeta 2. He used beta distribution with parameters 2 and 2. After he used the MCMC, he found that second choice is much more unbalanced than first choice and probabilities of elements in first choice approximately equal to 0.5.
Then he defined zeta 3 for analysing the dependency between first choice and second choice. And he found that Zeta 3’s probability is higher than Zeta 2. This means there is a dependency between first choice and second choice. But he noticed that the in the real-life process much more complex. When the choices are increased, sequence of choices increases exponentially and some of traces in our data can be outliers. For fix this problem, they use prefix tree in their package. They create prefix tree and add the escaping arcs which something can happen but we cannot see. Then they can define the probabilities for all splits in models. This is an automated construction of Bayesian modelling.
We can use this automated Bayesian modelling to find accurate process model for our data. This approach connects the data science and business process management field. This is available in package called “propro”. It takes the process data and process model as an input and does a Bayesian model. It is part of a R package called “bupaR”. It is providing the methods to load your event log and process models. Propro is adding statistical model on top of the process model on this package. (Schema)
Professionals in sales, project management, and other areas use business process modeling software to map out their approach to any specific process. Learn the essentials of BPMN and BPMN 2.0, along with the history, purpose, benefits, symbols, diagram types, and key tips for business process modeling.
In this blog difference between frequentist and bayesian statistic is explained. It includes 3 steps of sealing the gap between frequentist and bayesian statistic and gives examples in R .Then explains Bayesian & MCMC.
Main website for bupaR, R package. You can find lots of knowledge about how you can install and load library, event data model, common transformations, process map creating, performance dash board etc.
Document of bupaR. Very detailed and includes R topics.