Algorithms are the foundation of computer programming and an essential part of data science. In this article, we simply explain the five most popular algorithms used in process mining.
What are process mining algorithms?
Process mining algorithms are sets of mathematical rules used to discover process models from business systems using data mining techniques. Process mining algorithms allow you to map the true state of business processes, identify bottlenecks and efficiencies, and improve your business processes in a data-driven way.
These algorithms form the foundation of process mining software. The five most popular process mining algorithms include Alpha Miner, Heuristic Miner, Fuzzy Miner, Inductive Miner, and Genetic Miner.
1. Alpha Miner
The Alpha Miner (or α-algorithm, α-miner) connects event logs or observed data and the discovery of a process model. This algorithm was the first process discovery algorithm developed and put forward by Dr. Wil van der Aalst, Dr. Ton Weijters, and Dr. Laura Măruşter.
How the alpha miner works in process mining
The alpha miner algorithm uses event logs as its data source. It starts transforming the event logs into direct-follows, sequence, parallel, and choice relations, and then using these to create a petri net that describes the process model. In simple terms, it creates a timestamped flow of business processes that can be visualized.
A Petri net is a graphical and mathematical tool used to model and visualize concurrent systems. Source: Wikipedia.
Today the alpha miner algorithm and its variations can be used widely in process mining applications, for example, in process discovery and in conformance checking.
2. Heuristic Miner
The second popular process mining algorithm, the Heuristic Miner, was developed by Dr. Ton Weijters to address some of the key limitations of the Alpha Miner. In computer science, a heuristic is a technique designed for solving a problem more quickly by finding an approximate solution when classic algorithms look for an exact solution. Heuristic algorithms are popular in artificial intelligence, where you have large amounts of data and the ability to infer good enough answers based on machine learning.
How the Heuristic Miner works in process mining
Like the Alpha Miner algorithm, the heuristic miner uses a directly followed graph to show the sequence of business processes based on event logs. The key difference is that the Heuristic Miner applies filtering to reduce the noise, or meaningless or incomplete event log data, to provide flow charts that are less accurate but more robust than the Alpha Miner.
Example of a causal net used in heuristic miners. Source: r-project.org.
Heuristic mining algorithms use a representation called a causal net to map out all the time-space history of different activities. Then, they look at how frequently different events take place and create a process model that excludes the most infrequent paths into the visualized model. The end result works well in complex data environments, for example, process mining for particularly high-volume processes.
3. Fuzzy Miner
The Fuzzy Miner is a third core process mining algorithm suitable for mining less structured processes. It was developed by Christian W. Günther and aims to take some of the heaviest data-crunching out of process mining by focusing on what the user is looking to discover and analyze.
How the Fuzzy Miner works in process mining
The Fuzzy Miner uses significance/correlation metrics to interactively simplify the process model at desired level of detail. In simple terms, it does the right level of data mining based on where the user is looking. If the user looks into more detail, the model will include more details. When the user looks at the high-level view, the model is clustered and becomes “fuzzier.”
Example of fuzzy mining using ProM tool. Source: tue.nl
4. Inductive Miner
The Inductive Miner is another common algorithm used in process mining to discover process models from event logs. This technique relies on the idea of cutting event logs into smaller sub-logs called cuts or splits and then detecting various cuts on the directly followed graphs created using the event logs. The main advantage of the Inductive Miner is its flexibility and scalability.
How the Inductive Miner works in process mining
Inductive Miners’ unique aspect is the methodology of discovering various divisions in the directly followed graph and using the smaller components after division to represent the execution sequence of the activities. The Inductive Miner algorithm iteratively explores the space of possible process models and is able to detect a wide range of process structures, from linear to more complex models with concurrency, loops, and or-branches.
Inductive Miner example in process mining. Source: S.J.J. Leemans slideserve.com
5. Genetic Miner
The Genetic Miner derives its name from biology and works in a similar way as natural selection. It works by using a genetic algorithm to search a space of possible process models to identify the most likely process model. The Genetic Miner can be seen as an evolutionary approach that involves mutating and combining process models to search for better ones.
How the Genetic Miner works in process mining
The Genetic Miner algorithm evaluates each process model and uses selection, crossover and mutation operations to generate new process models. The process models are evaluated and the fittest model is chosen as the final process model. The Genetic Miner is able to identify process models with multiple variants and is able to detect complex process structures, such as loops and concurrency.
Visualization of how the Genetic Miner works in process mining Source: mlwiki.org.
Process mining in Python quick tutorial
A very simplified example is that you can access and use open-source process mining algorithms quickly with the Python programming language.
To perform process mining in Python, you can use the pm4py library, which provides various functionalities for process discovery, conformance checking, and process enhancement. Here’s a simple step-by-step guide to get started with process mining using the pm4py library:
Step 1: Install pm4py
First, you need to install the pm4py library. You can do this using pip:
bashCopy code
pip install pm4py
Step 2: Import required libraries
Next, import the required libraries in your Python script:
pythonCopy code
import pm4py
from pm4py.objects.log.importer.xes import importer as xes_importer
from pm4py.algo.discovery.alpha import algorithm as alpha_miner
from pm4py.visualization.petrinet import visualizer as pn_visualizer
Step 3: Load the event log
Load the event log data using the appropriate importer. In this example, we will use an XES event log file:
pythonCopy code
event_log_file = “path/to/your/event_log.xes”
log = xes_importer.apply(event_log_file)
You can also create an event log from a CSV file using the pm4py library. Here’s an example:
pythonCopy code
import pandas as pd
from pm4py.objects.conversion.log import converter as log_converter
from pm4py.objects.log.util import dataframe_utils
csv_file = “path/to/your/csv_file.csv”
dataframe = pd.read_csv(csv_file)
dataframe = dataframe_utils.convert_timestamp_columns_in_df(dataframe)
dataframe = dataframe.sort_values(“timestamp_column_name”)
log = log_converter.apply(dataframe)
Step 4: Apply a process discovery algorithm
Apply a process discovery algorithm to the event log data to extract the process model. In this example, we will use the Alpha Miner algorithm:
pythonCopy code
net, initial_marking, final_marking = alpha_miner.apply(log)
You can also try other process discovery algorithms like the Inductive Miner or Heuristics Miner, which are available in the pm4py library.
Step 5: Visualize the process model
Visualize the discovered process model using the Petri net visualizer:
pythonCopy code
gviz = pn_visualizer.apply(net, initial_marking, final_marking)
pn_visualizer.view(gviz)
This will display the Petri net visualization of the discovered process model.
Step 6: Perform additional analysis (optional)
You can use the pm4py library to perform additional analyses like conformance checking, bottleneck analysis, or performance analysis. Explore the library’s documentation and examples to learn more about these functionalities.
By following these steps, you can perform basic process mining in Python using the pm4py library. For more advanced use cases and customizations, refer to the official pm4py documentation and examples:
Official documentation: https://pm4py.fit.fraunhofer.de/documentation
Examples: https://github.com/pm4py/pm4py-core/tree/stable/examples
The above example is an oversimplification of the needs of most enterprise leaders. The reality of developing and applying process analytics is often a lot more complicated, so increasingly businesses are opting for dedicated solutions from vendors.
No-code alternative to process mining
If you’re looking for an effortless alternative to process mining, you could be interested in Process Intelligence. It’s a hybrid approach that combines elements of task mining and process mining without the need for data science or integration hassle. For more information, read the latest whitepaper.
Process mining Q&A:
1. What is process mining?
Process mining is a technique that analyzes event logs to create visual process models, providing valuable insights into the current state of a business process and identifying areas for improvement.
2. What’s the difference between process mining and task mining?
Process mining and task mining both provide insights relevant to business process management, but they work in slightly different ways. Process mining gathers data from event logs in enterprise source systems, while task mining gathers information from the user interface of workstations.
3. Is process mining suitable for all industries?
While process mining can be applied across various industries, it is particularly beneficial for businesses with complex processes and high volumes of data. These organizations can leverage process mining to gain valuable insights into their processes and drive significant improvements in efficiency, cost savings, and customer satisfaction.
4. How is process mining different from business intelligence?
Process mining is a subset of business intelligence where you combine BI methodologies and data science techniques to business process management.