1. 使用 RMarkdown 的 child 参数,进行文档拼接。
  2. 这样拼接以后的笔记方便复习。
  3. 相关问题提交到 Issue

参考 Ferreira (2017)

这本书是非常基础的,并且研究的方法也是局部的。

1 Process Model

Process mining is concerned with a different goal: the aim of process mining is to take advantage of event data in order to understand how an organization works. For example, with process mining it is possible to discover the sequence of tasks that are performed in a certain business process, and also the interactions that take place between the participants in that process.

流程挖掘是针对事件数据的,并且研究 sequence 的。

An activity is a unit of work that makes sense from a business point of view. For example, creating a purchase order or approving a purchase request are examples of what could be seen as activities in a purchase process.

activity 就是一单位的事件。

a representation of the process that is usually called a process model. It is a graphical, step-by-step description of the process in a form that is similar to a flowchart.

通常这种流程图就是 process model。

The instances of this process may have different behaviors. For example, suppose that there two purchase requests, where one is approved and the other is not. Then these two instances will follow different paths in the model of Fig. 1: the first will go through activity d and the second will end up in activity c.

不同的路径表示的就是一个 instance。

Even if two instances follow the same path, there may be differences in their behavior. For example, since activity g is in parallel with e and f , it may happen that g is performed before e and f in one instance, and in another instance it is performed after e and f , or even in between them.

behavior 不等同于 instance。同一个 behavior 下可以有不同的 instance。

2 Task Allocation

Each instance of these activities will have to be performed by someone. We refer to each of these activity instances as a task.

这里 instance 等价于 task。

It appears that this distribution is somewhat unbalanced, and this may be due to several reasons, such as the length or complexity of each task, the responsibilities of each user, the fact that some users may not be available at a certain moment, etc. There may be some level of unpredictability in the allocation of work. Namely, it is possible to discover the workload assigned to each user.

这类似于资产回收的按量预估工作。

3 unique identifier

In the context of process mining, the identifier of a process instance is usually called the case id. The reason for this is that, in many business scenarios, each process instance is referred to as a case, so the term case id became popular.

case id 算是一个专有名词,类似于标、分期的单位。

4 event log

For the purpose of process mining, each event that is recorded in an event log should contain at least the following information:

  1. a case id, which identifies the process instance;
  2. a task name, which identifies the activity that has been performed;
  3. a user name, which identifies the participant who performed the task;
  4. a timestamp, which indicates the date and time when the task was completed

5 control-flow perspective

5.1 Transition Matrix and Edge Thickness

If we have \(N\) activities, then there are \(N^2\) possible transitions between these activities. For example, with three activities \({a; b; c}\) there are nine possible transitions, namely:

\[a\to a; a \to b; a \to c; b \to a; b \to b; b \to c; c \to a; c \to b; c \to c\]

传递矩阵,就是查看量量发生的频次,每种情况都是穷举出来的。

case_id activity user_id time
1 a u1 2016-04-09 17:36:47
1 b u3 2016-04-11 09:11:13
1 d u6 2016-04-12 10:00:12
1 e u7 2016-04-12 18:21:32
1 f u8 2016-04-13 13:27:41
2 a u2 2016-04-14 08:56:09
2 b u3 2016-04-14 09:36:02
2 d u5 2016-04-15 10:16:40
1 g u6 2016-04-18 19:14:14
2 g u6 2016-04-19 15:39:15
1 h u2 2016-04-19 16:48:16
2 e u7 2016-04-20 14:39:45
2 f u8 2016-04-22 09:16:16
3 a u2 2016-04-25 08:39:24
2 h u1 2016-04-26 12:19:46
3 b u4 2016-04-29 10:56:14
3 c u1 2016-04-30 15:41:22
activity_left activity_right transition_n
a b 3
b c 1
b d 2
d e 1
d g 1
e f 2
f g 1
f h 1
g e 1
g h 1

我们就可以根据此,进行画图了,就可以出流程图 (Flowchart),也就是 process model。

1 这就是 control-flow algorithm,控制流公式。 注意这里的 label 都是频次。

这里对 edge 进行宽度调整 (Thickness) 可以直观发觉最粗的 instance,也就是频次最高的情况。

路径发现 a -> b -> d  -> g -> e -> f  -> h

Figure 5.1: 路径发现 a -> b -> d -> g -> e -> f -> h

当然宽度不能过大过小,因此进行标准化。

\[y=y_{\min }+\left(y_{\max }-y_{\min }\right) \frac{x-x_{\min }}{x_{\max }-x_{\min }}\]

  1. \(y_{\max }\)代表的是我们希望最粗程度,比如5
  2. \(y_{\min }\)代表的是我们希望最细程度,比如1
  3. \(x\)是原本的频次

Figure 5.2: 路径发现 a -> b -> d -> g -> e -> f -> h

5.2 Activity Count and Node Coloring

但是还是不直观,因为当节点变多时,也就是 activity 种类多时,这种 edge 的美化不明显了,因此考虑计入 activity count 并且做渐变色的强调区分。

activity activity_n color
a 3 grey0
b 3 grey0
c 1 grey100
d 2 grey50
e 2 grey50
f 2 grey50
g 2 grey50
h 2 grey50

\[y=\frac{x_{\max }-x}{x_{\max }-x_{\min }} \times 100\]

6 Organizational Perspective

The organizational perspective includes different kinds of analysis which are related to the participants in a business process.

组织视角是去查看流程参与者的绩效。

6.2 Working Together

这里不是 handover,而是看合作次数。

Note that the second loop starts from iC1. This ensures that j > i and therefore we are iterating through all pairs of users (ui; uj) with j>i, as desired.

这里不要取用重复数据。

In Graphviz, an undirected graph is defined with the keyword graph instead of digraph. In addition, the edges in an undirected graphare defined with a double dash (--) rather than with an arrow (->).

7 Performance Perspective

The performance perspective is concerned mainly with time. Examples of interesting time measurements are the average time it takes to perform an activity, the maximum time it takes for the process to reach a certain point, or the average end to-end duration of each process instance.

考察的是时间上的表现,和 control-flow 没有本质区别。

附录

参考文献

Ferreira, Diogo. R. 2017. Organizational Perspective. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-56427-2_3.