天天看點

工作流挖掘:相關問題和方法的研究(4)相關的工作

2. Related work

The idea of process mining is not new [8,11,15–17,24–29,42–44,53–57,61–63]. Cook and Wolf have investigated similar issues in the context of software engineering processes. In [15] they describe three methods for process discovery: one using neural networks, one using a purely algorithmic approach, and one Markovian approach. The authors consider the latter two the most promising approaches. The purely algorithmic approach builds a finite state machine(FSM) where states are fused if their futures (in terms of possible behavior in the next k steps) are identical. The Markovian approach uses a mixture of algorithmic and statistical methods and is able to deal with noise. Note that the results presented in [6] are limited to sequential behavior. Cook and Wolf extend their work to concurrent processes in [16]. They propose specific metrics (entropy, event type counts, periodicity, and causality) and use these metrics to discover models out of event streams. However, they do not provide an approach to generate explicit process models. Recall that the final goal of the approach presented in this paper is to find explicit representations for a broad range of process models, i.e., we want to be able to generate a concrete Petri net rather than a set of dependency relations between events. In [17] Cook and Wolf provide a measure to quantify discrepancies between a process model and the actual behavior as registered using event-based data. The idea of applying process mining in the context of workflow management was first introduced in [11]. This work is based on workflow graphs, which are inspired by workflow products such as IBM MQSeries workflow (formerly known as Flowmark) and InConcert. In this paper, two problems are defined. The first problem is to find a workflow graph generating events appearing in a given workflow log. The second problem is to find the definitions of edge conditions. A concrete algorithm is given for tackling the first problem. The approach is quite different from other approaches: Because the nature of workflow graphs there is no need to identify the nature (AND or OR) of joins and splits. As shown in [37], workflow graphs use true and false tokens which do not allow for cyclic graphs. Nevertheless, [11] partially deals with iteration by enumerating all occurrences of a given task and then folding the graph. However, the resulting conformal graph is not a complete model. In [44], a tool based on these algorithms is presented. Schimm [53,54,57] has developed a mining tool suitable for discovering hierarchically structured workflow processes. This requires all splits and joins to be balanced. Herbst and Karagiannis also address the issue of process mining in the context of workflow management [24–29] using an inductive approach. The work presented in [27,29] is limited to sequential models. The approach described in [24–26,28] also allows for concurrency. It uses stochastic task graphs as an intermediate representation and it generates a workflow model described in the ADONIS modeling language. In the induction step task nodes are merged and split in order to discover the underlying process. A notable difference with other approaches is that the same task can appear multiple times in the workflow model. The graph 240 W.M.P. van der Aalst et al. / Data & Knowledge Engineering 47 (2003) 237–267 generation technique is similar to the approach of [11,44]. The nature of splits and joins (i.e., AND or OR) is discovered in the transformation step, where the stochastic task graph is transformed into an ADONIS workflow model with block-structured splits and joins. In contrast to the previous papers, the work in [8,42,43,61,62] is characterized by the focus on workflow processes with concurrent behavior (rather than adding ad hoc mechanisms to capture parallelism). In [61,62] a heuristic approach using rather simple metrics is used to construct so-called “dependency/frequency tables” and “dependency/frequency graphs”. In [42] another variant of this technique is presented using examples from the health-care domain. The preliminary results presented in [42,61,62] only provide heuristics and focus on issues such as noise. The approach described in [8] differs from these approaches in the sense that for the a algorithm it is proven that for certain subclasses it is possible to find the right workflow model. In [3] the a algorithm is extended to incorporate timing information.

2.    相關的工作

過程挖掘的想法并不新鮮(譯注:可于參考文獻8,11,15-17,24-29,42-44,53-57,61-63中查閱到相關論述)。庫克和沃爾夫已經在軟體工程過程領域就類似問題做過研究。在參考文獻[15]中他們描述了過程發現的3種方法:通過神經網路、單純的算法步驟以及馬爾科夫鍊方法。作者認為後兩種是最有前景的方法。純算法構造了一種有限狀态機(FSM),(根據之後的若幹步可能的行為動作)如果其預期是可以斷定的,它的狀态就是确定的。馬爾科夫鍊則是算術和統計方法的混合體,它能(有效地)處理“噪音”。在參考文獻[6]中對于連續行為提到的結論是有限的。在參考文獻[16]中, 庫克和沃爾夫将他們的研究擴充到了并發過程。他們提到了一些特殊的名詞(熵,事件類型統計,周期和因果關系)并且通過這些從事件流中發現一些模式。然而, 他們并沒有提供一種生成直接的過程模式的方法。重申一下,本文所展示的方法的最終目的是尋找更廣範圍内的過程模式的直接表示方法,即:我們想要并且能夠構 造一種具體的Petri網絡而不僅僅限于一組事件的依賴關系。在參考文獻[17]中,庫克和沃爾夫提供了一種度量方法(形式)以量化過程模型和利用事件為基礎的資料注冊的實際行為之間的差異。在參考文獻[11]中首次介紹了在工作流管理方面應用過程挖掘的想法。這一著作用到了像IBM MQSeries工作流(大家所熟知的Flowmark)和InConcert等工作流産品所推薦的工作流圖表。本文将就如下2個問題給出定義,第1個問題是通過一種工作流圖表去再現工作流日志中提到的事件的,第2個問題是定義邊界條件。(工作流挖掘)方法不同去其他方法:因為工作流圖表的初衷是沒必要識别結合和分離的特性(與或者或)的。正如參考文獻[37]所提到的,工作流圖表由真和假的分支構成并且不允許循環。進而,參考文獻[11]鐘通過列舉給定任務的所有可能性并折疊圖表的方式部分處理了這些反複(循環)。然而,最終形成的圖表并不是一種完整的模式。在參考文獻[44]中,提出了一種基于這些算法的工具。Schimm在參考文獻[53,54,57]中開發了一種适于各級工作流過程的挖掘工具。這就要求所有的分離和結合都是對等的,在參考文獻[24-29]中,Herbst和Karagiannis通過一種引導性的方法提出了在工作流過程中的過程挖掘問題。參考文獻[27,29]中的成果局限于連續性模式。參考文獻[24-26,28]中描述的方法也允許同時發生(使用)。

繼續閱讀