How?
The literature makes a clear distinction among models that are interpretable by design, and those that can be explained by means of external XAI techniques. This duality could also be regarded as the difference between interpretable models and model interpretability techniques; a more widely accepted classification is that of transparent models and post-hoc explainability. This same duality also appears in the paper presented in [17] in which the distinction its authors make refers to the methods to solve the transparent box design problem against the problem of explaining the black-box problem. This work, further extends the distinction made among transparent models including the different levels of transparency considered.
Within transparency, three levels are contemplated: algorithmic transparency, decomposability and simulatability1. Among post-hoc techniques we may distinguish among text explanations, visualizations, local explanations, explanations by example, explanations by simplification and feature relevance. In this context, there is a broader distinction proposed by [24] discerning between 1) opaque systems, where the mappings from input to output are invisible to the user; 2) interpretable systems, in which users can mathematically analyze the mappings; and 3) comprehensible systems, in which the models should output symbols or rules along with their specific output to aid in the understanding process of the rationale behind the mappings being made. This last classification criterion could be considered included within the one proposed earlier, hence this paper will attempt at following the more specific one.
文獻明确區分了通過設計可解釋的模型和那些可以通過外部XAI技術解釋的模型。這種二重性也可以被認為是可解釋模型和模型可解釋技術之間的差別;一個更廣泛接受的分類是透明模型和事後可解釋性。同樣的對偶性也出現在[17]的文章中,其作者所做的區分是指解決透明盒設計問題的方法與解釋黑盒問題的方法。這項工作進一步擴大了透明模式之間的差別,包括考慮的不同透明度水準。
透明度包括三個層次:算法透明度、可分解性和可模拟性1。在事後解釋技術中,我們可以區分文本解釋、可視化解釋、局部解釋、舉例解釋、簡化解釋和特征相關性。在這種情況下,[24]提出了一個更廣泛的區分:1)不透明系統,其中從輸入到輸出的映射對使用者是不可見的;2)可解釋系統,使用者可以用數學方法分析映射;3)可了解的系統,在這種系統中,模型應該輸出符号或規則,以及它們的特定輸出,以幫助了解映射背後的基本原理。最後一個分類标準可以被認為包含在前面提出的分類标準中,是以本文将嘗試遵循更具體的分類标準。
2.5.1、Levels of Transparency in Machine Learning Models機器學習模型的透明度水準
Transparent models convey some degree of interpretability by themselves. Models belonging to this category can be also approached in terms of the domain in which they are interpretable, namely, algorithmic transparency, decomposability and simulatability. As we elaborate next in connection to Figure 3, each of these classes contains its predecessors, e.g. a simulatable model is at the same time a model that is decomposable and algorithmically transparent:
(1)、Simulatability denotes the ability of a model of being simulated or thought about strictly by a human, hence complexity takes a dominant place in this class. This being said, simple but extensive (i.e., with too large amount of rules) rule based systems fall out of this characteristic, whereas a single perceptron neural network falls within. This aspect aligns with the claim that sparse linear models are more interpretable than dense ones [170], and that an interpretable model is one that can be easily presented to a human by means of text and visualizations [32]. Again, endowing a decomposable model with simulatability requires that the model has to be self-contained enough for a human to think and reason about it as a whole.
(2)、Decomposability stands for the ability to explain each of the parts of a model (input, parameter and calculation). It can be considered as intelligibility as stated in [171]. This characteristic might empower the ability to understand, interpret or explain the behavior of a model. However, as occurs with algorithmic transparency, not every model can fulfill this property. Decomposability requires every input to be readily interpretable (e.g. cumbersome features will not fit the premise). The added constraint for an algorithmically transparent model to become decomposable is that every part of the model must be understandable by a human without the need for additional tools.
(3)、Algorithmic Transparency can be seen in different ways. It deals with the ability of the user to understand the process followed by the model to produce any given output from its input data. Put it differently, a linear model is deemed transparent because its error surface can be understood and reasoned about, allowing the user to understand how the model will act in every situation it may face [163]. Contrarily, it is not possible to understand it in deep architectures as the loss landscape might be opaque [172, 173] since it cannot be fully observed and the solution has to be approximated through heuristic optimization (e.g. through stochastic gradient descent). The main constraint for algorithmically transparent models is that the model has to be fully explorable by means of mathematical analysis and methods.
透明模型本身傳達了一定程度的可解釋性。屬于這一類的模型也可以按照它們可解釋的領域來處理,即算法透明性、可分解性和可模拟性。正如我們接下來在圖3中詳細闡述的那樣,這些類中的每一個都包含它的前輩,例如,一個可模拟的模型同時是一個可分解的和算法透明的模型:
(1)、可模拟性是指一個模型被人嚴格模拟或思考的能力,是以複雜性在本課程中占主導地位。也就是說,簡單但廣泛的(即具有太多的規則)基于規則的系統不符合這一特征,而單個感覺器神經網絡則符合這一特征。這與稀疏線性模型比密集線性模型更具可解釋性的觀點一緻[170],可解釋性模型是指可以通過文本和可視化[32]友善地呈現給人類的模型。同樣,賦予一個可分解的模型以可模拟性要求模型必須足夠獨立,使人類能夠作為一個整體來思考和推理。
(2),可分解性代表對模型的每一部分(輸入、參數和計算)的解釋能力。它可以被認為是如[171]所述的可了解性。這種特性可能會增強了解、解釋或解釋模型行為的能力。然而,就像算法透明性一樣,并不是每個模型都能滿足這個屬性。可分解性要求每個輸入都是易于解釋的(例如,繁瑣的功能将不适合前提)。要使算法透明的模型變得可分解,附加的限制是模型的每個部分必須為人類所了解,而不需要額外的工具。
(3)算法透明度有不同的了解方式。它處理使用者了解過程的能力,模型遵循該過程從輸入資料産生任何給定的輸出。換句話說,線性模型被認為是透明的,因為它的誤差面可以被了解和推理,允許使用者了解模型在它可能面臨的每一種情況下的行為[163]。相反,在深度架構中不可能了解它,因為損失景觀可能是不透明的[172,173],因為它不能完全觀察到,并且必須通過啟發式優化(例如通過随機梯度下降)來逼近解決方案。算法透明模型的主要限制是模型必須通過數學分析和方法完全可探索。
Figure 3: Conceptual diagram exemplifying the different levels of transparency characterizing a ML model Mϕ, with ϕ denoting the parameter set of the model at hand: (a) simulatability; (b) decomposability; (c) algorithmic transparency. Without loss of generality, the example focuses on the ML model as the explanation target. However, other targets for explainability may include a given example, the output classes or the dataset itself.
圖3:說明了ML模型Mϕ的不同透明度級别的概念圖,其中n表示手邊模型的參數集:(a)可模拟性;(b)可分解性;(c)算法的透明度。在不喪失一般性的情況下,該示例重點關注ML模型作為解釋目标。然而,可解釋性的其他目标可能包括給定的示例、輸出類或資料集本身。
2.5.2、Post-hoc Explainability Techniques for Machine Learning Models機器學習模型的事後可解釋性技術
Post-hoc explainability targets models that are not readily interpretable by design by resorting to diverse means to enhance their interpretability, such as text explanations, visual explanations, local explanations, explanations by example, explanations by simplification and feature relevance explanations techniques. Each of these techniques covers one of the most common ways humans explain systems and processes by themselves.
Further along this river, actual techniques, or better put, actual group of techniques are specified to ease the future work of any researcher that intends to look up for an specific technique that suits its knowledge. Not ending there, the classification also includes the type of data in which the techniques has been applied. Note that many techniques might be suitable for many different types of data, although the categorization only considers the type used by the authors that proposed such technique. Overall, post-hoc explainability techniques are divided first by the intention of the author (explanation technique e.g. Explanation by simplification), then, by the method utilized (actual technique e.g. sensitivity analysis) and finally by the type of data in which it was applied (e.g. images).
事後解釋性針對設計上不易解釋的模型,采用文本解釋、視覺解釋、局部解釋、執行個體解釋、簡化解釋和特征相關性解釋技術等多種方法來增強模型的可解釋性。每一種技術都涵蓋了人類自己解釋系統和過程的最常見方式之一。
沿着這條河,實際的技術,或者更好地說,實際的技術組被指定,以友善任何想要查找适合其知識的特定技術的研究人員的未來工作。不僅如此,分類還包括所應用技術的資料類型。請注意,許多技術可能适用于許多不同類型的資料,盡管分類隻考慮提出這種技術的作者所使用的類型。總的來說,事後解釋技術首先根據作者的意圖進行劃分(解釋技術,如簡化解釋),然後根據所使用的方法(實際技術,如敏感性分析),最後根據所應用的資料類型(如圖像)。
(1)、Text explanations deal with the problem of bringing explainability for a model by means of learning to generate text explanations that help explaining the results from the model [169]. Text explanations also include every method generating symbols that represent the functioning of the model. These symbols may portrait the rationale of the algorithm by means of a semantic mapping from model to symbols.
(2)、Visual explanation techniques for post-hoc explainability aim at visualizing the model’s behavior. Many of the visualization methods existing in the literature come along with dimensionality reduction techniques that allow for a human interpretable simple visualization. Visualizations may be coupled with other techniques to improve their understanding, and are considered as the most suitable way to introduce complex interactions within the variables involved in the model to users not acquainted to ML modeling.
(3)、Local explanations tackle explainability by segmenting the solution space and giving explanations to less complex solution subspaces that are relevant for the whole model. These explanations can be formed by means of techniques with the differentiating property that these only explain part of the whole system’s functioning.
(4)、Explanations by example consider the extraction of data examples that relate to the result generated by a certain model, enabling to get a better understanding of the model itself. Similarly to how humans behave when attempting to explain a given process, explanations by example are mainly centered in extracting representative examples that grasp the inner relationships and correlations found by the model being analyzed.
(5)、Explanations by simplification collectively denote those techniques in which a whole new system is rebuilt based on the trained model to be explained. This new, simplified model usually attempts at optimizing its resemblance to its antecedent functioning, while reducing its complexity, and keeping a similar performance score. An interesting byproduct of this family of post-hoc techniques is that the simplified model is, in general, easier to be implemented due to its reduced complexity with respect to the model it represents.
(6)、Finally, feature relevance explanation methods for post-hoc explainability clarify the inner functioning of a model by computing a relevance score for its managed variables. These scores quantify the affection (sensitivity) a feature has upon the output of the model. A comparison of the scores among different variables unveils the importance granted by the model to each of such variables when producing its output. Feature relevance methods can be thought to be an indirect method to explain a model.
(1)、文本解釋通過學習生成文本解釋來解決模型的可解釋性問題,文本解釋有助于解釋模型的結果[169]。文本解釋還包括生成代表模型功能的符号的每一種方法。這些符号可以通過從模型到符号的語義映射描繪算法的基本原理。
(2)、事後可解釋性可視化解釋技術旨在将模型的行為可視化。文獻中存在的許多可視化方法都伴随着降維技術,這些技術允許人類解釋簡單的可視化。可視化可以與其他技術相結合來提高他們的了解,并且被認為是向不熟悉ML模組化的使用者引入模型中涉及的變量之間複雜互動的最合适的方法。
(3)、局部解釋通過分割解空間,對與整個模型相關的不太複雜的解子空間進行解釋來解決可解釋性問題。這些解釋可以通過具有微分性質的技術手段形成,這些技術隻能解釋整個系統功能的一部分。
(4)、通過執行個體解釋考慮抽取與某一模型産生的結果相關的資料執行個體,可以更好地了解模型本身。與人類試圖解釋給定過程時的行為類似,通過示例進行解釋主要集中在提取具有代表性的例子,這些例子掌握了被分析的模型所發現的内部關系和相關性。