天天看点

通过外部XAI技术解释的模型

How?

The literature makes a clear distinction among models that are interpretable by design, and those that can be explained by means of external XAI techniques. This duality could also be regarded as the difference between interpretable models and model interpretability techniques; a more widely accepted classification is that of transparent models and post-hoc explainability. This same duality also appears in the paper presented in [17] in which the distinction its authors make refers to the methods to solve the transparent box design problem against the problem of explaining the black-box problem. This work, further extends the distinction made among transparent models including the different levels of transparency considered.

Within transparency, three levels are contemplated: algorithmic transparency, decomposability and simulatability1. Among post-hoc techniques we may distinguish among text explanations, visualizations, local explanations, explanations by example, explanations by simplification and feature relevance. In this context, there is a broader distinction proposed by [24] discerning between 1) opaque systems, where the mappings from input to output are invisible to the user; 2) interpretable systems, in which users can mathematically analyze the mappings; and 3) comprehensible systems, in which the models should output symbols or rules along with their specific output to aid in the understanding process of the rationale behind the mappings being made. This last classification criterion could be considered included within the one proposed earlier, hence this paper will attempt at following the more specific one.

文献明确区分了通过设计可解释的模型和那些可以通过外部XAI技术解释的模型。这种二重性也可以被认为是可解释模型和模型可解释技术之间的区别;一个更广泛接受的分类是透明模型和事后可解释性。同样的对偶性也出现在[17]的文章中,其作者所做的区分是指解决透明盒设计问题的方法与解释黑盒问题的方法。这项工作进一步扩大了透明模式之间的区别,包括考虑的不同透明度水平。

透明度包括三个层次:算法透明度、可分解性和可模拟性1。在事后解释技术中,我们可以区分文本解释、可视化解释、局部解释、举例解释、简化解释和特征相关性。在这种情况下,[24]提出了一个更广泛的区分:1)不透明系统,其中从输入到输出的映射对用户是不可见的;2)可解释系统,用户可以用数学方法分析映射;3)可理解的系统,在这种系统中,模型应该输出符号或规则,以及它们的特定输出,以帮助理解映射背后的基本原理。最后一个分类标准可以被认为包含在前面提出的分类标准中,因此本文将尝试遵循更具体的分类标准。

2.5.1、Levels of Transparency in Machine Learning Models机器学习模型的透明度水平

Transparent models convey some degree of interpretability by themselves. Models belonging to this category can be also approached in terms of the domain in which they are interpretable, namely, algorithmic transparency, decomposability and simulatability. As we elaborate next in connection to Figure 3, each of these classes contains its predecessors, e.g. a simulatable model is at the same time a model that is decomposable and algorithmically transparent:

(1)、Simulatability denotes the ability of a model of being simulated or thought about strictly by a human, hence complexity takes a dominant place in this class. This being said, simple but extensive (i.e., with too large amount of rules) rule based systems fall out of this characteristic, whereas a single perceptron neural network falls within. This aspect aligns with the claim that sparse linear models are more interpretable than dense ones [170], and that an interpretable model is one that can be easily presented to a human by means of text and visualizations [32]. Again, endowing a decomposable model with simulatability requires that the model has to be self-contained enough for a human to think and reason about it as a whole.

(2)、Decomposability stands for the ability to explain each of the parts of a model (input, parameter and calculation). It can be considered as intelligibility as stated in [171]. This characteristic might empower the ability to understand, interpret or explain the behavior of a model. However, as occurs with algorithmic transparency, not every model can fulfill this property. Decomposability requires every input to be readily interpretable (e.g. cumbersome features will not fit the premise). The added constraint for an algorithmically transparent model to become decomposable is that every part of the model must be understandable by a human without the need for additional tools.

(3)、Algorithmic Transparency can be seen in different ways. It deals with the ability of the user to understand the process followed by the model to produce any given output from its input data. Put it differently, a linear model is deemed transparent because its error surface can be understood and reasoned about, allowing the user to understand how the model will act in every situation it may face [163]. Contrarily, it is not possible to understand it in deep architectures as the loss landscape might be opaque [172, 173] since it cannot be fully observed and the solution has to be approximated through heuristic optimization (e.g. through stochastic gradient descent). The main constraint for algorithmically transparent models is that the model has to be fully explorable by means of mathematical analysis and methods.

透明模型本身传达了一定程度的可解释性。属于这一类的模型也可以按照它们可解释的领域来处理,即算法透明性、可分解性和可模拟性。正如我们接下来在图3中详细阐述的那样,这些类中的每一个都包含它的前辈,例如,一个可模拟的模型同时是一个可分解的和算法透明的模型:

(1)、可模拟性是指一个模型被人严格模拟或思考的能力,因此复杂性在本课程中占主导地位。也就是说,简单但广泛的(即具有太多的规则)基于规则的系统不符合这一特征,而单个感知器神经网络则符合这一特征。这与稀疏线性模型比密集线性模型更具可解释性的观点一致[170],可解释性模型是指可以通过文本和可视化[32]方便地呈现给人类的模型。同样,赋予一个可分解的模型以可模拟性要求模型必须足够独立,使人类能够作为一个整体来思考和推理。

(2),可分解性代表对模型的每一部分(输入、参数和计算)的解释能力。它可以被认为是如[171]所述的可理解性。这种特性可能会增强理解、解释或解释模型行为的能力。然而,就像算法透明性一样,并不是每个模型都能满足这个属性。可分解性要求每个输入都是易于解释的(例如,繁琐的功能将不适合前提)。要使算法透明的模型变得可分解,附加的约束是模型的每个部分必须为人类所理解,而不需要额外的工具。

(3)算法透明度有不同的理解方式。它处理用户理解过程的能力,模型遵循该过程从输入数据产生任何给定的输出。换句话说,线性模型被认为是透明的,因为它的误差面可以被理解和推理,允许用户理解模型在它可能面临的每一种情况下的行为[163]。相反,在深度架构中不可能理解它,因为损失景观可能是不透明的[172,173],因为它不能完全观察到,并且必须通过启发式优化(例如通过随机梯度下降)来逼近解决方案。算法透明模型的主要约束是模型必须通过数学分析和方法完全可探索。

 Figure 3: Conceptual diagram exemplifying the different levels of transparency characterizing a ML model Mϕ, with ϕ denoting the parameter set of the model at hand: (a) simulatability; (b) decomposability; (c) algorithmic transparency. Without loss of generality, the example focuses on the ML model as the explanation target. However, other targets for explainability may include a given example, the output classes or the dataset itself.

图3:说明了ML模型Mϕ的不同透明度级别的概念图,其中n表示手边模型的参数集:(a)可模拟性;(b)可分解性;(c)算法的透明度。在不丧失一般性的情况下,该示例重点关注ML模型作为解释目标。然而,可解释性的其他目标可能包括给定的示例、输出类或数据集本身。

2.5.2、Post-hoc Explainability Techniques for Machine Learning Models机器学习模型的事后可解释性技术

Post-hoc explainability targets models that are not readily interpretable by design by resorting to diverse means to enhance their interpretability, such as text explanations, visual explanations, local explanations, explanations by example, explanations by simplification and feature relevance explanations techniques. Each of these techniques covers one of the most common ways humans explain systems and processes by themselves.

Further along this river, actual techniques, or better put, actual group of techniques are specified to ease the future work of any researcher that intends to look up for an specific technique that suits its knowledge. Not ending there, the classification also includes the type of data in which the techniques has been applied. Note that many techniques might be suitable for many different types of data, although the categorization only considers the type used by the authors that proposed such technique. Overall, post-hoc explainability techniques are divided first by the intention of the author (explanation technique e.g. Explanation by simplification), then, by the method utilized (actual technique e.g. sensitivity analysis) and finally by the type of data in which it was applied (e.g. images).

事后解释性针对设计上不易解释的模型,采用文本解释、视觉解释、局部解释、实例解释、简化解释和特征相关性解释技术等多种方法来增强模型的可解释性。每一种技术都涵盖了人类自己解释系统和过程的最常见方式之一。

沿着这条河,实际的技术,或者更好地说,实际的技术组被指定,以方便任何想要查找适合其知识的特定技术的研究人员的未来工作。不仅如此,分类还包括所应用技术的数据类型。请注意,许多技术可能适用于许多不同类型的数据,尽管分类只考虑提出这种技术的作者所使用的类型。总的来说,事后解释技术首先根据作者的意图进行划分(解释技术,如简化解释),然后根据所使用的方法(实际技术,如敏感性分析),最后根据所应用的数据类型(如图像)。

(1)、Text explanations deal with the problem of bringing explainability for a model by means of learning to generate text explanations that help explaining the results from the model [169]. Text explanations also include every method generating symbols that represent the functioning of the model. These symbols may portrait the rationale of the algorithm by means of a semantic mapping from model to symbols.

(2)、Visual explanation techniques for post-hoc explainability aim at visualizing the model’s behavior. Many of the visualization methods existing in the literature come along with dimensionality reduction techniques that allow for a human interpretable simple visualization. Visualizations may be coupled with other techniques to improve their understanding, and are considered as the most suitable way to introduce complex interactions within the variables involved in the model to users not acquainted to ML modeling.

(3)、Local explanations tackle explainability by segmenting the solution space and giving explanations to less complex solution subspaces that are relevant for the whole model. These explanations can be formed by means of techniques with the differentiating property that these only explain part of the whole system’s functioning.

(4)、Explanations by example consider the extraction of data examples that relate to the result generated by a certain model, enabling to get a better understanding of the model itself. Similarly to how humans behave when attempting to explain a given process, explanations by example are mainly centered in extracting representative examples that grasp the inner relationships and correlations found by the model being analyzed.

(5)、Explanations by simplification collectively denote those techniques in which a whole new system is rebuilt based on the trained model to be explained. This new, simplified model usually attempts at optimizing its resemblance to its antecedent functioning, while reducing its complexity, and keeping a similar performance score. An interesting byproduct of this family of post-hoc techniques is that the simplified model is, in general, easier to be implemented due to its reduced complexity with respect to the model it represents.

(6)、Finally, feature relevance explanation methods for post-hoc explainability clarify the inner functioning of a model by computing a relevance score for its managed variables. These scores quantify the affection (sensitivity) a feature has upon the output of the model. A comparison of the scores among different variables unveils the importance granted by the model to each of such variables when producing its output. Feature relevance methods can be thought to be an indirect method to explain a model.

(1)、文本解释通过学习生成文本解释来解决模型的可解释性问题,文本解释有助于解释模型的结果[169]。文本解释还包括生成代表模型功能的符号的每一种方法。这些符号可以通过从模型到符号的语义映射描绘算法的基本原理。

(2)、事后可解释性可视化解释技术旨在将模型的行为可视化。文献中存在的许多可视化方法都伴随着降维技术,这些技术允许人类解释简单的可视化。可视化可以与其他技术相结合来提高他们的理解,并且被认为是向不熟悉ML建模的用户引入模型中涉及的变量之间复杂交互的最合适的方法。

(3)、局部解释通过分割解空间,对与整个模型相关的不太复杂的解子空间进行解释来解决可解释性问题。这些解释可以通过具有微分性质的技术手段形成,这些技术只能解释整个系统功能的一部分。

(4)、通过实例解释考虑抽取与某一模型产生的结果相关的数据实例,可以更好地理解模型本身。与人类试图解释给定过程时的行为类似,通过示例进行解释主要集中在提取具有代表性的例子,这些例子掌握了被分析的模型所发现的内部关系和相关性。