laitimes

Georgia Institute of Technology posted: Don't be superstitious about explainability, be careful of being misled

Georgia Institute of Technology posted: Don't be superstitious about explainability, be careful of being misled

Compile the | Wang Ye

Proofreading | Yan Yan

Interpretability is critical to the development of AI, but it is equally important to understand the potential negative effects of interpretable systems in terms of their credibility.

Recently, the Georgia Institute of Technology research team published a new study that focuses on an important but unharmed negative effect in the interpretable artificial intelligence system (XAI).

Georgia Institute of Technology posted: Don't be superstitious about explainability, be careful of being misled

In this paper https://arxiv.org/pdf/2109.12480.pdf, the authors propose the concept of "interpretability traps (EPs)," pointing out that even if the designer did not initially manipulate the user's intentions, the interpretability of the model may have an unexpected negative impact, which is different from the dark patterns (DPs) with deliberate deceptive nature, but is related to it. This paper elaborates on the Eps concept through a case study and confirms that the negative effects of interpretation are inevitable, and finally the authors further propose specific coping strategies from the three levels of research, design and organization.

<h3>

</h3>

<h3></h3>

1

The "two sides" of interpretability

The development of an explainable and credible new generation of AI is increasingly important as AI is already widely used in high-stakes decision-making areas such as healthcare, finance, and criminal justice. To improve the security of AI, we need to open the black box of AI's internal operations and provide users with understandable explanations.

Current research on interpretable AI (XAI) has made commendable progress, but the latest research has found that the impact of these explanations is not necessarily positive, but may also have a negative impact on downstream tasks. For example, the modeler deliberately creates unreasonable explanations to make people trust the AI system, thereby concealing the risks it may bring. What's more, although the original intent of the model design was good, this negative impact also seems inevitable.

In this case, how do we distinguish between intentional and unintentional negative interpretations? And how to conceptualize intentional negative effects?

Georgia Institute of Technology posted: Don't be superstitious about explainability, be careful of being misled

The authors introduce the concept of "Explainability pitfalls" (EPs), pointing out that AI interpretations can mislead users into making decisions that are in the interest of third parties without their knowledge or defense. Users' trust in AI, overestimation of capabilities, and over-reliance on certain explanations are the main reasons why they are manipulated by "explainability" unconsciously.

The biggest difference between EPs and DPs is the difference in "intent" – DPs have a deliberately deceptive nature that does not take into account the interests of users. But EPs can also turn into dark mode by deliberately setting "pitfalls."

The concept of EPs is not the result of pure theoretical derivation, but is based on a great deal of practical work and experience. This work demonstrates that although there is no intent to deceive, there can indeed be unexpected negative effects in AI interpretation.

This article is not a comprehensive discussion of EPs, but rather a fundamental step in existing concepts and practices. The authors say that the concept of the interpretability trap is proposed to make people aware of untapped knowledge blind spots (the negative effects surrounding AI interpretation) and thus expand the design space of XAI systems.

2

Multi-intelligent "interpretation traps"

In the study, the authors investigated two different groups — people with and without an AI background — and how they see different types of AI explanations. Here are what users think about the three explanations generated by AI:

(1) Natural language with a valid reason (2) Natural language without a justifiable reason (3) Providing wordless numbers for the behavior of the agent

In the study, participants watched videos of three agents navigating in a continuous decision-making environment and provided qualitative and quantitative perceptual information — in an environment full of rolling boulders and flowing lava, retrieving the food they had to supply for trapped explorers.

The agent performs the "thinking" process by simply outputting the numeric Q value of the current state (Figure 1). The Q value represents the degree of trust the agent has in each action (not the "why" trust), and the participants are not told in advance of the meaning of these Q values, so they do not know which values correspond to which actions.

Georgia Institute of Technology posted: Don't be superstitious about explainability, be careful of being misled

Figure 1: Shows the agent navigating the task environment

The experiment found that both groups of participants blindly trusted numbers, but with different levels and reasons for trust. The author adopts the concept of "cognitive inspiration" in an attempt to understand the reasons behind it. They found that

For participants with ai backgrounds, the mere numbers that appear can spark heuristic thinking. They don't fully understand the logic behind agent decision-making, but they also associate mathematical notation with the thought process of logical algorithms. Interestingly, they also pitched the smartest AI to the "weirdest-behaving" agents, suggesting that they not only placed too much emphasis on numerical outcomes, but also saw "ill-meaningd" numbers as potentially actionable. "Operability" here refers to what people can do with this information in judging or predicting future behavior.

So, how operable is the agent in real-world scenarios? As previously emphasized, the Q value does not indicate the "reason" behind the decision. Aside from assessing the quality of existing actions, these figures don't have much operability. That is, participants developed excessive trust and misplaced assessments of agents.

For participants without an AI background, even the inability to understand complex numbers triggers heuristic reasoning, and in their view, the agent must be intelligent, and these numbers represent the unique language of the agent "mysterious and incomprehensible". It should be noted that this way of reasoning is different from the reasoning process of people with previous AI backgrounds, who hypothesized future operability (although there is a lack of comprehensibility at the moment).

As we have seen, unlabeled, incomprehensible numbers instead increase the trust and evaluation of agents by both groups. This case study shows that even without intent to deceive, EPs can have unanticipated results and mislead participants into relying too much on digital generation.

It should be emphasized that this case assumes that the "original intention" of the Q value is good, and if these numbers are manipulated, some people use these hidden dangers to maliciously design dark modes, which will mislead more people into over-trusting and incorrect perception of the system, given the user's heuristic trust in the numbers in the case.

3

What are the circumvention strategies?

In summary, the interpretability trap (EPs) has two characteristics: it only exists but does not necessarily harm downstream, and existing knowledge cannot predict when, how, and why an AI explanation will trigger unexpected negative downstream effects.

Based on the above two points, the author believes that although we are unlikely to completely eliminate the negative effects of interpretation, we need to be aware of the existence of "traps", understand when they are easy to appear, and how they operate, and formulate corresponding measures to prevent microaggressions. In this paper, the author proposes several strategies from the three interrelated levels of research, design and organization:

At the research level, more people-centred contextual and empirical research is conducted to gain a sophisticated understanding of different interpretations of different stakeholders in multiple dimensions. This is because when downstream effects, such as the user's perception of the AI's interpretation, manifest themselves and are identified. In the case above, users with different AI backgrounds trigger the same pitfalls (i.e., excessive trust in numbers), but with different patterns of inspiration.

In fact, based on this case, we can further explore from the two dimensions of user knowledge background and understanding differences: how do the combined characteristics of users (such as educational background and professional background) affect the susceptibility of EPs? How do different heuristics find adverse effects? How do different users adapt to unexpected interpretations? In these explorations, having a sense of pitfalls can help us gain insight into how people's responses to AI explanations deviate from the designers' intentions.

At the design level, an effective strategy is to reinforce the user's reflection (rather than blind acceptance) in the interpretation process. Recent people-centered XAI work also advocates conceptualization of approaches that promote trust through reflection. Langer et al. point out that if we do not think consciously and deliberately about the explanation, we increase the likelihood of falling into a "trap." To get people's attention, Langer et al. suggest designing "effortive responses" or "thoughtful reactions," which can employ the perspective of a stitched design to help improve attention. The seamed design complements the concept of "seamlessness" in computing systems, which has its roots in ubiquitous computing. The concept of seams fits well with XAI because: (a) AI systems are deployed in the seamful spaces space; (b) the approach can be seen as a response to "seamless" dark mode AI decisions, with "zero friction" or comprehension.

In terms of form and function, seams strategically reveal the complexity and connection mechanisms between different parts, while hiding distracting elements. This concept of "strategic revealing and concealment" is at the heart of seamful design because it connects form and function, and an understanding of this connection can promote reflective thinking." Thus, Seamful explanations strategically reveal the flaws and endurance of the system and mask those distracting messages that awareness of can lead to useful reflection.

At the organizational level, educational (training) programs are introduced for designers and end users. It's important to build an ecosystem because EPs have a social dimension of complexity and we need a strategy that goes beyond the technical level. Recent work has shown that literacy in dark patterns can promote self-reflection and harm reduction. EPs literacy programs can be developed as follows: (a) to help designers become aware of the possible manifestations of EPs; and (b) to improve the ability of end users to identify "pitfalls."

Overall, these strategies help us to actively prevent EPs and promote resilience to traps. While not exhaustive and prescriptive, it is an important step in addressing potentially harmful issues.

4

summary

From a safety and reliability perspective, it is important for XAI systems to classify the impact of AI interpretation. The study uses a discussion of the concept of "interpretability traps (EPs)" to uncover the unintended negative effects that AI interpretation can bring. The interpretation and insights of the operationalization and coping strategies of EPs in this article can help improve the accountability and security mechanisms of XAI systems.

Based on the findings of this study, the authors believe that there are some open-ended questions about XAI that merit further discussion:

1. How can an effective EPs taxonomy be developed to better identify and reduce negative impacts? 2. How can inappropriate explanations be used to illustrate the effects of "traps" in reality? 3. How to evaluate the training process to mitigate the possible effects of "traps"

Finally, the authors say that from human-computer interaction to the AI community, they are further investigating the interpretability pitfalls through basic concepts and applications. It is believed that by understanding the location, method and reasons of traps in XAI systems, the security of artificial intelligence systems can be significantly improved.

Georgia Institute of Technology posted: Don't be superstitious about explainability, be careful of being misled

Lei Feng network

Read on