laitimes

The analysis can explain the application of artificial intelligence technology in the military

author:Yuanting Defense

Abstract: Recently, due to the advancement of artificial intelligence (AI) and machine learning, especially deep learning, the field of explainable artificial intelligence (XAI) research has received extensive attention. This area of research in XAI focuses on ensuring that the reasoning and decisions of AI systems can be understood by human users. In the military field, explainability usually guarantees the following: the AI system running by human users has the appropriate mental model; Experts can gain insights and knowledge from AI systems and their implicit tactical and strategic behavior; AI systems comply with international and national laws; Developers are able to identify flaws or errors in AI systems before deployment. Based on the report "Exploring Explainable Artificial Intelligence Techniques in Military Deep Learning Applications" by the Swedish National Defense Research Institute, this paper argues that such AI systems are inherently difficult to understand because the modeling process is too complex to use other interpretable alternatives. Although the XAI field of deep learning is still in its infancy, a number of interpretive techniques have emerged. Current XAI technology is mainly used for development purposes, such as identifying errors. However, more research should be done if these technologies can create suitable mental models for the AI systems used by users, develop tactics, and ensure that future military AI systems comply with national and international laws. Based on the report, this article will introduce XAI technology and its application in the military.

Keywords: artificial intelligence, explainable artificial intelligence, deep learning

The main reason for the success of artificial intelligence (AI) today is the breakthrough in machine learning (ML), more precisely, the breakthrough in deep learning (DL). Deep learning is a potentially disruptive technology that enables complex modeling that cannot be done with traditional techniques using deep neural networks. For example, deep learning can be used for accurate transcription (speech-to-text), translation (text-to-text), real-time strategy games (image-to-action), lip-reading (image-to-text), facial recognition (image-to-recognition), and controlling autonomous vehicles (image-to-action).

However, because deep learning is still in its early stages of development and there is no mathematical framework that can guarantee the accuracy of the model, there are bound to be many challenges and problems when developing, deploying, operating and maintaining military neural network models, requiring people to constantly think and find solutions.

For military personnel such as warfighters and data analysts, perhaps the biggest challenge is interpretability. As a rule of thumb, the need for explainability increases greatly if actions affect human life. Explainability is important because it affects the user's trust and dependence on the system. The trust relationship must maintain a certain balance, too high trust will lead to misuse of the system, too low trust, the system will not function. Ultimately, explanations are intended to help users build appropriate mental models of the system to ensure that the system is used effectively.

Deep learning has the potential to improve autonomy for complex military systems such as fighter jets, submarines, drones, and satellite surveillance systems, but it could also make these systems more complex and difficult to interpret. The main reason is that deep learning is an "end-to-end" machine learning technique, where machines learn to extract the most important features from the input data to achieve high performance. This process is different from the traditional technique of manually extracting features through intuition and is called representation learning. Representation learning often leads to high performance, but it also requires highly expressive and nonlinear features of the model. Therefore, deep neural networks trained using deep learning may contain millions or even billions of parameters, and even if people have a deep understanding of the algorithm, model architecture, training data, etc., it is difficult to interpret these models.

The Defense Advanced Research Projects Agency (DARPA) launched the Explainable Artificial Intelligence (XAI) program in 2016, which aims to: 1. generate more interpretable models while maintaining a high level of learning performance (prediction accuracy); Enable human users to understand, trust, and effectively manage the next generation of AI tools. Since the launch of the project, several technological advances have been made. Some XAI technologies have been packaged into software libraries and run up. Military personnel can use these software libraries to gain insight into deep neural networks while eliminating their errors and validating them. There's nothing wrong with this step in the general direction, but from a military perspective, tailoring XAI techniques and tools for military users is equally critical, which requires a high level of interpretation.

XAI technology

XAI is a key component of any military's high-risk decision-making AI system that will have an impact on human life. Take tactical-grade AI applications that focus on short-term decision-making, for example, and the capabilities of such AI include autonomous control of unmanned vehicles and target recognition, tracking, and strike capabilities for weapons and surveillance systems. XAI is equally important at the operational and strategic levels of warfare, where long-term decision-making and planning activities can affect all of humanity. At the operational and strategic levels, AI systems are often used for information analysis, but also simulations to propose plans or courses of action. The main roles of XAI in military systems include:

  • Mental models: XAI enables users to create appropriate mental models for AI systems. Whether or not the military system is enabled with AI, users must have a clear understanding of the operating boundary of the system to ensure reasonable and effective use of the system.
  • Insights: Deep neural networks can be used to gain knowledge and identify models unknown to humans in complex programs. By using XAI technology, people can discover and learn this knowledge. Using reinforcement learning to develop tactics and strategies is a typical use of XAI. During development, XAI may generate deeper insights into the military domain.
  • Laws and regulations: XAI can be used to ensure that AI systems comply with national and international laws. Lethal autonomous weapon systems (LAWS) may be the most controversial AI application. Some want to ban such applications, while others argue that LAWS can exist as long as it improves accuracy and minimizes collateral damage. The Swedish National Defense Research Institute reports that XAI has an important role to play in developing rules that dictate when and where AI systems such as LAWS are started.
  • Eliminate errors: There are countless cases in the literature where XAI has been used to identify errors in deep neural networks. Typically, deep neural networks make mistakes when running test data if copyright watermarks in images, fake simulator data, or unrealistic game data appear in the training data, performing well when running test data, and making frequent mistakes when running real data. If XAI technology can be integrated into the development process, such problems can be detected and resolved before deployment.

XAI technologies mainly include: global interpretation technology, such as visualization technology and model evaluation of large high-dimensional datasets; Local interpretation techniques, such as gradient significance, correlation score layer-by-layer propagation technique, Shapley value attachment interpretation, locally understandable model-independent interpretation, random input sampling for interpreting black-box models; Hybrid interpretive techniques, such as spectral correlation analysis.

Evaluate XAI technology

An often overlooked but crucial part of the XAI space is the evaluation of proposed XAI technologies. This section will introduce evaluation criteria from the human factor. In the evaluation of human factors, users such as warfighters and analysts are the core of measuring the effectiveness of XAI in AI systems. This section will also describe test methods that can be used to compare local XAI techniques.

1. Human factor assessment

The human factors assessment of XAI technology will test whether each interpretation takes into account all important factors so that users can get the most out of the AI system. For example, users may have different purposes, needs, knowledge, experience, task context, use cases, etc. As with the development of various systems, it is important to take these factors into account in the entire process of AI system development, from system specifications to user testing. Since XAI technology for deep learning is an emerging area of research, the initial users of the technique are often system developers interested in evaluating model performance. However, it is not yet possible to determine whether these XAI technologies will be useful for military users. The article "Explainable AI Indicators: Challenges and Prospects" gives 6 indicators for evaluating interpretation:

  • Explain the goodness: During the development of XAI technology, make a checklist from the user's point of view. The list is based on the existing literature on interpretation, and evaluates from seven aspects of interpretation, such as whether the explanation can help users understand how the AI system works, whether the explanation can satisfy the user, and whether the explanation is sufficiently detailed and comprehensive.
  • Interpretive satisfaction: A measurement scale that measures the user's experience of interpretation from the perspective of explanatory goodness. The scale includes 8 items in the form of statements (7 goodness items and 1 item on explaining whether it is useful for the user's goals). A validity analysis showed that the scale was reliable enough to distinguish between good and bad explanations.
  • Mental model guidance: Good explanations deepen the user's understanding of how AI systems work and how they make decisions. In cognitive psychology, this is known as the user mental model of an AI system. This paper suggests using four tasks to measure the user mental model of the AI system, such as the prompt review task, that is, the user describes the reasoning process after using the AI system to complete a task; Another example is the prediction task, where the user makes predictions about the subsequent behavior of the AI system. A study comparing user mental models with expert mental models shows the completeness of user mental models.
  • Curiosity driven: Good explanations can drive users to research and address knowledge gaps in mental models. The article suggests measuring curiosity drivers by asking clients to identify factors that motivate them to seek explanations. Possible drivers such as the reasonableness of the AI system's actions, the reasons why other options are excluded, the reasons why the AI system is not operating as expected, etc.
  • Explain trust: A good mental model allows users to maintain a moderate level of trust in an AI system and operate within its operating range. The article suggests using a measurement scale covering 8 items to measure user trust in AI systems. These items include user confidence in the use of the system, predictability and reliability of the system.
  • System performance: The ultimate goal of XAI is to improve the overall performance of the system to outperform AI systems without XAI technology enabled. Indicators to measure performance include the completion of major task goals, the user's ability to predict the response of the AI system, and user acceptance.

There will be more research to further explore how to understand these metrics when evaluating XAI technology for AI systems.

2. Evaluation of local interpretation techniques

Depending on the type of data that the model handles, the visual effect of the significance graph is also different. For example, heatmaps are often used to process images, while color-coded characters and words are commonly used to process text. Figure 1 shows the visual effect of using a heat map to create a saliency map. This case uses gradient significance (1.b) and correlation score layer-by-layer propagation techniques (1.c) to generate a heat map for the number 0 (1.a). Important dimensions such as pixels in the picture are represented by warm colors such as red, orange, and yellow, while unimportant dimensions are represented by cool colors such as dark blue, blue, and light blue. The significant differences between the two technologies can be visualized by highlighting the position of the dimension. This section will continue to describe techniques for quantitative comparison and evaluation of local interpretations to find the techniques that give the most accurate explanations.

The analysis can explain the application of artificial intelligence technology in the military

Figure 1. MNIST images and their corresponding heat maps; Heatmaps are generated using gradient significance and correlation scores layer-by-layer propagation techniques. Important dimensions or pixels in the graph are represented by warm colors such as red, orange, and yellow

(1) Deletion

In the process of changing or deleting inputs, the deletion metric can be calculated by measuring the accurate predictive power of the model. It should be noted that in this case, deletion means converting the input value to something neutral, such as an image background, etc. The deletion process is guided by a significance graph generated by XAI technology to remove values in relatively important dimensions before deleting values in relatively unimportant dimensions. During the deletion process, performance degrades quickly if the interpretation is better, and slow performance degradation if it is not.

Figure 2 illustrates the deletion process using Figure 1.b gradient significance plot. Figure 2.b removes the 50 most important pixels, and one can still easily see that the figure shows the number 0. Figure 2.f removes more than half of the pixels (400 pixels), and it is difficult for people to recognize that the image shows the number 0.

The analysis can explain the application of artificial intelligence technology in the military

Figure 2. 6 pictures exported from the deletion process of MNIST images, which have 0, 50, 100, 200, 300 and 400 pixels removed

(2) Insert

Inserting indicators is a complementary method of deletion. Figure 3 illustrates the insertion process with an MNIST image used during the deletion process. The all-black image of Figure 3.a is the initial input, and an increasing accuracy can be detected as more and more input dimensions are inserted in order of priority for the significance map. During insertion, the more information you insert into the input, the accuracy of the model's predictions should improve—that is, the accuracy increases faster when the interpretation is better, and vice versa.

The analysis can explain the application of artificial intelligence technology in the military

Figure 3. 6 pictures exported from the insertion process of MNIST images, which are inserted with 0, 50, 100, 200, 300, 400 pixels

(3) Evaluation indicators

This report uses the gradient significance and relevance score layer-by-layer propagation technique to demonstrate the deletion and insertion process. A classifier was used in the demonstration along with 100 randomly sampled images from the MINST dataset to evaluate XAI techniques.

Figures 4 and 5 show the results of the insertion and deletion processes, respectively. The area under the curve (AUC) measurement can be used to quantitatively compare multiple XAI techniques. During deletion, smaller AUC values are better than larger AUC values, while during insertion, larger AUC values are better than smaller AUC values.

As can be seen from Figure 4, the performance curve of the correlation score layer-by-layer propagation technique decreases more and converges to a lower average probability value during deletion. This is consistent with heat maps, which have fewer warm colors in the heat map of the correlation score layer-by-layer propagation technique than the gradient significance heat map (see Figures 1.b and 1.c), indicating that the correlation fraction layer-by-layer propagation technique can find explanations faster with fewer features than gradient significance. The same conclusion can be drawn from Figure 5. As can be seen from Figure 5.b, after inserting only a few dozen features, the average probability rises sharply and reaches a high performance level after inserting about 100 features.

The analysis can explain the application of artificial intelligence technology in the military

Figure 4. Deletion curves for gradient significance and correlation scores layer-by-layer propagation techniques

The analysis can explain the application of artificial intelligence technology in the military

Figure 5. Gradient significance and correlation scores are layer-by-layer propagation techniques for insertion curves

conclusion

Deep learning will be used to complement and replace some of the functions in military systems. In fact, military surveillance systems for autonomously detecting and tracking objects of interest from massive amounts of image data have begun evaluating deep learning techniques. Deep learning techniques offer several advantages over traditional software technologies, one of the most important of which is that deep learning can be used for complex modeling processes that cannot be done with traditional software technologies. In addition, deep learning can facilitate active learning, where AI systems interact with users to obtain high-quality data that can be used to enhance combat system models.

However, these advantages also present challenges at the technical and operational levels. The report focuses on the challenges posed by interpretability. The disadvantage of deep learning is that while the learning algorithm, model architecture, and training data are not new and easy to understand, the behavior of the model itself is difficult to explain. Typically, this is not a problem in civilian applications such as music push and advertising recommendations, but in the military field, understanding and explaining the behavior of AI systems is crucial. That's because, whether at the operational level or at the strategic level that requires long-term decision-making by military leadership and political decision-makers, the decisions and recommendations provided by AI systems can have a profound impact on the lives of all of humanity.

While complex military systems such as fighter jets, submarines, tanks, and accusatory decision support tools are equally difficult to master, the technologies that build these systems are inherently explainable, so that if these systems make mistakes, they can be identified and solved by troubleshooting the entire system. However, this is difficult to achieve in the field of deep learning. Deep neural networks in real-world applications are often composed of millions or even billions of parameters, and even the makers of the model cannot systematically solve for errors in the model.

The report presents several cutting-edge XAI technologies for solving interpretability challenges. It is worth noting that although this report has made some progress in this regard, AI technology for military deep learning applications is still in its early stages of development. Moreover, the XAI technologies proposed in the report have not been tested in military environments, so there is no guarantee that existing XAI technologies can enable the use of deep learning in high-risk military AI systems.

Read on