Compilation 丨 Zhang Takiling, Yang Liu
Edited by 丨Viktor
In January, Professor Stefan Feuerriegelc of ETH Zurich published an article in the journal Communications of the ACM entitled "Artificial Intelligence Across Company Borders", in which the professor pointed out a common challenge in the process of landing the artificial intelligence (AI) industry: how to carry out cross-company cooperation?

The professor said that constructing large-scale cross-company data sets through data sharing is one way, but there is a risk of data confidentiality and privacy leakage, and it is limited by privacy-related laws.
The distributed machine learning framework that protects privacy, federated learning, can solve the above pain points by allowing data not to be local.
But traditional federated learning does not currently provide a normative proof of privacy protection, and in addition, its scenarios are vulnerable to causal attacks.
Therefore, the professor noted that combining federated learning and domain adaptation can maximize the benefits of partner companies from collaborative AI models while keeping the raw training data local.
The following is Professor Stefan Feuerriegelc's introduction to the field of adaptive federal learning, translated by Nebulas Clustar Senior Algorithm Engineers Zhang Takiling and Yang Liu.
In recent years, digital technologies with AI as the core are driving economic and social development. The data shows that AI will increase economic activity in the global industrial sector by $13 trillion in 2030.
However, the potential of this technology remains largely untapped due to the inability to access or effectively utilize TNC data. AI benefits from a large amount of representative data, which often needs to come from multiple companies, especially in real-world industrial scenarios, and it is extremely challenging to achieve good performance of AI models in the face of rare unexpected events or critical system states.
One direct way to implement cross-company AI technologies is to construct large-scale cross-company datasets through data sharing. But because of data confidentiality and the risk of privacy leakage, most companies are reluctant to share data directly. And in most cases, shared data is limited by privacy-related laws. Therefore, federated learning with domain adaptation is the key to solving cross-company AI problems, on the one hand, federated learning can achieve model training and inference without leaking the data privacy of each company; on the other hand, domain adaptation allows companies to customize the federated model according to their own specific application scenarios and conditions.
1
Obstacles to AI cooperation
There are two main obstacles to cross-company AI:
The first is data privacy across companies. Direct sharing of raw data may expose competitor companies to proprietary information about their own company's operating processes or intellectual property rights. This barrier often arises when companies seek to collaborate with suppliers, customers, or competitors who want to do AI.
For example, data from a manufacturing plant can reveal parameter settings, product composition, yield, throughput, route, and machine uptime. If such data is leaked, it could be misused by customers in corporate negotiations or in turn help competitors increase productivity and improve products. At the same time, in addition to intellectual property rights, some deep constraints will also reduce the willingness or tendency of companies to share data, such as the degree of trust between companies, ethical constraints, laws and regulations that protect the privacy rights of corporate users, and cybersecurity risks. So we need a solution to protect data privacy, which is to make model inferences without exposing the source data of each company.
The second is that cross-company collaboration needs to take into account the impact of domain shifts. Domain drift refers to the mismatch in the distribution of data collected for different companies using different configurations of machines or operating systems. For example, machine data collected from one company may not be representative of another company due to different machine data acquisition conditions. Domain drift presents a potential inference barrier: a model trained on one company's data may perform poorly when deployed to another company with significantly different data distributions.
2
Cross-company AI
The latest advances in AI research are expected to break through these two conundrums. Federated learning is a privacy-preserving distributed machine learning framework designed to allow multiple edge devices or servers to jointly train machine learning models by sharing local model parameters (gradients or weights) without sharing data samples.
Vertical federated learning across companies can take place from the joint data of all participating companies (e.g., from multiple plants, rolling stock plants, or power plants) to jointly train models for machine learning by sharing model parameters (gradients or weights) across companies.
To achieve this, vertical federated learning across companies is decoupled by training models with access to the original training data: companies align common data through cryptography without exposing their own raw data. Train the model by leveraging local data from each party and return intermediate results to the coordinator. The coordinator summarizes the intermediate results of each participant and builds a collaboration model to improve the overall performance and effectiveness of the model. During this process, no company has direct access to the raw training data of other companies.
In the context of cross-company AI, for the problem of domain shift of cross-company cooperation, because the data distribution of different companies is usually only less overlapping, that is, there is a certain difference between the target domain and the source domain domain, we introduce domain adaptation theory, and the goal is to learn the invariant, that is, it is not limited by the specific operating conditions of the cooperative company, thereby alleviating the impact of poor model performance caused by domain shift between cross-companies.
Specifically, by learning the common feature representation of the source and target domains, in the common feature space, the distribution of the source and target domains should be as similar as possible so that the edges are distributed in the feature space.
Cross-corporate AI collaboration can address barriers to privacy protection for direct data sharing and barriers to domain drift through domain adaptation by using federated learning. This combination is often referred to as federal transfer learning.
There are two types of transfer learning approaches that are often encountered in industrial ecosystems, often treating failures as labels but being unbalanced but failures are often uncommon in systems. Labels usually appear in the source domain but not in the target domain (called unsupervised domain adaptation); labels are not in both the source and target domains (called unsupervised transfer learning)
3
Cross-company AI landing
Companies can combine federated learning and domain adaptation to enable collaborative AI in the industrial ecosystem. Once deployed, it allows partner companies to benefit from collaborative AI models while keeping the raw training data local. At the same time, the way the collaborative model is trained can summarize each company's data very well. And at no time will proprietary data across corporate boundaries be shared, only the intermediate results of the model (e.g. gradients) are shared between companies, and in addition, collaborative models represent the degree of heterogeneity between companies by learning invariants. For example, independent of the company's specific operating conditions, each participating stakeholder is able to extend its own operating experience through the experience of other partner companies.
For industrial ecosystems, the training process in traditional federated learning is usually coordinated by a central server, but on the one hand, due to the bottleneck nature of the central server, potential vulnerabilities can be created. On the other hand, this centralized architecture is currently only applied to the common scenario of bilateral cooperation.
Implementing cross-company AI collaboration in a decentralized way is very potential and of great value, so a decentralized learning setup is introduced. In decentralized federated learning, communication with a central server is replaced with peer-to-peer communication, which dynamically forms cross-corporate collaboration within a subnetwork by the similarity of applications or the similarity of operating conditions and the evolution of specific use cases and operating conditions. At the same time, in order to complete the task of the traditional central server, the application of distributed ledger technology here is also feasible. Finally, the approach discussed here needs to be chosen based on practical experience across enterprises so that companies choose whether to prefer federated learning with a centralized or decentralized approach.
While federated learning can provide a more significant privacy protection strategy and encourage collaboration across corporate boundaries, until now, traditional federated learning has not been able to provide a prescriptive proof of privacy protection, and semi-honest participants are likely to infer some information from gradient updates and previous model parameters. In addition, traditional federated learning scenarios are susceptible to causal attacks, i.e. trained models can be compromised by incorrect model updates by parties. It is very important for companies to avoid the implementation of such attacks, and one solution here is to propose the use of additional privacy protection techniques, such as differential privacy or cryptographic methods, etc.
4
Combine federated learning and domain adaptation
The power of AI can be unleashed in a cross-company environment
For practitioners, bringing cross-company AI collaboration into the industrial ecosystem will require a set of design principles that guide and implement the process. For example, if there is no significant domain shift in the distribution of data within the application of two companies, federated learning can be applied directly without the need to combine with domain adaptation, etc.
In addition, the implementation of AI collaboration across companies must meet the further needs of practice, which may require more scaling, such as solutions for continuous learning and data heterogeneity. For example, for highly heterogeneous systems, a model implementation that is robust enough must be chosen to achieve portability (for example, across different product models, different sensor group combinations, or different manufacturers). At the same time, with the passage of time, after the industry matures, it should also do a good job of guiding the work to develop a series of standard specifications and cross-company cooperation to further release the power of AI.
5
Direction of development
Combining federated learning with domain adaptation can unleash the power of AI in cross-company collaborations. This cross-company AI collaboration can be extended beyond traditional supply chains or domains. For example, create a large ecosystem of cooperative rating organizations. While this vision may come to fruition in the near future, companies can begin to learn and use this new technology among trusted partners. At the same time, there is still a need to develop fairness indicators to distribute models, which is the microeconomic implication of cross-company AI collaboration. Industry managers should identify data partners who can help optimize their performance more holistically, aligning with systems thinking.
Cross-company AI can also inspire new business models, such as providing services through AI or supporting data by third-party companies. Small and medium-sized companies, in particular, will benefit from leveraging the data resources of other companies. In this regard, service systems engineering can help develop system principles based on cross-company AI design and development of service system networks. The first step in this direction is to systematically understand the value co-creation model between stakeholders and resources.
Leveraging AI collaboration across companies will benefit from ongoing research. Research is also making new attempts to advance federated learning, improving its scalability, robustness, and effectiveness, while enhancing privacy protection and improving model performance. Federated learning of these domain-adaptive abilities can foster exponential growth in collaboration on ai across corporate boundaries.
Reference Links:
https://cacm.acm.org/magazines/2022/1/257442-artificial-intelligence-across-company-borders/fulltext
Leifeng NetworkLeifeng Network