laitimes

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

Can privacy computing unlock the value of medical data?

From April 12th to 15th, Leifeng Ai Nuggets invited four privacy computing companies CXOs to discuss the technical route of privacy computing and the practical application prospects in medical scenarios in the form of an online cloud summit, as well as to deduce the future trend of the industry, with the topic of "Privacy Computing, Let AI Release the Value of Medical Data".

At the Medical Privacy Computing Cloud Summit, Professor Li Xiaolin, partner of Tongdun Technology, chief scientist of the Institute of Medicine of the Chinese Academy of Sciences and chairman of the Knowledge Federation Industry-University-Research Alliance, made the first phase of sharing.

With the topic of "Trusted AI Empowering Healthcare: Let Data Flow and Let Knowledge Share", he shared the background of the construction of trusted AI platforms, the architecture of trusted AI platforms, the theory and practice, the products of trusted AI platforms, and the application of trusted AI platforms in pharmaceutical scenarios.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

He said that data has become the core element of the digital transformation and upgrading of medical care, but in the commercialization of private computing, due to the differences in technical solutions and platform products of different manufacturers, data has shown a split similar to "alliance A" and "alliance B", and the original "data island" has become a new "data island".

At present, different industries and fields urgently need a common trusted AI platform to open up data archipelagos under the premise of protecting data privacy, legal compliance, and data value.

The following is Li Xiaolin's sharing content, Leifeng Network & "Medical Health AI Nuggets" has made an edit and collation that does not change the original meaning.

Background of trusted AI platform construction

There are various scenarios in the medical field, and medical data in different scenarios has emerged.

Specifically, medical data can be divided into six categories: omics database, medicinal chemistry database, disease database, electronic medical record database, medical imaging database, and wearable device database.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

This medical data is collected and used by many platforms, but it also needs to be strictly protected in the process of value generation. Nowadays, data privacy protection has become the focus of legal and policy documents. Since last year, the Data Security Law and the Personal Information Protection Law have been introduced, and data privacy protection has gradually been valued by society.

At the same time, data privacy protection has also exacerbated the difficulties of data sharing and data analysis in the medical industry.

Today, on the one hand, we must protect data privacy, on the other hand, we must also break the data barrier, especially in the context of the new generation of artificial intelligence led by deep learning, data has become the core element of the digital transformation and upgrading of medical care.

But the problem is that medical data not only faces data privacy problems, but also faces problems such as high thresholds, data heterogeneity, and complex types. It is very difficult to collect omics, genes, DNA, imaging and other data accumulated over many years in each medical field.

In addition, in the process of integration, different patients and different hospitals involve multiple data rights and multiple data standards, which also makes data sharing more challenging.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

So how does privacy computing solve the problem of data sharing and data circulation?

In order to protect data privacy, but also to play the value of data, to achieve the safe and compliant circulation of data, since the 1970s, the industry has launched a series of privacy computing technology means, such as homomorphic encryption, secret sharing and a series of data "available and invisible" ideas. In the 1980s, ideas such as multi-party secure computing and MPC were derived.

In recent years, three new ideas have emerged, such as the Trusted Execution Environment (TEE), Federated Learning (FL), and the Knowledge Federation (KF). Together, they push private computing to the heights of the next generation of trusted AI.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

However, at the same time, in the commercialization of privacy computing, because of the differences in technical solutions and platform products of different manufacturers, data shows a split similar to "Alliance A" and "Alliance B", and the original "data island" has become a new "data island".

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

Therefore, all walks of life urgently need to build a trusted AI platform. Connect to the Data Islands while protecting data privacy, legal compliance, and protecting the value of your data.

At present, the open source framework in the market and mainstream research focus on the research and development of the federal algorithm level cannot completely solve the bottleneck of the "archipelago" fragmentation.

If you want to fully share data, share knowledge, and keep data flowing, the first problem is to have a "guarantee of consistency."

That is, multiple member nodes in the federation, under the guarantee of the agreed agreement, reach a "certain degree" of agreement on the processing results of a series of operations. For example, the tasks, nodes, and states are consistent on the connection, and the parameters, algorithms, models, encryption, applications, and regulatory logs are consistent on the circulation.

Trusted AI platform architecture, theory and practice

In order to solve the interconnection problem of different federal systems and establish a federal ecological network on a larger scale, Tongdun Technology has created an open AI platform based on privacy computing.

The first is the architecture of the platform.

In order to give full play to the value of data circulation, Tongdun Technology has created a set of open and shared intelligent platform based on privacy computing, the core of which is The Zhibang platform iBond, and the bottom layer is the core of Zhibang iCore.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

In addition, Tongdun Technology has also created a comprehensive interconnection reference model FIRM model (open Federated system Interconnection/ReferenceModel, that is, the right frame of the figure).

This is a multi-level interconnection reference model, which divides interconnection into four levels, including the communication layer (Ionic), the data exchange layer (FLEX), the algorithm layer (Caffeine), and the application layer (SAFE). Among them, the communication layer and the data exchange layer are the basis for the participants to exchange secure data.

Theoretically, each layer in the FIRM is built on top of its lower layer, providing a certain service to its upper layer, and shielding the upper layer from the details of how to implement this service.

To do this, it is necessary to define a standardized protocol specification for each layer, and describe in detail the services and actions provided by that layer in the agreement to ensure that effective services are provided.

Moreover, the functional definition of each layer is distinguished from the implementation details, so that the model has a universal adaptability.

The second is the theory of this platform – the Knowledge Federation.

The theoretical framework of the Knowledge Federation consists of 4 levels:

The bottom layer is the information layer, which is extracted from data into information, which can be calculated or queried by certain calculations, or even relatively simple statistical information of some ciphertext;

The model layer can do some joint modeling, do some relatively complex machine learning models, or deep learning models;

The cognitive layer is an intermediate state collection layer that can support transfer learning, integration learning, knowledge distillation, and so on;

The knowledge layer can do some knowledge reasoning and knowledge discovery expression.

These four layers comprehensively integrate multi-party secure computing (MPC), federated learning (FL), trusted execution environment (TEE) and other technologies, realizing that data is invisible, knowledge co-creation and sharing, and for the first time, cognition and knowledge are introduced into the category of privacy computing, with the goal of realizing the next generation of trusted, explainable, reasonable, and decision-making artificial intelligence.

Currently, the Knowledge Federation supports secure multi-party query, computation, learning, reasoning, and more. From a technical point of view, while learning from some related technologies, the knowledge federation also has a certain originality, especially the cognitive layer and the knowledge layer federation belong to domestic independent innovation, surpassing the primary federal learning abroad.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

Finally, there is the practice of the platform - the data security exchange protocol FLEX.

FLEX (Federated Learning Exchange), a data security exchange protocol, is a set of open source, standardized federal protocols.

The FLEX protocol stipulates the order in which data is exchanged between the parties in the federation process, as well as the data encryption and decryption methods used before and after the exchange. Just as the HTTP protocol carries the extremely rich Internet applications we see today, the federated protocol is also the foundational protocol necessary to build federated learning applications.

With this protocol, the federated learning application can be standardized, so that the data security and model performance in the federated learning process can be effectively guaranteed.

It is implemented by agreeing on the order of data exchange between the participants in the federation process, and adopting the method of data encryption and decryption before and after the exchange, thereby breaking the platform island.

We have published the Knowledge Federation Data Security Exchange (FLEX) White Paper, which embodies two layers of protocol:

The first is the application protocol, which is oriented to the federated algorithm and provides application support for the multi-party data exchange for the federated algorithm. The communication protocols used in the federal process are also encapsulated here.

The second is the public component, which is the basic cryptographic algorithm and security protocols relied on by the upper-level application protocol, such as homomorphic encryption and secret sharing.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

Trusted AI platform products

First of all, in order to make the knowledge federation better applied, we have created a platform product based on the knowledge federation theoretical framework and the FLEX exchange protocol - Zhibang iBond.

It includes a series of industrial application scenarios, all of which are performed in invisible ways in which data can be used, such as initiating federations and MPC, scheduling tasks, registering data, and so on.

For users, simple algorithms can be called directly from the algorithm library, or they can be customized. Next, users can submit tasks to the Zhibang platform for scheduling and execution, and evaluate the output results, such as performance evaluation, function evaluation, log check, etc.

In addition, users can submit applications, data, algorithms, and communication protocols to our data element market to replace our underlying data communication layer.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

Secondly, based on compliance interconnection, we further create a market of data elements, that is, Zhibang iData.

Data from all parties can be exchanged, traded, and shared in a secure and compliant manner on this unified platform.

Taking data transactions as an example, Zhibang iData divides unused users into data providers and data users, application developers, application providers, and users. All parties publish data and applications on iData, and price the data according to the degree of contribution, usage or market mechanism, so as to realize the value of the data.

For example, in the treatment of rare diseases in china, the rare disease data of hospitals and scientific research groups across the country can be put into the iData data element market, thereby greatly improving the medical data of a rare disease and further improving the disease diagnosis and treatment model.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

On the basis of these efforts, we hope to build a truly trusted AI platform for medical treatment: various medical institutions in China can safely, legally and compliantly share medical data, maximize the potential of production materials, and promote the birth of new diagnostic algorithms and a new medical ecosystem.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

The application of trusted AI platform in pharmaceutical scenarios

In the fields of smart healthcare, inclusive healthcare and drug innovation, how can trusted AI platforms help?

The first use case is the use of ciphertext calculations for medically assisted diagnosis.

For artificial intelligence-assisted diagnosis and treatment, it is fundamentally based on big data as the training basis, which not only needs to be rich and diversified medical big data, but also needs to be marked with a large number of data labels. For small medical institutions or medical institutions in remote areas, they do not have the ability to train models.

But for many large medical institutions, they have the ability to purchase high-precision equipment and have a wealth of patient cases, thus precipitating high-quality labeling data and AI-assisted diagnostic models.

Small hospitals can provide encrypted data to large medical institutions through the Zhibang platform, and use the data advantages of large medical institutions to improve the diagnostic capabilities of AI models.

Whether it is through homomorphic encryption, MPC, federated learning, or large model sharing models, small medical structures can obtain fairly high data accuracy, without being limited by small data or small model problems.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

The second application case is to query through secure SQL and assess the risk level of health insurance.

In the risk assessment of the health of the insured, the inquirer is the insurance institution, and the inquired party is the big data institution that holds the ID information of the intended user.

When assessing risk, it is generally necessary to comprehensively analyze the BMI and age while protecting user privacy and ensuring data security. When the "insured BMI ≤25 and the age <50 years old", it is considered to be a policyholder with higher credit.

In practice, we can use privacy calculations to conduct risk assessments on policyholders through SQL statements and PSI. This will not leak user privacy, but also can get accurate evaluation results, which is the result of killing two birds with one stone.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

Similarly, a risk assessment of the social behavior of people with a disease can be made through federal modeling. For example, the Public Security Bureau or the Health Commission has realized the dynamic risk assessment of accidents and disasters caused by patients with severe alcohol dependence disease through multi-party joint modeling, so as to classify and supervise them at different levels, accurately predict them, and improve the public safety level of residents.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

The third case is the realization of personalized intelligent diagnosis and treatment through federal modeling.

For example, many elderly people with underlying diseases are diagnosed with new crown, what complications will they have at this time, and what is the likelihood of each complication?

Today, machine learning to personalize complication predictions for patients before and after surgery is a way to significantly improve patient rescueability. Through the federal modeling of real clinical big data, the prediction model is built on the basis of data cleaning, clinical feature extraction and structured data, which has strong risk prediction ability and can accurately classify patients with different risk levels to help doctors make scientific decisions.

In addition, privacy computing can also be adapted to the treatment of rare diseases.

For example, each hospital has a certain degree of confidentiality in some rare disease data, and patient information will also involve personal privacy, so we can share data by creating a large model of privacy calculation through multiple hospitals, thereby improving the treatment capacity of rare diseases.

At present, we have introduced a diagnosis model in which doctors and experts and trusted AI platforms complement each other (human in the loop), where doctors and experts can make some judgments on patients on the basis of auxiliary diagnosis, while improving the prediction accuracy of algorithms or models on trusted AI platforms.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

The fourth case is federated learning at the cognitive level, through knowledge distillation, collaborative drug discovery.

The pharmaceutical sector also often faces very complex issues of intellectual property and economic interests, making it nearly impossible for pharmaceutical institutions to share and collaborate directly with each other. However, at the same time, the amount of parameters required for the drug discovery model based on the neural network is large, and the model training time is exponentially multiplied with the amount of data when performing parameter aggregation.

As a result, data on the drug discovery process becomes extremely precious and scarce.

So what are the ways to share drug discovery data?

First, through federated learning, solve the problem of multiple pharmaceutical institutions using NN models for collaborative drug discovery, which is significantly better than the local NN modeling of a single institution using only private data;

Second, through distillation learning, the problem of excessive number of model parameters involved in polymerization is solved, and the model effect of NN modeling is obtained that is the same as that of directly integrating the molecular structure data of drugs of various institutions for NN modeling;

The third is to use cognitive layer federation to transfer the knowledge of each participant, which can solve the problems of domain adaptation and data set drift under the premise of protecting the privacy of drug molecular structure.

Moreover, for some cases of drug failure, this part of the data can also be shared as a resource, so as to avoid broad-spectrum and random selection of drug trial patients.

Overall, through the trusted AI platform, it can solve the data problems of multiple pharmaceutical institutions, drug research and development institutions, research institutes or research groups, help all parties improve the accuracy and success rate of their local drug discovery, and even improve the clinical performance of drugs.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

The fifth case is machine learning through FPGA, efficient privacy protection.

When modeling multiple parties, the parameters of transmission/aggregation are often protected by homomorphic encryption. However, ciphertext encryption and decryption and calculation speed based on ciphertext operations are often one of the bottlenecks in modeling.

If the software + hardware (such as: FPGA, GPU, encryption card, etc.) composite technology is used to build an aggregator based on the encryption and decryption chip, and the FPGA is embedded in the federated learning system, the execution speed and parallelism of the encryption algorithm (such as Paillier) can be significantly improved, thereby improving the efficiency of data encryption and decryption, and reducing the iteration time of training.

This method can be used for the use of trusted AI platforms in the medical field, such as the data of medical images is very large, if the computing efficiency can be improved in all directions through hardware acceleration, it will greatly promote the application of medical privacy computing and data security exchange.

Tongdun Technology Li Xiaolin: Trusted AI ecosystem will become the "infrastructure" of the next generation of AI medical treatment

That's all I've been told, thank you.

Read on