Nasty Teacher, avoiding "distillation" of models and strengthening intellectual property protection

author：I love computer vision 2022-01-20 09:46:00

Author | Daft shiner

Edit | CV Jun

Coverage | I love computer vision (WeChat id: aicvml)

This article shares ICLR's 2021 paper, Undistillable: Making A Nasty Teacher That CANNOT teach students, proposing a Nasty Teacher to prevent models from being stolen by distillation, providing a new way of thinking for the protection of intellectual property.

Thesis link: https://openreview.net/forum?id=0zvfm-nZqQs
Project link: https://github.com/VITA-Group/Nasty-Teacher

Nasty Teacher, avoiding "distillation" of models and strengthening intellectual property protection

Introduction

Knowledge distillation is a common means of model compression, which transfers the knowledge of pre-trained teacher models to more lightweight student models, so that student models can achieve or even exceed the performance of teacher models, so as to achieve the effect of lossless model compression. However, once the technology is used by ill-intentioned people, it can easily steal the model that researchers spend a lot of time, effort and equipment training to be easily reproduced, resulting in intellectual property rights (Intellectual Properties) being infringed. Therefore, this paper designs a Nasty Teacher so that its performance remains unchanged when it is used normally, and the performance of the student model obtained when it is distilled by knowledge is very poor, so as to protect the intellectual property rights of the model owner.

Contributions

Natty Teacher was introduced to prevent knowledge leakage and unauthorized model cloning by knowledge distillation without sacrificing performance. The authors see this as a first step toward machine learning intellectual property and privacy protection.

The authors propose a simple and efficient algorithm called self-undermining knowledge distillation, which is optimized by maximizing the difference between a Natty Teacher and a normally trained opponent.

• We conducted extensive experiments on standard knowledge distillation and data-free knowledge distillation methods to prove that The Nasty Teacher can achieve nearly the same accuracy (less than a 1% accuracy gap), while student models learned from the Nasty Teacher will reduce the accuracy by up to 10% or more, and even divergences during training.

Methodology

Revisiting Knowledge Distillation

Knowledge Distillation is a widely used method of model compression proposed by Hinhon et al. The optimization goals for its training are as follows:

Where the sum is KL divergence and cross-entropy loss, respectively, and represents the teacher and student models, respectively, the equation is designed to allow the student model to learn the logit and ground true labels of the teacher model. and softmax temperature was used. Logit is the probability prediction value for each category output by the teacher model.

A very perceptual realization of learning logit is that in addition to the positive example OfgroundTruth, the negative case also carries a lot of valuable information. For example, in the MNIST classification, 0 to 9 numbers, 7 and 1 are written very similarly, but 7 and 5 are very different, GroundTruth only tells us that this picture is 7, but logit also tells us: this picture is most likely 7, small probability is 1, almost unlike other numbers. The amount of information carried in this is what we want the student model to learn through knowledge distillation.

Nasty Teacher

Here we will introduce the design of the optimization function of The Nasty Teacher, which aims to maximize the adversarial network (the student model used by the attacker) and the KL divergence of the Nasty Teacher under the condition of ensuring the accuracy of the model, so the following equation is easy to understand:

Experiments

The above table shows the performance comparison between the normal teacher model and the Nasty Teacher on CIFAR10, and it can be found that the accuracy of the teacher model is less and the accuracy of the student model obtained according to the Nasty Teacher is significantly lower than that of the normal teacher model distilled.

The same phenomenon occurs on the CIFAR100 dataset, where student models that mimic Nasty Teacher on MobilenetV2 can't even be trained.

The same phenomenon also occurs on the Tiny-ImageNet dataset, where it can be found that student models that mimic Nasty Teacher on MobilenetV2 cannot be trained either.

The figure above shows the difference in the probability distribution obtained by the normal teacher and the Nasty Teacher on the CIFAR10 dataset, and it can be seen that the predicted distribution of the Nasty Teacher has changed greatly.

The figure above shows the t-SNE visualization results on the CIFAR10 dataset with ResNet18, which shows that the visualization of normal teachers and Natty Teachers is similar, which also shows why the accuracy of Nasty Teachers is similar to normal.

The table above shows the performance of attackers and Natty Teachers using different model structures.

Since the attacker may not be able to have the structural information of the teacher model, there is a situation where the structure of the student model is more complex than the teacher model, and the corresponding experiment is carried out for this situation in the above table.

The figure above shows the curve of the effect on the accuracy of the model in the Natty Teacher optimization goal.

The figure above shows the effect of temperature on the accuracy of the model.

The figure above explores the different effects of the attacker training the student model to use, and it can be found that no matter how the choice is, Nasty Teacher can effectively prevent the attacker from distilling, compared to the larger, smaller can be relatively good to achieve distillation stealing of The Nasty Teacher. Figure b above simulates the fact that an attacker cannot obtain all the training data in real-world situations, in which case Nasty Teacher can also effectively prevent distillation theft.

The table above reflects the results of data-free knowledge distillation, and it can be found that Natty Teacher also has a good anti-distillation effect.

Finally, the authors also visualized images based on reverse engineering from DeepInversion (a kind of data-free knowledge distillation), which can be generated by ordinary ResNet-34 with high visual fidelity, while the images generated from Nasty ResNet-34 contain distorted noise and even false classification features.

Conclusion

This paper proposes a Nasty Teacher that prevents models from being stolen by distillation, providing a new way of thinking for the protection of intellectual property.

Nasty Teacher, avoiding "distillation" of models and strengthening intellectual property protection

Introduction

Contributions

Methodology

Revisiting Knowledge Distillation

Nasty Teacher

Experiments

Conclusion

Read on

In the past two days, several people have sent me a long article about the "text regeneration" of the wooden heart, saying that the Lord does not pursue (almost all of them are dead), and we will do what he does. I just felt like "text again."

Hong Kong suffers a "tsunami-style" epidemic! Expert advice: actively consider "lockdown" to avoid the fall of medical services

When watching the Winter Olympics, don't just sit, get up and move around often to reduce the risk of blood clots. A study of over 130,000 people in the UK showed that long hours of watching TV (more than 4 hours a day).

The real "misunderstanding" can stand the test of time and facts, "misunderstanding" cannot be avoided, and there is no "misunderstanding", "misunderstanding" and "doubt" that trust the other party

Once a family thrives, decay is inevitable

Pregnancy, childbirth, confinement is a topic that cannot be talked about. Finally having a baby is about to enter a very special period. It is said that confinement is a major event in a woman's life, if the confinement is done

These 5 things do not pay attention, be careful of prostatitis "find" you 1, before going to bed at night, try to drink less water, avoid a full bladder squeezing the prostate. 2. Drink at least 1.7L of water during the day

The LipPan people are an ancient tribe in Ethiopia, Africa. The women of the tribe, from the age of 10, pierced their lips and slipped them into a wooden disc, and did not grow with age

#Re-examination #[These 4 situations in the re-examination interview are easy to be eliminated] The re-examination is a key step for candidates, and everyone must take it seriously. Today, Kao Yanjun took stock for everyone

#Prevention of eye disease #1.Pay attention to safety and prevent eye trauma. Eye trauma can cause severe visual impairment and even blindness. Therefore, pay attention to safety when moving, always pay attention to eye protection, prevention

Many natural disasters are inevitable, but recent anomalies I think are still caused by human beings themselves.

Born in the earthly world, everyone inevitably has to experience bitter rain and miserable winds, facing hardships and hardships, and if you want to open, you are heaven, and if you don't want to open, you are hell.

In the ancient times, there was no theory of the Three Principles and Six Ages, people lived a simple life, covered themselves with bark and grass leaves, often covering the front but not the back, and people's initial shame was visible here. Yes

Many retail investors have bought the stage highs of a stock and sold at the stage lows of a stock, and they repeat this mistake over and over again. This is simply reverse compound interest

Wang Ziwen looks really more than 159, wearing suit pants long legs are too eye-catching usually suit pants are made of polyester tweed fabric, this fabric is easy to pick up after a long time, easy to dry in water. This fabric

Mencius, it's so distressing! On May 19, female star Meng Ziyi broke the news on the show, saying that she was dragged into the toilet by several girls when she was in junior high school, and asked her if she knew a boy.

The Standing Committee of the Tongguan County People's Congress inspected the inheritance and protection of the county's intangible cultural heritage

The Biden administration has announced new actions to ensure environmental protection in the Antarctic region

"Protect and pass on this historical and cultural heritage"

Liu Xiang 520 shows affection, the dog does not show his wife, insider: He pays great attention to protecting the privacy of his family

Wang Feng admitted that he had a relationship with the Internet celebrity Lin Linbei, refuted rumors of cheating in marriage, and wanted to protect the people he cared about

Xu Dongdong's Illumination Incident: The Importance of Privacy and Legal Protection

Anti-non-propaganda month丨Anti-non-mantra! To protect our legitimate rights and interests

World No Tobacco Day - Protecting Adolescents from Tobacco Harm Free Clinic was successfully carried out in West Asia, Huangchuan County

The appearance of the iPhone 16 Pro is basically confirmed, the latest case is exposed, and a new button has been added

Typical Cases of Judicial Protection of Intellectual Property Rights (6-8)

Xie Bi'an's finale is coming, the actor's daily sharing is more exciting, Xie Bi'an is finally about to usher in the finale, this role is still relatively simple, as for what is fair, these big questions must be whether he has it

The Beijing-Tianjin-Hebei People's Court's coordination and co-construction mechanism for the protection of minors has been launched

Comment: Listed companies should not "retreat" and should implement measures to protect small and medium-sized shareholders

The First People's Hospital of Ningyang County: Jointly create a smoke-free and healthy growth environment to protect young people from the harm of tobacco

Firefighters are experiencing life and death rescue in their dreams, and their instinctive reaction when facing danger is to protect their comrades