laitimes

Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

Just 3 seconds of voice to perfectly replicate an entire person's accent? In the context of explosive generative artificial intelligence GPT, the risk of technology being abused, the security risks and threats to ordinary people, should not be underestimated. According to a recent report published by McAfee, artificial intelligence (AI)-based voice scams are on the rise, with 77% of those who receive scam calls resulting in financial losses.

So, how do criminals use AI technology to clone user voices?

Voiceprint recognition is a technology that automatically identifies the speaker's identity based on the speaker's information contained in the speech waveform. In recent years, finance, telecommunications, insurance and other services have achieved online popularization, and biometric technology has played an important role in identity authentication scenarios. Compared with biometrics such as face, iris, fingerprint, etc., voiceprints have data advantages such as low cost, easy to accept, and difficult to imitate, and have been widely used in many scenarios such as login, payment, business verification, anti-fraud, and remote identity authentication in finance, public security and other fields, especially identity recognition based on telecommunications networks, such as telephone banking, telephone stock trading, electronic shopping, etc.

Behind the wide application of voiceprint authentication is the large-scale and chaotic voiceprint certification market. Free and paid AI voice cloning tools are readily available, and many require only basic experience and expertise. Some tools can generate 85% matching clones in as little as three seconds of speech. The more accurate the cloned voice, the more likely it is that cybercriminals will trick their victims into handing over money or taking other required actions.

From the perspective of the gray-black industry chain of voiceprint authentication, the black industry gang obtains the real information of individuals through network channels, and combines AI technical means or tools to clone the user's voice, bypassing the verification mechanism such as voiceprint live detection, and deceiving the voiceprint recognition system, so as to achieve the purpose of account and information theft, telecommunications fraud and other business violations.

01

How to exploit the voiceprint authentication security vulnerability

Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

While voiceprint recognition technology is more secure than traditional authentication techniques in some ways, there are still some security vulnerabilities and exploits:

1. Recording attack: By recording the voice of the target user, using the recording to trick the voiceprint identification system and gain unauthorized access.

2. Speech synthesis attacks: Speech synthesis technology is used to generate voices similar to the target user, and these voices are used to trick the voiceprint recognition system and gain unauthorized access.

3. Sound morphing attack: Use sound morphing technology to change your voice, trick the voiceprint recognition system, and gain unauthorized access.

4. Sound replay attacks: Use sound reproduction technology to replay previous sounds, trick the voiceprint recognition system, and gain unauthorized access.

5. Voice jamming attack: Use voice jamming technology to interfere with the normal operation of the voiceprint recognition system, so that it cannot correctly identify sound.

6. Voiceprint recognition algorithm vulnerabilities: There may be vulnerabilities in the voiceprint recognition algorithm, which attackers can exploit to trick the voiceprint recognition system and gain unauthorized access.

7. Database attack: An attacker can obtain the user's voiceprint information by attacking the database of the voiceprint identification system, thereby gaining unauthorized access.

02

The voiceprint recognition system bypasses technical analysis

Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

Counteract sample attacks

Counter Sample Attacks are deliberately designed by attackers to cause incorrect input of the algorithm model, and by adding specific disturbance information to the sample, the interference model output points to the wrong result or the output result is abnormal, which is widely used and has certain versatility in the field of deep learning algorithm security confrontation.

Adversarial sample attack is to add disturbance information to a non-target person's voice to generate adversarial voice, so that the system recognizes adversarial speech as the target person, which is divided into white-box attack, black-box attack and semi-black-box attack. The attacker can access all the information of the target neural network, including its architecture, parameters, gradients, etc.

Attackers can make full use of network information to craft adversarial samples. The following diagram shows how to combat a sample attack:

Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!
Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

Model backdoor attacks

Model Backdoor Attack refers to burying a backdoor in some way during the training process of the model, and then firing it through a trigger preset by the attacker. When the backdoor is not triggered, the attacked model behaves similarly to the normal model; When the buried backdoor in the model is activated by the trigger specified by the attacker, the output of the model becomes the target label specified by the attacker in advance, so as to achieve the malicious attack goal.

Backdoor attacks occur in many scenarios where the training process is not completely controlled, such as using third-party datasets, using third-party platforms for training, and directly calling third-party models, which pose a huge threat to model security. At present, poisoning training data is the most direct and common way to implement backdoor attacks.

Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

Example:

https://github.com/zhaitongqing233/Backdoor-attack-against-speaker-verification           
Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

Deepfake attacks

Deep Forgery Attack refers to the attack on the voiceprint recognition system through various speech synthesis, speech conversion or highly imitating the tone of the target's voice, intensity, pronunciation habits and other rhythmic characteristics, is a new type of attack method, the conditions are high, the need to use a variety of audio processing software to forge speech, thereby blurring the voiceprint recognition system, bypassing the voiceprint recognition.

Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

To complete deepfake audio, complex factors need to be overcome:

  • Guaranteed high audio quality with almost no background noise;
  • Deliver audio in stages;
  • Avoid face-to-face conversations and only use voice messages and other means to achieve the goal.
Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

03

Technical defense guidelines for bypassing voiceprint recognition

In order to standardize the development of artificial intelligence, in December 2022, the Provisions on the Administration of Deep Synthesis of Internet Information Services was officially released, clarifying data and technology management specifications. The principle of authorization and consent mentions that where deep synthesis service providers and technical supporters provide biometric information editing functions such as faces and human voices, they "shall prompt deep synthesis service users to inform the edited individual in accordance with the law and obtain their separate consent."

In the face of such rampant black industry technology that bypasses voiceprint authentication, enterprises can effectively defend against the following measures:

1. Add multi-factor authentication

Multi-factor authentication reduces the risk of a single authentication method being compromised. Combine with other authentication methods, such as passwords, PINs, facial recognition, or fingerprint recognition, to enhance security.

2. Update the voiceprint recognition system regularly

Ensure that the voiceprint recognition system and its algorithms are kept up-to-date to fix known vulnerabilities and improve recognition accuracy.

3. Introduction of liveness detection technology

Live detection can require users to make random actions or utter random phrases during the verification process to ensure that the sound originates from a real-time human user, preventing recording attacks and speech synthesis attacks.

4. Voice encryption

The user's voiceprint data is encrypted to prevent interception or tampering during transmission or storage.

5. Secure storage

Ensure that voiceprint data is stored in a secure environment, such as using secure databases and access control policies.

6. Change the verification phrase regularly

Encourage users to periodically change the verification phrase used for voiceprint recognition to reduce the risk of being impersonated by an attacker.

7. Monitor and alarm

Implement real-time monitoring and alerting mechanisms to detect and take action when abnormal behavior or attacks occur.

8. Regular audits

Voiceprint recognition systems are regularly audited to ensure their safety and compliance.

For individuals, they should enhance their awareness of information protection, and do not easily disclose their voice messages, dynamic videos, etc. on unknown software and mini programs, so as to reduce the risk of malicious use of voice.

Can audio clone a full voice alone? "AI Voice Cloning" Principle Analysis and Defense Guide!

As a result, Bangbang Security will continue to gain insight into the development trend of industry technology, strengthen theoretical research, promote the formulation of standards, improve evaluation and consulting service capabilities, and work with voiceprint recognition manufacturers to build a "biometric technology application ecology" of co-governance, win-win and sharing.