laitimes

Analysis shows that Meta's Llama 2 LLM is still prone to hallucinations and other serious security vulnerabilities

author:cnBeta

Unless you're directly involved in developing or training large language models, you won't even be aware of their potential security vulnerabilities. Whether it's providing misinformation or leaking personal data, these weaknesses pose a risk to LLM providers and users.

Analysis shows that Meta's Llama 2 LLM is still prone to hallucinations and other serious security vulnerabilities

In a recent third-party evaluation conducted by AI security company DeepKeep, Meta's Llama LLM did not perform well. The researchers tested the model in 13 risk assessment categories, but it only passed 4 categories. The severity of its manifestations is particularly evident in the hallucinations, timely injections, and PII/data leakage categories, where it exhibits significant weaknesses.

When it comes to LLMs, hallucinations are when models take inaccurate or fabricated information as fact, and sometimes even insist that it is true when confronted with that information. In DeepKeep's tests, Llama 2 7B had an "extremely high" hallucination score, with a hallucination rate of 48%. In other words, your odds of getting an accurate answer are equivalent to flipping a coin.

Analysis shows that Meta's Llama 2 LLM is still prone to hallucinations and other serious security vulnerabilities

"The results showed that the model had a distinct hallucinatory tendency, with about a 50 percent chance of providing a correct answer or making up an answer," DeepKeep said. "In general, the more common the misunderstanding, the higher the chance that the model will respond to the wrong message. "

Hallucinations are a well-known old problem for Llama. Stanford University removed Llama-based chatbot "Alpaca" from the internet last year because it was prone to hallucinations. As a result, it's as bad as ever in this regard, which also reflects Meta's efforts to address this issue.

Llama's vulnerabilities in just-in-time injection and PII/data leakage are also of particular concern.

Prompt injection involves manipulating the LLM to override its internal programs in order to execute the attacker's instructions. In tests, 80% of the time, prompt injection successfully manipulated Llama's output, which is a worrying statistic considering that bad actors could use it to direct users to malicious websites.

Analysis shows that Meta's Llama 2 LLM is still prone to hallucinations and other serious security vulnerabilities

"For prompts that contain a hint injection context, the model is manipulated 80% of the time, meaning it follows the prompt injection instructions and ignores the system instructions," DeepKeep said. [Prompt injection] can take many forms, from personally identifiable information (PII) exfiltration to triggering a denial-of-service and facilitating phishing attacks. "

Llama also has a tendency to data breaches. It mostly avoids revealing personally identifiable information such as phone numbers, email addresses, or street addresses. However, it appears overzealous when editing information, often mistakenly deleting unnecessary, benign items. It is highly restrictive to inquiries about race, gender, sexual orientation, and other categories, even in appropriate circumstances.

In other PII areas such as health and financial information, Llama leaks data almost "randomly". The model often acknowledges that the information may be confidential, but then exposes it anyway. This type of security issue is another headache when it comes to reliability.

Analysis shows that Meta's Llama 2 LLM is still prone to hallucinations and other serious security vulnerabilities

"The performance of LlamaV2 7B is closely related to randomness, with data breaches and unnecessary data deletion occurring in about half of the cases," the study revealed. Sometimes, the model claims that some information is private and cannot be made public, but it recklessly references context. This suggests that while the model may recognize the concept of privacy, it does not consistently apply this understanding to effectively redact sensitive information. "

On the bright side, DeepKeep says that Llama's responses to inquiries are mostly well-founded, that is, when it's not hallucinating, its answers are reasonable and accurate. It also effectively handles toxic, harmful, and semantic jailbreaks. Its answers, however, tend to oscillate between being too detailed and too vague.

Analysis shows that Meta's Llama 2 LLM is still prone to hallucinations and other serious security vulnerabilities
Analysis shows that Meta's Llama 2 LLM is still prone to hallucinations and other serious security vulnerabilities

While Llama is well immune to prompts that use linguistic ambiguity to make LLMs violate their filters or programs (semantic jailbreaking), the model is still vulnerable to other types of adversarial jailbreaking. As mentioned earlier, it is very vulnerable to both direct and indirect prompt injection, which is a standard way to override the model's hard-coded features (jailbreak).

Meta isn't the only LLM provider with similar security risks. Last June, Google warned its employees not to hand over confidential information to Bard, possibly because of the possibility of a leak. Unfortunately, companies that adopt these models are eager to be number one, so many weaknesses may not be repaired for a long time.

At least once, an automated menu bot gets a customer order wrong 70% of the time. Instead of solving problems or pulling down products, it masks the failure rate by outsourcing human labor to help correct orders. The company, called Presto Automation, lightly described the bot's poor performance, revealing that 95% of the orders it took when it first launched needed help. No matter how you look at it, this is a dishonorable gesture.

Read on