Cleanlab Launches New Solution to Detect Artificial Intelligence Illusions

author：The frontier of the AI era 2024-04-30 08:58:00

Cleanlab has launched the Trustworthy Language Model (TLM), a fundamental advancement in generative AI. The company says it can detect when large language models (LLMs) are hallucinating. Dr. Steven Gawthorpe, associate director and senior data scientist at Berkeley's research group, called trusted language models "the first viable answer to the LLM illusion I've ever seen."

Cleanlab Launches New Solution to Detect Artificial Intelligence Illusions

Generative Artificial Intelligence (GenAI) has the potential to transform every industry and profession, but it faces the significant challenge of "hallucinations", where LLMs produce incorrect or misleading results. The LLM's response may seem convincing. But is this correct? Is it based on reality? There is no way for LLMs to know for sure. This makes it nearly impossible to automate sensitive tasks with GenAI.

Lack of trust is a major barrier to LLM adoption by businesses. Billions of dollars in productivity gains are held back by this dilemma. Cleanlab was probably the first to crack it.

Cleanlab's TLM combines world-class uncertainty estimation, automated ML ensemble, and quantum information algorithms that are repurposed for general-purpose computing to increase trust in generative AI. Its API can encapsulate any LLM to generate a reliable confidence score for each response.

In industry-standard benchmarks for LLM reliability, TLMs outperform other methods across the board. The performance it provides is not only superior, but consistently superior, giving businesses the confidence to rely on generative AI to get the work that matters right.

For example, businesses can use TLMs to automate the customer refund process, bringing in human reviewers when the LLM's response falls below a predetermined level of confidence.

"Cleanlab's TLM provides us with rich data from thousands of data scientists and the ability to enhance LLM output, delivering 10x to 100x ROI for many of our clients. Other tools aren't even at the same level of competition as compared to what Cleanlab does. Gawthorpe said.

"Cleanlab's TLM is a truly groundbreaking solution that effectively addresses hallucinations. Akshay Pachaar, an AI engineer at Lightning.ai, added. "The integration of Cleanlab's confidence score has transformed the manual cycle workflow with up to 90% automation. Not only does it save hundreds of hours of manpower per week, but it also improves our efficiency in processing large data sets for data enrichment, file and chat log analysis, and other large-scale tasks. It has the potential to revolutionize the way we manage and derive value from data. ”

In addition to making LLMs more trustworthy, TLMs also make LLMs more accurate. It functions like a kind of super LLM, examining the output of the LLM to provide better results than the LLM itself. In benchmarks comparing the accuracy of GPT4 and GPT4 + TLM, the combination of GPT4 and TLM outperformed GPT4 itself every time. This makes TLM ideal for the following scenarios:

RAG (Retrieval Enhanced Generation): provides more reliable context for LLMs;

Business chatbots: answer questions from customers and employees accurately;

Data Extraction: Extract complex information from PDFs;

Securities Analysis: Scan stock reviews for the strongest buy signals.

Like Cleanlab's other products, TLM stems from the founder's pioneering research into the uncertainty of AI datasets. The company's CEO, Curtis Northcutt, spent eight years working with the inventors of quantum computers to understand how to extract reliable computations from arbitrary data. Its Chief Scientist, Jonas Mueller, led the development of AutoGluon, AWS's open-source and industry-standard Auto-ML platform. Its CTO, Anish Athlaye, is one of the world's most renowned machine learning developers, with over 30,000 stars on GitHub for his personal project.

Fortune 500 companies such as Amazon Web Services (AWS), Google, JPMorgan Chase, Tesla and Walmart are all using Cleanlab's technology to improve data entry. Now, Cleanlab is applying the same expertise to the output of LLMs – and the economic implications are even greater.

Curtis Northcutt, CEO of Cleanlab, said: "This is a turning point for generative AI in the enterprise. "Increasing trust in LLMs will change the way people think about the use of LLMs. We will always have some form of hallucination. The difference is that now we have a robust solution to detect and manage them. This means businesses can deploy generative AI for previously unimaginable use cases and unlock important new productivity and revenue streams. ”

Founded in 2021 by three MIT PhDs in computer science, Cleanlab adds trust to every input and output of data-driven processes by transforming unreliable data into reliable models and insights. Cleanlab's AI data platform, Cleanlab Studio, can automatically find and fix errors in structured and unstructured datasets, such as visual, textual, and tabular data, and add more than 30 quality/trust scores to data points. Its Trusted Language Model (TLM) provides the first reliable way to assess the trustworthiness of LLM output.

Cleanlab总部位于旧金山,作为福布斯人工智能50强公司之一,得到了Menlo Ventures、Bain Capital Ventures、Databricks Ventures、TQ Ventures、Samsung Ventures等领先投资者的支持,以及包括雅虎、GitHub、Mosaic和Okta等公司首席执行官和创始人在内的天使投资人的支持。

Cleanlab Launches New Solution to Detect Artificial Intelligence Illusions

Read on

Strawberry Hospital has carried out HER-2 serological testing to bring precise measures to the evaluation of breast cancer treatment effect

Molecular marker detection of colorectal cancer丨The national team will take you to learn mCRC precision treatment

Pen and ink condense the fragrance of books and calculate the style of the exhibition - the test of Chinese character writing ability and mathematical calculation ability of Hongqi Primary School in Weiyang District

The big S event is escalated again! It is useless to use the test report to prove itself, and the three major doubts prove that there is another hidden situation

AI-driven "deep medicine" is transforming current healthcare practices

What should I do if my EAC Certification sample fails to pass the test?

Japanese media observation: Chinese cloud service providers are still waiting for the rain and dew of artificial intelligence

Heavy! Big S, Little S and Fan Xiaoxuan publish the test results! The comment area exploded!

The size of the urine is detected with a tranquilizer! The doctor analyzed that the drug is highly addictive, and Wang Xiaofei did not lie

Artificial intelligence is moving towards the new, and the industry model promotes new quality productivity and empowers thousands of industries

Artificial Intelligence Assistant Feels Family Affection: Dr. Sun Weidong's Lonely Wanderings and the Importance of Family Bonds

Artificial intelligence and extraterrestrial civilizations, the two threats to the future of mankind, which will come first?

The Israeli colonel made China an imaginary enemy and warned against China's AI cyber attacks

"I Am a Leader" AI explores the future

Grasp the "bull's nose" of artificial intelligence, and accelerate the cultivation and development of new quality productivity

How museums are committed to education and research in the age of artificial intelligence

Once tested for life! This hospital in Oil City has launched a personalized medication genetic testing program

MIT engineers create advanced lead detection equipment that delivers near-instantaneous and accurate results with a single drop of water

Ocean Flower Island Basement Quality Problem Tracking: Resampling and Testing

How AI developments affect workforce employment

Top 10 AI Chip Manufacturing Companies in 2024

Helin Micro-Nano: Da Mo Research Report strongly promotes + GB200 semiconductor testing + glass substrates

At a speed of 76,000 kilometers per hour, NASA detected that an asteroid with a diameter of 1.8 kilometers was heading towards the Earth

What is the principle of artificial intelligence?

Musk: Give me 25% of Tesla, otherwise divest artificial intelligence and robotics

After testing, the sugar content here in Hongqiao exceeded the standard!

The 2024 5th Science and Technology Innovation Convergence Artificial Intelligence Development Summit Forum was successfully concluded in Beijing

Which used car platform has more car sources? Which used car inspection platform is good?

Technology giants are actively deploying in the field of AI! Microsoft will hold its annual developer conference, or reveal plans for AI PCs

National Highway 245 New Deal Overrun Detection Station Traffic Technology Monitoring Equipment Will Be Launched Soon···

The first benchmark dataset for hyperspectral salient object detection was launched

Medical artificial intelligence is moving towards the "new".