Hugging Face 发布医疗任务评估基准Open Medical-LLM

author：The Webmaster's House 2024-04-20 00:37:00

Highlights:

⭐️ Hugging Face has released a new medical task assessment benchmark designed to test the performance of generative AI models on health-related tasks.

⭐️ The Open Medical-LLM benchmark is a stitching of existing test sets that cover multiple areas of medicine, such as anatomy, pharmacology, genetics, and clinical practice.

⭐️ Some medical experts have warned against Open Medical-LLM, emphasizing that there is a large gap between actual clinical practice and answering medical questions, emphasizing that benchmark results are not a substitute for real-world testing.

Webmaster's Home (ChinaZ.com) April 19 News: Recently, Hugging Face released a new benchmark called Open Medical-LLM, which aims to evaluate the performance of generative AI models on health-related tasks.

The benchmark was created by Hugging Face in collaboration with researchers from the nonprofit Open Life Science AI and the University of Edinburgh's Natural Language Processing Group. The goal of Open Medical-LLM is to standardize and evaluate the performance of generative AI models on a range of medical-related tasks.

Rather than a benchmark from scratch, Open Medical-LLM is a patchwork of existing test sets (e.g., MedQA, PubMedQA, MedMCQA, etc.) covering multiple medical fields such as anatomy, pharmacology, genetics, and clinical practice. The benchmark test contains multiple-choice and open-ended questions that require medical reasoning and understanding, and covers the content of medical licensing exams in the United States and India, as well as college biology test question banks.

While Hugging Face sees the benchmark as a "sound assessment" of generative AI models in the medical community, some medical experts have warned about Open Medical-LLM on social media, pointing to a large gap between actual clinical practice and answers to medical questions. They emphasize that benchmark results are not a substitute for careful testing under real-world conditions.

In response, Clémentine Fourrier, a research scientist at Hugging Face, said on social media that these rankings can only be used as a first approximation to explore specific use cases, but that a more in-depth testing phase is actually needed to check the limitations and relevance of the model under real-world conditions. She points out that medical models must never be used by patients alone, but should be trained as a support tool for doctors.

While benchmarks such as Open Medical-LLM are instructive, the results leaderboard also reflects the model's poor performance in answering basic health questions. However, Open Medical-LLM and any other benchmarks are no substitute for well-thought-out real-world testing. For example, Google has tried to introduce an AI tool for diabetic retinopathy screening into Thailand's healthcare system, but despite its high theoretical accuracy, the tool has performed poorly in real-world tests, resulting in frustration among patients and nurses about the inconsistency of their results and a lack of coordination with actual clinical practice.

To date, none of the 139 AI-related medical devices approved by the U.S. Food and Drug Administration use generative AI. Testing how the performance of generative AI tools in the lab translates to real-world conditions in hospitals and outpatient clinics, and how those results may trend over time, can be challenging.

Official blog: https://huggingface.co/blog/leaderboard-medicalllm

Hugging Face 发布医疗任务评估基准Open Medical-LLM

Read on

Cut off the chain of medical corruption!4 On the 30th, the right of hospitals to collect payments will be abolished

Before lung cancer comes, you will first experience these symptoms! Advice: Once it appears, don't easily ignore "I thought it was just a cold, but I didn't expect it to be lung cancer." Mr. Li recalled when he was diagnosed

The doctor bluntly said: There is no one of the three major symptoms, congratulations to the heart is still healthy, "Doctor, my chest hurts, is there something wrong with the heart?"

Frequent itchy ears are indicative of disease? Should itchy ears be removed? The doctor gave the truth: "Recently, my ears have been itching, and I can't help but want to pick them every time, but I heard that this is not good, what should I do?"

Jiu'an Medical's annual revenue fell by 88% and net profit fell by 92%, what should the giant Jiu'an Medical think?

The Jiangling County Medical Security Bureau carried out the theme activity of the medical insurance policy publicity into the community and the centralized publicity month of medical insurance fund supervision

The medical insurance reform for new urban and rural residents, the individual payment is cleared, and high-quality medical services are more accessible to the people!

The "Thread of Life" Should Not Be Cut Lightly: Reflections on Medical Ethics and Life-and-Death Decisions

Stick to the health defense line and build a warm May Day together - the 920th Hospital will make every effort to ensure holiday medical services

Medical experts disclosed the condition of 30 injured people in the Meida Expressway collapse: one person was critically injured, and many were younger than 18 years old, mainly suffering from head injuries, bone injuries, and lung contusions

What is the financial quality of the sixth episode of the first season of the transformation medical device industry analysis?

Transforming the medical device industry analysis in the first season of the fifth episode of the orthopedic leader, what is the way forward?

Huimin County Women's Hospital provides medical security for a number of school sports meetings

Wang Liying: Across the mountains and seas, we provide support, and medical assistance is responsible

Compared with the logistics support system with a clear division of labor among the Autobots, the Decepticons can be said to be all soldiers and basically combatants. But that doesn't mean they're injured later

Medical experts disclosed the situation of 30 injured people in the Meida Expressway collapse: bone injuries, craniocerebral trauma, and pulmonary contusions