一家名为Saama AI Labs发布了他们基于Llama 3 微调的开源医疗AI大模型OpenBioLLM-Llama3-70B 和 OpenBioLLM-Llama3-8B,刷新抱抱脸上的医疗大模型榜单,并占据榜首。 其在生物医学领域的测试性能超越 GPT-4、Gemini、Meditron-70B、Med-PaLM-2等行业巨头。
Benchmark performance
OpenBioLLM-70B demonstrated superior performance, surpassing larger models such as GPT-4, Gemini, Meditron-70B, and Med-PaLM-2 in 9 different biomedical datasets. Despite its small number of parameters compared to GPT-4 and Med-PaLM, it achieves the best results, with an impressive average score of 86.06%.
Detailed results of the accuracy of medical topics
Fine-tune the process
The fine-tuning process is carried out in two phases, using the LLama-3 70B and 8B models as the basis for fine-tuning
1. Strategy optimization: Optimize the DPO dataset and fine-tune the recipe with direct preference. Direct Preference Optimization: Your Language Model Is Actually a Reward Model arxiv.org/abs/2305.18290
2. Fine-tune the dataset: Customize the medical guidance dataset. It took about 4 months to collect the data, including 3000 healthcare and more than 10 medical subject data, working with medical experts to review their quality and filter out non-conforming examples. The dataset details have not yet been published.
An example of an official application
Summarize clinical records
OpenBioLLM efficiently analyzes and summarizes complex clinical records, EHR data, and discharge summaries, extracts key information, and generates concise, structured summaries of medical records.
Answer medical questions
OpenBioLLM can provide answers to a wide range of medical questions.
Classification of medical documents
Perform a variety of biomedical classification tasks, such as disease prediction, sentiment analysis, medical document classification
Clinical Entity Recognition
Advanced clinical entity recognition can be performed by identifying and extracting key medical concepts such as diseases, symptoms, medications, surgeries, and anatomical structures from unstructured clinical texts. By leveraging a deep understanding of medical terminology and context, the model can accurately annotate and classify clinical entities, enabling more efficient information retrieval, data analysis, and knowledge discovery from electronic health records, research articles, and other biomedical text sources. This capability can support a variety of downstream applications, such as clinical decision support, pharmacovigilance, and medical research.
Medical marker extraction
Identify and automatically delete patient information
Detect and delete personally identifiable information (PII) from medical records, ensure patient privacy, and comply with data protection regulations such as HIPAA.
Download and use the model
OpenBioLLM-70B 下载地址:huggingface.co/aaditya/Llama3-OpenBioLLM-70B
OpenBioLLM-8B 量化模型下载地址:huggingface.co/aaditya/OpenBioLLM-Llama3-8B-GGUF
It is important to note that you should use the chat templates provided by the Llama-3 guided version. Otherwise, performance will be degraded. Some users and I have already encountered the problem of answering the wrong question in the self-loop because of the prompt word template in the process of actually using Llama 3.
Officials also claim that while OpenBioLLM-70B and 8B utilize high-quality data sources, their output may still contain inaccuracies, biases, or inconsistencies, which could pose risks if relied upon for medical decisions without further testing and refinement. The performance of this model has not been rigorously evaluated in randomized controlled trials or in real-world healthcare settings.
If the model is deployed locally in the hospital and linked to the hospital's local knowledge base, a lot of practical applications can already be imagined, and it should become a super efficiency tool in the medical industry.