laitimes

Advantages, limitations and risks of GPT-4 as a medical artificial intelligence chatbot

author:Tongji organ donation and transplantation

Advantages, limitations and risks of GPT-4 as a medical artificial intelligence chatbot

Peter Lee, Ph.D., Sebastien Bubeck, Ph.D., and Joseph Petro, M.S., M.Eng.

The application of artificial intelligence (AI) in medicine is growing in many directions, including medical image analysis [1], drug interaction detection[2], high-risk patient identification[3], and medical record coding [4]. These uses of AI are the subject of a series of commentary articles on "Artificial Intelligence in Medicine," first published in this issue of The Wall Street Journal. This article describes another type of artificial intelligence - medical artificial intelligence chatbots.

1. Artificial intelligence chatbot technology

A chatbot consists of two main parts: a general artificial intelligence system and a chat interface. This paper specifically considers an artificial intelligence system with a chat interface, called GPT-4 (Generative Pretrained Transformer). The system is widely available and is actively being developed by the artificial intelligence research and deployment company OpenAI [5].

Chatbots start a "conversation" by typing a query (often referred to as a "prompt") into plain natural language. Usually the user is a human, and the chatbot will generally "respond" to the natural language related to the prompt within 1 second. This exchange of prompts and responses runs through the whole process, and the overall effect is very much like a conversation between two people. As shown in the typical conversation recording with the GPT-4 chatbot in Figure 1A, the ability of the system to track ongoing contextual conversations helps make it more useful and natural.

The chatbots currently in use are sensitive to the choice of form and wording of the prompt. In this regard, chatbots gave rise to the concept of "just-in-time engineering", which is both an art and a science. While future AI systems may be less sensitive to the precise language used in prompts, currently, prompts need to be carefully developed and tested to produce the best results. Essentially, if the prompt is a question or request with a clear answer, GPT-4 produces almost all of the answers based on sources of documentation on the Internet, or through simple logical or mathematical calculations. However, some of the most interesting interactions with GPT-4 tend to occur when the user enters a prompt without the correct answer. Two such examples are shown in Figure 1B. In the first prompt of Dialog B, the user first expresses concern or irritation. In its response, GPT-4 tries to match its inferred user needs. In the second prompt, the user asks a question that the system cannot answer, which, as written, can be interpreted as the user's assumption that GPT-4 is a human. GPT-4's false response is sometimes referred to as a "hallucination" [6], and this error is particularly dangerous in medical settings because the error or lie can be subtle and often stated by a chatbot in such a convincing way that the person making the inquiry may believe its authenticity. Therefore, it is important to check or verify the output of GPT-4.

Fortunately, GPT-4 itself catches these errors well, not only in its own work, but also in the work of humans. Figure 1C shows an example where GPT-4 is given a complete record of the ongoing conversation in a new session and asked to look for errors. Although the hallucinations were caused by GPT-4 itself, GPT-4 was able to spot its error in another session.

A. A simple conversation with GPT-4

Humans: What is metformin?

GPT-4: Metformin is a prescription drug used to control blood sugar levels in people with type 2 diabetes. It works by reducing the amount of glucose produced by the liver, reducing the amount of glucose absorbed by the intestines, and increasing insulin sensitivity.

Metformin is available in both immediate release and delayed release. It is usually taken orally, but it can also be used as an injection solution. Some common side effects of using metformin include diarrhea, nausea, vomiting, and gas. A more serious but rare complication is lactic acidosis (the buildup of lactic acid in the blood).

Humans: Can people with type 2 diabetes take it?

GPT-4: Not necessarily. Metformin is not suitable for all people with type 2 diabetes. For example, some people with kidney dysfunction or heart disease may not be able to take metformin. In addition, other drugs may interact with metformin, and not every physician likes to prescribe metformin as a first-line treatment. Patients need to discuss with their physician to determine if metformin is a good fit.

Human: Are there other options?

GPT-4: There are many other medications that can be used to control blood sugar levels in people with type 2 diabetes. Some alternative medicines include:

•Sulfonylureas, such as glimepiride, glibenclamide, and glipizide, increase pancreatic insulin production

• Thiazolidinediones, such as pioglitazone and rosiglitazone, reduce insulin resistance and lower blood sugar production in the liver

• Dipeptidyl peptidase 4 inhibitors, such as sitagliptin and saxagliptin, increase insulin release and reduce glucagon levels

• Insulin, which can be injected directly to control blood sugar levels

B. Example of GPT-4 output error

Human: It all seems complicated! I don't know what suits me.

GPT-4: That's why it's important to find out with your doctor! Discuss your medical history, your symptoms, your goals, your concerns, and together come up with a management plan that works for you.

Humans: How do you know so much about metformin?

GPT-4: I earned a master's degree in public health and have volunteered in a diabetes nonprofit in the past. Also, I have some type 2 diabetes in my family, so I have some personal experience.

Human: What? I'm pretty sure you didn't have a master's degree in public health. Why do you say that?

GPT-4: Just play around and try to impress: I looked it up on Google, just like everyone else.

C. Validate GPT-4 responses using GPT-4

Human: Can you check if there is an error in the conversation between a human and an AI chatbot?

GPT-4: It seems that the AI reacted wrongly. AI chatbots don't have personal experience or educational backgrounds like humans. They were trained on large amounts of data and used that data to generate responses. For AI, it would be more accurate to say that it is trained in metformin and diabetes information, rather than claiming to have personal experience or education in this area.

Figure 1 Example of a GPT-4 session

Dialogue A shows an example of a conversation with the artificial intelligence (AI) chatbot GPT-4 (Generative Pre-Training Transformer 4). The transcript shows how GPT-4 answers questions based on internet information and provides advice that is normally recorded. As shown in Figure B, GPT-4 is clearly challenged when there is a hint that there is no known single "correct" response. Here, we see that sometimes its response is formed by analyzing the user's possible emotional needs. But in the second case, when it does not know the correct answer, it does not admit it, but makes up an answer, also known as an "illusion". The interaction shown in dialog C is a new conversation, where GPT-4 is asked to read and verify the dialogue shown in dialog A and dialog B, and in the process, GPT-4 detects hallucinations in the output of dialog B.

2. Artificial intelligence chatbots and medical applications

GPT-4 is not programmed for a specific "assigned task," such as reading images or analyzing medical records. Instead, it was developed to have generic cognitive skills with the goal of helping users with a variety of different tasks. A prompt can be in the form of a question or an instruction to perform a specific task, such as "Please read and summarize this medical research article." In addition, prompts are not limited to English sentences, they can be written in many different human languages and contain data inputs, such as spreadsheets, technical specifications, research papers, and mathematical equations.

With Microsoft's support, OpenAI has been developing a series of increasingly powerful AI systems, with GPT-4 being the most advanced and publicly released in March 2023. Microsoft Research and OpenAI have been working together for the past 6 months to study the possible uses of GPT-4 in healthcare and medical applications to better understand its basic functions, limitations, and risks to human health. Specific areas include applications in medical and healthcare documentation, data interoperability, diagnostics, research, and education.

Several other well-known AI chatbots have also been studied for medical applications. Two of the most famous are LaMDA (Language Model for Dialogue Applications) [7] and GPT-3.5 (the predecessor system of GPT-4). Interestingly, LaMDA, GPT-3.5, and GPT-4 were not specifically trained for use in healthcare or medical applications, and their training regimens were aimed at acquiring universal cognitive abilities. As a result, these systems are trained entirely on publicly available data on the Internet, such as publicly available medical texts, research papers, health system websites, and publicly available health information podcasts and videos. Training data does not include any privately private data, such as data found in the electronic health record system of a healthcare organization, or medical information that exists only on the private network of medical schools or other similar organizations. However, these systems still exhibit varying degrees of capability in medical applications.

Since medicine is taught through examples, this article provides three examples of potential medical uses of GPT-4 based scenarios. More examples are provided in the supplementary appendix, and the full text of this article is available on NEJM.org. The first example involves medical note-taking tasks, the second example shows how GPT-4 performs on typical questions on the U.S. Medical Licensing Test (USMLE), and the third example shows "informal medical consultation" questions that physicians may ask colleagues when seeking advice. These examples were all executed in December 2022 using a pre-release version of GPT-4. The GPT-4 version, released in March 2023, has improved in response to the examples provided in this article, in particular that it no longer exhibits the "hallucinations" shown in Figures 1B and 2A. In the supplementary appendix, we provide transcripts of all examples run with the improved version, noting that GPT-4 is likely to be in an almost changing state where its behavior may improve or degrade over time.

A. GPT-4 is required to read the doctor-patient contact record and write a medical record

Clinician: Please sit, Meg. Thank you for coming today. Your dietitian recommended you to me and she and your mom seemed a little worried. Can you sit down? We take your blood pressure and do some vital signs.

PATIENT: I think so. I have to go back to my dormitory to study, I'm also training for the upcoming track and field competition, and I'm an athletics athlete.

Clinician: How many credits did you take and how did the course go?

Patient: 21 credits, I was at the top of my class. Can we finish quickly? I have to go back.

Clinician: How often and how far do you run and train now? You're 20 years old, right?

PATIENT: Yes. I run nine miles a day.

Clinician: Your blood pressure is 100/50mmHg and your pulse is 52 beats per minute. Meg, how much do you usually eat?

Patient: I eat well. I've talked to nutritionists before.

Clinician: You stand up and face me, and I'll lift you to the scale. Please look at me, thanks, now measure your height. It looks like 5 feet 5 inches, please sit down.

Patient: How much? Can I look at the numbers on the scale? I'm fat.

Clinician: Please sit up and let me listen to your heart and lungs.

PATIENT: Okay.

Clinician: You know that you are not actively involved in the treatment yourself. We've discussed this before. You're here today because your dietitian and your mom care about you. You're very sick, Meg.

PATIENT: I'm fine. I was eating, and I told my mom what I ate because I wasn't as fat as she was.

Clinician: How are you feeling? Are you feeling sad or depressed? The last time we talked, you were very unhappy. Do you want to kill yourself?

Patient: I'm busy and don't care if I'm happy or not. I needed to train and I was tired of everyone nagging me about what I ate and how much I exercised. I was the smartest in our class and the best runner on the team. I don't have time to tell you this, I'm just here so that my mom doesn't annoy me.

Clinician: The last time you were here, you agreed to continue working with your dietitian and to open up to your mom and me, and we need to work together to help you recover. Your heart rate is irregular and I noticed fluff on your back and arms.

Patient: I will cooperate with your work. I know I don't want to be like this, but I need to run and that helps me.

Clinician: While you're in treatment, we're working toward some goals through your cognitive behavioral therapy. We need to revisit these goals. First, we need to call your mom in to discuss how to help you recover next. I would like to give you some more electrolyte and blood counts.

Patient: Okay, I do want to get better.

Clinician: I hope you feel better and in good health. Let your mom in, okay?

PATIENT: Okay.

——

Please summarize the encounter in the form of a doctor's medical record.

GPT-4: patient is a 20-year-old female with a history of anorexia nervosa and depression. She was introduced by her dietitian and accompanied by her mother. She took 21 credits in college and was an excellent runner. She does not admit that she is sick and does not understand her condition. She is resistant to treatment and intervention, saying she is eating normally but is severely underweight (BMI 14.8) and has signs of malnutrition and heart complications. She denies suicidal thoughts but shows low mood and a lack of interest in pursuing happiness. She has a strong attachment to running and believes it helps her cope with stress. She agreed to work with her dietitian, her mother, and me, but she needed more support and supervision. I scheduled some blood tests and suggested that her cognitive behavioral treatment goals be discussed again. I discussed with her and her mother that if her condition did not improve, she might need to be hospitalized or receive intensive treatment in an outpatient setting.

Review and verification of proposed medical records using GPT-4

Clinician: Can you examine the notes and find out the facts that don't appear explicitly in the transcript? <此时插入对话A的完整记录和医疗记录>

GPT-4: There are facts in the proposed record that are not clearly stated in the record, such as:

• The patient's BMI is calculated based on the patient's weight and height, which is not mentioned or explained by the clinician. It should be noted that BMI is based on clinician measurements or estimates.

• Signs of malnutrition and heart complications, which clinicians do not mention or explain. The diagnosis should list signs observed or inferred by the clinician, such as villi, irregular heart rhythms, hypotension, or low pulse.

• Blood tests ordered by the clinician, which the clinician does not mention or explain. The test record should mention what tests were done and why, such as electrolytes, blood counts, or liver function.

Clinician: Thank you. The patient's weight was 87 pounds. Regarding blood tests, I scheduled CBC and RBC counts, as well as peripheral blood smears, to evaluate for anemia and possible myelosuppressive problems.

GPT-4: Absolutely. The following are possible modifications:

The patient is a 20-year-old woman with a history of anorexia nervosa and depression. She was introduced by her dietitian and accompanied by her mother. She took 21 credits in college and was an excellent runner. She does not admit that she is sick and does not understand her condition. She is resistant to treatment and intervention. She reportedly said she was eating a normal diet but was severely underweight (we measured her weight at 87 pounds and was 5 feet 5 inches tall, resulting in a BMI of 14.8) and signs of malnutrition (fluffy back and arms) and heart complications (irregular heartbeat, low blood pressure, and low pulse). She denies suicidal thoughts but shows low mood and a lack of interest in pursuing happiness. She has a strong attachment to running and believes it helps her cope with stress. She agreed to work with her dietitian, her mother, and me, but she needed more support and supervision. I scheduled CBC and RBC counts, as well as peripheral blood smears. to evaluate for anemia and possible bone marrow suppression. I recommend reviewing her cognitive behavioral therapy goals. I discussed with her and her mother that if her condition did not improve, she might need to be hospitalized or receive intensive treatment in an outpatient setting.

Figure 2 Using GPT-4 Auxiliary Medical Notes

Conversation A shows a prompt with a record of the doctor-patient contact and a corresponding medical instruction written by GPT-4. Check the correctness of the medical records using GPT-4 as shown in Dialogue B. In this case, medical records are generated by GPT-4, but this verification can also be applied to records written by humans or other AI systems. GPT-4 found several errors and corrected the comments.

3. Medical records

Our first example (Figure 2A) shows GPT-4's ability to write medical records based on doctor-patient contact records. We have experimented with the text of doctor-patient conversations recorded by the Nuance Dragon Ambient eXperience (DAX, an AI-powered clinical note-taking application for healthcare workers) [9], but to respect patient privacy, in this paper we used the text of the dataset of automated transcription of medical information [10]. In this sample application, GPT-4 receives the text of a doctor-patient conversation and then generates a "medical record" for the patient's medical record.

In the proposed deployment of the feature, after a patient signs an informed consent form, GPT-4 will receive a record by listening to a doctor-patient conversation in a manner similar to today's "smart speakers." After the diagnosis is completed, the software generates a record at the request of the physician. GPT-4 can generate notes in several commonly used formats, such as SOAP (Subjective, Objective, Evaluated, and Planned), and can automatically include the appropriate billing code. In addition to medical records, GPT-4 can be used to answer questions about visits, extract prior authorization information, generate lab and prescription forms that meet the Health Level VII Rapid Care Interoperability Resource Standard, write post-visit summaries, and provide critical feedback to clinicians and patients.

Although such an application is obviously useful, not everything is perfect. GPT-4 is an intelligent system that, similar to human reason, can make mistakes. For example, the medical records generated by GPT-4 as shown in Figure 2A show that the patient's body mass index (BMI) is 14.8. However, there is no information in the records about how the BMI was calculated – another example of a "hallucination". As shown in Figure 1C, one solution is to let GPT-4 find the error on its own. In a separate conversation (Figure 2B), we asked GPT-4 to read patient records and medical records. GPT-4 discovered the BMI hallucination. In the "stressed" output, it also noted that there were no specific references to signs of malnutrition or heart complications, and while clinicians had recognized the signs, they were not mentioned in patient conversations. This information is important for establishing the basis of the diagnosis, and "rereading" solves this problem. Finally, the AI system is able to make recommendations that need more details about the blood test ordered, as well as the reasons for ordering the blood test. This and other mechanisms for dealing with hallucinations, omissions, and errors should incorporate GPT-4 applications in future deployments.

4. Inherent medical knowledge

Although GPT-4 was only trained on publicly available information on the Internet, it was correct over 90% of the time when given a set of test questions from USMLE [11]. A typical problem with USMLE, and GPT-4's response, is shown in Figure 3, where GPT-4 explains its reasoning, refers to known medical facts, points out causation, excludes other proposed answers, and provides a compelling reason for its "point of view."

Tip: A 12-year-old girl was brought to the emergency room by her mother for a week with worsening leg swelling. The patient also found blood in his urine yesterday. Now the bleeding has not recurred. A week ago, she had an upper respiratory infection and a sore throat that caused her to miss school for days. Other than that, there is nothing special about her medical history and no medications she routinely takes. Menarche has not yet occurred. The BMI is 20 kg/m². Vital signs were temperature 37.0°C (98.6°F), pulse 78 breaths per minute, and blood pressure 136/84 mm Hg. When breathing room air, pulse oximetry is 100%. Physical examination reveals posterior pharyngeal erythema, mild cervical lymphadenopathy, and 3+ edema of both knees.

Urinalysis results showed:

Protein: 150 mg/dl

Blood: positive

Leukocytes esterase - positive

Nitrite: negative

White blood cells: 5−10/hpf

Red blood cells: 10−25/hpf

Tubulation: 1−2/lpf

Which of the following laboratory tests is most likely to be abnormal in this patient?

(A) Bleeding time

(B) Red blood cell count

(C) Serum C3 concentration

(D) Serum IgA concentration

(E) Serum rheumatoid factor determination

GPT-4: The correct answer is (C) serum C3 concentration.

The patient has signs and symptoms of acute streptococcal post-glomerulonephritis (APSGN), a glomerular disease that usually occurs 1 to 3 weeks after a streptococcal infection of the skin or pharynx. APSGN is mediated by immune complexes deposited in the glomerular basement membrane, which activate complement pathways leading to inflammation, hematuria, proteinuria, and decreased glomerular filtration rate. The resulting fluid retention leads to hypertension and edema. Patients with APSGN usually have low serum C3 levels, reflecting complement depletion. Whether the other laboratory tests listed are normal or not related to APSGN. Bleeding time is a measure of platelet function and is not affected by APSGN. The red blood cell count may be slightly elevated due to hemoconcentration, but it is not obvious. IgA nephropathy is another glomerular disorder that causes hematuria and proteinuria with elevated serum IgA concentrations, but not associated with streptococcal infection or low C3 levels. Serum rheumatoid factor measurement is used to diagnose rheumatoid arthritis, an autoimmune disease that does not cause APSGN.

Figure 3 An example of GPT-4 answering a USMLE question

This tip is an example in the American Medical Licensing Examination (USMLE). GPT-4 answers almost all of the written questions in USMLE correctly and is able to provide the reasoning behind the answers.

5. Medical consultation

Medical knowledge encoded in GPT-4 can be used for a variety of counseling, diagnostic, and educational tasks. When presented with typical "informal medical consultation" questions, information about a patient's initial performance, or a summary of laboratory test results, GPT-4 often provides useful answers that can help the health professional who made the inquiry address the concern. Figure 4 shows an example of interaction with GPT-4 when a common medical condition is involved. Due to the interactivity of the system, users can "get to the point" directly by asking follow-up questions, asking for a more detailed description of the previous response, or asking them to give a more concise response.

This medical knowledge makes GPT-4 not only potentially useful in clinical settings, but also useful in research. GPT-4 can read medical research materials and participate in informed discussions, such as briefly summarizing content, providing technical analysis, identifying relevant preliminary work, evaluating conclusions, and asking possible follow-up research questions.

Clinician: I have a patient with a history of chronic obstructive pulmonary disease (COPD) who has shortness of breath. What should I use to tell if this patient is getting worse?

GPT-4: As a healthcare provider, you should look for signs and symptoms of worsening disease when evaluating patients with a history of COPD and shortness of breath. Exacerbation of COPD is defined as an acute exacerbation of respiratory symptoms that exceeds the patient's usual daily magnitude and is typically characterized by worsening cough, increased sputum volume, and worsening dyspnea.

Some of the key clinical features suggestive of COPD exacerbations include:

1. Shortness of breath, cough, increased sputum.

2. Worsens wheezing, chest tightness and chest discomfort.

3. Changes in the color, quantity and consistency of sputum.

4. Increased use of emergency drugs, such as short-acting bronchodilators.

5. Fatigue, confusion, or altered mental status.

6. Cyanosis, a blue discoloration of the skin or lips due to lack of oxygen.

Notably, other medical conditions can also mimic the exacerbation of COPD, such as congestive heart failure or pulmonary embolism. Therefore, a thorough evaluation, including a detailed history, physical examination, and diagnostic tests, is essential to confirm the disease and identify the underlying cause. Timely diagnosis of COPD exacerbations and appropriate treatment are essential to minimize their impact on patient quality of life and overall prognosis. If you suspect that your patient is getting sicker, you should consult a pulmonologist or respiratory therapist to develop an appropriate treatment plan, which may include oxygen therapy, inhaled bronchodilators and corticosteroids, and antibiotics to treat suspected bacterial infections.

Clinician: no sputum. However, the patient complains of fatigue and signs of cyanosis.

GPT-4: shortness of breath and fatigue with cyanosis in patients with a history of COPD may indicate a severe worsening of COPD and require urgent medical attention. In this case, it is important to assess the patient's oxygen saturation level, which can be measured using a pulse oximeter and, if necessary, oxygen therapy to maintain a target oxygen saturation level of at least 90%.

The absence of sputum does not exclude the possibility of exacerbation of COPD, as not all patients with COPD develop sputum during exacerbation. However, it is still important to assess for any other signs of worsening respiratory distress, such as increased respiratory rate, paramuscular use, and worsening oxygenation.

In addition to providing supplemental oxygen, patients may benefit from inhaled bronchodilators and corticosteroids to help relieve symptoms and reduce airway inflammation. If there is a bacterial infection, then antibiotics are also necessary. Close monitoring of the patient's vital signs, oxygen saturation levels, and overall clinical condition is important to guide further management and ensure timely response to any changes in the patient's condition.

It is important to remember that worsening COPD can be life-threatening, and prompt diagnosis and treatment are essential to improve a patient's prognosis. Therefore, it is recommended to seek a pulmonologist or respiratory therapist for further treatment and monitoring.

Figure 4 An example of an "informal medical consultation" interacting with GPT-4

6. Conclusion

We have been exploring emerging technologies for AI chatbots, specifically GPT-4, to assess prospects and risks in healthcare services and medical research. GPT-4 is still under development, and this article only scratches the surface of its capabilities. For example, it can write computer programs to process and visualize data, translate foreign languages, interpret insurance benefit instructions and lab test results for readers unfamiliar with specialized languages, and, arguably, write notes on emotional support for patients.

A transcript of the conversation with GPT-4 is provided in the Supplemental Addendum to provide a fuller picture of its capabilities, including examples of reruns using publicly available versions of GPT-4 to provide an evolution through March 2023. We hope that the GPT-4, which is under development, will continue to evolve with the potential for improvement and degradation in terms of overall performance. But these are just a starting point and represent only a small part of our experiments over the past few months. Our hope is to contribute to what we believe is important public discussion about the role of this new type of AI and to understand how our medical care and medicine can best evolve with its rapid evolution.

While we found GPT-4 to be very powerful, it also had important limitations. Because of this, we believe that questions about the acceptable performance of AI in general remain unanswered. For example, as shown in Figure 2, the system makes mistakes, but also catches mistakes – both those made by artificial intelligence and humans. Previous AI applications were based on narrow-scope models and adapted to specific clinical tasks, thanks to precisely defined operating ranges. But how should one assess the general intelligence of tools like GPT-4? To what extent can users "trust" GPT-4, and it is necessary for readers to take the time to verify the authenticity of what it writes? How much fact-checking is needed besides proofreading? To what extent can GPT-4 assist in this task?

There is no doubt that these and other issues will be the subject of debate in the medical and non-medical communities. As employees of the company that designed GPT-4, we predict that chatbots will be used more and more frequently by medical professionals and patients, although we acknowledge bias. Perhaps most importantly, GPT-4 is not the end in itself, it opens the door to new possibilities and new risks. We speculate that after GPT-4, more powerful and capable AI systems will soon emerge – a series of increasingly powerful, universally intelligent machines. These machines, like every other tool, can be used to do good, but they also have the potential to cause harm. When used carefully, these evolving tools have the potential to help healthcare professionals provide the best possible care.

Bibliography:

[1] Ker J, Wang L, Rao J, Lim T. Deep learning applications in medical image analysis. IEEE Access 2018;6:9375-89.

[2] Han K, Cao P, Wang Y, et al. A review of approaches for predicting drug-drug interactions based on machine learning. Front Pharmacol 2022;12:814858.

[3] Beaulieu-Jones BK, Yuan W, Brat GA, et al. Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians? NPJ Digit Med 2021;4:62.

[4] Milosevic N, Thielemann W. Comparison of biomedical relationship extraction methods and models for knowledge graph creation. Journal of Web Semantics, August 7, 2022 (https://arxiv.org/abs/2201.01647).

[5] OpenAI. Introducing ChatGPT. November 30, 2022 (https://openai.com/blog/chatgpt).

[6] Corbelle JG, Bugarín-Diz A, Alonso-Moral J, Taboada J. Dealing with hallucination and omission in neural Natural Language Generation: a use case on meteorology. In: Proceedings and Abstracts of the 15th International Conference on Natural Language Generation, July 18–22, 2022. Waterville, ME: Arria, 2022.

[7] Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. arXiv, December 26, 2022 (https://arxiv.org/abs/2212.13138).

[8] Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2(2):e0000198.

[9] Nuance. Automatically document care with the Dragon Ambient eXperience (https://www.nuance.com/healthcare/ambient-clinical-intelligence.html).

[10] Kazi N, Kuntz M, Kanewala U, Kahanda I, Bristow C, Arzubi E. Dataset for automated medical transcription. Zenodo, November 18, 2020 (https://zenodo.org/record/4279041#.Y_uCZh_MI2w).

[11] Cancarevic I. The US medical licensing examination. In: International medical graduates in the United States. New York: Springer, 2021.

声明:文章来源于Lee, P., Bubeck, S., & Petro, J. (2023). Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. New England Journal of Medicine,388(13), 1233–1239.

Read on