
Artificial intelligence chatbot – ChatGPT


ChatGPT is an AI chatbot developed by OpenAI and released in November 2022. It is a task-specific GPT tuned based on OpenAI's GPT-3.5 model (an improved GPT-3 model). It can interact with the user in a conversational way, answer follow-up questions, admit its mistakes, challenge wrong premises, and reject inappropriate requests. It is the sibling model of InstructGPT, a model capable of providing detailed answers based on instructions in prompts.

Artificial intelligence chatbot – ChatGPT

ChatGPT is trained using a human feedback reinforcement learning (RLHF) method, which is the same as InstructGPT's method, but slightly different in terms of data collection. First, an initial model was trained using supervised fine-tuning: AI trainers provided them with conversations between the user and the AI assistant. Trainers can use the suggestions generated by the model to help them write their answers. This new dialog dataset is then mixed with the InstructGPT dataset converted to a dialog format. To create a reward model for reinforcement learning, you need to collect comparative data, that is, two or more models answered by quality. To collect this data, the conversations conducted by the AI trainer with the chatbot were taken out. Randomly select a message generated by a model, sample several substitutions done, and have the AI trainer rank them. Using these reward models, the model can be fine-tuned using near-end policy optimization (PPO). This process went through several iterations.

Artificial intelligence chatbot – ChatGPT

ChatGPT is fine-tuned from a model in the GPT-3.5 series that completed training in early 2022. Both ChatGPT and GPT-3.5 are trained on the Azure AI supercomputing infrastructure.

ChatGPT limitations:

  • ChatGPT sometimes writes plausible but wrong or meaningless answers. Fixing this issue is difficult because: (1) during RL training, there is currently no real source; (2) training the model to become more cautious causes it to reject questions that can be answered correctly; (3) Supervised training misleads the model because the ideal answer depends on what the model knows, not what the human presenter knows.
  • ChatGPT is very sensitive to typing phrasing or trying the same prompt multiple times. For example, given one wording of a question, the model might claim not to know the answer, but after changing the wording slightly, it can answer 2 correctly.
  • Models are often overly verbose and overuse certain phrases, such as reiterating that it is a language model trained by OpenAI. These problems stem from bias in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization problems
Artificial intelligence chatbot – ChatGPT

Read on