laitimes

OpenAI曾秘密测试GPT-4o,登顶聊天机器人竞技场排行榜

author:IT House

IT Home reported on May 14 that OpenAI employee William Fedus confirmed on the social platform X on Monday that the mysterious chatbot "gpt-chatbot" that has recently performed well on the LMSYS chatbot arena (Chatbot Arena) is the new artificial intelligence model GPT-4o that they have just released. Fedus also revealed that GPT-4o topped the arena leaderboard in the test, achieving the highest score ever.

OpenAI曾秘密测试GPT-4o,登顶聊天机器人竞技场排行榜

"GPT-4o is our state-of-the-art, cutting-edge model," Fedus wrote on Twitter, "and we've been testing a version of the model in the arena under the name 'im-also-a-good-gpt2-chatbot.'" ”

OpenAI曾秘密测试GPT-4o,登顶聊天机器人竞技场排行榜
OpenAI曾秘密测试GPT-4o,登顶聊天机器人竞技场排行榜

A chatbot arena is a website where visitors can talk to two random AI language models at the same time without knowing which is which, and then choose the model that provides a better response.

Starting in April of this year, OpenAI tested multiple versions of GPT-4o in the arena, and the model initially appeared under the name "gpt2-chatbot", then became "im-a-good-gpt2-chatbot", and finally "im-also-a-good-gpt2-chatbot".

Since GPT-4o's release today, multiple sources have revealed that the model has topped LMSYS' internal leaderboard by a huge margin, surpassing the previous highest-ranked models, Claude 3 Opus and GPT-4 Turbo.

lmsys.org's official account shared a chart and wrote: "The 'GPT2-chatbot' family of models has just soared to the top of the list, surpassing all other models by a significant margin (around 50 ELO), and it has become the most powerful model in the arena." Here's an inside screenshot of the public version of 'GPT-4O' now in the arena and will soon be on the public leaderboards! ”

OpenAI曾秘密测试GPT-4o,登顶聊天机器人竞技场排行榜

截至IT之家发稿时,"im-also-a-good-gpt2-chatbot" 的 Elo 分数为 1309,领先于 GPT-4-Turbo-2023-04-09 的 1253 分和 Claude 3 Opus 的 1246 分。 在三个"GPT2-chatbot" 出现并搅局之前,Claude 3 和 GPT-4 Turbo 一直在排行榜上争夺冠军。

Read on