天天看点

会说多国语言的ChatGPT

作者:成熟可爱熊猫

Johnson:Speaking in many tongues

约翰逊专栏:会说多国语言

ChatGPT may make things up, but it does so fluently in more than 50 languages

ChatGPT或许“胡编乱造”,却能流利地使用50多种语言“胡编乱造”

【1】The hype that followed Chatgpt’s public launch last year was, even by the standards of tech innovations, extreme. Openai’s natural-language system creates recipes, writes computer code and parodies literary styles. Its latest iteration can even describe photographs. It has been hailed as a technological breakthrough on a par with the printing press. But it has not taken long for huge flaws to emerge, too. It sometimes “hallucinates” non-facts that it pronounces with perfect confidence, insisting on those falsehoods when queried. It also fails basic logic tests.

自去年ChatGPT发布后,其公司大肆宣传,即使是用科技创新的标准来看,这样宣传也过于极端。这款OpenAI发布的自然语言系统不仅可以谱写菜单、编写计算机代码,可以模仿各种文学风格。而它最新的版本甚至还可以描述照片内容。ChatGPT被誉为可以与印刷机相提并论的技术性突破。但没过多久,它也出现了巨大的漏洞。它有时会“幻想”出一些非事实性内容,即使受到质疑,它也仍坚持虚假的言说。连基本的逻辑测试,它也没能通过。

【2】In other words, Chatgpt is not a general artificial intelligence, an independent thinking machine. It is, in the jargon, a large language model. That means it is very good at predicting what kinds of words tend to follow which others, after being trained on a huge body of text—its developer, Openai, does not say exactly from where—and spotting patterns.

换而言之,ChatGPT不是一个通用人工智能,没有独立思考的能力。用行话来说,它只是一个大型的语言模型。这意味着,大量的文本训练后,ChatGPT善于预测哪些单词之后往往接着哪些其他单词,并找出其中的规律。其开发者OpenAI并未没有确切文本的来源。

【3】Amid the hype, it is easy to forget a minor miracle. Chatgpt has aced a problem that long served as a far-off dream for engineers: generating human-like language. Unlike earlier versions of the system, it can go on doing so for paragraphs on end without descending into incoherence. And this achievement’s dimensions are even greater than they seem at first glance. Chatgpt is not only able to generate remarkably realistic English. It is also able to instantly blurt out text in more than 50 languages—the precise number is apparently unknown to the system itself.

在其大肆宣传中,人们很容易忽略它的一个小成就。那就是ChatGPT已经解决了长期以来被工程师们视为遥不可及的梦的问题:生成人类相似的语言。不同于早期的语言系统,它可以连续不断地说上好几段,并且不会语无伦次。这个成就的影响比乍看之下时还要大。ChatGPT不仅能够生成逼真的英语,还能够即时说出50多种语言——确切的数字系统本身也显然不知。

【4】Asked (in Spanish) how many languages it can speak, Chatgpt replies, vaguely, “more than 50”, explaining that its ability to produce text will depend on how much training data is available for any given language. Then, asked a question in an unannounced switch to Portuguese, it offers up a sketch of your columnist’s biography in that language. Most of it was correct, but it had him studying the wrong subject at the wrong university. The language itself was impeccable.

如果用西班牙语问它会说多少种语言,ChatGPT只是模棱两可地回答,“50多种”,并解释它以某种语言产出文本的能力取决于这一语言的训练数据有多少。接着,在没有通知的情况下转而用葡萄牙语提问问它本专栏作者的简介时,它又用葡萄牙语提供了专栏作家的生平简介。大部分内容是正确的,只是作者的就读大学和专业搞错了。但就语言方面而言,是无可挑剔的。

【5】Portuguese is one of the world’s biggest languages. Trying out a smaller language, your columnist probed Chatgpt in Danish, spoken by only about 5.5m people. Danes do much of their online writing in English, so the training data for Danish must be orders of magnitude scarcer than what is available for English, Spanish or Portuguese. Chatgpt’s answers were factually askew but expressed in almost perfect Danish. (A tiny gender-agreement error was the only mistake caught in any of the languages tested.)

葡萄牙语是世界上最大的语种之一。为了试一个更小的语种,于是本专栏作者语言用丹麦语对ChatGPT进行了追问,世界上只有550万人口使用丹麦语,。丹麦人在网上写作时大多使用英语,所以丹麦语的训练数据一定比英语、西班牙语或葡萄牙语的数据少及格数量级。ChatGPT回答的内容实际上牛头不对马嘴,但其丹麦语几近完美。(在所有测试的语言中,只发现了一个细微的性别一致的错误。

【6】Indeed, Chatgpt is too modest about its own abilities. On request, it furnishes a list of 51 languages it can work in, including Esperanto, Kannada and Zulu. It declines to say that it can “speak” these languages, but rather “generates text” in them. This is too humble an answer. Addressed in Catalan—a language not on the list—it replies in that language with a cheerful “Yes, I do speak Catalan—what can I help you with?” A few follow-up questions do not trip it up in the slightest, including a query about whether it is merely translating answers first generated in another language into Catalan. This, Chatgpt denies: “I don’t translate from any other language; I look in my database for the best words and phrases to answer your questions.”

实际上,ChatGPT对自己的能力还是太过谦虚。它可以根据要求提供可使用的51种语言清单,包括世界语、卡纳达语、祖鲁语。它否认“会说”这些语言,而是称可以用这些语言“生成文本”。这种回答太谦虚了。如果用加泰罗尼亚(该语言并不在清单上)提问,它会用加泰罗尼亚语回答:“是的,我会说泰罗尼亚语,有什么可以帮您的?”一些后续问题丝毫没有难倒它,包括询问它:是否先用另一种语言生成的答案再翻译成加泰罗尼亚语回答的。对此,ChatGPT否认:“我并不会翻译任何语言来获得答案,而是从我的数据库中寻找最佳词语和短语来回答您的问题。”

【7】Who knows if this is true? Chatgpt not only makes things up, but incorrectly answers questions about the very conversation it is having. (It has no “memory”, but rather feeds the last few thousand words of each conversation back into itself as a new prompt. If you have been speaking English for a while it will “forget” that you asked a question in Danish earlier and say that the question was asked in English.) Chatgpt is untrustworthy not just about the world, but even about itself.

谁又知道它说的是真是假呢?ChatGPT不仅“编造”答案,并且无法正确回答长在进行的对话的问题。(它没有“记忆”,只是将每次对话最后几千字反馈给自己,作为新的提示符。如果你说了好一会英语,它就会“忘记”你之前用丹麦语问的问题,并且认为这个问题是用英语问的。)ChatGPT不仅对世界不可信,就连它自己也不可信。

【8】This should not overshadow the achievement of a model that can effortlessly mimic so many languages, including those with limited training data. Speakers of smaller languages have worried for years about language technologies passing them by. Their justifiable concern had two causes: the lesser incentive for companies to develop products in Icelandic or Maltese, and the relative lack of data to train them.

但这不应该掩盖这一模型的成就,它可以毫不费力地模仿如此多的语言,包括那些训练数据有限的语言。多年来,小语种使用者担心语言技术会将他们遗漏。他们的担心情有可原,主要是:一,企业开发冰岛语或马耳他语产品的积极性较低;二,相对缺乏有关数据训练。

【9】Somehow the developers of Chatgpt seem to have overcome such problems. It is too early to say what good the technology will do, but this alone gives one reason to be optimistic. As machine-learning techniques improve, they may not require the vast resources, in programming time or data, traditionally thought necessary to make sure smaller languages are not overlooked online.

Chatgpt的开发者似乎已经克服了这些问题。现在说这项技术会有什么好处还为时过早,但仅此一点就给了我们一个乐观的理由。随着机器学习技术的进步,它们可能不需要大量的资源,在编程时间或数据方面,传统上认为必须确保较小的语言不会在网上被忽视。

①短语:

1. 原文:It has been hailed as a technological breakthrough on a par with the printing press.

词典:be hailed as 被誉为

例句:Since the shooting, the policemen have been hailed as heroes.

枪击事件以来, 出事警官一直被誉为英雄人物。。

2. 原文:It has been hailed as a technological breakthrough on a par with the printing press.

词典:on a par with 与...同等;与...一样

例句:The water park will be on a par with some of the best public swim facilities around.

这个水上公园将与周围那些最好的公共游泳场所一样好。

3. 原文:insisting on those falsehoods when queried. It also fails basic logic tests

词典:insist on坚持

例句:He is an assertive boy, always insisting on his own rights and opinions.

他是个固执的孩子, 总是坚持自己的权力和主意。

4. 原文:it can go on doing so for paragraphs on end without descending into incoherence.

词典:descend into 落入

例句:Eventually, their civilization descended into chaos and warfare.

最后他们的文明也陷入混乱与战争中。

5.原文:It is also able to instantly blurt out text in more than 50 languages

词典: blurt out 脱口而出

例句:Peter blurted out the secret.

彼得脱口漏出了这个秘密。

6.原文:Then, asked a question in an unannounced switch to Portuguese,

词典:switch to 切换;转换

例句:To keep fit, you can switch to morning or lunchtime exercise.

为了保持健康,你可以转变到早上或午间锻炼。

7.原文:On request, it furnishes a list of 51 languages it can work in,

词典:On request 一经要求;根据要求

例句:Catalogues of our books will be sent on request.

书籍目录承索即寄。

8. 原文:A few follow-up questions do not trip it up in the slightest,

词典:trip up 绊倒;挑剔

例句:But though fortune may favour the brave, it can trip up the headstrong.

尽管运气可能会战胜勇气,但它也能成为绊脚之石。

9. 原文:A few follow-up questions do not trip it up in the slightest,

词典: in the slightest 丝毫;根本

例句:He didn't seem to mind in the slightest.

他好像一点都不在乎。

10.原文:Speakers of smaller languages have worried for years about language technologies passing them by.

词典:pass by 擦肩而过

例句:Birds were chattering somewhere, and occasionally he could hear a vehicle pass by.

鸟儿们在某处啁啾,偶尔他能听到一辆汽车经过。

②长难句

1.原文:That means it is very good at predicting what kinds of words tend to follow which others, after being trained on a huge body of text—its developer, Openai, does not say exactly from where—and spotting patterns.

2. 分析:本句比较多的从句。开头That means时主句,后面是一个宾语从句,省略了that;宾语从句的it是从句的主语,指Chatgpt,is系动词,good at 作表语表示“擅长”,后面接的就是擅长的东西“predicting,预测”;后面是what引导宾语从句,kinds of words是从句主语,tend to follow为谓语,which others为宾语,表示“哪一种其他的”;“after being trained on a huge body of text”是时间状语,中间破折号间插入了一个句子起补充说明的作用,“its developers它的开发者”是主语,Openai为developers的同位语,谓语为does not say;破折号结束后的and是连接predicting和spotting,表示chatgpt擅长“predicting”和“spotting”。

3.译文:这意味着,大量的文本训练后,ChatGPT善于预测哪些单词之后往往接着哪些其他单词,并找出其中的规律。其开发者OpenAI并未没有确切文本的来源。

1.原文:Asked (in Spanish) how many languages it can speak, Chatgpt replies, vaguely, “more than 50”, explaining that its ability to produce text will depend on how much training data is available for any given language.

分析:本句为主谓宾结构。句子主语为Chatgpt,谓语为replies;开头的Asked为非谓语做时间状语,表Chatgpt被问及“how many languages it can speak会说多少种语言”,这是how引导的宾语从句;explaining是非谓语做伴随状语,后面是that引导的宾语从句,从句主语是 its ability ,to produce text为非谓语做定语,修饰ability。谓语will depend on,how much引导宾语从句,表示“取决于什么东西”,即“training data is available for any given language 给定语言数据有多少”。

3.译文:如果用西班牙语问它会说多少种语言,ChatGPT只是模棱两可地回答,“50多种”,并解释它以某种语言产出文本的能力取决于这一语言的训练数据有多少。

③写作技巧:

In other words, Chatgpt is not a general artificial intelligence, an independent thinking machine.

换而言之,Chatgpt不是一个人工智能,没有独立思考的能力。

表达: In other words“换而言之”;“换句话来说”,写作时进一步论述观点的时候可以用这个表达。

例句:In other words, the technological advancement is a double-blade sword.

换句话说,技术的发展是一把双刃剑。

④背景知识:

Chatgpt:ChatGPT(全名:Chat Generative Pre-trained Transformer),美国OpenAI研发的聊天机器人程序,于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具,它能够通过理解和学习人类的语言来进行对话,还能根据聊天的上下文进行互动,真正像人类一样来聊天交流,甚至能完成撰写邮件、视频脚本、文案、翻译、代码,写论文等任务。ChatGPT 的使用上还有局限性,模型仍有优化空间。ChatGPT模型的能力上限是由奖励模型决定,该模型需要巨量的语料来拟合真实世界,对标注员的工作量以及综合素质要求较高。ChatGPT可能会出现创造不存在的知识,或者主观猜测提问者的意图等问题,模型的优化将是一个持续的过程。

⑤段落大意:

【1】被吹捧的Chatgpt也暴露出缺陷

【2】Chatgpt的实质——庞大的语言模型

【3】Chatgpt会使用超过50种语言

【4】&【5】Chatgpt语言的能力无可挑剔

【6】Chatgpt掌握比预期更多的语言

【7】Chatgpt的回答时而“牛头不对马嘴”

【8】&【9】小语种得益于Chatgpt而发展

继续阅读