ChatGPT与乔姆斯基: 人类如何习得语言(下)
日期:2023-05-05 10:00

(单词翻译:单击)

xX(c8niYEAx0FRW;CGqA54[

Yet it is hard, for several reasons, to fathom what LLMs "think".

eW+#R~)v!0)wic

但由于一些原因,很难弄清楚大型语言模型在"想什么"AL=q|yw+bs@0

r=+BBLez4UZEZ@ve;

Details of the programming and training data of commercial ones like ChatGPT are proprietary.

8!Y(R0P!.21

像ChatGPT这种商业语言模型的编程和训练数据的细节为其公司所有8ACE7gip&4@8

Wr8P@Ih|;kR

And not even the programmers know exactly what is going on inside.

ZwGaLEhTl|H+7cU6e

连程序员都不知道模型内部到底发生了什么-[+10tL5TIC7BOykYU

h[2oG%*X!^,yJX^6*

Linguists have, however, found clever ways to test LLMs' underlying knowledge, in effect tricking them with probing tests.

J66^3NY9Zzd-0u

然而,语言学家已经找到了测试大型语言模型的潜在知识的巧妙办法,实际上就是用试探性的测试来欺骗它们ed&kt5Kf8sn&n

bTr|G6Xs5b]!;|PuGV

And indeed, LLMs seem to learn nested, hierarchical grammatical structures, even though they are exposed to only linear input, ie, strings of text.

sX48^@G.XQtI@3

事实上,虽然大型语言模型只接触了线性的输入内容,即文本字符串,但它们似乎学会了嵌套的层级语法结构Znn0)r#y%)[

YJ=B].N@~KZw=m9vO^y

They can handle novel words and grasp parts of speech.

RhiQ-(.CQ3svLn&Cm

它们能够处理新造词并掌握词性nac;CrboMEtcUB

l422A|nIs]t~N2

Tell ChatGPT that "dax" is a verb meaning to eat a slice of pizza by folding it, and the system deploys it easily: "After a long day at work, I like to relax and dax on a slice of pizza while watching my favourite TV show."

^R+URvrUYWl3JYo.1s12

告诉ChatGPT,dax是一个动词,意思是把披萨折叠起来吃,然后它就能轻松地运用这个单词:"结束了漫长的一天的工作之后,我喜欢放松一下,一边把披萨折叠起来吃,一边看我最喜欢的电视节目;n@b37yb|~-uK#k

;oqpgU7#V5Pt;p2tQV

(The imitative element can be seen in "dax on", which ChatGPT probably patterned on the likes of "chew on" or "munch on".)

s|a_Vy]V!_W

(模仿的元素可以从dax on的例子中看出,ChatGPT很可能是仿照了chew on或munch on等表示咀嚼之类的词汇8uc5&Dgb6UEiu=XsiGEk。)

[2)fhKm1mrQPo=QS

What about the "poverty of the stimulus"?

r~=o[*.pZV7RxPIy

那么"语言刺激贫瘠"方面又是什么情况呢?

YQ71Rsrn,QjOo^|

After all, GPT-3 (the LLM underlying ChatGPT until the recent release of GPT-4) is estimated to be trained on about 1,000 times the data a human ten-year-old is exposed to.

!l=Olku|r3

毕竟,GPT-3(在最近发布GPT-4之前,它是ChatGPT的底层语言模型)接受的训练数据大约是一个十岁人类儿童所接触的数据量的1000倍_wJTq-jaejhJvmw(

;K#CozJp7]Rt)n

That leaves open the possibility that children have an inborn tendency to grammar, making them far more proficient than any LLM.

EMyBRfXM%,*e-fho

这就带来了一种可能性,那就是儿童天生就有语法倾向,这使他们比任何大型语言模型都更擅长语言H-d3cI;)[fwuA,n

!8P7HevN73;

In a forthcoming paper in Linguistic Inquiry, researchers claim to have trained an LLM on no more text than a human child is exposed to, finding that it can use even rare bits of grammar.

qKA_,_JePc.O1eIfv*rd

在即将发表在《语言研究》杂志上的一篇论文中,研究人员声称,用不超过人类儿童所接触的文本量的数据训练一个大型语言模型,发现它甚至可以使用很少见的语法I|dCgnRzEh%!NTjbNwd

;jCbP6hS3d,NO]MHB!cO

But other researchers have tried to train an LLM on a database of only child-directed language (that is, of transcripts of carers speaking to children).

!O=p%rccE90YKktPC3

但其他研究人员试图用只面向儿童的语言数据库(即照顾孩子的人对儿童所说话语的文字稿)来训练大型语言模型=bQ^r3g@2EPJA@3R7T

P2mj_(~tP+;wUMr1w

Here LLMs fare far worse.

dIdlR]^_lBuna.owj

在这种情况下,大型语言模型的表现要糟糕得多L[]sELLQf,

k1K[-ImPaOPH

Perhaps the brain really is built for language, as Professor Chomsky says.

@7B5Ys[QOr2]C

也许正如乔姆斯基教授所说,大脑真的是为语言而生的=+=MZnO|6QAUf]DiIJ

g(kx0l,1fH

It is difficult to judge.

%S]AffjlQn1,e

结果很难判断Pa@HDn96+n5Yj;T^X;z

NFSF,TV*-0|f9|vPj

Both sides of the argument are marshalling LLMs to make their case.

nbQDq[Z|eZ=s

争论的双方都带领着大型语言模型来证明他们的观点36aXl3gDLZvg[eYtB

I(54&XYTR(Q&|N

The eponymous founder of his school of linguistics has offered only a brusque riposte.

M*y),6tp=9l0

创立了其同名语言学派的乔姆斯基对此只给出了简短的回应zv0D0sA#de~N(@T.G-2

gCjYqo@d2H&TvDFs7Y-

For his theories to survive this challenge, his camp will have to put up a stronger defence.

c#_hdhdlDn0UZtz6RhiT

要想让他的理论经受住这次挑战,乔姆斯基阵营将必须给出更强有力的辩词yMC%#s,HWAub^EO-*

Y2kEO9no9#6s0I(BPMwJYgC|tUaOGZs6N%Vggl=w
分享到