计算机是如何翻译人类语言的
日期:2019-03-10 14:37

(单词翻译:单击)

 MP3点击下载

How is it that so many intergalactic species in movies and TV just happen to speak perfect English?
为什么影视剧里会有那么多的星际物种恰好都会说一口流利的英语呢?
The short answer is that no one wants to watch a starship crew spend years compiling an alien dictionary.
原因很简单,因为没人希望为了看一部星际舰队,还得花上好几年时间去编译一本外星字典。
But to keep things consistent,
但为了保证一致性,
the creators of Star Trek and other science-fiction worlds have introduced the concept of a universal translator,
星际迷航和其它科幻小说的编导们就想出了万能翻译机这个点子,
a portable device that can instantly translate between any languages.
一个能够立马能在各种语言间进行翻译的手持设备。
So is a universal translator possible in real life?
你们觉得万能翻译机在现实生活中是可行的吗?
We already have many programs that claim to do just that, taking a word, sentence, or entire book in one language
现在已经有很多程序声称,他们能把不管是一个字,一句话,一本书,
and translating it into almost any other, whether it's modern English or Ancient Sanskrit.
也不管是现代英语还是古梵语,在各种语言间进行翻译。
And if translation were just a matter of looking up words in a dictionary,
如果翻译仅仅只是在字典上查找字意的话,
these programs would run circles around humans.
这些程序完全能比人类做得更好。
The reality, however, is a bit more complicated.
但实际上没那么简单。
A rule-based translation program uses a lexical database,
一个基于规则的翻译系统所用的词义数据,
which includes all the words you'd find in a dictionary and all grammatical forms they can take,
包括你能在字典上找到的所有单词和所有能够使用的语法形态,
and set of rules to recognize the basic linguistic elements in the input language.
并且得有一套规则能够区分输入语言的基本语言成分。
For a seemingly simple sentence like, 'The children eat the muffins,'
举个看起来比较简单的例子:孩子们在吃松饼,
the program first parses its syntax, or grammatical structure, by identifying the children as the subject,
翻译程序会先解析这句话的句法或语法结构,通过将“孩子”定为主语,
and the rest of the sentence as the predicate consisting of a verb 'eat,' and a direct object 'the muffins.'
剩下的部分作为谓语,并且包含动词“吃”和直接宾语“松饼”。
It then needs to recognize English morphology,
然后需要识别英语词法,
or how the language can be broken down into its smallest meaningful units,
或者这段话怎么才能够拆分成几个小词组
such as the word muffin and the suffix 's,' used to indicate plural.
就比如说“松饼”这个词,后缀“s”通常是表示复数。
Finally, it needs to understand the semantics, what the different parts of the sentence actually mean.
最后一步还需要理解其中的语义学,需要理解这段话中的每个部分都各自表示什么意思。
To translate this sentence properly,
为了恰当地翻译这句话,
the program would refer to a different set of vocabulary and rules for each element of the target language.
翻译程序会为将翻译的文本参照其语言的各个要素词汇和使用规则。
But this is where it gets tricky.
但这才是麻烦的地方。
The syntax of some languages allows words to be arranged in any order,
在一些语言的句法结构中,文字并没有特定的顺序,
while in others, doing so could make the muffin eat the child.
而且在有些语言中这句话看起来就像:松饼在吃小孩儿。
Morphology can also pose a problem.
词态学也是个问题。

计算机是如何翻译人类语言的

Slovene distinguishes between two children and three or more using a dual suffix absent in many other languages,
斯洛文尼亚语中区别通过使用双重后缀缺失,来区分这句话中孩子的数量,两个、三个或者更多,
while Russian's lack of definite articles might leave you wondering
而俄罗斯人不使用定冠词会让你觉得,
whether the children are eating some particular muffins, or just eat muffins in general.
这些孩子到底是在吃一些特定的松饼呢,还是一般含义上的松饼。
Finally, even when the semantics are technically correct, the program might miss their finer points,
结果是,就算程序翻译出来的语义是正确的,它可能还是会忽略一些细节,
such as whether the children 'mangiano' the muffins, or 'divorano' them.
就比如说这些孩子到底是在吃松饼还是在吞松饼?
Another method is statistical machine translation,
另一个研究方法是统计翻译法,
which analyzes a database of books, articles, and documents that have already been translated by humans.
这个方法是取分析那些已经被前人翻译过的书籍、文章和文件的数据库。
By finding matches between source and translated text that are unlikely to occur by chance,
翻译系统可以通过找到哪些不是偶然和译文恰好匹配的资源,
the program can identify corresponding phrases and patterns, and use them for future translations.
辨识相关的短语和句型,并存以备用。
However, the quality of this type of translation
然而这种方式的翻译质量,
depends on the size of the initial database and the availability of samples for certain languages or styles of writing.
要根据某些语言或写作风格的初始数据库和语库可用性而定。
The difficulty that computers have with the exceptions, irregularities
有一些困难,就像一些特例、非常规的事物,
and shades of meaning that seem to come instinctively to humans
和人类本能上的细微区别这样的困难,
has led some researchers to believe that our understanding of language is a unique product of our biological brain structure.
导致了一些研究人员觉得我们对于语言的理解是我们大脑生物结构的单一产物。
In fact, one of the most famous fictional universal translators,
但事实上,最著名的科幻小说通用翻译器,
the Babel fish from 'The Hitchhiker's Guide to the Galaxy', is not a machine at all
“巴别塔”是从“银河系漫游指南”中逐渐分离出来的,这种翻译器不完全只是一个机器,
but a small creature that translates the brain waves and nerve signals of sentient species through a form of telepathy.
而是一个能以心电感应形式从有意识生物那儿翻译他们的脑电波和神经信号的小生物。
For now, learning a language the old fashioned way will still give you better results than any currently available computer program.
目前为止,用老办法去学一门新的语言,仍然比用目前可用的计算机程序的效果更好。
But this is no easy task, and the sheer number of languages in the world,
但这也绝非易事,世界上语言的绝对数量
as well as the increasing interaction between the people who speak them,
和其使用者间的相互作用,
will only continue to spur greater advances in automatic translation.
会刺激自动翻译系统不断进步。
Perhaps by the time we encounter intergalactic life forms,
也许等到我们遇到星际生命形态的物种时,
we'll be able to communicate with them through a tiny gizmo,
我们就能够通过一个小发明与他们交流,
or we might have to start compiling that dictionary, after all.
又或许我们终究需要编译那样一套字典。

分享到
重点单词
  • complicatedadj. 复杂的,难懂的 动词complicate的过去
  • conceptn. 概念,观念
  • indicatev. 显示,象征,指示 v. 指明,表明
  • devicen. 装置,设计,策略,设备
  • definiteadj. 明确的,确切的,有把握的
  • sheeradj. 纯粹的,全然的,陡峭的 adv. 完全地,峻峭
  • statisticaladj. 统计的,统计学的
  • certainadj. 确定的,必然的,特定的 pron. 某几个,某
  • universaladj. 普遍的,通用的,宇宙的,全体的,全世界的 n.
  • particularadj. 特殊的,特别的,特定的,挑剔的 n. 个别项目