回顾大数据的发展
日期:2017-12-11 14:43

(单词翻译:单击)

 MP3点击下载

Big data is an elusive concept.
“大数据”是一个让人难以捉摸的概念。
It represents an amount of digital information, which is uncomfortable to store, transport, or analyze.
它表示了巨大的数字信息量,大到难以存储、转移或分析。
Big data is so voluminous that it overwhelms the technologies of the day
“大数据”非常庞大,以至于它颠覆了目前的科技发展,
and challenges us to create the next generation of data storage tools and techniques.
并且挑战我们发明新一代数据存储技术的工具和技术。
So, big data isn't new.
所以,“大数据”不是新的话题。
In fact, physicists at CERN have been rangling with the challenge of their ever-expanding big data for decades.
实际上,欧洲粒子物理研究所(CERN)的物理学家,已经为他们不断扩大的数据库纠结了数十年。
Fifty years ago, CERN's data could be stored in a single computer.
五十年前,CERN的数据可以被存储在单独一台电脑上。
OK, so it wasn't your usual computer, this was a mainframe computer that filled an entire building.
好吧,那台电脑不是你现在用的普通的电脑,这台电脑的主机塞满了整个办公楼。
To analyze the data, physicists from around the world traveled to CERN to connect to the enormous machine.
想要分析得到的数据,世界各地的物理学家们就得来CERN,连接上这个巨大的机器。
In the 1970's, our ever-growing big data was distributed across different sets of computers, which mushroomed at CERN.
在20世纪70年代,这些不断增长的大数据被分配到好几组不同的计算机中,这些计算机集在研究所里如雨后春笋般地出现。
Each set was joined together in dedicated, homegrown networks.
每组计算机只用自制的专用网络相连接。
But physicists collaborated without regard for the boundaries between sets, hence needed to access data on all of these.
但是科学家的合作关系并不局限在单一组计算机中,所以他们必须能够在所有计算机上运用这些数据。
So, we bridged the independent networks together in our own CERNET.
所以我们把各独立的网络桥接在一起,成了我们的CERNET。
In the 1980's, islands of similar networks speaking different dialects sprung up all over Europe and the States,
在20世纪80年代,许多类似这样的网络在欧洲及美国各地涌现,它们都用不同的方言,
making remote access possible but torturous.
这让远程连接变为可能,却也折磨人。
To make it easy for our physicists across the world to access the ever-expanding big data stored at CERN without traveling,
为了让我们散布在世界各地的物理学家不用四处奔波,就能得到存取在CERN中的不断更新的数据,
the networks needed to be talking with the same language.
这个网络系统就必须使用同一种语言。
We adopted the fledgling internet working standard from the States, followed by the rest of Europe,
我们采用了美国那不成熟的标准系统,之后欧洲其余单位也接受了,
and we established the principal link at CERN between Europe and the States in 1989, and the truly global internet took off!
接着在1989年,我们在CERN建立了欧洲和美国的主要联机,这个正式的全球网络终于起飞了。
Physicists could easily then access the terabytes of big data remotely from around the world,
物理学家可以轻松地从世界各地存取到好几TB的海量资料,
generate results, and write papers in their home institutes.
生成结果,然后在自家的研究机构中撰写论文。
Then, they wanted to share their findings with all their colleagues.
接着,他们想要和他们的同事分享他们的结果。
To make this information sharing easy, we created the web in the early 1990's.
为了让数据分享更容易,我们在20世纪90年代早期建构了一个网络。
Physicists no longer needed to know where the information was stored in order to find it and access it on the web,
物理学家不再须要先知道数据是储存在哪里,然后才能存取数据,
an idea which caught on across the world and has transformed the way we communicate in our daily lives.
一个传遍世界的想法改变了我们日常通讯的方式。

回顾大数据的发展

During the early 2000's, the continued growth of our big data outstripped our capability to analyze it at CERN,
在21世纪初,我们这个持续增长的大数据超过了CERN能够处理的能力,
despite having buildings full of computers.
除非所有建筑物里都塞满计算机。
We had to start distributing the petabytes of data to our collaborating partners
我们必须开始将这好几PB的数据分配储存在我们的合作伙伴那里,
in order to employ local computing and storage at hundreds of different institutes.
这样才有办法利用各地上百个不同机构的计算储存资源。
In order to orchestrate these interconnected resources with their diverse technologies,
为了要让这些错综复杂的资源在各地不同的系统中能协调运作,
we developed a computing grid, enabling the seamless sharing of computing resources around the globe.
我们开发出了一套计算网格,让世界各地的计算资源得以无缝地分享。
This relies on trust relationships and mutual exchange.
这要依靠彼此的信赖关系以及资源交换。
But this grid model could not be transferred out of our community so easily,
但这个网格模型没办法简单地移转出我们这个群体,
where not everyone has resources to share nor could companies be expected to have the same level of trust.
因为不是所有人都有资源可以分享,而各公司之间也没办法指望能有相同层级的信赖。
Instead, an alternative, more business-like approach for accessing on-demand resources has been flourishing recently,
取而代之的是,针对存取须求的资源,有一个商业取向的替代方案近期正在蓬勃发展,
called cloud computing, which other communities are now exploiting to analyzing their big data.
它叫做云端计算,有些其它的群体正利用它来分析它们的大数据。
It might seem paradoxical for a place like CERN, a lab focused on the study of the unimaginably small building blocks of matter,
这对于CERN这个地方来说,听起来可能有点冲突,一个专注于研究物质的极小构成要素的实验室,
to be the source of something as big as big data.
竟然是这样大数据的来源。
But the way we study the fundamental particles, as well as the forces by which they interact, involves creating them fleetingly,
但是我们研究基本粒子以及它们的交互作用力的方法,包含了在瞬间产生这些粒子、
colliding protons in our accelerators and capturing a trace of them as they zoom off near light speed.
在我们的加速器中碰撞质子以及在它们以近光速运动时捕捉他们的轨迹。
To see those traces, our detector, with 150 million sensors, acts like a really massive 3-D camera,
要见到这些轨迹,我们的侦测器包含了一亿五千万个传感器,像是一个非常巨大的3-D摄影机,
taking a picture of each collision event -- that's up to 14 millions times per second.
记录每一次碰撞--这可能会高到每秒一千四百万次。
That makes a lot of data.
这就会产生大量的数据。
But if big data has been around for so long, why do we suddenly keep hearing about it now?
但是如果大数据已经存在这么久了,为什么我们现在才不断听到它?
Well, as the old metaphor explains, the whole is greater than the sum of its parts,
这个嘛,就像一个古老的比喻所说的,整体要打过它所有部分的总和,
and this is no longer just science that is exploiting this.
而这已经不再仅仅是科学在开发了。
The fact that we can derive more knowledge by joining related information together
我们可以通过链接相关的信息以及开发合作关系来增长知识,
and spotting correlations can inform and enrich numerous aspects of everyday life,
而这项事实可以滋润并强化日常生活中的许多方面,
either in real time, such as traffic or financial conditions,
无论是在如交通或是财政状况这样的实时信息中,
in short-term evolutions, such as medical or meteorological,
还是在如医学或是天气学这样的短期的演变上,
or in predictive situations, such as business, crime, or disease trends.
又或是在预测如商业、犯罪、疾病趋势这些情景上。
Virtually every field is turning to gathering big data, with mobile sensor networks spanning the globe,
实际上每个领域都渐渐开始搜集大数据,像是跨越全球的行动装置网络、
cameras on the ground and in the air, archives storing information published on the web,
地面及空中的摄影机、储存发表在网络上的信息的数据库
and loggers capturing the activities of Internet citizens the world over.
以及记载各地网民活动的记录器。
The challenge is on to invent new tools and techniques to mine these vast stores, to inform decision making, to improve medical diagnosis,
这个挑战在于要发明一项新的工具以及技术来储存这大量的数据、来为决策提供信息、来改进医学诊断、
and otherwise to answer needs and desires of tomorrow's society in ways that are unimagined today.
以及响应一些现在没想过的未来社会的需求与渴望。

分享到