我们如何在DNA中存储电子数据
日期:2019-06-11 16:46

(单词翻译:单击)

 MP3点击下载

I could fit all movies ever made inside of this tube.
我可以把有史以来的所有电影装进这个小管里。
If you can't see it, that's kind of the point.
如果你看不见它,那就对了。
Before we understand how this is possible, it's important to understand the value of this feat.
在我们理解这件事的可能性之前,重要的是先理解这项技术的价值。
All of our thoughts and actions these days, through photos and videos
现在我们所有的想法和行动,通过照片和视频,
even our fitness activities -- are stored as digital data.
甚至是我们的健身运动--都存储在电子数据里。
Aside from running out of space on our phones, we rarely think about our digital footprint.
要不是它们超出我们手机的存储空间,我们很少会去想我们到底存储了多少电子数据。
But humanity has collectively generated more data in the last few years than all of preceding human history.
但是人类已经携手创造了更多的数据,在过去的几年中,这些数据比先前人类历史所产生的还要多。
Big data has become a big problem.
大数据已经开始成为一个大问题。
Digital storage is really expensive, and none of these devices that we have really stand the test of time.
数字化存储很昂贵,而且还没有一个设备能真正经得起时间的考验。
There's this nonprofit website called the Internet Archive.
有一个非营利网站,叫做“网络档案馆”。
In addition to free books and movies, you can access web pages as far back as 1996.
除了免费的书籍和电影,你还可以在上面找到1996年以来的网页。
Now, this is very tempting, but I decided to go back and look at the TED website's very humble beginnings.
这可是非常诱人的,但我决定回过头来,看看TED网站最初的样子。
As you can see, it's changed quite a bit in the last 30 years.
可以看到,它在过去30年里改变了很多。
So this led me to the first-ever TED, back in 1984,
这使我回忆起第一次的TED,回到1984,
and it just so happened to be a Sony executive explaining how a compact disk works.
太巧了,正是索尼的主管在解释一个简单的磁盘是如何工作的。
Now, it's really incredible to be able to go back in time and access this moment.
这是让人难以置信的事,我们可以回到过去,并且与那个时刻紧密相连。
It's also really fascinating that after 30 years, after that first TED, we're still talking about digital storage.
这也是非常让人着迷的事,在第一次TED演讲过去的30年后,我们还在谈论着数字化存储。
Now, if we look back another 30 years, IBM released the first-ever hard drive back in 1956.
如果我们回头看看另外一个30年,1956年,IBM破天荒地发布了它的第一个硬盘驱动器。
Here it is being loaded for shipping in front of a small audience.
这是它正在被装载上车,一小群人在围观。
It held the equivalent of one MP3 song and weighed over one ton.
它承载着一首MP3歌曲的内容,却重达一吨多。
At 10,000 dollars a megabyte, I don't think anyone in this room would be interested in buying this thing,
一兆字节价值1万美元,我想这里不会有人有兴趣要买它,
except maybe as a collector's item. But it's the best we could do at the time.
除非可能作为一个收藏品。但这是我们在当时最好的产品了。
We've come such a long way in data storage. Devices have evolved dramatically.
我们在数据存储的路上走了很久。设备已经显著进化了。
But all media eventually wear out or become obsolete.
但所有载体最终都会磨损或被废弃。
If someone handed you a floppy drive today to back up your presentation,
今天,如果有人递给你一张软盘驱动器来备份你的演示文稿,
you'd probably look at them kind of strange, maybe laugh, but you'd have no way to use the damn thing.
你可能会奇怪地看着他们,可能还会大笑,但你肯定不会用这个落伍的东西。
These devices can no longer meet our storage needs, although some of them can be repurposed.
这些设备已经不能再满足我们的存储需求了,虽然它们有些可以被改作其他用途。
All technology eventually dies or is lost, along with our data, all of our memories.
所有的科技最终都会死亡或者消失,和我们的数据一起,包含我们所有的记忆。
There's this illusion that the storage problem has been solved, but really, we all just externalize it.
有一种错觉认为存储问题已经解决了,但实际上,我们只是把它外化了。
We don't worry about storing our emails and our photos. They're just in the cloud.
我们不担心邮件和照片的存储。它们都在云端上。
But behind the scenes, storage is problematic. After all, the cloud is just a lot of hard drives.
但是在这些场景背后,存储问题依然存在。毕竟,云端只是许多硬盘组成的。
Now, most digital data, we could argue, is not really critical. Surely, we could just delete it.
我们认为大部分电子数据都不重要。当然,我们还可以轻易地删除这些数据。
But how can we really know what's important today?
但是今天的我们怎么知道到底什么是重要的?
We've learned so much about human history from drawings and writings in caves, from stone tablets.
我们从人类历史中得到了很多信息,从洞穴里的壁画和文字,还有石碑。
We've deciphered languages from the Rosetta Stone.
我们破译了罗塞塔石碑上的语言。
You know, we'll never really have the whole story, though.
尽管我们还远没有了解整个故事。
Our data is our story, even more so today.
我们的数据就是我们的故事,这在今天更是这样。
We won't have our record recorded on stone tablets.
我们不再将记录刻在石碑上。
But we don't have to choose what is important now. There's a way to store it all.
我们现在也不需要去选择什么是重要的。有一种方法可以存储所有信息。
It turns out that there's a solution that's been around for a few billion years, and it's actually in this tube.
我们发现,这种解决方案已经存在了数十亿年。它实际上就在这个小管里。
DNA is nature's oldest storage device.
DNA是大自然最古老的存储设备。
After all, it contains all the information necessary to build and maintain a human being.
毕竟,它保存着构建和维持一个人生命的所有必要的信息。
But what makes DNA so great? Well, let's take our own genome as an example.
然而,DNA为何如此强大?让我们来看看人类的基因组。
If we were to print out all three billion A's, T's, C's and G's on a standard font, standard format,
如果我们将所有30亿个A(腺嘌呤),T(胸腺嘧啶),C(胞嘧啶)和G(鸟嘌呤),以标准字体,标准格式打印出来,
and then we were to stack all of those papers, it would be about 130 meters high,
然后我们把所有纸张叠起来,大概会有130米高,
somewhere between the Statue of Liberty and the Washington Monument.
介于自由女神像和华盛顿纪念碑的高度之间。
Now, if we converted all those A's, T's, C's and G's to digital data, to zeroes and ones, it would total a few gigs.
如果我们将所有这些A,T,C和G,转换为电子数据,0和1,这不过是几场演奏会的事。
And that's in each cell of our body. We have more than 30 trillion cells.
这会发生在我们身体的每个细胞中。我们有超过30万亿的细胞。
You get the idea: DNA can store a ton of information in a minuscule space.
估计你们已经想明白了:DNA可以在一个微小的空间存储大量信息。
DNA is also very durable, and it doesn't even require electricity to store it.
DNA也是持久耐用的,它甚至不需要供电来储存信息。
We know this because scientists have recovered DNA from ancient humans that lived hundreds of thousands of years ago.
我们知道这些,是因为科学家已经从生活在千万年前的远古人类身上复原了DNA。
One of those is Otzi the Iceman. Turns out, he's Austrian.
其中一个是Otzi冰人。他是奥地利人。
He was found high, well-preserved, in the mountains between Italy and Austria,
他被发现时正完整的保存在意大利和奥地利之间的山中,
and it turns out that he has living genetic relatives here in Austria today.
证明他和现在的奥地利人有基因关系。
So one of you could be a cousin of Otzi.
所以你们其中有人可能是Otzi的表亲。
The point is that we have a better chance of recovering information from an ancient human than we do from an old phone.
其中的关键是,我们拥有更好的从一个远古人类身上修复信息的机会,比从一台老电话上获得的更多。
It's also much less likely that we'll lose the ability to read DNA than any single man-made device.
同时,相较于任何一种人造的设备,我们不太可能失去解读DNA的能力。
Every single new storage format requires a new way to read it. We'll always be able to read DNA.
每一种新的存储格式都要求一种新的解读方式。而我们将一直保持解读DNA的能力。
If we can no longer sequence, we have bigger problems than worrying about data storage.
如果有一天我们不能够进行基因排序,那问题可比数据存储更令人担忧。
Storing data on DNA is not new. Nature's been doing it for several billion years.
在DNA中存储数据不是新鲜事。大自然在数十亿年中一直这么做。

我们如何在DNA中存储电子数据

In fact, every living thing is a DNA storage device.
事实上,每一个生物都是一个DNA存储设备。
But how do we store data on DNA? This is Photo 51.
但是我们怎么把数据存储进DNA呢?这是照片51。
It's the first-ever photo of DNA, taken about 60 years ago.
这是第一张DNA的照片,拍摄于大约60年前。
This is around the time that that same hard drive was released by IBM.
也是大约这个时间,IBM发布了硬盘驱动器。
So really, our understanding of digital storage and of DNA have coevolved.
可以说,我们对数字化存储的理解和我们对基因的理解是在同步进化的。
We first learned to sequence, or read DNA, and very soon after, how to write it, or synthesize it.
我们最开始是学习测序,或者解读DNA,之后很快也学会了如何编辑它,或者合成它。
This is much like how we learn a new language.
这很像如何学习一门新语言。
And now we have the ability to read, write and copy DNA. We do it in the lab all the time.
而现在我们有能力阅读、编辑和复制DNA。我们一直在实验室里这么做。
So anything, really anything, that can be stored as zeroes and ones can be stored in DNA.
所以,毫不夸张的说,任何东西可以以0和1的形式存储在DNA中。
To store something digitally, like this photo, we convert it to bits, or binary digits.
要以数字化的方式存储某些内容,比如这张照片,我们要先把它转换为比特,或者二进制数字。
Each pixel in a black-and-white photo is simply a zero or a one.
黑白照片中的每个像素就代表一个0或1。
And we can write DNA much like an inkjet printer can print letters on a page.
我们可以像喷墨打印机打字一样书写DNA。
We just have to convert our data, all of those zeroes and ones,
我们只要将数据,所有这些0和1,
to A's, T's, C's and G's, and then we send this to a synthesis company.
转换为A,T,C,G,然后将它们发送到合成公司。
So we write it, we can store it, and when we want to recover our data, we just sequence it.
这样一来,我们既可以书写,也可以存储,当我们想要恢复数据,只需要测序就好。
Now, the fun part of all of this is deciding what files to include.
有意思的部分是决定要包含哪些文件。
We're serious scientists, so we had to include a manuscript for good posterity.
我们是严肃的科学家,所以我们必须留下一份手稿给我们优秀的后代。
We also included a $50 Amazon gift card -- don't get too excited, it's already been spent, someone decoded it
我们还放入了一份价值50美元的亚马逊礼卡--别激动,里面的余额已经被移除了,
as well as an operating system, one of the first movies ever made and a Pioneer plaque.
还有一个操作系统,人类制作的第一部电影,和一个“先驱者号”金属板。
Some of you might have seen this. It has a depiction of a typical -- apparently -- male and female,
你们中可能有人见过它。它包含了代表性的信息,显然,包括男女性别,
and our approximate location in the Solar System, in case the Pioneer spacecraft ever encounters extraterrestrials.
还有我们在太阳系中的大致位置,以防万一“先驱者号”太空飞船遇见了外星人。
So once we decided what sort of files we want to encode, we package up the data,
一旦我们决定了哪些类型的文件要编码,就可以把这些数据打包,
convert those zeroes and ones to A's, T's, C's and G's, and then we just send this file off to a synthesis company.
将这些0和1转换为A,T,C,G,然后将这个文件发送到合成公司。
And this is what we got back. Our files were in this tube.
而这,就是我们拿回来的东西。我们的文件就在这个小管里。
All we had to do was sequence it. This all sounds pretty straightforward,
我们只需要对它进行测序就可以解读其中的信息。这听起来真的很简单,
but the difference between a really cool, fun idea and something we can actually use is overcoming these practical challenges.
但一个很酷、很有趣的想法,与我们实际运用之间的不同之处,在于战胜实际的挑战。
Now, while DNA is more robust than any man-made device, it's not perfect. It does have some weaknesses.
而DNA虽然比任何人造设备更稳定,但它并不是完美的。它也有一些弱点。
We recover our message by sequencing the DNA, and every time data is retrieved, we lose the DNA.
我们可以通过DNA测序来恢复信息,但每次数据找回,这个DNA都会被破坏。
That's just part of the sequencing process. We don't want to run out of data,
这只是测序过程的必要步骤。我们不想把数据耗尽,
but luckily, there's a way to copy the DNA that's even cheaper and easier than synthesizing it.
不过好在还有一种方法可以复制DNA,甚至比合成更便宜,更容易。
We actually tested a way to make 200 trillion copies of our files, and we recovered all the data without error.
我们测试了这种方法,将我们的文件复制了200万亿份,并精准的还原了所有数据。
So sequencing also introduces errors into our DNA, into the A's, T's, C's and G's.
测序也会将误差引入DNA,引入A,T,C,G中。
Nature has a way to deal with this in our cells.
大自然有办法在细胞中处理这个问题。
But our data is stored in synthetic DNA in a tube, so we had to find our own way to overcome this problem.
但我们的数据是存储在小管里的合成DNA中,所以我们必须找到自己的方法来解决这个问题。
We decided to use an algorithm that was used to stream videos.
我们决定使用传输视频时用到的算法。
When you're streaming a video, you're essentially trying to recover the original video, the original file.
当你在传输视频时,你实际上是在设法恢复原始的视频,原始文件。
When we're trying to recover our original files, we're simply sequencing.
当我们在设法恢复原始文件时,我们只是在测序。
But really, both of these processes are about recovering enough zeroes and ones to put our data back together.
但实际上,这两个过程都是在复原足够的0和1,将数据重新整合在一起。
And so, because of our coding strategy,
所以,根据我们的编码策略,
we were able to package up all of our data in a way that allowed us to make millions and trillions of copies
我们能够以一种可以制造上万亿份拷贝的方式将所有数据打包,
and still always recover all of our files back.
同时仍然保证所有的文件可以复原。
This is the movie we encoded.
这是我们编码的电影。
It's one of the first movies ever made, and now the first to be copied more than 200 trillion times on DNA.
它是人类创作的首批电影之一,也是第一个在DNA中被复制出超过200万亿份拷贝的电影。
Soon after our work was published, we participated in an "Ask Me Anything" on the website reddit.
很快我们的工作被公开发表,我们在Reddit网站上参与了“问我任何问题”的活动。
If you're a fellow nerd, you're very familiar with this website.
如果你是一个资深学究,你应该对这个网站不会陌生。
Most questions were thoughtful. Some were comical.
大部分问题都有很深的思考,也有一些问题很好笑。
For example, one user wanted to know when we would have a literal thumb drive.
比如,一个用户想知道我们什么时候会拥有一个字面意义的拇指储存器。
Now, the thing is, our DNA already stores everything needed to make us who we are.
事实上,我们的DNA已经存储了所有塑造了我们的必要信息。
It's a lot safer to store data on DNA in synthetic DNA in a tube.
将数据存储在DNA中,比在小管中合成DNA要安全得多。
Writing and reading data from DNA is obviously a lot more time-consuming than just saving all your files on a hard drive -- for now.
在DNA中写入和读取数据,明显比在硬盘中存储文件更花时间--目前是这样。
So initially, we should focus on long-term storage. Most data are ephemeral.
所以,我们首先应该关注长期存储的问题。大部分数据只能保存一段时间。
It's really hard to grasp what's important today, or what will be important for future generations.
目前还很难提炼出哪些信息是重要的,或者哪些对后人是重要的。
But the point is, we don't have to decide today.
但重点是,我们不一定要马上做决定。
There's this great program by UNESCO called the "Memory of the World" program.
联合国教科文组织有一个叫做“世界的记忆”的项目。
It's been created to preserve historical materials that are considered of value to all of humanity.
建立这个项目的初衷是保存历史的记忆,那些对全人类都有价值的记忆。
Items are nominated to be added to the collection, including that film that we encoded.
被选中的信息会被加入集合中,包括我们编译的那部电影。
While a wonderful way to preserve human heritage, it doesn't have to be a choice.
而保存人类传统更好的方式,不是必须做一个选择。
Instead of asking the current generation -- us -- what might be important in the future, we could store everything in DNA.
与其问我们这一代人,在未来什么东西可能是重要的,我们可以在DNA中存储一切。
Storage is not just about how many bytes but how well we can actually store the data and recover it.
存储不止是关乎有多少字节,而是我们可以多好地保存和恢复数据。
There's always been this tension between how much data we can generate and how much we can recover and how much we can store.
一直以来,在我们会产生多少数据,可以恢复多少数据,以及可以存储多少数据之间,都存在着矛盾。
Every advance in writing data has required a new way to read it. We can no longer read old media.
数据写入的每次进步,都要求一种新的读取方式。我们已无法再读取那些老旧的存储设备了。
How many of you even have a disk drive in your laptop, never mind a floppy drive?
你们还有多少人的笔记本电脑中有磁盘驱动器,或者软盘驱动器?
This will never be the case with DNA. As long as we're around, DNA is around, and we'll find a way to sequence it.
有了DNA,这些情况再也不会出现。只要我们在,DNA就存在,我们总会找到排序的方式。
Archiving the world around us is part of human nature.
将我们周围的世界存档是人类天性的一部分。
This is the progress we've made in digital storage in 60 years, at a time when we were only beginning to understand DNA.
这是过去60年我们数字化存储的发展,60年前我们也刚刚开始理解DNA。
Yet, we've made similar progress in half that time with DNA sequencers,
而有了DNA测序技术,我们用一半的时间就达到了相似的发展进度,
and as long as we're around, DNA will never be obsolete. Thank you.
而且只要我们还存在,DNA就永不过时。谢谢。

分享到