如何解读基因组并重组基因
日期:2017-06-07 18:08

(单词翻译:单击)

 MP3点击下载

For the next 16 minutes, I'm going to take you on a journey that is probably the biggest dream of humanity: to understand the code of life.
在接下来的16分钟里 我要带大家踏上一段旅程这大概是全人类的终极梦想——解读生命的基因编码
So for me, everything started many, many years ago when I met the first 3D printer.
对我来说 早在多年前 当我遇到了第一台3D打印机的时候 这个梦想就开始了
The concept was fascinating.
这个概念真是太精彩了
A 3D printer needs three elements: a bit of information, some raw material, some energy, and it can produce any object that was not there before.
一台3D打印机需要三个要素:一些信息一些原材料和一些能量 它就能打印出原先没有的东西
I was doing physics, I was coming back home and I realized that I actually always knew a 3D printer.
我那时正在研究物理现象 在回家的路上我突然意识到实际上我早就知道3D打印机了
And everyone does. It was my mom.
每个人都知道 那就是我妈妈
My mom takes three elements:a bit of information, which is between my father and my mom in this case,
我妈妈获取了这三个要素:一点信息 这里指的是我爸和我妈的基因
raw elements and energy in the same media, that is food, and after several months, produces me.
同一种介质提供原材料和能量——那就是食物 历时几个月 产下我
And I was not existent before.
而我以前从来没有存在过
So apart from the shock of my mom discovering that she was a 3D printer, I immediately got mesmerized by that piece, the first one, the information.
除了震惊的发现我妈其实是台3D打印机以外 我还立即被另一个部分吸引了那就是第一个要素信息
What amount of information does it take to build and assemble a human?
到底需要获取多少信息才能孕育出一个人呢?
Is it much? Is it little? How many thumb drives can you fill?
很多还是就一点 要存满多少个U盘?
Well, I was studying physics at the beginning and I took this approximation of a human as a gigantic Lego piece.
我一开始是学物理的我想如果把人近似于看成是一个巨型的乐高玩具
okay So, imagine that the building blocks are little atoms and there is a hydrogen here, a carbon here, a nitrogen here.
小的乐高模块就像是原子 有氢原子 碳原子 和 氮原子
So in the first approximation, if I can list the number of atoms that compose a human being, I can build it.
按照最初的这个设定 如果能够列出组成人类的所有原子我就能组装出一个人
Now, you can run some numbers and that happens to be quite an astonishing number.
现在你可以大致计算一下得到的结果非常惊人
So the number of atoms, the file that I will save in my thumb drive to assemble a little baby,
所需要的原子的总数全部存到U盘里面 即便是组装一个小婴儿
will actually fill an entire Titanic of thumb drives multiplied 2,000 times.
用掉的U盘就能装满整个泰坦尼克号 再乘以2000倍
This is the miracle of life.
这就是生命的奇迹
Every time you see from now on a pregnant lady,
从现在开始你每看到一个孕妇
she's assembling the biggest amount of information that you will ever encounter.
她正在组装你从未见过的最大量的信息
Forget big data, forget anything you heard of.
不要谈大数据 不要谈以前听说过的任何事情
This is the biggest amount of information that exists.
这是现今存在的最大信息量
But nature, fortunately, is much smarter than a young physicist, and in four billion years, managed to pack this information in a small crystal we call DNA.
幸运的是大自然比一个年轻的物理学家要聪明多了在40亿年里把这些信息打包放进一个小晶体里 我们称之为DNA
We met it for the first time in 1950 when Rosalind Franklin, an amazing scientist, a woman, took a picture of it.

1950年当时一位伟大的科学家罗莎琳富兰克林女士给DNA拍了张照 我们第一次认识了它
But it took us more than 40 years to finally poke inside a human cell, take out this crystal, unroll it, and read it for the first time.
但是我们耗时四十年多年 才最终拨开人的细胞从里面拿出了这个晶体展开并解读它
The code comes out to be a fairly simple alphabet, four letters: A, T, C and G.
这个遗传编码由简单的字母表组成 只有四个字母:A T C和G
And to build a human, you need three billion of them.
要组装一个人 你需要30亿个这样的字母
Three billion. How many are three billion?
三十亿 三十亿有多大
It doesn't really make any sense as a number, right?
我们对这个数字没有任何概念 对吧
So I was thinking how I could explain myself better about how big and enormous this code is.
所以我在想我如何将这个编码数量的庞大性很好地传达给大家呢
But there is -- I mean, I'm going to have some help, and the best person to help me introduce the code is actually the first man to sequence it, Dr. Craig Venter.
所以我需要点帮助 最合适来帮我介绍遗传密码的人 是第一个将基因排序的人 克雷格文特尔博士
So welcome onstage, Dr. Craig Venter.
让我们欢迎克雷格文特尔博士
Not the man in the flesh, but for the first time in history, this is the genome of a specific human, printed page-by-page, letter-by-letter:
不是他本人 这也是历史上的第一次是一个人类的基因组一页一页一个字母一个字母的被打印出来

262,000 pages of information, 450 kilograms, shipped from the United States to Canada
总共262000页的信息量 四百五十千克 被美国船运到加拿大
only thanks to Bruno Bowden, Lulu.com, a start-up, did everything.
感谢布鲁诺?鲍登 Lulu.com网站 一个新兴公司做了所有这些事情
It was an amazing feat.
这真是令人赞叹的壮举
But this is the visual perception of what is the code of life.
但是这是生命编码比较直观的表达
And now, for the first time, I can do something fun.
现在 我是第一次做一些有趣的事情
I can actually poke inside it and read.
我能戳进去从这里面挑一段来读一读
So let me take an interesting book ... like this one.
我来找一本有意思的……比如这一本
I have an annotation; it's a fairly big book.
我放了书签在里面 这书太厚了
So just to let you see what is the code of life.
给你们看一下 什么是生命编码
Thousands and thousands and thousands and millions of letters.
成千上万的字母
And they apparently make sense.
它们当然都有意义
Let's get to a specific part. Let me read it to you: "AAG, AAT, ATA."
让我们聚焦到具体的一部分 读给你们听:"AAG AAT ATA"
To you it sounds like mute letters, but this sequence gives the color of the eyes to Craig.
你们可能觉得像是听天书 但是这个序列决定了格雷尔眼睛的颜色
I'll show you another part of the book.
在看看书的另外一部分
This is actually a little more complicated.
这一段稍微复杂一些
Chromosome 14, book 132: As you might expect. "ATT, CTT, GATT."
第14号染色体 书本编号132 可能和你们想的一样 "ATT CTT GATT"
This human is lucky, because if you miss just two letters in this position two letters of our three billion
这个人很幸运 因为如果他在这个地方少了2个字母 30亿中的2个字母
he will be condemned to a terrible disease: cystic fibrosis.
他就会患上一种非常可怕的疾病:囊肿性纤维症
We have no cure for it, we don't know how to solve it, and it's just two letters of difference from what we are.
目前没有治疗的方法 我们还没有解决方法 它仅仅和我们是2个字母的区别
A wonderful book, a mighty book, a mighty book that helped me understand and show you something quite remarkable.
这是一部鸿篇巨着 这本有力的书帮助我理解一切 向你们展示一些非凡的东西
Every one of you what makes me, me and you, you-is just about five million of these, half a book.
我们每个人 你我他 只需要这些中的500万个半本书
For the rest, we are all absolutely identical.
剩下的基因我们都是完全相同的
Five hundred pages is the miracle of life that you are. The rest, we all share it.

500页 涵盖了你的生命奇迹 余下的 我们全都一样
So think about that again when we think that we are different.
讨论人与人差异的时候反思一下
This is the amount that we share.
这是我们共有的东西
So now that I have your attention, the next question is: How do I read it? How do I make sense out of it?
所以现在请注意 下一个问题就是:怎么去读取这些信息?怎么理解和运用它们?
Well, for however good you can be at assembling Swedish furniture, this instruction manual is nothing you can crack in your life.
不管你在组装瑞典家具上有多在行 这么长的指令手册在你有生之年是不可能被破解的
And so, in 2014, two famous TEDsters, Peter Diamandis and Craig Venter himself, decided to assemble a new company.
因此在2014年两位着名的TED演讲者 彼得迪亚芒蒂思和克雷格文特尔本人 决定成立一个新公司
Human Longevity was born, with one mission: trying everything we can try
人类长寿公司就此诞生了 唯一的任务尽我们所能
and learning everything we can learn from these books, with one target making real the dream of personalized medicine,
解读出所有我们能在这些书本里读到的东西 只为达到一个目的:让个人化医疗成为现实
understanding what things should be done to have better health and what are the secrets in these books.
明白怎么做才能提高人类健康水平 了解这些书目背后的秘密
An amazing team, 40 data scientists and many, many more people, a pleasure to work with.
一个惊人的团队拥有四十名数据科学家和越来越多的人 和他们一起工作十分愉快
The concept is actually very simple.
实际上工作流程很简单
We're going to use a technology called machine learning.
我们用一种叫做机器学习的方法
On one side, we have genomes -- thousands of them.
一方面 我们有几千个基因组
On the other side, we collected the biggest database of human beings: phenotypes, 3D scan, NMR -- everything you can think of.
另一边我们建立一个超大的人类信息数据库:性状 3D扫描 核磁共振——所有你能想到的
Inside there, on these two opposite sides, there is the secret of translation.
在内部 在这两个端点之间 有神秘的翻译在进行
And in the middle, we build a machine.
我们在中间建了一个机器
We build a machine and we train a machine --well, not exactly one machine, many, many machines --to try to understand and translate the genome in a phenotype.
建好之后我们训练这台机器 实际上不只一台机器而是很多台试图去理解基因组并把它翻译成性状
What are those letters, and what do they do?
有哪些字母 它们控制什么性状
It's an approach that can be used for everything, but using it in genomics is particularly complicated.
这是普适的方法 可以用在所有问题上但用在基因组学上异常的复杂
Little by little we grew and we wanted to build different challenges.
一点一点有了进展后我们想要尝试更有挑战性的东西
We started from the beginning, from common traits.
最开始我们从常见的特征下手
Common traits are comfortable because they are common, everyone has them.
常见特征最容易因为它们太常见了每个人都有
So we started to ask our questions: Can we predict height?
我们开始提出如下问题 能预测身高吗?
Can we read the books and predict your height?
能不能解读这些书本信息来预测身高?
Well, we actually can, with five centimeters of precision.
是的我们可以 存在五厘米的误差
BMI is fairly connected to your lifestyle, but we still can, we get in the ballpark, eight kilograms of precision.
身体质量指数主要跟生活习惯密切有关 但我们仍然能预测得差不多存在8千克上下的误差
Can we predict eye color? Yeah, we can.
眼睛的颜色能不能预测?是的我们可以
Eighty percent accuracy. Can we predict skin color?

80%的正确率 皮肤颜色?
Yeah we can, 80 percent accuracy. Can we predict age?
可以 80%的正确率 我们可以预测年龄吗?
We can, because apparently, the code changes during your life.
可以 因为很明显基因随着年龄产生变化
It gets shorter, you lose pieces, it gets insertions.
DNA 会变短 缺失一些片段或者插入另外一些片段
We read the signals, and we make a model.
我们读取这些信号然后建立模型
Now, an interesting challenge:Can we predict a human face?
现在来个有意思的挑战:我们能不能预测人的面孔?
It's a little complicated, because a human face is scattered among millions of these letters.
这个略有点复杂 因为有几百万个碱基都对人脸产生影响
And a human face is not a very well-defined object.
而且人脸并不是一个构造十分精准的物体
So, we had to build an entire tier of it to learn and teach a machine what a face is, and embed and compress it.
所以必须要建立一整个单独的模块给机器去训练和学习人脸是什么再把这个模块压缩整合进去
And if you're comfortable with machine learning, you understand what the challenge is here.
如果你对机器学习有点概念的话就能够想象这个挑战是有多大
Now, after 15 years -- 15 years after we read the first sequence --this October, we started to see some signals.
现在15年过去了 15年前我们读取第一条序列 今年10月 我们总算有了些进展
And it was a very emotional moment.
当时还是很激动人心的
What you see here is a subject coming in our lab.
你现在看到的东西来自于我们的实验室
This is a face for us.
这是我们的一张脸
So we take the real face of a subject, we reduce the complexity,
我们要对测试对象的面孔进行简化
because not everything is in your face -- lots of features and defects and asymmetries come from your life.
因为并不是所有的特征都是面孔的一部分—很多特点 缺陷和不对称是生活的痕迹
We symmetrize the face, and we run our algorithm.okay
把面孔调整对称之后 跟我们运算的结果比较
The results that I show you right now, this is the prediction we have from the blood.
我刚才给你看的结果 是我们根据血液预测的
Now wait a second. In these seconds, your eyes are watching, left and right, left and right, and your brain wants those pictures to be identical.
等一下 你们的眼睛正在左右两边交替看,大脑希望两幅图是一模一样的
So I ask you to do another exercise, to be honest.
坦诚来说 我其实想请大家做另一件事情
Please search for the differences, which are many.
找找两幅图的不同点 其实非常多
The biggest amount of signal comes from gender, then there is age, BMI, the ethnicity component of a human.
性别提供最多的信息接下来是年龄 BMI(体质指数)和种族
And scaling up over that signal is much more complicated.
再考虑更多因素会变得更加复杂
But what you see here, even in the differences, lets you understand that we are in the right ballpark, that we are getting closer.
但是这样的结果即便有很多不同 表示我们在正确的范围内正在逐步接近
And it's already giving you some emotions.
它已经给了你一些情绪反应
This is another subject that comes in place, and this is a prediction.
这是另外一个测试对象这边是预测结果
A little smaller face, we didn't get the complete cranial structure, but still, it's in the ballpark.
脸小了一点 完整的颅骨结构没预测到 但至少像那么回事
This is a subject that comes in our lab, and this is the prediction.
这是又一个测试对象 这是预测结果
So these people have never been seen in the training of the machine.
机器接受训练时 它们从未看见过这些面孔
These are the so-called "held-out" set. okay
这就是所谓的随机测试组
But these are people that you will probably never believe.
并且你们不认识这些人 可能说服力不太够
We're publishing everything in a scientific publication, you can read it.
我们在学术期刊上发表了这些结果 你们可以去读一下
But since we are onstage, Chris challenged me.
但既然我们在台上 克里斯向我提出挑战
I probably exposed myself and tried to predict someone that you might recognize.
我尽我所能暴露自己 尝试着预测 某个你可能认识的人
So, in this vial of blood and believe me, you have no idea what we had to do to have this blood now, here
这里有一小管血液——你们很难想象我们为了一管血液我们花了多少工夫
in this vial of blood is the amount of biological information that we need to do a full genome sequence.
这一小管血液中蕴含了大量的生物信息我们需要做一个完整的基因组排序
We just need this amount. We ran this sequence, and I'm going to do it with you.
只需要这么多 我们已经完成了测序 下面我和你们一起做
And we start to layer up all the understanding we have.
我们综合了所有已知的信息
In the vial of blood, we predicted he's a male. And the subject is a male.
从这一管血液里 我们预测这是一名男性 被试者正是一名男性
We predict that he's a meter and 76 cm. The subject is a meter and 77 cm.
我们预测他身高1米76 被试身高1米77
So, we predicted that he's 76; the subject is 82.
预测他体重76kg 被试者是82kg
We predict his age, 38. The subject is 35.
我们还预测了他的年龄 38岁 被试者实际上是35岁
We predict his eye color. Too dark.
我们预测了他眼睛的颜色 非常深的黑色
We predict his skin color. We are almost there. That's his face.
我们预测他的皮肤颜色我们基本上是准确的 这是他的面孔
Now, the reveal moment: the subject is this person. And I did it intentionally.
现在到了揭晓的时刻:被试对象是这个人 我是有意拿自己做测试的
I am a very particular and peculiar ethnicity.
我属于一个特别又特殊的种族
Southern European, Italians they never fit in models.
南欧人 意大利人——从来都不符合模型预测
And it's particular -- that ethnicity is a complex corner case for our model.
而且这一种族在模型里是一个复杂的边界情况
But there is another point.
但还有另一个重点
So, one of the things that we use a lot to recognize people will never be written in the genome.
最常用的来辨识人的方法不是由基因组编译的
It's our free will, it's how I look.
是人们的自由意志 即我想让自己看起来怎么样
Not my haircut in this case, but my beard cut.
虽然我的发型不是我自己决定的 但胡子是的
So I'm going to show you, I'm going to, in this case, transfer it -- and this is nothing more than Photoshop, no modeling --the beard on the subject.
下面我们来看一下 我要进行下改变——单纯的用photoshop 不用建模——把胡子加上去
And immediately, we get much, much better in the feeling. So, why do we do this?
是不是立即觉得变得很相像了 因此 为什么我们要这样做?
We certainly don't do it for predicting height or taking a beautiful picture out of your blood.
当然不是为了预测身高或者描绘出你没有胡子时的完美照片
We do it because the same technology and the same approach, the machine learning of this code,
我们研究是因为同样的技术和手段 基因组的学习机器
is helping us to understand how we work, how your body works, how your body ages,
能帮助我们了解人类自身是如何工作的你的身体是如何协调工作的 你的身体是如何变老的
how disease generates in your body, how your cancer grows and develops, how drugs work and if they work on your body.
疾病在你的身体里是如何产生的 癌症是怎么出现和恶化的 药物是如何起作用的药物是不是能够对你的身体起作用
This is a huge challenge.
这是一个巨大的挑战
This is a challenge that we share with thousands of other researchers around the world.
这是一个我们和世界各地其他成千上万的研究者们一起面临的挑战
It's called personalized medicine.
它被称为 个性化医疗
It's the ability to move from a statistical approach where you're a dot in the ocean, to a personalized approach, where we read all these books
从只能借助统计学方法每个人都只是沧海一粟 到能够实现有针对性的治疗通过解码这些基因信息
and we get an understanding of exactly how you are.
我们能够彻底了解每一个人
But it is a particularly complicated challenge, because of all these books, as of today,
但这是一项异常复杂的挑战因为到目前为止在这么庞大的基因组信息中
we just know probably two percent: four books of more than 175.
我们大概只了解2%:175本书里的4本

tedyj01.jpg


And this is not the topic of my talk, because we will learn more.
当然这不是我今天演讲的主题因为我们会了解更多
There are the best minds in the world on this topic.
有很多顶尖的人才在从事这项工作
The prediction will get better, the model will get more precise.
预测会越来越准确模型会越来越精准
And the more we learn, the every time we will be confronted with decisions that we never had to face before about life, about death, about parenting.
随着了解的逐渐深入我们需要做的决定会越来越多而且是一些从前没有想象过的决定 关于生命 关于死亡 关于养育
So, this conversation and we are touching the very inner detail on how life works.
因此 我们正在不断接近基因内部的细节以解开生命机体如何工作之谜
And it's a revolution that cannot be confined in the domain of science or technology.
这是一项重要的革命 它不能被限制于科学技术领域
This must be a global conversation.
这是一个全球性的会话
We must start to think of the future we're building as a humanity.
我们必须思考我们的未来 要结合起全人类的力量
We need to interact with creatives, with artists, with philosophers, with politicians.
我们需要和创意人员 艺术家 哲学家 和政治家 一起相互讨论和影响
Everyone is involved, because it's the future of our species.
每一个人都被包含在内因为这是我们人类物种的未来
Without fear, but with the understanding that the decisions that we make in the next year will change the course of history forever. Thank you.
不要害怕 但是我们要明白我们接下来一年中所做的决定 都会彻底改变历史的进程 谢谢

分享到
重点单词
  • solvev. 解决,解答
  • layern. 层 vi. 分层 vt. 将某物堆积成层 n
  • complicatedadj. 复杂的,难懂的 动词complicate的过去
  • assemblevt. 聚集,集合,装配 vi. 集合,聚集
  • specificadj. 特殊的,明确的,具有特效的 n. 特效药,特性
  • predictv. 预知,预言,预报,预测
  • intentionallyadv. 有意地,故意地
  • challengen. 挑战 v. 向 ... 挑战
  • remarkableadj. 显著的,异常的,非凡的,值得注意的
  • approachn. 接近; 途径,方法 v. 靠近,接近,动手处理