探秘OKCupid: 网络交友中的数学
日期:2017-11-02 18:49

(单词翻译:单击)

 MP3点击下载

Hello, my name is Christian Rudder, and I was one of the founders of OkCupid.
大家好,我叫克里斯蒂安·拉德,我是OKCupid网站的创办人之一。
It's now one of the biggest dating sites in the United States.
这个网站现在已经是全美最大的交友网站。
Like most everyone at the site, I was a math major.
就像这网站上大多数其他人一样,我是学数学的。
As you may expect, we're known for the analytic approach we take to love. We call it our matching algorithm.
正如你所期待的那样,我们擅于分析,我们把该方法也应用在了爱情上。我们把它叫做“配对算法”。
Basically, OkCupid's matching algorithm helps us decide whether two people should go on a date.
基本上,OK Cupid的配对算法帮助我们决定两个人是否应该约会。
We built our entire business around it.
我们围绕这一点来打造我们的整个业务。
Now, algorithm is a fancy word, and people like to drop it like it's this big thing.
“算法”这个词说起来专业而高级,大家喜欢把它想成很大的一件事。
But really, an algorithm is just a systematic, step-by-step way to solve a problem.
但其实,算法只不过是一个系统的、一步一步的解决问题的方法。
It doesn't have to be fancy at all.
它根本没有那么复杂。
Here in this lesson, I'm going to explain how we arrived at our particular algorithm, so you can see how it's done.
现在,我将为大家解释我们怎样得出这一个特殊的算法,你会在这里看到它是怎样成形的。
Now, why are algorithms even important? Why does this lesson even exist?
为什么算法如此重要?为什么我们要有这堂课?
Well, notice one very significant phrase I used above:
请注意我刚才提到的一个很重要的词:
they are a step-by-step way to solve a problem,
它们是一种"逐步"解决问题的方法,
and as you probably know, computers excel at step-by-step processes.
你或许也知道,电脑擅长于一步一步的运算过程。
A computer without an algorithm is basically an expensive paperweight.
没有算法的电脑,基本上只是一个昂贵的压纸器。
And since computers are such a pervasive part of everyday life, algorithms are everywhere.
由于电脑已经普及到我们的日常生活中,所以算法是无处不在的。
The math behind OkCupid's matching algorithm is surprisingly simple.
OK Cupid配对算法背后的数学逻辑是非常简单的。
It's just some addition, multiplication, a little bit of square roots.
就是一些加法,乘法,再来一点平方根。
The tricky part in designing it was figuring out how to take something mysterious, human attraction,
不过,设计这套算法的关键部分,在于要找出那些神秘的人与人之间的相互吸引力,
and break it into components that a computer can work with.
并把它解构成电脑可以处理的部分。
The first thing we needed to match people up was data, something for the algorithm to work with.
我们要做的第一件事就是把人和数据关联起来,这样算法才能生效。
The best way to get data quickly from people is to just ask for it.
要最快的从人们那里得到数据,最好的办法就是直接询问他们。
So we decided that OkCupid should ask users questions, stuff like,
我们决定应该让OK Cupid向用户问问题,比如说:
"Do you want to have kids one day?"
“你会想要小孩吗?”,
"How often do you brush your teeth?"
“你多久刷一次牙?”,
"Do you like scary movies?"
“你喜欢看恐怖电影么?”
And big stuff like, "Do you believe in God?"
也有严肃些的问题,比如:“你相信上帝么?”
Now, a lot of the questions are good for matching like with like, that is, when both people answer the same way.
目前有很多问题在进行同类型配对上都很合适,就是当双方的答案相同时。
For example, two people who are both into scary movies are probably a better match than one person who is and one who isn't.
比如,两个人都喜欢看恐怖电影,比起一个喜欢一个不喜欢,更有可能配对成功。
But what about a question like, "Do you like to be the center of attention?"
但如果碰到下面的问题:“你喜欢成为关注的中心么?”
If both people in a relationship are saying yes to this, they're going to have massive problems.
如果交往中的双方都回答是,那他们可有大问题了。
We realized this early on, and so we decided we needed a bit more data from each question.
我们很早就意识到了这一点,所以我们觉得需要在每个问题中再收集多一些的数据。
We had to ask people to specify not only their own answer, but the answer they wanted from someone else.
我们不仅要人们回答自己的看法,也要他们回答他们期待对方如何回答。
That worked really well. But we needed one more dimension.
这方法很有效,不过我们还要再多加一个维度。
Some questions tell you more about a person than others.
相比其它问题,有些问题更能让你认清一个人。
For example, a question about politics, something like, "Which is worse: book burning or flag burning?"
比如,关于政治的问题,“焚烧书籍或者国旗,哪个更糟糕?”
might reveal more about someone than their taste in movies.
这能揭露人们电影口味之外的东西。
And it doesn't make sense to weigh all things equally, so we added one final data point.
同时,并不是所有问题都同等重要的,所以我们最后增加了一个数据点。
For everything that OkCupid asks you, you have a chance to tell us the role it plays in your life.
任何OK Cupid问你的问题,你都可以告诉我们其在你生命中的重要性。
And this ranges from irrelevant to mandatory. So now, for every question, we have three things for our algorithm:
它的程度从“无关”到“必要”。所以现在,每一个问题我们有三个信息提供给算法:
first, your answer; second, how you want someone else -- your potential match -- to answer; and third, how important the question is to you at all.
第一,你的答案;第二,你希望别人怎么回答,也就是你潜在的对象的答案;第三,这个问题对你有多重要?
With all this information, OkCupid can figure out how well two people will get along.
有了所有的这些信息,OK Cupid就可以知道两个人的相处和谐程度是如何的了。
The algorithm crunches the numbers and gives us a result.
算法吃进数字,吐出答案。
As a practical example, let's look at how we'd match you with another person. Let's call him "B."
举个实际的例子吧,一起来看一下我们是怎样把你和另外一个人进行配对的。暂且称他为“B”。
Your match percentage with B is based on questions you've both answered.
你和B的适配度是基于你们双方都回答过的问题。
Let's call that set of common questions "s."
姑且把这些共同问题的集合称之为“s”吧。
As a very simple example, we use a small set "s" with just two questions in common, and compute a match from that.
简单举例,我们用一个小集合“s”--只需两个共同回答过的问题,电脑就会根据它算出适配度。

探秘OKCupid:网络交友中的数学

Here are our two example questions. The first one, let's say, is, "How messy are you?"
下面就是我们的两道例题。第一个问题是:“你有多邋遢?”
And the answer possibilities are: very messy, average and very organized.
可供选择的答案选项有:非常邋遢,一般和非常整洁。
And let's say you answered "very organized,"
我们假设你回答的是“非常整洁”,
and you'd like someone else to answer "very organized," and the question is very important to you.
你期待别人的回答是“非常整洁”,并且对你来说,这个问题非常重要。
Basically, you're a neat freak. You're neat, you want someone else to be neat, and that's it.
大致说来,你就是个有洁癖的人。你是个爱干净的人,你也希望对方同样如此,就这样。
And let's say B is a little bit different. He answered "very organized" for himself,
我们假设B有些不同。他的回答是自己非常有条理,
but "average" is OK with him as an answer from someone else, and the question is only a little important to him.
但是他也能接受别人的答案是“一般”,这个问题于他而言不太重要。
Let's look at the second question, from our previous example: "Do you like to be the center of attention?"
我们看第二个问题,就是我们最开始举例的:“你喜欢成为关注的中心么?”
The answers are "yes" and "no."
答案只有“是”或者“否”。
You've answered "no," you want someone else to answer "no," and the question is only a little important to you.
现在你的回答是“否”,你希望别人回答的也是“否”,这个问题对于你不太重要。
Now B, he's answered "yes." He wants someone else to answer "no,"
而B呢,他自己的回答是“是”,他希望别人回答“否”,
because he wants the spotlight on him, and the question is somewhat important to him.
因为他希望所有焦点都在他身上,而这个问题对他还算重要。
So, let's try to compute all of this.
现在,我们让电脑来处理这些吧。
Our first step is, since we use computers to do this,
我们的第一步是,既然我们要用电脑来处理它,
we need to assign numerical values to ideas like "somewhat important" and "very important," because computers need everything in numbers.
我们就需要给诸如“还算重要”和“非常重要”设定一些数值,因为电脑需要的是数字。
We at OkCupid decided on the following scale: "Irrelevant" is worth 0. "A little important" is worth 1.
在OK Cupid上我们设定了如下级别:“无关”是0,“不太重要”的值是1。
"Somewhat important" is worth 10. "Very important" is 50. And "absolutely mandatory" is 250.
“还算重要”的值是10,“非常重要”的值是50,“绝对必要”的值是250。
Next, the algorithm makes two simple calculations.
接下来,算法要做两个简单的计算。
The first is: How much did B's answers satisfy you?
第一个是:B的回答有多符合你的期望?
That is, how many possible points did B score on your scale?
也就是,B在你的数值范围内能得多少分?
Well, you indicated that B's answer to the first question, about messiness, was very important to you.
你在第一个有关整洁的问题上,表示B的答案对你是非常重要。
It's worth 50 points and B got that right.
它值50分,而B正好符合。
The second question is worth only 1, because you said it was only a little important.
第二个问题只值1分,因为你说这个问题对你不太重要。
B got that wrong, so B's answers were 50 out of 51 possible points. That's 98% satisfactory. Pretty good.
B答错了,所以B的回答在51分满分里拿到了50分。适配满意度是98%。非常好。
The second question the algorithm looks at is: How much did you satisfy B?
算法的第二个计算是:B对你的回答有多满意?
Well, B placed 1 point on your answer to the messiness question and 10 on your answer to the second.
对于你有关整洁性的回答,B设置了1分,而对第二个问题的答案设置了10分。
Of those 11, that's 1 plus 10, you earned 10 -- you guys satisfied each other on the second question.
满分11分,就是1+10,你得到了10分,在第二个问题上,你俩彼此都满意。
So your answers were 10 out of 11 equals 91 percent satisfactory to B. That's not bad.
所以你在满分11分里获得了10分,相当于B的91%的满意度。还不错。
The final step is to take these two match percentages and get one number for the both of you.
最后一步是把两个适配度百分比放在一起,为你俩打一个分数。
To do this, the algorithm multiplies your scores, then takes the nth root, where "n" is the number of questions.
为得到这一结果,算法会把你们两人的得分相乘,然后开n次方根,n就是问题的数目。
Because s, which is the number of questions in this sample, is only 2,
因为“s”-- 也就是这个例子中问题的数目,只有“2”,
we have: match percentage equals the square root of 98 percent times 91 percent. That equals 94 percent.
所以我们得到的适配度百分比就等于98%乘以91%再开平方根。等于94%。
That 94 percent is your match percentage with B.
即你们的适配度百分比等于94%。
It's a mathematical expression of how happy you'd be with each other, based on what we know.
这是基于我们所知道的信息,通过数学方法来表达你们彼此之间相处的愉快程度是怎样的。
Now, why does the algorithm multiply, as opposed to, say, average the two match scores together, and do the square-root business?
为什么算法要相乘,而不是除?比如,把两个分数求平均值以后再开平方根?
In general, this formula is called the geometric mean.
总的来说,这个公式叫几何平均数。
It's a great way to combine values that have wide ranges and represent very different properties.
它很适合处理差异很大的数据以及代表不同属性的数据。
In other words, it's perfect for romantic matching.
换句话说,它能完美的计算出浪漫适配度。
You've got wide ranges and you've got tons of different data points, like I said, about movies, politics, religion -- everything.
你们可选的范围很大,有数不清的数据点,就像我刚说过的,有关电影的、政治的、宗教的以及所有的一切。
Intuitively, too, this makes sense.
凭直觉讲,这很有道理。
Two people satisfying each other 50 percent should be a better match than two others who satisfy 0 and 100, because affection needs to be mutual.
两个人彼此的满意度是50%,会好过那些两个人彼此满意度是0或者100的,因为爱情应该是互相的。
After adding a little correction for margin of error, in the case where we have a small number of questions,
在增加了对误差幅度的小修改之后--这种情况在问题量很小的时候会出现,
like we do in this example, we're good to go.
就像我们刚举的运算实例一样--这套算法就可以运作了。
Any time OkCupid matches two people, it goes through the steps we just outlined.
任何时候,当OK Cupid将两个人配对时,它都会按照我们刚介绍的步骤来进行。
First it collects data about your answers,
首先它收集你的答题的数据,
then it compares your choices and preferences to other people's in simple, mathematical ways.
然后它比较你的选项和你期待的对方的选项,以简单的、数学的方法来进行。
This, the ability to take real-world phenomena and make them something a microchip can understand,
这种能将现实世界的现象转化为电脑芯片能读取的数据的能力,
is, I think, the most important skill anyone can have these days.
我认为,是现代最重要的一种技术。
Like you use sentences to tell a story to a person, you use algorithms to tell a story to a computer.
就像你用句子来给一个人讲故事一样,你现在是用算法来跟电脑讲故事。
If you learn the language, you can go out and tell your stories.
如果你学会了这种语言,你就可以去讲你的故事了。
I hope this will help you do that.
我希望我说的这些能帮助你做到这点。

分享到
重点单词
  • approachn. 接近; 途径,方法 v. 靠近,接近,动手处理
  • massiveadj. 巨大的,大规模的,大量的,大范围的
  • marginn. 差额,利润,页边空白,边缘 vt. 使围绕于,加边
  • specifyv. 指定,阐述,详细说明
  • affectionn. 慈爱,喜爱,感情,影响
  • samplen. 样品,样本 vt. 采样,取样 adj. 样
  • potentialadj. 可能的,潜在的 n. 潜力,潜能 n. 电位,
  • figuren. 图形,数字,形状; 人物,外形,体型 v. 演算,
  • computev. (用计算机或计数器)计算,估算,估计
  • additionn. 增加,附加物,加法