大数据问题症结: 短期内数据不适合推测
日期:2016-06-16 10:03

(单词翻译:单击)

BBC News – You may be familiar with the statistic that 90% of the world’s data was created in the last few years. Indeed, every two years for about the last three decades the amount of data in the world has increased by about 10 times.

BBC新闻 – 你可能熟悉这个统计数据,世界上90%的数据是过去几年创建的。确实,在过去的大约30年中,世界上的数据量每两年就增加10倍左右。

One of the problems with such a rate of information increase is that the present moment will always loom far larger than even the recent past. Short-sightedness is built into the structure, in the form of an overwhelming tendency to over-estimate short-term trends at the expense of history.

这样的信息增速带来的问题之一是,目前时刻总是比过去时刻,甚至是刚刚过去的时刻显得重要得多。短视成了思维结构的内置功能,表现为以忽略历史为代价高估短期趋势的压倒性倾向。

To understand why this matters, consider the findings from social science about ‘recency bias’, which describes the tendency to assume that future events will closely resemble recent experience. It’s a version of what is also known as the availability heuristic: the tendency to base your thinking disproportionately on whatever comes most easily to mind.

要理解为什么这很重要,不妨想想社会科学关于“近期偏差”的发现,它描述的就是人们倾向于认为未来事件与近期经历大体相似。它的另一个版本又被称为可用性启发法:不管什么,最容易浮上心头的,人们往往不能对它进行恰如其分的思考。

It’s also a universal psychological attribute. If the last few years have seen exceptionally cold summers where you live, for example, you might be tempted to state that summers are getting colder – or that your local climate may be cooling. In fact, you would need to take a far, far longer view to learn anything meaningful about climate trends.

这也是一种普遍的心理属性。例如,过去几年你生活的地方出现了格外冷的夏天,你可能会说夏天在变冷 – 或当地气候在变冷。事实上,你可能需要看得很长远很长远,才能了解有关气候趋势的有意义的现象。

The same tends to be true of most complex phenomena in real life: stock markets, economies, the success or failure of companies, war and peace, relationships, the rise and fall of empires. Short-term analyses aren’t only invalid – they’re actively unhelpful and misleading.

同样的道理往往也适用于现实生活中最复杂的现象:股市,经济,公司成败,战争与和平,人际关系,帝国兴衰。短期分析不仅无效,而且越帮越忙,南辕北辙 – 积极却误导。

It’s also worth remembering that novelty tends to be a dominant consideration when deciding what data to keep or delete. Out with the old and in with the new: that’s the digital trend in a world where search algorithms are intrinsically biased towards freshness. A bias towards the present is structurally engrained in almost all the technology surrounding us.

还值得记住的是,在决定要保存或删除什么样的数据时,新颖往往是首要考虑。除旧迎新是搜索算法本质上偏重新鲜度的世界中呈现的数字化趋势。对目前的偏重结构化地渗透在我们周围几乎一切技术中。

What to do? This isn’t just a question of being better at preserving old data. More importantly, it’s about determining what is worth preserving in the first place. Mere accumulation is no kind of answer. In an era of bigger and bigger data, what you choose not to know matters just as much as what you do.

怎么办呢?这不仅仅是更好地保存旧数据的问题。更重要的是,这首先是确定什么值得保存的问题。一味积累不是什么答案。在数据越来越大的时代,选择不知道什么与选择知道什么同样重要。

分享到