电脑是如何学习即时辨认物体的?
日期:2018-04-05 17:19

(单词翻译:单击)

 MP3点击下载

Ten years ago, computer vision researchers thought that
10年前,计算机视觉研究人员认为,
getting a computer to tell the difference between a cat and a dog would be almost impossible,
要让计算机辨别猫与狗的差别,几乎是比登天还难,
even with the significant advance in the state of artificial intelligence.
即使用了相当先进的人工智能都很难办到。
Now we can do it at a level greater than 99 percent accuracy.
现在我们可以把辨别的准确度提升到99%以上。
This is called image classification -- give it an image, put a label to that image
这技术叫做图像分类--给计算机看图片,并给图片贴上标签,
and computers know thousands of other categories as well.
计算机还可以识别出许多其它类别的东西。
I'm a graduate student at the University of Washington, and I work on a project called Darknet,
我目前是华盛顿大学的研究生,我正在做一个专题叫做“暗黑网络”,
which is a neural network framework for training and testing computer vision models.
它是一个用来训练及测试计算机视觉模型的神经网络架构。
So let's just see what Darknet thinks of this image that we have.
所以,让我们来瞧瞧暗黑网络对我们照片识别能力的状况。
When we run our classifier on this image,
当我们在这张照片上开启我们的分类器,
we see we don't just get a prediction of dog or cat, we actually get specific breed predictions.
可以看到计算机现在不只在预测这是狗或猫,它实际上正在撷取特定品种的预测。
That's the level of granularity we have now. And it's correct. My dog is in fact a malamute.
这就是现在我们计算机的粒度等级。辨别正确。我的狗的确是只雪橇犬。
So we've made amazing strides in image classification, but what happens when we run our classifier on an image that looks like this?
所以,我们在图像识别上已经有了很大的进步,但如果我们用识别器来辨别这样的照片呢?
Well... We see that the classifier comes back with a pretty similar prediction.
嗯...可以看到从分类器得到的预测也相当类似。
And it's correct, there is a malamute in the image, but just given this label,
没错,图片中有一只雪橇狗,但它只给出一个标签,
we don't actually know that much about what's going on in the image.
我们对这张照片的理解还不是很完整。
We need something more powerful. I work on a problem called object detection,
我们需要更强的东西。我正在研究一个问题,叫做“物件侦测”,
where we look at an image and try to find all of the objects, put bounding boxes around them and say what those objects are.
我们把一张照片中的所有物体都找出来,用边界框把它们框起来,然后标示它们是那些东西。
So here's what happens when we run a detector on this image.
我们来看一下当我们在这一张图片上执行侦测软件时,会发生什么事。
Now, with this kind of result, we can do a lot more with our computer vision algorithms.
现在,有了这类的结果,我们就可以利用计算机视觉算法,帮我们做更多的事。
We see that it knows that there's a cat and a dog.
我们可以看到,计算机知道图片中有一只猫和狗。
It knows their relative locations, their size. It may even know some extra information.
它知道它们彼此的相对位置、大小。计算机甚至可能知道其它的信息。
There's a book sitting in the background. And if you want to build a system on top of computer vision,
它也看到了背景中有一本书。如果你想要建立一个基于计算机视觉系统的实用系统,
say a self-driving vehicle or a robotic system, this is the kind of information that you want.
比如说,自动驾驶车或机械人系统,这类就会是你想要的信息。
You want something so that you can interact with the physical world.
你会想要一个可以与实体世界互动的东西。
Now, when I started working on object detection, it took 20 seconds to process a single image.
当我开始做物件侦测时,它要花20秒才能处理一张图片。
And to get a feel for why speed is so important in this domain,
为了让各位体会为什么这个领域这么讲究速度,
here's an example of an object detector that takes two seconds to process an image.
我这边做个执行物件侦测器的示范,一张照片只要2秒的处理时间。
So this is 10 times faster than the 20-seconds-per-image detector, and you can see that by the time it makes predictions,
所以,比20秒一张的侦测器快了10倍,各位可以看到,在它识别图像的过程中,
the entire state of the world has changed, and this wouldn't be very useful for an application.
周围环境已经发生了变化,但对一个应用软件而言,这样的速度是很鸡肋的。
If we speed this up by another factor of 10, this is a detector running at five frames per second.
如果我们把另一个参数调升到10,这个侦测器每秒就可以识别5张图片。
This is a lot better, but for example, if there's any significant movement, I wouldn't want a system like this driving my car.
这样好多了,但,假如,移动很快的时候...我可不想在我车上装这样慢的系统。
This is our detection system running in real time on my laptop.
这是在我笔记本上运行的实时侦测系统。
So it smoothly tracks me as I move around the frame, and it's robust to a wide variety of changes in size, pose, forward, backward.
我在框框附近移动的时候,它可以很顺畅地追踪着我,而且,它可以根据不同的大小、姿势、前、后来做调整。
This is great. This is what we really need if we're going to build systems on top of computer vision.
太棒了。如果我们要建立一个基于计算机视觉系统的实用系统,这个才会是我真正想要的。
So in just a few years, we've gone from 20 seconds per image to 20 milliseconds per image, a thousand times faster.
所以,才几年的时间,我们从每20秒处理一张照片,进步到每张照片只要20毫秒,快了1000倍。

电脑是如何学习即时辨认物体的?

How did we get there? Well, in the past, object detection systems would take an image like this
我们是如何办到的?过去,物件侦测系统,会把一张像这样的照片,
and split it into a bunch of regions and then run a classifier on each of these regions,
分割成好几个小区块,然后在每一个小区块运行分类器软件,
and high scores for that classifier would be considered detections in the image.
相似度得分如果比较高,会被识别器认为照片侦测成功。
But this involved running a classifier thousands of times over an image,
但这样一张图片要执行好几千次的识别指令、
thousands of neural network evaluations to produce detection.
经过好几千次的神经网络评估才有办法侦测出来。
Instead, we trained a single network to do all of detection for us.
但我们不是这样做,我们训练了一个网络模型来帮我们完成所有的侦测。
It produces all of the bounding boxes and class probabilities simultaneously.
它可以同时产出边界框并同时对可能的结果进行评估。
With our system, instead of looking at an image thousands of times to produce detection,
有了我们的系统,你就不用一张图片看了好几千遍才能侦测出来,
you only look once, and that's why we call it the YOLO method of object detection.
你只要看一眼(YOLO),所以我们简称这个物件侦测技术为YOLO。
So with this speed, we're not just limited to images; we can process video in real time.
所以,有了这样的辨识速度,我们不只可以侦测图片;还可以处理实时的影片。
And now, instead of just seeing that cat and dog, we can see them move around and interact with each other.
现在各位看到的不是猫、狗的静态图片,而是有它们在移动、互动的动态影片。
This is a detector that we trained on 80 different classes in Microsoft's COCO dataset.
这是我们用微软COCO资料集里80种不同的类别训练出来的辨识器。
It has all sorts of things like spoon and fork, bowl, common objects like that.
它包含各种东西,象是汤匙、叉子、碗这类的日常用品。
It has a variety of more exotic things: animals, cars, zebras, giraffes.
它还有很多奇妙的东西:动物、车子、斑马、长颈鹿。
And now we're going to do something fun.
现在我们要进行一件好玩的事。
We're just going to go out into the audience and see what kind of things we can detect.
我们会进到观众席,去看看能辨识到哪些东西。
Does anyone want a stuffed animal? There are some teddy bears out there.
有谁要填充娃娃?这边还有一些泰迪熊。
And we can turn down our threshold for detection a little bit, so we can find more of you guys out in the audience.
我们现在降低一下对侦测结果的精确度的要求,这样我们可以在观众席中找到更多东西。
Let's see if we can get these stop signs. We find some backpacks. Let's just zoom in a little bit.
我们来看看能不能侦测到停止标志。我们有侦测到一些背包。现在把镜头拉近一点。
And this is great. And all of the processing is happening in real time on the laptop.
这真的很厉害。所有的侦测流程都可以在笔记本里实时呈现。
And it's important to remember that this is a general purpose object detection system, so we can train this for any image domain.
更重要的是,这只是一个一般用的物件侦测系统,我们还可以训练它辨别任何领域的照片。
The same code that we use to find stop signs or pedestrians, bicycles in a self-driving vehicle,
同样的程序码,放在自动驾驶车里,可以侦测到停止标志、行人、脚踏车,
can be used to find cancer cells in a tissue biopsy.
但放到组织切片就可以侦测出癌症细胞。
And there are researchers around the globe already using this technology for advances in things like medicine, robotics.
现在全球有很多研究人员已经开始在使用这项技术做进一步的研究,象是医药、机械人领域。
This morning, I read a paper where they were taking a census of animals in Nairobi National Park with YOLO as part of this detection system.
今天早上,我读到一篇文章,在奈洛比国家公园里,他们要对动物们进行统计调查,YOLO就是其使用的侦测系统的一部分。
And that's because Darknet is open source and in the public domain, free for anyone to use.
而这一切都是因为暗黑网络是开放原始码,在公众领域,任何人都可以免费使用。
But we wanted to make detection even more accessible and usable,
但我们希望侦测系统可以更亲民、更好用,
so through a combination of model optimization, network binarization and approximation,
所以在经过模型优化、网络二值化及近似度化的整合后,
we actually have object detection running on a phone.
我们终于可以在手机上侦测物件。
And I'm really excited because now we have a pretty powerful solution to this low-level computer vision problem,
而我真的相当兴奋,因为我们现在在低阶的计算机影像处理问题上有了相当强力的解决方式,
and anyone can take it and build something with it.
任何人都可以拿去并创造一些东西。
So now the rest is up to all of you and people around the world with access to this software,
所以,接下来就看各位以及全世界所有人,用这个软件大展身手了,
and I can't wait to see what people will build with this technology. Thank you.
我真的等不及想看看你们用这项科技所做出来的产品。谢谢。

分享到