Where Computers Defeat Humans, and Where They Can’t AlphaGo是怎么学会下围棋的
ALPHAGO, the artificial intelligence system built by the Google subsidiary DeepMind, has just defeated the human champion, Lee Se-dol, four games to one in the tournament of the strategy game of Go. Why does this matter? After all, computers surpassed humans in chess in 1997, when IBM’s Deep Blue beat Garry Kasparov. So why is AlphaGo’s victory significant?
由Google的子公司DeepMind创建的人工智能系统AlphaGo,刚刚在一场围棋比赛中以四比一的成绩战胜了人类冠军李世石(Lee Se-dol)。此事有何重大意义?毕竟在1997年IBM深蓝(Deep Blue)击败加里·卡斯帕罗夫(Garry Kasparov)后,电脑已经在国际象棋上超越了人类。为什么要对AlphaGo的胜利大惊小怪呢?
Like chess, Go is a hugely complex strategy game in which chance and luck play no role. Two players take turns placing white or black stones on a 19-by-19 grid; when stones are surrounded on all four sides by those of the other color they are removed from the board, and the player with more stones remaining at the game’s end wins.
Unlike the case with chess, however, no human can explain how to play Go at the highest levels. The top players, it turns out, can’t fully access their own knowledge about how they’re able to perform so well. This self-ignorance is common to many human abilities, from driving a car in traffic to recognizing a face. This strange state of affairs was beautifully summarized by the philosopher and scientist Michael Polanyi, who said, “We know more than we can tell.” It’s a phenomenon that has come to be known as “Polanyi’s Paradox.”
然而和国际象棋不一样的是,没有人能解释顶尖水平的围棋是怎么下的。我们发现,顶级棋手本人也无法解释他们为什么下得那么好。人类的许多能力中存在这样的不自知,从在车流中驾驶汽车,到辨识一张面孔。对于这一怪象,哲学家、科学家迈克尔·波兰尼(Michael Polanyi)有精彩的概括,他说,“我们知道的,比我们可言说的多。”这种现象后来就被称为“波兰尼悖论”。
Polanyi’s Paradox hasn’t prevented us from using computers to accomplish complicated tasks, like processing payrolls, optimizing flight schedules, routing telephone calls and calculating taxes. But as anyone who’s written a traditional computer program can tell you, automating these activities has required painstaking precision to explain exactly what the computer is supposed to do.
This approach to programming computers is severely limited; it can’t be used in the many domains, like Go, where we know more than we can tell, or other tasks like recognizing common objects in photos, translating between human languages and diagnosing diseases — all tasks where the rules-based approach to programming has failed badly over the years.
这样的电脑编程方式是有很大局限的;在很多领域无法应用,比如我们知道但不可言说的围棋,或者对照片中寻常物品的识别、人类语言间的转译和疾病的诊断等——多年来,基于规则的编程方法在这些事务上几无建树。 Deep Blue achieved its superhuman performance almost by sheer computing power: It was fed millions of examples of chess games so it could sift among the possibilities to determine the optimal move. The problem is that there are many more possible Go games than there are atoms in the universe, so even the fastest computers can’t simulate a meaningful fraction of them. To make matters worse, it’s usually far from clear which possible moves to even start exploring.
What changed? The AlphaGo victories vividly illustrate the power of a new approach in which instead of trying to program smart strategies into a computer, we instead build systems that can learn winning strategies almost entirely on their own, by seeing examples of successes and failures.
Since these systems don’t rely on human knowledge about the task at hand, they’re not limited by the
fact that we know more than we can tell.
由于这些系统并不依赖人类对这项工作的已有知识,即使我们知道的比可言说的更多,也不会对它构成限制。 AlphaGo does use simulations and traditional search algorithms to help it decide on some moves, but its real breakthrough is its ability to overcome Polanyi’s Paradox. It did this by figuring out winning strategies for itself, both by example and from experience. The examples came from huge libraries of Go matches between top players amassed over the game’s 2,500-year history. To understand the strategies that led to victory in these games, the system made use of an approach known as deep learning, which has demonstrated remarkable abilities to tease out patterns and understand what’s important in large pools of information.
Learning in our brains is a process of forming and strengthening connections among neurons. Deep learning systems take an analogous approach, so much so that they used to be called “neural nets.” They set up billions of nodes and connections in software, use “training sets” of examples to strengthen connections among stimuli (a Go game in process) and responses (the next move), then expose the system to a new stimulus and see what its response is. AlphaGo also played millions of games against itself, using another technique called reinforcement learning to remember the moves and strategies that worked well.
在我们的大脑中,学习是神经元间形成和巩固关系的过程。深度学习系统采用的方法与此类似,以至于这种系统一度被称为“神经网络”。系统在软件中设置了数十亿个节点和连结,使用对弈实例组成的“训练集合”来强化刺激(一盘正在进行的围棋)和反应(下一步棋)的连结,然后让系统接收一次新的刺激,看看它的反应是什么。通过另一种叫做“强化学习”的技术,AlphaGo还和自己下了几百万盘棋,从而记住哪些走法和策略是有效的。 Deep learning and reinforcement learning have both been around for a while, but until recently it was not at all clear how powerful they were, and how far they could be extended. In fact, it’s still not,
but applications are improving at a gallop, with no end in sight. And the applications are broad, including speech recognition, credit card fraud detection, and radiology and pathology. Machines can now recognize faces and drive cars, two of the examples that Polanyi himself noted as areas where we know more than we can tell.
We still have a long way to go, but the implications are profound. As when James Watt introduced his steam engine 240 years ago, technology-fueled changes will ripple throughout our economy in the years ahead, but there is no guarantee that everyone will benefit equally. Understanding and addressing the societal challenges brought on by rapid technological progress remain tasks that no machine can do for us.
我们还有很长的路要走,但潜能是十分可观的。就像240年前詹姆斯·瓦特(James Watt)首次推出蒸汽机,技术推动的变革在未来几年里将会波及我们的整个经济,但不能保证每个人都能从中得到同等的好处。快速的技术进步带来的社会挑战,依然是需要我们去理解和应对的,这方面不能指望机器。
