David Silver - Deep Reinforcement Learning from AlphaGo to AlphaStar (Talk back at UAlberta) Part 1

David Silver – Deep Reinforcement Learning from AlphaGo to AlphaStar (Talk back at UAlberta) Part 1



[Applause] so [Applause] [Applause] it could be using imagination we can do this together you should which is if you're trying to keep this and let's see how far we can get so we talked about this goal of making something which is generally intelligent but how do you measure that and we say that something is really doing well I'd have me be sure that our algorithms of doing something accessible is to study to study games and to say well games a really perfect microcosm of the earth they can capture these really interesting aspects real well in this closed world has like absolutely fascinating strategies which should be studied by humans for thousands of years and what's even more importantly have a human we could direct competitor agents against to really establish like litmus test to say well obvious intelligence is the human well that we can beat them in their games then making out in those names is yes if we make the games interesting enough recover enough typical problem as well those little microcosms can actually end up being quite rich interesting and capture all the basic facts about real world we might want to carry with us and so this in video games such as Atari game is always the game start so that's a cartwheel to billable agents which can achieve human-level is different and so first definite era change now is when we took when we started 30s Atari 2600 opinion later again this was an idea that had its source and roots in area of Alberta and we kind of picked up a battle if you like and try to see how far we can push these systems using a combination of reinforcement learning in the deeper and so now if we go back to our future of the report menu cycle we can realize that we can realize that making concrete in this setting of playing it out where the world is now with these Atari and what's happening is we've got our agent here is brain can you succeed to actually play these these games yes we did that by combining these two elements together so first of all you know we use something called a multi-layer propositional function that takes these little patches of screen to give you features and processes those features to give you even which repeaters describe what's going to each card screen kind of deeper deeper and then combine this together to give you an estimate of the value of things the future scores gonna need and then it keeps the actual maximize dispensable very simple so what we did was we have to pick like this 357 different games that were provided it's like the emulator Atari and we saw there's something quite interesting happened so this is what happened in in the game of breakout so we saw that after just a few minutes of training and really have a panacea to figure out let's paint all over time it starts to try all different things it starts to see that our training which so we played a bunch of different things here we played several different games and actually was able to master a whole variety of different games I was quite different in the nature of I think this is not just the same kind of game with little tweaks but things which are really visually very different some of the Masai squalling some of the food coming three-dimensional games has flat screens that you look at some of them have got parallax there's all kinds of different things going on with all kinds of difficulties and interesting things happening in the long-term rewards how to observability all kinds of things happening in these in these games and across these games I was able to achieve human-level exceed performance about human testing okay enormous complexity that's led to many people trying to crack this this amazing challenge and historically they're not being able to make a huge amount of progress with the traditional approaches this past search space that has 270 unique States and so there tend to be 340 different positions in the entry so just a quick visualization of well which is features of things about it has a second piece of intuition which is the back unit and which kind of received and then helpless single cover she could think it was like a heuristic evaluation function in the game of chess a single number saying just a number between minus one one representing in terms of might be winners again and these were trained in the original ago and I'll talk about the subsequent versions called me these which main pipeline of machines this is between you and me the game until the game finishes and that could be a winner and from the winners of the game we can then predict from each of those positions and we can use that to train the train and the beautiful thing about that procedure is we we can generate as much data as you want we're not limited by whether or not human nature okay techniques research whether they know considering to mention to those so the real case is far far worse be companion still get a picture from this cartoon at the way in which we use these positive element maps to to make search more effective so first of all we would use breadth of especially by using a public network so this is a very intuitive idea basically said and that dramatically reduces the search base of what we have considered by classical in addition we can dramatically reduce definitely search and that's what value never comes here you can think of the value networkers basically giving us a way to follow up a whole-step screen which was kind of hidden beneath each of these of these leaf nodes and you don't get to look at it anymore if you have an accurate we have you been able to continue that Center to be able to so it's a public network as a proxy for what you would do if you were able to send walkway to the end of the game computer research please or why here okay so by putting those two techniques together we were able to go and challenge the the greatest player of recent times and they can listen on there's a nine-day effect on human go there the time we played in every one they to eat world titles which is a remarkable achievement and this matter to play did starting in about 2016 and it's really cool to realize that men make yourself back in context no previous programmer than ever to easily defeat the professional player and so so what happened so rich mentioned there's a movie about this okay so before I get one it was a fascinating game a little you look very interesting to be their installer this was unfolding and I can just tell me from a personal perspective it was the making sequence of the dance I've got very privileged to be not telling the truth

Author:

Leave a Reply

Your email address will not be published. Required fields are marked *