Lead from the subsoil CS188.1x Artificial Intelligence or final impressions

Prologue

Hello again!
Since the first part was received favorably, I decided to write about all my impressions after the completion of the course.

Summary of the previous series: I decided to learn python, after Lutz and Nick Parlante signed up for a fundamental CS course (unfortunately, not always the python style), for the easy course “Python for the youngest” (already finished). Well, somewhere between them, I got involved in CS188.1x AI , judging once I train a python, so on serious things.
In the previous review I had time to consider the first 2 weeks of the course (about 30%), on November 19 I passed the hard deadline for the final exam, and I want to summarize.

We continue to get acquainted with the subtleties of AI-being

After the unsuccessful implementation of project 1, students waited for a layer of information about Constraint Satisfaction Problems (CSP). In short, this thing is about how to technically solve problems with a known number of restrictions - the most typical is, for example, to make a schedule for the university, taking into account the employment of professors, audiences, comparing the available time and so on. It seems as if everything is intuitively clear (but not always easy), but there are all sorts of techniques that speed the bypass of the state graph. Those interested can play with this interactive html-coy (There you need to assign colors, neighboring elements cannot be of the same color. It is clearly seen how the selection of solutions differs without optimizations and with certain optimizations of the graph traversal). In general, this topic was not difficult.

Then we waited for Game Trees and Decision Theory. Here the professor introduced the solution of game problems in the conditions of the opposition of the enemy (in the case of pakmen, these were ghosts). In principle, the same Search on Game Tree, taking into account all sorts of moves of the opponent, both optimal and not. Separately, there is Alpha-Beta Pruning as a way to significantly reduce the time to bypass a large tree by cutting off individual guaranteed unpromising branches.

Project 2 said to me: "My friend, come here again your python." Actually again a sharp, but not unexpected, transition from words to code. That same Pacman World remained, only enemy agents were added.

It was necessary to implement ReflexAgent first, which plans nothing, acts solely from the current game situation. Next - MinimaxAgent, comes from the optimal actions of the enemy and looks close (works slowly) into the future. Then you clearly understand how time is shortened by an order of magnitude when looking a few steps ahead using Alpha-Beta. ExpectimaxAgent acts on the basis of the possible stupidity of the opponent, which sometimes allows you to emerge victorious from the seemingly fatal game situations. But for dessert - “Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable evaluation function”. I got 0/6 for it, because I wrote the logic and didn’t have time to debug it. Conclusion - do not sit down at the project just before the deadline, if it is on Sunday night, and you on Monday to work.
')

Learning AI to learn, or the inimitable "Claw"

The last two weeks have been devoted to Markov Decision Processes (MDP), a variant of representing the world as MDP and Reinforcement Learning (RL), when we do not know anything about the conditions of the world around us, and should learn it like that. The key idea is rewards, positive or negative rewards for various actions.

Here, of course, from the first meeting, he captured my mind, the great and terrible Claw.

At the same time funny and scary, he stupidly waving his paw, but he needs to learn to walk in order to go to college :) I enclose a piece of video from the lecture, it will be clearer. Looking ahead, I’ll say that in the last project I managed to train my own pet to walk.

Oftopik and dreams

Just at this time on Habré ran a great article with great videos about Hexapod do it yourself. Now the idea of collecting the same thing (the STMF4DISCOVERY is lying around with a gyroscope and accelerometer, fitting a wifi / bluetooth whistle) is haunted me, and try to teach him how to walk. Just to see how a pseudo-life insect is squirming. Oh, to force yourself ...

These two topics seemed to be quite difficult, for example, the question of how to find a balance between the optimal behavior of Exploration vs Exploitation, when to decide what is enough to learn, it's time to act in accordance with the knowledge gained.

In project 3, they were a little distracted from the maze with a yellow kolobok, resolved issues of the so-called GridWorld.

How to decide where to go optimally in conditions when we press "north" and with a certain probability at the same time we step to the "east"? How to behave in the process of exploring this world? What is good, what is bad? The implemented algorithms turned out to be working also for the Claw. A little dopilivanie - ready to learn and pekmen. It turned out to be very interesting to come back and compare how our pekmen-killer behaves, trained by several hundreds of games against ghosts. Here it is the difference of approaches, write ExpectiMax Search to solve the game in the maze, or teach Pecmen to learn how to win.

Finish line

A week was given for the completion of the project 3 and the training of one's darling in Final Exam Practice, the marks for which were not considered. A week of time was allotted for the final exam, you could choose the 48-hour corridor, and at that time calmly answer. This was both a plus and a relaxing minus. Another cruel discovery was one single attempt to answer most of the questions (approximately 40 out of 54, the rest were given two attempts). And if in questions from the True / False series this is justified, then in some others with 4-6 checkboxes - it strained. With two attempts to answer it was easier. The questions covered the course quite tightly. Lecturers exam was estimated at 2-5 hours, I spent about 1 + 4 hours in 2 days (answered slowly, in some places I reviewed lectures), well, I could not pass the exam without errors, the final result (176/200).

Instead of concluding or summarizing your impressions

Very interesting! In the lectures and test tasks, the python does not smell, but it smells very strongly in specialized pacman projects. A full sense of using the language as a tool for solving specific problems - I think for study is very good. The presentation of the material by the lecturer is excellent, and for me personally there was enough information in the lectures for solving problems, although some extras. materials are listed in the course wiki. Several times the forum was very useful. Well, the technical part is up to par, everything is convenient, buttons, flags, answers. It should be noted that on Sunday evenings before the deadline auto-grader cool slows down (checks the code up to 20-30 minutes). Brakes the forum, if you watch it right below the video / question section, but here I am sinning on my old netbook.

Unfortunately, having played with youtube-dl, I didn’t understand how to get lectures with edt subtitles for yourself, and not youtube auto-subtitles, if someone decided this question - write plz.

I spent 4 hours per week for lectures and homework, and 10 hours (and maybe more, difficult to count) for each project (once every 2 weeks). I did not keep my own outline, now I regret it a little.

Well, the special pleasure brought such Easter eggs from the team CS188x, as:

if 0 == 1: print 'We are in a world of arithmetic pain'

As a bonus, I collected a few links (with time) on videos in lectures that affect various real-world applications of Robo-AI: aibo soccer , google car , shirt shirt holder , terminator , aibo learning to walk , humanoid robot learning to walk . For students of the course, the veil of secrecy over that, but how is it still programmed a little opened.
In the 2nd part of the course enroll in anyway!

See you at Professor Klein.

Source: https://habr.com/ru/post/159433/

All Articles