A robot from MIT is learning to play Jeng, relying on vision and touch.
A special approach to machine learning can help robots learn how to assemble telephones and work with other small parts on the conveyor.
In the basement of MIT 3rd Corps, the robot carefully reflects on its next move. He gently pokes the tower of blocks, looking for the most suitable block to pull out so as not to destroy the entire tower. So is his single, slow but surprisingly dynamic game "Genga". ')
The robot, designed by engineers from MIT, is equipped with a soft-pin grip, a bracelet with a pressure sensor and an external camera - and all this they use to see and feel both the entire tower and its individual blocks. While the robot gently presses on the block, the computer perceives visual and tactile feedback from the camera and bracelet, comparing the measurements with previous moves. It also calculates the possible consequences of these moves - specifically, whether it will succeed to extract a certain block, taking into account the specific configuration of the tower and with the application of a force of a certain magnitude. Then, in real time, the robot “learns”, whether it is necessary to continue pushing the block, or whether it is necessary to switch to the new in order to prevent the tower from falling.
A detailed description of the robot, playing "Jangu", were published in January in the journal Science Robotics. Alberto Rodriguez, Adjunct Professor from the Career Center named. Walter Henry Gale of the MIT Mechanical Engineering Department, says that the robot demonstrates something that was difficult to achieve in the development of previous systems: the ability to quickly learn the best way to accomplish a task not only from visual data, which is often used in robotics, but also by tactile, physical interaction.
“In contrast to more logical tasks or games, for example, chess or go, in order to play Jeng, one must have good physical skills - to feel, pull, place and align blocks. This requires interactive perception and manipulation, you need to touch the tower to understand how and when to move blocks, says Rodriguez. - It is very difficult to simulate such a task, so the robot has to learn in the real world, interacting with the real Jenga tower. The main difficulty is the need to study on a relatively small number of experiments using common sense when applied to objects and physics. ”
He says that the tactile learning system developed by them can be used for other tasks besides “Djenga”, especially in those that require careful physical interaction, for example, sorting recyclable garbage or collecting consumer products.
“On the assembly line for phones, almost every step requires the feeling that the part has fallen into place, or that the screw is twisted — it all comes from tactile and power sensations, not visual ones,” says Rodriguez. “Educational models of such actions are by far the most delicious segment of this technology.”
Lead author of the work is MIT postgraduate student Nima Faseli. The team also includes: Mikel Oller, Jiajun Wu, Zheng Wu, and Joshua Tenenbaum, professor of cognitive sciences and studying the work of the brain at MIT.
Pull up
In the game “Jenga”, which means “build” in Swahili, 54 rectangular blocks are put into 18 layers with 3 blocks in each, so that in the adjacent layers the blocks are perpendicular to each other. The goal of the game is to carefully remove the blocks and place them on top of the tower, building a new level so that the tower does not fall.
To program a robot to play Jeng, traditional machine learning (MO) schemes would require to describe everything that can happen when a block, robot and tower interact - these are quite expensive calculations that require processing data from thousands or even tens of thousands of attempts to get block.
Instead, Rodriguez and colleagues began to look for a more data-efficient way for the robot to learn how to play Jangu, inspired by the human cognitive abilities and how we ourselves could approach this game.
The team adapted for the task the industry-standard robotic capture ABB IRB 120, and then installed the Jenga tower in an accessible location to capture, and the training period began. At first, the robot selected random blocks and a place on the block where it was necessary to press. Then he put a little effort, trying to squeeze the block out of the tower.
During each attempt, the computer recorded visual and tactile measurements associated with it, and noted whether it ended in success.
Instead of spending tens of thousands of such attempts (then the tower would have to be restored as many times), the robot learned only 300. Attempts at similar measurements and results were grouped, denoting certain aspects of the behavior of the blocks. For example, one group of data could indicate attempts to move a block that resists movement, another - work with a block that moved easily, and the third - attempts that led to the fall of the tower. For each data group, the robot has developed a simple model that predicts the behavior of the unit based on its current visual and tactile measurements.
Phasel says that such grouping technology seriously increases the efficiency with which the robot learns this game, and was inspired by the natural way in which people group similar objects' behavior. "The robot builds data clusters and then learns the models for each of these clusters, rather than learning from a model that describes everything that can happen in principle."
Collecting a pile
The researchers tested their approach by comparing it with the advanced MO algorithms in a computer simulation of the game using the MuJoCo simulator. The data obtained in simulators allow scientists to understand how a robot would learn in the real world.
“We provide these algorithms with the same data that our system receives in order to see how they can learn how to play Jangu at a similar level,” says Oller. “Compared to our approach, for mastering the game, these algorithms had to play with the number of towers several orders of magnitude greater than what we had.”
The team wondered if their approach to the MO could compete with human players, and held several informal competitions with volunteers.
“We looked at how many blocks a person could get from the tower before it falls, and the difference was not so big,” Oller says.
However, there is a way to really set off a robot and a man, if the researchers want it. In addition to physical interaction, a strategy is needed to play Jangu, extracting a suitable block so that it is harder for an opponent to pull out the next block without dropping the tower.
So far, the team is not so interested in the creation of a robot, winning in the "Jangu", it is more busy applying its new skills in other areas.
“There are many tasks that we do with our hands, where the feeling of“ correct execution ”can be expressed in the language of strength and tactile prompts,” says Rodriguez. “For such tasks, an approach similar to ours may be useful.”