Use human and animal movements to teach robots to dribble a ball and simulated humanoid characters to carry boxes and play soccer.
Five years ago, we took on the challenge of teaching a fully articulated humanoid character to navigate obstacle courses. This demonstrated what reinforcement learning (RL) can achieve through trial and error, but also highlighted two challenges in solving embodied intelligence:
- Reuse previously learned behaviors: A significant amount of data was required for the agent to “start”. Without any initial knowledge of the force to be applied to each of his joints, the officer began with random body jerks and quickly fell to the ground. This problem could be mitigated by reusing previously learned behaviors.
- Idiosyncratic behaviors: When the agent finally learned to navigate obstacle courses, he did so with a natural (although fun) motion patterns that would be impractical for applications such as robotics.
Here, we describe a solution to two challenges called neural probabilistic motor primitives (NPMPs), involving guided learning with human- and animal-derived movement models, and discuss how this approach is used in our humanoid football paper, published today in Science Robotics.
We also discuss how this same approach enables full-body humanoid manipulation from vision, such as a humanoid carrying an object, and real-world robotic control, such as a robot dribbling a ball.
Distill data into controllable motor primitives using NPMP
An NPMP is a general-purpose engine control module that translates short-horizon engine intentions into low-level command signals, and it is trained offline Where via LAN by mimicking motion capture data (MoCap), recorded with trackers on humans or animals performing movements of interest.
The model has two parts:
- An encoder that takes a future trajectory and compresses it into a motor intent.
- A low-level controller that produces the next action given the agent’s current state and this motor intent.
After training, the low-level controller can be reused to learn new tasks, where a high-level controller is optimized to produce motor intentions directly. This enables efficient exploration – since consistent behaviors are produced, even with randomly sampled motor intentions – and constrains the final solution.
Emerging Team Coordination in Humanoid Football
soccer has been a long-standing challenge for embodied intelligence research, requiring individual skills and coordinated team play. In our latest work, we used an NPMP as a prerequisite to guide motor skill learning.
The result was a squad of players that went from learning ball-chasing skills to learning coordination. Previously, in a study with simple achievements, we showed that coordinated behaviors can emerge in teams competing with each other. NPMP allowed us to observe a similar effect but in a scenario that required much more advanced motor control.
Our agents learned skills such as agile locomotion, passing and division of labor, as evidenced by a range of statistics, including metrics used in real-world sports analytics. Players exhibit both agile high-frequency motor control and long-range decision-making that involves anticipating the behaviors of their teammates, leading to coordinated team play.
Whole body manipulation and cognitive tasks using vision
Learning to interact with objects using the arms is another difficult control challenge. The NPMP can also enable this type of whole-body manipulation. With a small amount of box interaction MoCap data, we are able to train an agent to carry a box from one place to another, using an egocentric view and with only a sparse reward signal:
Similarly, we can teach the agent to catch and throw balls:
Using NPMP we can also tackle maze tasks involving locomotion, perception and memory:
Safe and efficient control of real-world robots
The NPMP can also help control real robots. Having well-regulated behavior is essential for activities like walking on uneven ground or handling fragile objects. Jerky movements can damage the robot itself or its surroundings, or at least drain its battery. Therefore, significant effort is often invested in designing learning goals that force a robot to do what we want it to do while behaving safely and efficiently.
As an alternative, we investigated whether the use a priori derived from biological movement can give us well-regulated, natural-looking, and reusable movement skills for legged robots, such as walking, running, and turning, that are suitable for deployment on real-world robots.
Starting from MoCap data of humans and dogs, we adapted the NPMP approach to train skills and controllers in simulation that can then be deployed on real humanoid (OP3) and quadruped (ANYmal B) robots, respectively. This allowed robots to be directed by a user via a joystick or to dribble a ball to a target location in a natural and robust way.
Advantages of using neural probabilistic motor primitives
In summary, we used the NPMP skill model to learn complex tasks with humanoid characters in simulations and real-world robots. The NPMP bundles low-level motor skills in a reusable way, making it easier to learn useful behaviors that would be difficult to discover through unstructured trial and error. By using motion capture as a source of prior information, he biases the learning of motor control towards that of naturalistic movements.
NPMP allows embodied agents to learn faster using RL; learn more naturalistic behaviors; learn safer, efficient and stable behaviors adapted to real-world robotics; and combine whole-body motor control with longer-term cognitive skills, such as teamwork and coordination.
Learn more about our work: