Module 11: Learning-Based Control
Introduction
Learning enables robots to acquire skills that are difficult to program explicitly. This module covers reinforcement learning, imitation learning, and their application to robotic control.
Section 1: Reinforcement Learning Foundations
1.1 Markov Decision Processes
Markov Decision Process (MDP): A mathematical framework for sequential decision-making defined by states, actions, transitions, and rewards.
1.2 Policy Gradient Methods
def policy_gradient_update(policy, trajectories):
loss = 0
for traj in trajectories:
returns = compute_returns(traj.rewards)
for t, (s, a, r, R) in enumerate(zip(...)):
log_prob = policy.log_prob(s, a)
loss -= log_prob * R # REINFORCE
loss.backward()
optimizer.step()
Section 2: Deep RL Algorithms
2.1 PPO (Proximal Policy Optimization)
Stable policy updates through clipping:
2.2 SAC (Soft Actor-Critic)
Maximum entropy RL for exploration.
RL requires many samples. Simulation training is typically necessary before real-world deployment.
Section 3: Imitation Learning
3.1 Behavioral Cloning
Supervised learning from demonstrations:
def behavioral_cloning(demonstrations):
model = PolicyNetwork()
for epoch in range(epochs):
for state, action in demonstrations:
pred_action = model(state)
loss = mse_loss(pred_action, action)
loss.backward()
optimizer.step()
return model
3.2 DAgger
Dataset Aggregation for correcting distribution shift.
Section 4: Reward Engineering
4.1 Reward Design
Challenges:
- Sparse rewards (hard to learn)
- Dense rewards (reward hacking)
- Multi-objective tradeoffs
4.2 Learning from Preferences
Using human feedback to shape rewards.
Summary
Key takeaways:
- RL enables skill acquisition through trial and error
- PPO and SAC are practical algorithms for robotics
- Imitation learning leverages human expertise
- Reward design significantly impacts learning outcomes
Key Concepts
- Policy Gradient: Learning by gradient ascent on expected return
- Actor-Critic: Combining policy and value function learning
- Imitation Learning: Learning from demonstrations
- Reward Shaping: Designing rewards for desired behavior
Further Reading
- Sutton, R.S. & Barto, A.G. (2018). "Reinforcement Learning: An Introduction"
- Levine, S. et al. (2016). "End-to-End Training of Deep Visuomotor Policies"