Next step

Loss of example

What was the definition of entropy?

What was the motivation behind KL divergence?

on policy vs off policy

Stable baselines and how can we use this?

PPO

Different types of RL algorithms

Chat GPT RL