Deriving the Policy Gradient