5 – M3L3 C05 V2
As you’ve learned, we can express the expected return as a probability weighted sum, where we take into account the probability of each possible trajectory and, the return permits trajectory. Our goal is to find the value of theta that maximizes expected return. One way to do that is by Gradient Ascent, where we just … Read more
댓글을 달려면 로그인해야 합니다.