Date and Time
Overview: Temporal-Difference Method 1. Review: DP and MC 2. Temporal-Difference Prediction 3. Temporal-Difference Control 4. Example code Temporal difference (TD) learning is an approach to learning how to predict a quantity that depends on future values of a given signal. The name TD derives from its use of changes, or differences, in predictions over successive time steps to drive the learning process. The prediction at any given time step is updated to bring it closer to the prediction of the same quantity at the next time step. It is a supervised learning process in which the training signal for a prediction is a future prediction. TD algorithms are often used in reinforcement learning to predict a measure of the total amount of reward expected over the future, but they can be used to predict other quantities as well.
권한이 없습니다. 로그인 부탁드립니다. You don't have permission to access. Please login.