£46.95

Stochastic Optimization Methods for Policy Evaluation in Reinforcement Learning (Foundations and Trends® in Optimization)

Price data last checked 56 day(s) ago - refreshing...

View at Amazon

Price History & Forecast

Last 35 days • 35 data points (No recent data available)

Historical
Generating forecast...
£46.95 £38.03 £39.98 £41.92 £43.87 £45.81 £47.76 25 January 2026 02 February 2026 11 February 2026 19 February 2026 28 February 2026

Price Distribution

Price distribution over 35 days • 2 price levels

Days at Price
Current Price
1 day 34 days · current 0 9 17 26 34 £39 £47 Days at Price

Price Analysis

Most common price: £47 (34 days, 97.1%)

Price range: £39 - £47

Price levels: 2 different prices over 35 days

Description

This monograph introduces various value-based approaches for solving the policy evaluation problem in the online reinforcement learning (RL) scenario, which aims to learn the value function associated with a specific policy under a single Markov decision process (MDP). Approaches vary depending on whether they are implemented in an on-policy or off-policy manner. In on-policy settings, where the evaluation of the policy is conducted using data generated from the same policy that is being assessed, classical techniques such as TD(0), TD(λ), and their extensions with function approximation or variance reduction are employed in this setting. For off-policy evaluation, where samples are collected under a different behavior policy, this monograph introduces gradient-based two-timescale algorithms like GTD2, TDC, and variance-reduced TDC. These algorithms are designed to minimize the mean-squared projected Bellman error (MSPBE) as the objective function. This monograph also discusses their finite-sample convergence upper bounds and sample complexity.

Product Specifications

Format
paperback
Domain
Amazon UK
Release Date
15 August 2024
Listed Since
22 July 2024

Barcode

No barcode data available