Function Approximation 的目的是用类似于linear combination或者nn的方法来构造一个计算Q值的funtion。因为许多问题的state和action是无限大的或者是continuous的，无法用一个简单的Q tabel来描述。本节重点DQN。

# Introduction

## Large-Scale Reinforcement Learning

Reinforcement learning can be used to solve large problems, e.g.

• Backgammon: 1020 states
• Computer Go: 10170 states
• Helicopter: continuous state space

# Batch Methods

• Gradient descent is simple and appealing
• But it is not sample efficient
• Batch methods seek to find the best fitting value function
• Given the agent’s experience (“training data”)

## Least Squares Prediction

### DQN

Experience Replay 的好处：

The basic idea is that by storing an agent’s experiences, and then randomly drawing batches of them to train the network, we can more robustly learn to perform well in the task.

Separate Target Network 的原因：

The issue is that at every step of training, the Q-network’s values shift, and if we are using a constantly shifting set of values to adjust our network values, then the value estimations can easily spiral out of control.