RL Experiments

V-learning

Idea is to train a model to predict the value of the field V(s) for any field. And then use it to choose the best action at state s by computing V(s_t+1) for any possible s_t+1.

Linear model with handcrafted features

Let's try to approximate V(s) with linear regression. V(s) = w_1 * f_1(s) + w_2 * f_2(s) + ... + w_k * f_k(s)
where f_i(s) is some fixed feature of game state s.

In my experiments, I used binary features from move pure cell positions and 2x1 and 2x2 convolutions with a fixed kernel that outputted 1 on a specific configuration of cells. Total of ~2350 features.

Temporal difference updates

To learn a linear approximation I sampled self-play games with existing value approximator V and updated weights of V to minimize the difference between V(s_t) and V(s_t+1). Given that (end_game_state) = (+/- 1). To speed up learning I first trained net on endgame positions only to predict +/- 1 and only than sampled games with V.

Trained net for 400 epochs with SGD before resampling games with updated weights. To sample games used e-greedy policy with e=0.1

Training converged is some sense, but net didn't show to learn meaningful weights or play better than simple cell counting strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL Experiments

V-learning

Linear model with handcrafted features

Temporal difference updates

Uh oh!

Uh oh!

Clone this wiki locally