Reinforcement Learning

A good trader is characterized by making good decisions with all the information from the market, and it usually takes years to learn. Can a computer model achieve the same feat, namely, to digest a massive amount of input data, and then make decisions to maximize the future reward? With the recent advances in deep learning, it is now possible to train a neural network based model with reinforcement learning.

Below is a simple example: an agent is moving randomly in a room with walls, with its eyesight pointing ahead of it. There are also red and green items randomly thrown to the floor. If the agent 'eats' a red item, the agent gets a positive reward, and if the agent 'eats' a green item, it gets a negative reward. The overall goal is for the agent to maximize its total reward. At the beginning, the agent doesn't know what policy it should follow to maximize its future reward, but over time, it learns to avoid states that lead to states with low rewards, and picks actions that lead to higher reward instead.

The model is being trained with in the web browser. With current settings, it will take few minutes for the agent to learn the best policy. If you are impatient, you can load the trained agent by clicking the button at the bottom. Refresh the page to reload the 'naive' agent.

This model uses the open source library ConvNetJS.

(Takes ~10 minutes to train with default settings. If you're impatient, scroll down and load an example pre-trained network from pre-filled JSON)

var num_inputs = 27; // 9 eyes, each sees 3 numbers (wall, green, red thing proximity)
var num_actions = 5; // 5 possible angles agent can turn
var temporal_window = 1; // amount of temporal memory. 0 = agent lives in-the-moment :)
var network_size = num_inputs*temporal_window + num_actions*temporal_window + num_inputs;

// the value function network computes a value of taking any of the possible actions
// given an input state. Here we specify one explicitly the hard way
// but user could also equivalently instead use opt.hidden_layer_sizes = [20,20]
// to just insert simple relu hidden layers.
var layer_defs = [];
layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:network_size});
layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'});
layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'});
layer_defs.push({type:'regression', num_neurons:num_actions});

// options for the Temporal Difference learner that trains the above net
// by backpropping the temporal difference learning rule.
var tdtrainer_options = {learning_rate:0.001, momentum:0.0, batch_size:64, l2_decay:0.01};

var opt = {};
opt.temporal_window = temporal_window;
opt.experience_size = 30000;
opt.start_learn_threshold = 1000;
opt.gamma = 0.7;
opt.learning_steps_total = 200000;
opt.learning_steps_burnin = 3000;
opt.epsilon_min = 0.05;
opt.epsilon_test_time = 0.05;
opt.layer_defs = layer_defs;
opt.tdtrainer_options = tdtrainer_options;

var brain = new deepqlearn.Brain(num_inputs, num_actions, opt); // woohoo

We Would Love to Have You Visit Soon!

Hours

Telephone

Email