Transactional markets (e.g. a stock exchange like the NYSE) are systems implementing simple rules (e.g. a central limit order book) in order for market participants to either buy or sell goods or services. The price of a product on the market will therefore be governed by the simple law of supply and demand. Institutions or individuals engage in exchanges for various reasons from commodity procurement to pure financial speculation. In any case, these are agents acting in an environment (the market and the other actors), willing to optimise a given quantity (maximum profit, minimum volatility, etc.).
Actor participants can either be humans or computers. It is estimated that computers already account for more than half of the transactions in many financial markets. Current algorithms trading on markets are usually hand-crafted, relying on a lot of assumptions about the underlying statistics of the market. They are efficient in the very specific task they have been designed for but cannot be extended to a different market, a different frequency range or a different objective. Another open question about algorithmic trading is: How often does a model need to be re-calibrated?
At Prediction Machines, we are applying recent advances in deep reinforcement learning to build adaptive autonomous agents able to act without (or with very limited) human intervention on any transactional markets.
Reinforcement learning is not a new concept but resurfaced recently due to a breakthrough by Deepmind in their quest for artificial general intelligence (AGI). They combined advances in convolutional neural networks (CNN) and reinforcement learning (more specifically Q-learning) to teach a computer how to play Atari games solely from the pixel input, reaching better than human performance for a broad range of games. They brought three new conceptual ideas to the table, namely:
Instead of handcrafting some features that will help the algorithm to decide what to do, Deepmind’s approach was to put the human and the machine on par by feeding the (convolutional) neural network raw image input.
The idea is not to train the network on the current transition, but on a random transition (or batch thereof) selected at random from the memory. This solves some well-known problems of nonlinear instability in reinforcement learning.
Target and value networks
Instead of having only one network for the learning and the target (see the Bellman equation), we use a separate network for the target, therefore stabilising the training process.
Since then some additions have proven to improve the original algorithm. We list the most important one here, along with the intuition and parameters behind it.
Prioritised Experience Replay:
Samples unexpected transitions more often while accounting for the bias in learning by adjusting the weights in the loss function.
Prevents the overestimation of the Q value by using both the value and target models in the evaluation of the target value.
Estimates the value and the advantage function independently by forking the network into two independent branches before summing to get the Q value.
Q-learning is not the only approach to reinforcement learning, and it’s worth mentioning the recent progress in Policy Gradient (PG) methods.
We start by building an automated agent able to trade a single price signal. By solving the problem in a very simple case where the price follows a simple pattern, we will then be in a good position to then increase the signal complexity until it becomes market realistic. This approach replaces the classical approach of signal extraction, price prediction and strategy design used in current algorithmic trading systems.
In order to train our agent, we cannot start on real market data, hence the approach of solving easy “games” first and then building up the complexity, via transfer learning for instance. In the last stage of realistic market data trading, the agent would have to be trained on actual data. Unfortunately, deep learning algorithms typically require a vast amount of data to be able to learn, which would amount to thousands of years of market data. Our approach consists in generating synthetic market data using recent advances in generative machine learning, specifically generative adversarial networks aka GANs.
We created an environment able to simulate very naive price movements, but flexible enough to be able to reproduce market realistic data. The following js applet gives an idea of the environment we created (see our previous blog post for more).
As a starting point, we consider the case where the price is varying following a deterministic triangular pattern (level 1). The state which is fed to our agent is simply the composed of the current price. The agent can interact with the environment with three actions: hold, buy or sell. The reward scheme consists of the realised profit (only measured when a position is closed: for example buying after selling) and a trading penalty for buy and sell actions. The realised profit is the difference between the exit and first entry prices, with the corresponding sign depending if the sequence is buy-sell or sell-buy.
The trading penalty promotes closing a position rather than continuing to buy or sell. In this game, we can only buy or sell a fixed quantity and the game stops when the position is closed. Any trade which is preceded by a similar one (e.g. two subsequent buy) will only lead to a trading penalty and no increased profit.
The best trade in our environment is to buy low (at -1) and sell high (at +1) (regardless of the order) to get a profit of 2 (if no trading penalty is considered).
The agent learns the optimal strategy in around 1000 games.
Step-up the game complexity
Synthetic market generation
Transfer learning between different “levels” of the simulation
If you are interested in applying recent AI concepts to markets, don’t hesitate to get in touch.
Deep Reinforcement Learning: An Overview
Human-level control through deep reinforcement learning
Prioritized Experience Replay
Dueling Network Architectures for Deep Reinforcement Learning
Deep Reinforcement Learning with Double Q-learning
How transferable are features in deep neural networks?
Generative Adversarial Nets
Demystifying Deep Reinforcement Learning
Reinforcement Learning and DQN, learning to play from pixels
Double Learning and Prioritized Experience Replay
Prioritized Experience Replay Kills on Doom
Deep Reinforcement Learning: Pong from Pixels
What is difference between DQN and Policy Gradient methods?