Typically, we cannot predict a meaningful portion of daily or higher-frequency market returns. A more realistic approach is classifying the state of the market for a particular day or hour. A powerful tool for this purpose is artificial neural networks. This is a popular machine learning method that consists of layers of data-processing units, connections between them and the application of weights and biases that are estimated based on training data. Classification with neural networks is suitable for complex structures and large numbers of data points. A simple idea for a neural network approach to financial markets is to use combinations of price trends as features and deploy them to classify the market into simple buy, sell or neutral labels and to estimate the probability of each class at each point in time. This approach can, in principle, be extended to include trading volumes, economic data or sentiment indicators.
The below is a summary based on [a] some introductory papers of neural networks (links next to quotes) and [b] the following article that outlines an idea of how to use neural networks for market classification:
Balcerak, Michal and Thomas Schmelzer (2020), “Constructing trading strategy ensembles by classifying market states”.
The post ties up with this site’s summary on quantitative methods for macro information efficiency.
What are artificial neural networks?
“Artificial neural networks are a form of machine-learning algorithm with a structure roughly based on that of the human brain. Like other kinds of machine-learning algorithms, they can solve problems through trial and error without being explicitly programmed with rules to follow.” [Physics World]
“A distinguishing feature of neural networks is that knowledge of its domain is distributed throughout the network itself rather than being explicitly written into the program. This knowledge is modelled as the connections between the processing elements (artificial neurons) and the adaptive weights of each of these connections. The network then learns through exposure to various situations…by adjusting the weight of the connections between the communicating neurons grouped into layers.” [Zwass]
“Artificial neural networks are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold…Weights help determine the importance of any given variable, with larger ones contributing more significantly to the output compared to other inputs…If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network.” [IBM]
“Each layer’s output is simultaneously the subsequent layer’s input, starting from an initial input layer receiving your data.
Layers are made of nodes. A node is just a place where computation happens…A node combines input from the data with a set of coefficients, or weights, that either amplify or dampen that input, thereby assigning significance to inputs with regard to the task the algorithm is trying to learn…These input-weight products are summed and then the sum is passed through a node’s so-called activation function, to determine whether and to what extent that signal should progress further through the network to affect the ultimate outcome, say, an act of classification.” [pathmind]
“Deep learning is the name we use for ‘stacked neural networks’; that is, networks composed of several layers…Deep-learning networks are distinguished from the more commonplace single-hidden-layer neural networks by their depth; that is, the number of node layers through which data must pass in a multistep process of pattern recognition…More than three layers (including input and output) qualifies as deep learning.” [pathmind]
“In a neural network, changing the weight of any one connection (or the bias of a neuron) has a reverberating effect across all the other neurons and their activations in the subsequent layers. That’s because each neuron in a neural network is like its own little model…Each hidden layer of a neural network is basically a stack of models whose outputs feed into even more models further downstream.” [Yiu]
“We want to find the set of weights (each connecting line between any two elements in a neural network houses a weight) and biases (each neuron houses a bias) that minimize our cost function — where the cost function is an approximation of how wrong our predictions are relative to the target outcome… We have a cost function to minimize…To use gradient descent, we need to know the gradient of our cost function, the vector that points in the direction of greatest steepness…We want to repeatedly take steps in the opposite direction of the gradient to eventually arrive at the minimum.” [Yiu]
“The goal of backpropagation is to compute the partial derivatives of the cost function with respect to any weight or bias in the network…For backpropagation to work we need to make two main assumptions about the form of the cost function…The first assumption we need is that the cost function can be written as an average over cost functions for individual training examples…The second assumption we make about the cost is that it can be written as a function of the outputs from the neural network.” [Nielsen]
For the basics of neural network mathematics and backpropagation see Josh Starmer’s sequence of videos starting with:
How to use neural networks for classification?
“Neural networks rely on training data to learn and improve their accuracy over time. However, once these learning algorithms are fine-tuned for accuracy, they are powerful tools in computer science and artificial intelligence, allowing us to classify and cluster data at a high velocity.” [IBM]
“Neural networks help us cluster and classify. You can think of them as a clustering and classification layer on top of the data you store and manage. They help to group unlabeled data according to similarities among the example inputs, and they classify data when they have a labeled dataset to train on [N.B: labels are simply the categories of interest, such as rising or falling market price trends].
- Clustering or grouping is the detection of similarities. Deep learning does not require labels to detect similarities. Learning without labels is called unsupervised learning.
- All classification tasks depend upon labeled datasets; that is, humans must transfer their knowledge to the dataset in order for a neural network to learn the correlation between labels and data. This is known as supervised learning…Any labels that humans can generate, any outcomes that you care about and which correlate to data, can be used to train a neural network.” [pathmind, https://wiki.pathmind.com/neural-network]
“Using neural networks to predict financial time series data is today widely regarded as the old unfulfilled dream of quantitative finance.” [Balcerak and Schmelzer]
How to use neural networks for classifying markets: an example
“Rather than directly predicting future prices or returns, we follow a more recent trend in asset management and classify the state of a market based on labels. We use numerous standard labels [categories or classes of market states] and even construct our own ones…For each label we use a specific neural network to classify the state using the market features from our feature space… [mainly price dynamics]. Each classifier gives a probability to buy or to sell and combining all their recommendations.” [Balcerak and Schmelzer]
“Rather than using [a wide range of] prices, we reduce the dimensionality of the problem by using [much fewer] features based on the very same [range of prices] i.e. an optimal combination of moving averages.” [Balcerak and Schmelzer]
“We do not stop by only modifying the input – we also alter the goals of our predictions. Rather than aiming for a (noisy) price trajectory we ask simpler questions more suitable for the machinery of machine learning. Our goal is to quantify the probability of a market being in a class or category or moving into one within the next hours or minutes. This could be the probability for a trend reversion or a spike in volatility or volume.”
“We describe a market by a time series of datapoints…Rather than aiming for the next price, we argue that the market is currently in a particular label-class which we ultimately want to identify without using any unseen future data…We distinguish three such label-classes:
- The market may start or continue to rise over the next few periods
- The market may drop over the next few periods, the volume may drop significantly or there is a spike in volatility.
- We do nothing.” [Balcerak and Schmelzer]
“So given a historic time series with all its price jumps and chaotic behaviour we reduce it to a time series just oscillating between three label-classes. The idea is to approximate the labels with market features (i.e. technical trading indicators) that do not use any future data. Once in live trading, we can live update the indicators and therefore talk about label-classes predictions.…Although it would be possible to have labels based on all sorts of financial data, e.g. volume, we use here exclusively labels based on price data…We introduce a threshold …The threshold is often made dynamic using estimates for the current volatility.” [Balcerak and Schmelzer]
“We call the process of classifying over time the labeling of a time series. So the particular label is a time series mapping [of data points] to one of the three classes. For each label we ask for an optimal set of m features to approximate them. These features, through a classifier, induce a probability for the market to be in a particular label-class. We then ask for an optimal linear combination of those probabilities to execute trades… It is essential that features used in the market representation will differ in values if they encounter different classes of our label of choice.” [Balcerak and Schmelzer]
“Because of a high number of datapoints in our training dataset and the requested non-linear behaviour we have decided to use a neural network classifier and a supervised learning algorithm. For hyper-parameter optimisation we used…Bayesian Optimization and HyperBand.” [Balcerak and Schmelzer]
“The loss calculator…is built based on a concept called loss scaling which scales loss based on continuous labels. The central idea is to make class0 (buy) and class2 (sell) prediction accuracy more significant than class1 (do nothing) in the feedback loop to the label classifier during the training.” [Balcerak and Schmelzer]