Home » Research Blog » How random forests can improve macro trading signals

How random forests can improve macro trading signals

Jupyter Notebook

Random forest regression combines the discovery of complex predictive relations with efficient management of the “bias-variance trade-off” of machine learning. The method is suitable for constructing macro trading signals with statistical learning, particularly when relations between macro factors and market returns are multi-faceted or non-monotonic and do not have clear theoretical priors to go on. This post shows how random forest regression can be used in a statistical learning pipeline for macro trading signals that chooses and optimizes models sequentially over time. For cross-sector equity allocation using a set of over 50 conceptual macro factors, regression trees have delivered signals with significant predictive power and economic value. Value generation has been higher and less seasonal than for statistical learning with linear regression models.

Please quote as “Gholkar, Rushil and Sueppel, Ralph, “How random forests can improve macro trading signals,” Macrosynergy research post, November 2024.

A Jupyter notebook for audit and replication of the research results can be downloaded here. The notebook operation requires access to J.P. Morgan DataQuery to download data from JPMaQS. Everyone with DataQuery access can download data, except for the last 6 months. Moreover, J.P. Morgan offers free trials on the full dataset for institutional clients. An academic research support program sponsors data sets for relevant projects.
This post ties in with this site’s summary of “Quantitative Methods For Macro Information Efficiency”.

The basics of random forest regression

Random forest regression is an ensemble machine learning model that combines multiple regression trees to predict a continuous target variable. A single regression tree is a decision tree model used to predict continuous outcomes by recursively splitting the data into subsets.

Unlike ordinary least squares, regression trees can model non-linear and non-monotonic relationships. They are also more robust to outliers and accommodate missing data. And since they are computationally efficient, they work well with large datasets. The drawback of decision trees is their tendency to overfit. In machine learning lingo, a regression tree typically has little bias, i.e., accurately captures the patterns of the training data, but large variance, i.e., is sensitive to small changes in the training data and, hence, struggles to generalize to new unseen data. For a brief, intuitive summary of regression trees, see Josh Starmer’s introduction video here.

  • A regression tree is built by sequentially dividing the dataset based on feature thresholds that minimize prediction errors within each subset, also called “nodes”. The quality of nodes can then be measured by “variance reduction”, i.e., the decline sum of squared errors of subset predictions versus full-set predictions. The nodes are chosen by evaluating the relation between various thresholds for a set of features and the resultant sum of squared errors, i.e., the squared differences from the averages in the sub-samples. The “best” node is the feature threshold with the largest variance reduction. The splitting then continues for the data subsets, dividing full feature space into smaller and smaller subspaces. The smallest sub-spaces created by the tree are called “leaves”. A leaf contains the average value of an appropriately sized statistical neighbourhood in the k-dimensional feature space. Out of the sample, the neighbourhood of a new feature set determines its predicted target value.
  • Since full-grown regression trees perfectly fit the training data, they don’t usually generalize well to unseen data. Hence, a critical task is to avert overfitting. The key weapon of combat is to merge leaves, i.e., use higher ultimate feature subspaces. This is called “pruning the regression tree”. There are many methods to this end, which can broadly be divided into pre-pruning, i.e., stopping tree growth during construction, for example, by setting maximum tree depth or minimum sizes of split samples, and post-pruning, such as cost complexity pruning.

A random forest is an ensemble learning method that combines multiple decision trees related to the same problem to avert overfitting and improve predictive accuracy and robustness. Two techniques foster the diversity of trees.

  • Each tree is trained independently on a different “bootstrap sample”. The sets are created by randomly drawing samples (with replacement) from an original dataset.
  • Each node in a tree is only considered a random subset of features for splitting the data.

For a full random forest, predictions for a specific feature set are obtained by averaging the predictions of each regression tree. The special qualities of random forest regression are its ability to manage the bias-variance trade-off to capture the complex, hierarchical relationships between a dependent variable and a collection of independent variables. By growing these trees relatively “deep”, they become low bias-high variance models. However, averaging the predictions over sufficiently diverse trees cuts back the variance. Since mean values are less vulnerable to noise or outliers, they are more robust predictors that generalize better to unseen data than a single tree. For a brief, intuitive summary of random forests, see Josh Starmer’s introduction video here.

Using random forests in statistical learning for macro signals

The main purpose of this post is to explore the benefits of random forest regression for calculation of macro trading signals with statistical learning vis-à-vis linear regression techniques. Generally, statistical learning offers methods for sequentially choosing the best model class and other hyperparameters for signal generation, thus supporting realistic backtests and automated operation of strategies (view post here). Data used as macro trading signals are often time series panels featuring various types of market data and point-in-time macroeconomic information states (macro-quantamental indicators) across a range of countries. A statistical learning process in Python with the scikit-learn package typically takes five steps:

  1. Collect features and lagged target returns in a double-indexed pandas data frame, where the row indices mark countries and time.
  2. Define hyperparameter grids for a scikit-learn pipeline, i.e., dictionaries of structural proprieties of models, from which the process will choose the “best model” at any given point in time through cross-validation.
  3. Choose a (cross-)validation splitter that forms multiple cohesive training and test samples and criteria according to which the validation process evaluates models.
  4. Perform sequential model selection across time with related parameter estimation and signal generation. This means that cross-validation is performed on expanding or rolling samples, respecting the panel structure of the data. This special application of scikit-learn to panels can be managed with the SignalOptimizer class of the Macrosynergy package.
  5. Evaluate the sequentially optimized signals in terms of predictive power, accuracy, and naïve PnL generation (view post here).

Statistical learning for macro signals is often based on regression. The process optimizes regression model hyperparameters and parameters sequentially to derive predictions based on whatever model has fared best in cross-validation up to a specific point in time. These predictions then become the basis for trading signals. We have discussed the benefits and drawbacks of various linear regression models in a previous article (view post here). Regression-based learning has shown its ability to combine various macro factors into composite signals in the FX space (view post here) and in the equity space (view post here). It can often be enhanced by using principal components analysis on a set of trading factor candidates (view post here) and by adjusting regression-based trading signals for reliability (view post here).

Random forests are yet another method to improve the generation of trading signals in regression-based statistical learning. The main apparent advantage of random forest regression over OLS is the detection of complex non-linear relations. For example, business sentiment may usually be a positive predictor of equity returns, but extreme values may also signal exuberance and setback risk. Linear regression cannot accommodate such effects. Other advantages of random forest regression are that (a) there is no need for theoretical priors in model formation, and (b) backtests are typically less sensitive to the parameters of the statistical learning pipeline and, hence, more reliable.

Altogether, this suggests that random forest regression should work well for predictions that involve many features, potentially complex relations, and few clear theoretical priors. This is why, for this post, we chose to apply the method to the example case of macro trading signals for cross-sector equity trading, an area where theoretical research and clear intuition are scarce.

Random forest regression can be executed in Python with the RandomForestRegressor class of scikit-learn. It sets the number of trees in a forest, applies a criterion that governs the quality of data splits and provides various options to manage (a) the depth of the tree, (b) the size of bootstrap samples, and (c) the diversity of the nodes of the trees.

Example application: equity sector allocation and a macro “kitchen sink”

We apply random forest regression to the prediction of relative sectoral equity returns based on a broad set of macro-quantamental categories. The data set has been introduced in a previous post (Statistical learning for sectoral equity allocation) that demonstrated statistical learning with linear regression. In that post, we investigated the predictive power of a broad range of macro factors for the relative performance of 11 major equity sectors in 12 developed countries over an almost 25-year period since 2000.

  • Macro predictors or factors here are macro-quantamental indicators, i.e., states and trends of the economy in a point-in-time format. These are available from the J.P. Morgan Macrosynergy Quantamental System(“JPMaQS”). Unlike regular economic time series, their values are based solely on information available at the time of record, i.e., original or estimated data vintages. We consider 56 quantamental categories, i.e., indicator panels for 12 countries. These are all categories that were identified in past research as predictors for at least one sector’s relative returns (Macro factors and sectoral equity allocation). However, for the purpose of machine learning, all categories are considered for all sectors, subject to some availability constraints, which is why they have the characteristic of a “kitchen sink” with a minimum of plausible pre-selection. For a list and grouping of the categories, see Annex 1 below.
  • The targets of this strategy are relative equity sector returns, i.e., volatility-targeted returns of one sector versus an all-sectors average of these returns. These can be calculated based on JPMaQS’ sectoral equity index returns for the 11 standard sectors of the “Global Industry Classification Standards” or GICS: Energy (ENG), materials (MAT), industrials (IND), consumer discretionary (COD), consumer staples (COS), health care (HLC), financials (FIN), information technology (ITE), communication services (CSR), utilities (UTL), and real estate (REL). For a brief characterization of each of these sectors, see Annex 2 

One common learning pipeline is applied to predicting and trading relative returns of all 11 sectors, each for the panel of 12 countries. This pipeline is used to generate optimal random forests sequentially, based on data panels that expand over time at a monthly frequency. These expanding panels serve as development data sets (joint training and validation sets) for model choice and, once the optimal forest model has been determined, as a sample for parameter estimation and signal generation for the next month. All this is implemented by the SignalOptimizer of the Macrosynergy package. The actual optimization is handled by the calculate_predictions method, which uses model validation rules and a grid of hyperparameters to learn and derive signals sequentially.

  • Validation rules: The training-validation set splitter that is applied to each development data set, i.e., each of the expanding data panels, is a special version of scikit-learn’s TimeSeriesSplit that was adapted for panels: the RecencyKFoldPanelSplit It is a K-Fold walk-forward cross-validator, but with test folds concentrated on the most recent information. Specifically, here, the validation set is the last 6 months of data, and the training data is the full previous data history. Unlike in previous posts, we use a single validation set instead of cross-validation to contain estimation times. The criterion for forest model evaluation in cross-validation is panel significance probability, which was explained in the article “Testing macro trading factors”. It can be implemented through the panel_significance_probability function of the Macrosynergy package.
  • Hyperparameters: We will build forests of 100 trees each. The hyperparameter grid of the pipeline mainly manages two aspects of their creation.
    • Depth of individual trees: The greater the trees, the more each one fits its data set and the greater the variance vis-à-vis the bias of the process. The RandomForestRegressor class offers various options to determine depth. In the present example, we use the `min_samples_leaf` argument. It sets a minimum requirement for sample sizes for each node. A split point at any depth will only be considered if it leaves sufficient training samples in both subsets. Setting this parameter greater than one adds regularization to a decision tree. Here, we offer minimum requirements of 1, 3, 6, and 9 observations for the learning process.
    • Diversity of trees within a forest: The trade-off is between better-fitted and more diverse trees. Diversity is managed by two hyperparameters. The `max_samples` parameter determines the number or share of observations drawn from the full dataset for each bootstrap sample. Here, we give the learning process a choice between 10% and 25% of the original sample size. Small bootstrap datasets encourage diversity because the trees see different perspectives of the input data. It is also a form of regularization because it restricts tree growth indirectly. The `max_features` parameter governs the maximum number of features that are considered for the splits at each node. The more features are considered at each split, the greater the accuracy of the individual trees and the lower the diversity across trees. Here, we allow the parameter to be set at 50% or at the square root of the number of available features, a standard setting (which implies about 14% of the features in the present case).

The hyperparameter grid is small for this example case. Small grids save calculation time and contain the effect of model instability on simulated performance statistics. A great number of hyperparameters may better adapt the process to past experiences but also lead to more frequent and greater changes in the optimal random forest regressor. Since model change has no plausible relation to changes in market conditions, it just adds irrelevant variance to the signal time series. Moreover, most of the power in random forests arises from the tree aggregation procedure. This means that, unlike many other models, intense hyperparameter grids are not needed. While setting all hyperparameters to their theoretical optimal values is desirable, the main hyperparameters of interest for the random forest are a small subset of those available.

The charts below illustrate the ten most important predictive macro-quantamental categories according to the random forest models for three example sectors and their average contributions each year. The basis for both ranking and feature attribution are impurity-based feature importances as represented by the scikit-learn `feature_importances_` attribute. The higher the value, the more important the feature. Importance means a reduction of the evaluation criterion that was accomplished by that feature. For all sectors, the set of most important features has been diverse in concepts and even in impact, illustrating a key strength of random forests.

The predictive power and economic value of random forest signals

We can test the predictive power of the random forest-based macro signals based on the Macrosynergy panel test for each of the 11 sectors’ relative returns versus the returns on an all-sectors basket. The test has been explained in a previous post. It estimates the probability that the relationship between the variation in the signals (across time and countries) and subsequent returns has been non-accidental. It is based on panel regression models with period-specific random effects, adjusting targets and features of the predictive regression for common (global) influences. Random forest signals displayed positive predictive relations with subsequent monthly relative returns for all equity sectors since 2003. For six of them, the probability of significance of the relation was 90% or higher. Only the energy sector showed a probability of less than 50%.

To assess the economic value of the random forest signals, we simulate simple “naive” PnLs for each sector’s relative positions. They assume that positions are taken in accordance with normalized signals and regular rebalancing at the beginning of each month, according to information at the end of the previous month. We also assume a 1-day slippage for trading. The signals are winsorized, i.e., capped and floored, at a maximum of 2 standard deviations as a reasonable risk limit. Otherwise, the naïve PnLs do not consider any risk management rules or transaction costs. They are thus not realistic backtests of actual financial returns in a specific institutional setting but an objective and undistorted representation of the historic economic value of the signals.

As a rough proxy for an all-sectors PnL, we use a simple average of all sectoral PnLs after volatility scaling. Again, this typically overstates the actual risk parity achievable in live trading but gives an unbiased estimate of the value that would have been generated if all sectoral signals had been weighted equally over time.

The long-term Sharpe ratio (since 2003) over the all-sectors relative value PnL has been 1.3, and the Sortino ratio is 1.8. The correlation of all sectoral strategies with the S&P500 returns has been near zero. The strategy has been seasonal in value generation but posted a positive drift most of the time. The PnL contribution of the 5% best months has been less than 50%. The maximum peak-to-trough drawdown has slightly exceeded the average annual return.

The performance ratios of random forest signals have been higher than those of an analogous strategy that uses the very same data but linear regression-based learning (view post here).  Risk-adjusted returns have been about 30% higher, and seasonality has been less pronounced. Across both methods, the consistent value generation across different learning processes attests to the relevance of the underlying macro-quantamental data set.

Across sectors, Sharpe ratios of RV PnLs ranged between 0.2 (consumer staples) and 0.8 (real estate). Individual sector performances have naturally been a lot more seasonal, but most displayed a positive long-term drift, and all ended up with a positive cumulative PnL, testifying to the power of the approach.

Annex 1: Macro factors

For the prediction of sectoral equity returns, we considered a set of macro-quantamental categories to express or calculate simple equity factors with neutral zero values in accordance with a previous post (Macro factors and sectoral equity allocation). In the table below, they are grouped into 12 broad economic concepts. The geography columns distinguish categories that are calculated for each country based on the local data alone (“local”), those that are a weighted average of local and global values, whereby the international is assumed to be in accordance with the share of external trade flows in GDP (“weighted global”), and those that naturally only have global values (“global”), such as commodity inventory scores.

Annex 2: Equity sectors

The analysis in this post refers to the following equity 11 sectors by the “Global Industry Classification Standards” (GICS) developed in 1999 jointly by MSCI and Standard & Poor’s. The purpose of the GICS is to help asset managers classify companies and benchmark individual company performances:

The analysis in this post refers to the following equity 11 sectors by the “Global Industry Classification Standards” (GICS) developed in 1999 jointly by MSCI and Standard & Poor’s. The purpose of the GICS is to help asset managers classify companies and benchmark individual company performances:

  • Energy: The sector comprises companies that support the production and transformation of energy. There are two types. The first type focuses on exploring, producing, refining, marketing, and storing oil, gas, and consumable fuels. The second type provides equipment and services for the oil and gas industries, including drilling, well services, and related equipment manufacturing.
  • Materials: The sector encompasses a wide range of companies engaged in discovering, developing, and processing raw materials. These include chemicals, construction materials (such as cement and bricks), container and packaging materials (such as plastic and glass), base and precious metals, industrial minerals, paper, and other forest products.
  • Industrials: The sector contains a broad range of companies involved in producing goods used in construction and manufacturing (capital goods) as well as providing commercial services and transportation. The area of capital goods includes aerospace and defence, building products, construction and engineering, electrical equipment, industrial conglomerates, and machinery. The commercial services sub-sectors include waste management, office supplies, security services, and professional services (consulting, staffing, and research). The transportation area includes air freight and logistics, airlines, marine transportation, road and rail transportation, and transportation infrastructure companies.
  • Consumer discretionary: This sector comprises companies producing consumer goods and services considered non-essential but desirable when disposable income is sufficient. The main areas are automobiles, consumer durables, apparel, consumer services (such as hotels and restaurants), and various retail businesses.
  • Consumer staples: This sector includes companies that produce and distribute presumed essential consumer products that households purchase regardless of economic conditions. These products mainly include food, beverages, household goods, and personal care items.
  • Health care: The sector includes companies that provide medical services, manufacture medical equipment, or produce drugs. It has two main areas. The first features health care equipment and services. It includes manufacturers of medical products and supplies, providers of health care services (such as hospitals and nursing homes), and companies that provide technology services (such as electronic health records). The second area features research, development, and production of pharmaceuticals, biotechnology, and life sciences tools.
  • Financials: This sector provides financial services, including banking, investment services, insurance, and financial technology (fintech). The four main subsectors are banks, diversified financials (such as asset management, credit cards, and financial exchanges), insurance, and investment trusts.
  • Information technology: This sector includes companies that produce software, hardware, and semiconductors, as well as those that provide IT services, internet services, and interactive media. Software companies produce application software and systems software. Hardware companies provide computers, networking equipment, and consumer electronics. The semiconductor sector manufactures semiconductors and the equipment used for producing the former. IT services include consulting, data processing and outsourced services. Internet services encompass cloud computing, web hosting and data centres. Interactive media include digital platforms, such as Google and Facebook.
  • Communication services: This sector features companies that broadly provide communication services and entertainment content. It contains two main areas. The first is telecommunication services, which provide the means for telecommunication, including traditional fixed-line telephone services, broadband internet services, and wireless telecommunication services. The second area is media and entertainment, which focuses on the creation and distribution of content for broadcasting, home entertainment, movies, music, video games, social media platforms, search engines, and so forth.
  • Utilities: This sector includes companies that provide essential utility services such as electricity and water. Their activities include generation, transmission, and distribution, and they are typically subject to tight regulations. Standard classification distinguishes five types of utilities: electric utilities, gas utilities, water utilities, multi-utilities, and independent power and renewable electricity producers.
  • Real estate: This sector focuses on real estate development and operation. It encompasses property ownership, development, management, and leasing. It also includes Real Estate Investment Trusts (REITs) that invest in various property types.
Share

Related articles