Jupyter Notebook of factor calculation Jupyter Notebook of statistical learning
There is sound reason and evidence for the predictive power of macro indicators for relative sectoral equity returns. However, the relations between economic information and equity sector performance can be complex. Considering the broad range of available point-in-time macro-categories that are now available, statistical learning has become a compelling method for discovering macro predictors and supporting prudent and realistic backtests of related strategies. This post shows a simple five-step method to use statistical learning to select and combine macro predictors from a broad set of categories for the 11 major equity sectors in 12 developed countries. The learning process produces signals based on changing models and factors per the statistical evidence. These signals have been positive predictors for relative returns of all sectors versus a broad basket. Combined into a single strategy, these signals create material and uncorrelated investor value through sectoral allocation alone.
The post below is based on Macrosynergy’s proprietary research.
Please quote as “Costa, Michele, Gholkar, Rushil and Sueppel, Ralph, ‘Statistical learning for sectoral equity allocation,’ Macrosynergy research post, September 2024.”
Jupyter notebooks allow audit and replication of the research results. A notebook for factor calculation can be downloaded here. A notebook for subsequent statistical learning can be downloaded here. The notebooks’ operation requires access to J.P. Morgan DataQuery to download data from JPMaQS, a premium service of quantamental indicators. J.P. Morgan offers free trials for institutional clients.
Also, an academic research support program sponsors data sets for relevant projects.
This post ties in with this site’s summary of macro trends and systematic value.
The basic idea
Macroeconomic point-in-time indicators can predict the relative performance of equity sector returns (see previous posts here and here). This reflects that earnings and risk premia across sectors differ in their sensitivities to macroeconomic conditions, such as the state of business cycles, relative price trends, and inflation. Moreover, the absence of published articles on the relation between macroeconomic variables and sector equity returns suggests that this area of research is underdeveloped and that equity markets are far from efficient in using macro information.
In a recent post, we tested the predictive power of pre-selected plausible macro factors for the relative performance of 11 major equity sectors in 12 developed countries over an almost 25-year period since 2000 (view post here). The finding was that conceptual risk parity signals, i.e., simple averages of normalized factors, displayed significant predictive power and applied to simple naïve strategies, sizeable investor value. However, a challenge in finding macro factors for equity returns is the lack of clear theoretical guidance and the complexity of relations between macroeconomic trends and sectoral corporate balance sheets. This may erode confidence in theoretically and empirically valid factors (“fear of false positive”) and cause investors to overlook the value of less obvious but relevant factors (“prevalence of false negative”)
A valid path to mitigating these problems is statistical learning. The basic idea is that rather than pre-selecting a set of plausible macro factors for each sector’s outperformance, we merely select a large range of macro factors that could conceivably exercise predictive power. Then, a learning process is applied sequentially, each month selecting the best prediction method and most relevant macro predictors for each sector’s relative returns. After selecting predictors, various types of regression are considered for aggregating the selected factors into a single sectoral signal. The most successful selection-prediction method based on history to that date is then chosen to make a prediction. There are two major benefits of this learning process.
- Prudent backtests: The peril of ad-hoc predictor pre-selection is a hindsight bias. Theoretical hypotheses often develop based on a practitioner’s historical experience of – worse – are derived by data mining. The learning backtests fully ignore such knowledge. They may be unnecessarily ignorant in the early years of the backtested sample. However, they invite less upside bias and moral hazard, which are typically greater threats to success in live trading.
- Discovery of hidden or subtle factors: Sectoral balance sheets and economic dependencies are very information-intensive, and even researchers with the best domain knowledge may overlook relations. Statistical learning has a better chance of eventually discovering less obvious relations.
The drawback of the method is its waiver of reasonable priors and theory and, hence, the greater risk that temporary episodes dominate predictions, even if the environment has changed. This is particularly important in the macro space, where certain conditions can be prevalent for years or even decades. Thus, in macro-trading practice, the reliance on statistical learning involves a bias-variance trade-off. Statistical learning reduces bias by considering a wide array of factors beyond the scope of personal judgment and conventional wisdom but increases variance, as predictions are more dependent on past experiences than general theory or long-term structural propositions.
The data
Equity sector return data
The strategy targets of this post are cash equity excess returns for the 11 standard sectors of the “Global Industry Classification Standards” or GICS: Energy (ENG), materials (MAT), industrials (IND), consumer discretionary (COD), consumer staples (COS), health care (HLC), financials (FIN), information technology (ITE), communication services (CSR), utilities (UTL), and real estate (REL). For a brief characterization of each of these sectors, see Annex 1 below.
Sectoral cash equity return data have been imported from JPMaQS for 12 economies (view documentation here), which are (alphabetically by currency symbol): Australia (AUD), Canada (CAD), Switzerland (CHF), the euro area (EUR), the UK (GBP), Israel (ILS), Japan (JPY), Norway (NOK), New Zealand (NZD), Sweden (SEK), Singapore (SGD), and the U.S. (USD). The underlying equity return data comes from the J.P. Morgan SIFT database. SIFT stands for Strategic Indices Fundamental Toolkit.
The focus of the below research is on relative volatility-targeted cash equity returns. Relative here means sector return minus the return of the equally weighted basket of all 11 sectors. The volatility-targeted returns are returned on a cash position in an index that is scaled to a 10% volatility target based on the historical standard deviation for an exponential moving average with a half-life of 11 days. Positions are rebalanced at the end of each month.
Macro predictor data
Macro predictors or factors here are macro-quantamental indicators, i.e., states and trends of the economy in a point-in-time format. These are available from the J.P. Morgan Macrosynergy Quantamental System (“JPMaQS”). Unlike regular economic time series, their values are based solely on information available at the time of record, i.e., original or estimated data vintages. Therefore, macro-quantamental indicators can be compared to market price data and are well-suited for backtesting trading ideas and implementing algorithmic strategies.
We consider a broad range of 56 quantamental categories, i.e., indicators that are available for all 12 economies or as a global influence. The quantamental categories can be roughly divided into 12 groups: excess output growth, excess private consumption and retail sales growth, excess export growth, labour market indicators, business survey scores, excess private credit growth, excess broad inflation, excess specific (food and energy CPI inflation), debt and debt servicing ratios, commodity inventory scores, openness–adjusted real appreciation and commodity-based terms of trade, and interest rate and market metrics. This selection is the same as that used for a previous post. It has been left unchanged for consistency and is not exhaustive. Principally, the whole content of JPMaQS would be eligible for factor candidates, but this would make the operation of the related notebook more time-consuming.
The table in Annex 2 summarizes all macro-quantamental categories considered. They come in three geographical flavours. Local categories, such as economic growth, are specific to each of the 12 economies. Global categories, such as commodity inventories, are single communal indicators that are used to predict sectoral equity returns in all countries. Finally, weighted categories are weighted averages of local and global values, whereby international weights are set in accordance with the share of external trade flows to GDP. Which one is appropriate depends on the nature of the sector and macro category. For example, the impact of real interest rates on banks is mostly local, but the impact of industry growth on manufacturers is typically a combination of local and global conditions.
The application of statistical learning
The detection and application of macro factors with statistical learning follows the basics developed in previous posts (view here and here) and proceeds in four steps:
- We prepare suitable panel data sets for statistical learning pipelines with sci-kit learn, weeding out categories with insufficient data and imputing missing data sets in individual cross sections.
- We define a small range of feature selection methods and feature combination methods for the statistical learning process.
- We define suitable cross-validation splitters and performance criteria for the cross-validation of the statistical learning process.
- We operate specialized Python functions based on scikit-learn that manage sequential monthly model selection and signal generation.
- We test the predictive power and economic value of the learning-based signals through standard procedures used in Macrosynergy posts.
Category exclusions and cross-sectional imputations
This step mainly deals with quantamental categories that have insufficient data, i.e., that are not available for most cross sections or have a short history. A standard statistical learning process requires all feature candidates to be available for a country and date to generate a signal. Even a single missing category out of over 50 principally translates into a missing signal. There are two ways to prevent missing data from escalating the loss of historic signal generation:
- Exclusion means that we remove categories that fail criteria on available cross-sections or history. Here, we require categories to be available for at least ten countries (at some point over the sample period) and have a history from 2003.
- Imputation means that under certain conditions, we fill in missing values for an individual cross sections and date with the average indicator value of the available cross sections for that date. At least 40% of the cross-sections must have valid data on any given date for the remaining to be imputed. Also, imputation is disallowed for some categories that are purely idiosyncratic and for which global averages are not meaningful estimates, namely real effective exchange rates and terms of trade changes.
For the below analysis, we also needed to blacklist certain sectors for periods during which they were not tradeable. This is typically the case when a smaller country does not have companies representing the sector. This is accomplished by using so-called blacklist dummies that can be downloaded from JPMaQS (documentation here).
Setting up feature selection and prediction methods
Statistical learning uses scikit-learn functions and specialized wrappers for macro-quantamental data panels. The learning models or pipelines all have two parts: a “selector” that chooses the predictors that deliver the best explanatory power and a “predictor” that delivers regression-based return forecasts based on a combination of chosen predictors.
- The selector method is Least Angle Regression (LARS), an algorithm suitable for high-dimensional data, i.e., datasets with a high number of predictor candidates relative to available observations. LARS operates a particular type of forward stepwise regression. At the outset, the coefficients of all predictor candidates are set to zero. Then it moves the coefficient of the most correlated predictor towards its least-squares value while also considering other variables. LARS changes direction toward the new predictor when another predictor becomes equally correlated with the residuals. This continues until the desired number of predictors with non-zero coefficients has been reached.
For this purpose, the implementation of LARS uses the LarsSelector class of the Macrosynergy package to select categories based on panel regressions. It considers the historic predictive power of predictor panels on target panels rather than just for individual cross-sections. A category needs to have predictive power across the full set of 12 economies or countries to be selected. The learning process decides the optimal number of categories that shall be selected. - The predictor methods are regression types that can be used to combine different quantamental categories into single trading signals. Here, we consider simple linear regression, sign-weighted least squares, and time-weighted least squares. Sign-weighted least squares (SWLS) equalize the contribution of positive and negative samples to the model fit. Time-Weighted Least Squares (TWLS) allow prioritizing more recent information in the model fit by defining a half-life of exponential decay in units of the native dataset frequency. The usage of these methods in the context of statistical learning and their relative strengths and weaknesses have been explained in a previous post (view here). Here we use modified versions of these regression types. They are implemented through the ModifiedLinearRegression, ModifiedSignWeightedLinearRegression, and ModifiedTimeWeightedLinearRegression classes on the Macrosynergy package. The term “modified” here means that predictions use regression coefficients that are adjusted for statistical precision, which tends to increase as sample sizes grow. This technique has been explained in another post (view here).
Setting cross-validation splitters and optimization criterion
Optimal selection and prediction models require cross-validation for expanding samples. To operate such learning, we must set a splitting method for the data panel available at each point in time and a statistical criterion for validation.
- Train-test splitter: Cross-validation compares the predictive quality of the combined selection-prediction models based on multiple splits of the data into training and test sets. Each pair is called a “fold.”. In the case of panel data, these splits must respect the continuity of training and test sets based on a double index of cross-sections and time periods, ensuring that all sets are sub-panels over common adjacent time spans. This was explained in a previous post (see here).
Here, the splitting is governed by the ExpandingKFoldPanelSplit class of the Macrosynergy package. It allows instantiating panel splitters where a fixed number of splits is implemented, but temporally adjacent panel training sets always precede test sets chronologically and where the time span of the training sets increases with the implied date of the train-test split. It is equivalent to scikit-learn’s TimeSeriesSplit but adapted for panels. - Evaluation criterion: The metric that validates the quality of the selection-prediction models with respect to predicting target returns here is the Sharpe ratio of a stylized binary trading strategy. This is implemented by the sharpe_ratio function of the Macrosynergy package. It returns return a Sharpe ratio for a stylized strategy that goes long if the predictions are positive and short if the predictions are negative. It is a bit like an accuracy metric. The advantage of the Sharpe ratio versus a residuals-based criterion, such as R-squared, is that it is closer to the ultimate purpose of the model selection and a fairer basis for comparing different sets of predictors that arise from different selections across folds. If the predictors chosen across folds are unstable then k-fold cross-validation with R-squared is more prone to outliers than the Sharpe ratio of a binary strategy.
Sequential learning and signal generation for each sector
For each of the 11 equity sectors, statistical learning chooses sequentially optimal combined selection-prediction methods and derives related signals. These are weighted averages of the predictors, whose weights are regression coefficients adjusted for statistical precision. Sequentially here means monthly. Simply put, at the end of each month, the optimal selection-prediction method at that time is chosen to produce a signal for each country’s relative sector position at the beginning of the next month. All this is managed by the SignalOptimizer class of the Macrosynergy package.
This means that for each sector, the signal-generating method, the selected predictors, and their weights change. The benefits are that hyperparameter selection is informed by empirical experience and backtests become more objective and reliable. The drawback is that changing models and predictions become a source of instability in signals that are unrelated to markets. The following graphics illustrate the learning process, for example, of the relative value positions of the energy sector over the past 22 years. Although data are mostly available from 2000, the analysis of learning signals typically only begins in 2003 due to meeting minimum data requirements.
The black bars in the selection map below indicate the use of various models and hyperparameter versions over time. Model change has been frequent, typically every 2-6 months. There has been a clear preference for time-weighted least-squares models, suggesting that the relevance of macro factors is seasonal, i.e., factors’ predictive powers change with the economic regime, calling for greater weight of the experiences of recent months or years. The learning process also preferred relatively short lookback windows, with exponential half-lives of 12-24 months. Thus far, the learning process has not zeroed in on a single method but converged on a set of closely related time-weighted least square regressions with intercept, short half-lives and 5-10 pre-selected features.
The selection and weights of macro factors also have not yet converged in a clear and stable fashion. This reflects that the learning process preferred models with fast-decaying lookback windows. Altogether, 44 different features have at some point been selected and applied since 2003, but 10 of them have been dominant. In recent years, the main positive coefficients were given to import and export price growth (which are related to the energy prices in a country). The main negative coefficients apply to excess inventories of crude oil and base metals (which bode for softer energy demand ahead).
The below signal heatmap shows that both the direction of positions and the implied risk-taking have been variable across time. Since signals conceptually apply to relative vol-targeted positions, they can be seen as a rough proxy to signal-related risk of positions. Preference for long or short positions in the energy sector has been strongly correlated across countries, which results from the influence of global factors, such as commodity inventories and prices, and international correlation of many local factors, such as manufacturing business confidence. The intensity of signals depends on the strength and alignment of selected factors.
Signal quality assessment and backtest
We apply two standard checks to assess the quality of the monthly trading signals for relative sector positions. These tests have been explained in greater detail in a previous post.
Predictive power check: The first check tests the significance of the predictive power of end-of-month signals for the sector’s relative returns for the next month. It looks at the statistical significance of the signal coefficient of a panel regression with period-specific random effects (view post here). This type of regression adjusts targets and features of the predictive regression for common (global) influences. It looks at the experiences of all countries while considering common global factors in target returns and features. A relation is significant if (a) signal and subsequent returns are related over time and (b) if the country with the stronger signal relative to the cross-sectional mean tends to experience a higher subsequent return. The stronger the influence of global factors, the greater the weight of deviations from the period-mean in the regression.
The test can be implemented in Python with the CategoryRelations class of the Macrosynergy package. In particular, the reg_scatter method of this class displays scatter plots and regression lines in conjunction with the results of the panel regression test. The chart below illustrates the results for the example energy sector. End-of-month learning-based signals have been positively related to subsequent relative sector returns, with a probability of significance of over 97%.
Economic value check: The second check estimates and plots the long-term cumulative naïve profit and loss of a sector relative-value strategy. Naïve PnLs can be calculated by taking positions in accordance with normalized signals and regular rebalancing at the beginning of each month, in accordance with signals at the end of the previous month, allowing for a 1-day time-lapse for trading. The trading signals here are capped at a maximum of 3 standard deviations as a reasonable risk limit. A naïve PnL does not consider transaction costs or risk management tools. It is thus not a realistic backtest of actual financial returns in a specific institutional setting but an objective and undistorted representation of the economic value of the signals.
The naive PnL can be calculated and plotted using the NaivePnL class of the Macrosynergy package. PnL generation has been positive overall, with a long-term Sharpe ratio of 0.7 and a slightly negative correlation with the S&P500. Value generation has been very seasonal, as is typical for single-principle strategies with correlated positions. In this case, the strategy could only make money in periods when there was actually a sustained positive or negative trend in the relative energy sector performance.
The general predictive power and PnL generation
This section evaluates the predictive power and value generation of a cross-sector relative value strategy that encompasses all 11 major equity sectors. The scatters and panel tests below show consistent positive and (often) significant predictive power of learning-based signals with respect to subsequent relative sectoral returns for the 12-country panels. The probability of significance of the relation has been above 90% for 5 of the 11 sectors and above 80% for nine sectors. Predictive power has been on the low side for the heavily regulated sectors of healthcare and utilities.
A global PnL relative sector strategy has been approximated by a simple unweighted average of RV PnLs for all sectors. This strategy allocates the same risk capital to the signals of all sectoral strategies, recognizing that signals are conceptually comparable and have similar orders of magnitude. This PnL can be interpreted as the value-added of the sector allocation element of a broader equity strategy.
The long-term Sharpe ratio of the global cross-sector strategy has been 1.2, and its Sortino ratio is 1.8. There has been almost no correlation of the PnL with equity benchmark returns. Value generation has been mildly seasonal but not heavily concentrated. The share of the best-performing 5% months in long-term PnL generation has been below 50%, which is modest for macro strategies. There have been no protracted periods of drawdowns. Considering that all aspects of value generation, from the macro indicators to the model choice, are point-in-time and free of hindsight, this is very strong evidence of the economic value behind the data and the methodology.
The long-term value generation of a cross-sector strategy based on statistical learning signals has been similar in magnitude to that of using conceptual parity signals with factors based on convention and plausibility, which was described in a previous post.
Annex 1: Equity sectors
The analysis in this post refers to the following equity 11 sectors by the “Global Industry Classification Standards” (GICS) developed in 1999 jointly by MSCI and Standard & Poor’s. The purpose of the GICS is to help asset managers classify companies and benchmark individual company performances:
- Energy: The sector comprises companies that support the production and transformation of energy. There are two types. The first type focuses on exploring, producing, refining, marketing, and storing oil, gas, and consumable fuels. The second type provides equipment and services for the oil and gas industries, including drilling, well services, and related equipment manufacturing.
- Materials: The sector encompasses a wide range of companies engaged in discovering, developing, and processing raw materials. These include chemicals, construction materials (such as cement and bricks), container and packaging materials (such as plastic and glass), base and precious metals, industrial minerals, paper, and other forest products.
- Industrials: The sector contains a broad range of companies involved in producing goods used in construction and manufacturing (capital goods) as well as providing commercial services and transportation. The area of capital goods includes aerospace and defence, building products, construction and engineering, electrical equipment, industrial conglomerates, and machinery. The commercial services sub-sectors include waste management, office supplies, security services, and professional services (consulting, staffing, and research). The transportation area includes air freight and logistics, airlines, marine transportation, road and rail transportation, and transportation infrastructure companies.
- Consumer discretionary: This sector comprises companies producing consumer goods and services considered non-essential but desirable when disposable income is sufficient. The main areas are automobiles, consumer durables, apparel, consumer services (such as hotels and restaurants), and various retail businesses.
- Consumer staples: This sector includes companies that produce and distribute presumed essential consumer products that households purchase regardless of economic conditions. These products mainly include food, beverages, household goods, and personal care items.
- Health care: The sector includes companies that provide medical services, manufacture medical equipment, or produce drugs. It has two main areas. The first features health care equipment and services. It includes manufacturers of medical products and supplies, providers of health care services (such as hospitals and nursing homes), and companies that provide technology services (such as electronic health records). The second area features research, development, and production of pharmaceuticals, biotechnology, and life sciences tools.
- Financials: This sector provides financial services, including banking, investment services, insurance, and financial technology (fintech). The four main subsectors are banks, diversified financials (such as asset management, credit cards, and financial exchanges), insurance, and investment trusts.
- Information technology: This sector includes companies that produce software, hardware, and semiconductors, as well as those that provide IT services, internet services, and interactive media. Software companies produce application software and systems software. Hardware companies provide computers, networking equipment, and consumer electronics. The semiconductor sector manufactures semiconductors and the equipment used for producing the former. IT services include consulting, data processing and outsourced services. Internet services encompass cloud computing, web hosting and data centres. Interactive media include digital platforms, such as Google and Facebook.
- Communication services: This sector features companies that broadly provide communication services and entertainment content. It contains two main areas. The first is telecommunication services, which provide the means for telecommunication, including traditional fixed-line telephone services, broadband internet services, and wireless telecommunication services. The second area is media and entertainment, which focuses on the creation and distribution of content for broadcasting, home entertainment, movies, music, video games, social media platforms, search engines, and so forth.
- Utilities: This sector includes companies that provide essential utility services such as electricity and water. Their activities include generation, transmission, and distribution, and they are typically subject to tight regulations. Standard classification distinguishes five types of utilities: electric utilities, gas utilities, water utilities, multi-utilities, and independent power and renewable electricity producers.
- Real estate: This sector focuses on real estate development and operation. It encompasses property ownership, development, management, and leasing. It also includes Real Estate Investment Trusts (REITs) that invest in various property types.
Annex 2: List of macro factors
The below indicators have been explained in a previous post and notebook (view here).