Using principal components to construct macro trading signals

Jupyter Notebook

Principal Components Analysis (PCA) is a dimensionality reduction technique that condenses the key information from a large dataset into a smaller set of uncorrelated variables called “principal components.” This smaller set often functions better as features for predictive regressions, stabilizing coefficient estimates and reducing the influence of noise. In this way, principal components can improve statistical learning methods that optimize trading signals.

This post shows how principal components can serve as building blocks of trading signals for developed market interest rate swap positions, condensing the information of macro-quantamental indicators on inflation pressure, activity growth, and credit and money expansion. Compared to a simple combination of these categories, PCA-based statistical learning methods have produced materially higher predictive accuracy and backtested trading profits. PCA methods have also outperformed non-PCA-based regression learning. PCA-based statistical learning in backtesting leaves little scope for data mining or hindsight, and the discovery of trading value has high credibility.

The post below is based on Macrosynergy’s proprietary research.
Please quote as “Gholkar, Rushil and Sueppel, Ralph, ‘Using principal components to construct macro trading signals,’ Macrosynergy research post, October 2024.”

A Jupyter notebook for audit and replication of the research results can be downloaded here. The notebook operation requires access to J.P. Morgan DataQuery to download data from JPMaQS. Everyone with DataQuery access can download data, except for the last 6 months. Moreover, J.P. Morgan offers free trials on the full dataset for institutional clients. For others, an academic research support program sponsors data sets for relevant projects.

This post ties in with this site’s summary of “Quantitative Methods For Macro Information Efficiency”.

The basics of principal components analysis

Principal Components Analysis (PCA) is a dimension-reduction technique for large datasets. It transforms a multitude of original data series into a reduced dataset by detecting the most important patterns in the historical data. The transformed series are “principal components”. They are always linear combinations of the original series and retain a large part of the information of the original data set. The principal components are uncorrelated and ordered such that the first few components capture most of the variability in the data. Mathematically, they are derived from the eigenvectors of the covariance matrix of the original data. The eigenvalues represent the amount of overall variance explained by each principal component. Geometrically, PCA transforms the original data into a new coordinate system where the axes, which represent the principal components, are aligned in the direction of maximum variance.

The computational part of principal component analysis typically proceeds in five steps:

  1. Standardize the original data series so that each has a mean of 0 and a standard deviation of 1.
  2. Compute the covariance matrix of these series.
  3. Find Eigenvalues and Eigenvectors. Eigenvectors indicate the direction of the principal components, while eigenvalues show how much variance occurs in that direction.
  4. Select Principal Components based on these eigenvalues. One can either fix the number of selected components or the share of variation that they are to represent.
  5. Project the original data onto the new principal components, which represent a lower dimensional space. This is a simple matrix multiplication.

Note that standardization and centring data are important for PCA. If one does not centre, the first principal component can point in the direction of the mean rather than capturing the largest source of variation. The extent of this effect depends on the distance of the original data from the origin. The principal components analysis in this post uses the PCA class of the scikit-learn package. It employs Singular Value Decomposition of the data to project series to a lower dimensional space.

The benefits and drawbacks of principal components as macro trading signals

Principal components can be gainfully applied in predictive regressions that estimate the relationship between factor candidates and target returns. Predictive regressions, in turn, are an important basis of statistical learning for the development of macro trading factors (view post here). Principal components are a substitute for large, correlated sets of features in predictive regressions. The use of principal components in predictive regressions offers various benefits:

  • As principal components are orthogonal, they remove the multicollinearity of features. Multicollinearity undermines the reliability and interpretability of predictive regressions through unstable coefficient estimates, inflated standard errors of parameter estimates, and overfitting.
  • PCA identifies components that explain most of the variance of the original data set, allowing you to focus on the most important patterns and filter out noise. This often results in a more stable and generalizable model. The model is less likely to overfit to irrelevant fluctuations.
  • If time series data represent underlying latent factors (such as economic indicators or market forces), PCA can help uncover these hidden factors.
  • Finally, fewer predictors also make the model simpler and reduce training time in statistical learning. This can lead to faster computation times and improved performance in machine learning.

When weighing the choice and specific form of the PCA, one must also be aware of the drawbacks of the method:

  • The principal components that are passed to predictive regressions have no clear interpretation. Uncorrelatedness does not mean independence, and correlations between principal components and the original data series can often be confounding. Without a clear interpretation of regressors, one cannot introduce theoretical priors to predictive regression, such as non-negativity of coefficients, and the choice of models and signals in statistical learning is all up to the data, reducing bias but increasing variance.
  • A statistical drawback of PCA is that it increases the variance of the statistical learning signals by adding estimated coefficients that are susceptible to episodic peculiarities. Economic time series, such as inflation and money growth, may be correlated in one environment and not in another.

These drawbacks imply that the choice of original data sets matters and that it is often appropriate to structure the original series into “conceptual groups” based on the meaning of the series. Principal components of homogenous groups, such as various types of inflation metrics, are more likely to display stable correlation and interpretable components than components of a heterogeneous data set of conceptually unrelated indicators whose correlation can be adventitious and unstable.

A practical application of PCA to extract fixed-income trading signals

In this post, we apply regression-based learning with PCA to combine various macro-quantamental categories into trading signals for duration exposure in developed countries’ interest rate swap markets. We consider economic indicators and interest rate swap returns for ten developed markets currency areas: AUD (Australian dollar), CAD (Canadian dollar), CHF (Swiss franc), EUR (Euro), GBP (British pound), JPY (Japanese yen), NOK (Norwegian krone), NZD (New Zealand dollar), SEK (Swedish krona), and (U.S. dollar).

The objective is to develop macro-quantamental trading signals to manage interest rate swap positions across developed markets. The resultant strategy would implicitly manage both directional and cross-currency duration risk. The formal targets of the analysis are returns on 5-year fixed-rate receiver positions, also called duration returns (view documentation).

For all these currency areas we consider commonly watched economic indicators as constituents of a trading signal. For meaningful historical analysis and backtesting, these data must come in the form of macro-quantamental indicators. Macro-quantamental indicators are point-in-time information states of economic developments specifically generated and updated for the backtesting and operation of trading strategies. Comparable indicators across multiple countries can be called macro-quantamental categories. The indicators have been downloaded from the J.P. Morgan Macrosynergy Quantamental System (JPMaQS).

In particular, we consider three groups of macro-quantamental categories as predictors of fixed income returns: inflation pressure, excess aggregate demand and output, and excess money and credit growth. The purpose of this selection is to feed a balanced set of categories across concepts that are commonly monitored by the market to the PCA and statistical learning process rather than a “kitchen sink” of all available series on JPMaQS. The benefit of selection and balancing is that (i) all categories have proven relevance for the market, (ii) we do not allow the concept with the most available statistical series to crowd out the others, and (iii) we preserve a modicum of interpretability of principal components as dimensions of a representative information set of the market.

Group 1: Inflation pressure categories

This group contains ten categories that indicate excessive (or insufficient) price pressure in the macroeconomy. Excess inflation is supposed to indicate upward pressure on policy rates and inflation risk premia. The term “excess” for all price inflation indicators means relative to the effective inflation target of the currency area’s central bank (view documentation). Excess for wage inflation indicators means relative to effective inflation targets plus medium-term productivity growth, whereby the latter is the difference between the 5-year medians of GDP growth (view documentation) and workforce growth (view documentation).

  • Two excess headline consumer price index (CPI) growth rates, measured as % over a year ago (view documentation) and as % of the last 6 months over the previous 6 months, seasonally and jump adjusted, at an annualized rate (view documentation).
  • Two excess core CPI growth rates, again as % over a year ago (view documentation) and as % of the last 6 months over the previous 6 months seasonally and jump adjusted and annualized (view documentation), whereby the core inflation rate is calculated according to local convention.
  • Two excess producer price index (PPI) growth rates, measured as % over a year ago and as % of the last 6 months over the previous 6 months, seasonally adjusted and annualized (view documentation).
  • Economy-wide estimated excess output price growth, % over a year ago, 3-month moving average. Output price trends for the overall economy resemble GDP deflators in principle but are estimated at a monthly frequency with a simple nowcasting method and based on a limited early set of price indicators (view documentation).
  • Estimated excess CPI inflation expectations of market participants for 2 years after the latest reported CPI data (view documentation). This is a formulaic estimate that assumes that market participants form their inflation expectations based on the recent inflation rate (adjusted for jumps and outliers) and the effective inflation target.
  • Excess wage growth, main local measure, % over a year ago, 3-month moving average or quarterly (view documentation).
  • Excess residential real estate price growth over a year ago, 3-month moving average or quarterly (view documentation).

For the purpose of principal components analysis and the calculation of the benchmark conceptual parity, all categories are sequentially normalized around their theoretical neutral level, and values are winsorized at three standard deviations to de-emphasize outliers.

Group 2: Excess demand and activity growth categories

This group contains nine categories that indicate aggregate demand or activity growth relative to trend or potential. Positive excess growth information is supposed to exert upward pressure on policy rates and equilibrium real interest rates. Excess for activity and real demand growth is relative to the 5-year median GDP growth (view documentation). Excess for employment growth is relative to the 5-year median workforce growth (view documentation). Business sentiment scores and unemployment rate changes need no benchmark since their natural neutral level is zero.

  • Excess intuitive GDP growth, i.e., the latest estimable GDP growth trend based on actual national accounts and monthly activity data, using sets of regressions that replicate conventional charting methods in markets, % over a year ago, 3-month moving average (view documentation).
  • Excess technical GDP growth, i.e., the latest estimable GDP growth trend based on actual national accounts and monthly activity data, using a standard generic nowcasting model whose hyperparameters are updated over time, % over a year ago, 3-month moving average (view documentation).
  • Excess industrial production growth, % over a year ago, 3-month moving average or quarterly (view documentation).
  • Excess real retail sales growth, % over a year ago, 3-month moving average or quarterly (view documentation).
  • Excess employment growth, % over a year ago, 3-month moving average or quarterly (view documentation).
  • Two measures of negative changes in the unemployment rate are the difference over a year ago, the 3-month moving average or quarterly (view documentation), and the difference of the latest 3 months over the previous 3 months or quarter on quarter (view documentation).
  • Manufacturing confidence score, normalized around theoretical and empirical neutral level, based on main local surveys and using updating parameters for normalization (view documentation).
  • Consumer confidence score, normalized around the theoretical and empirical neutral level, based on main local surveys and using updating parameters for normalization (view documentation).

Group 3: Excess credit and money growth categories

This group contains six categories: excess private credit, money, and liquidity growth. The benchmark for excess credit and money growth is the sum of 5-year median GDP growth and effective inflation targets. No benchmark has been applied to liquidity growth since neutral levels are more dependent on the structural developments of the financial sector.

  • Two excess private credit expansion rates, measured as % change over a year ago, seasonally and jump-adjusted (view documentation) and as a change of private credit over 1 year ago, seasonally and jump-adjusted, as % of nominal GDP (view documentation).
  • Two excess money growth rates, both measured as % change over a year ago, seasonally and jump-adjusted, but one for a narrow money concept (view documentation) and one for a broad money concept (view documentation).
  • Two measures of central bank liquidity growth, both measured as change over the past 6 months, one measuring only the expansion of central bank liquidity that is related to FX interventions and securities purchase programmes (view documentation), the other capturing the full monetary base expansion (view documentation)

All quantamental categories above have been formulated such that their predictive relation with subsequent duration returns should be negative, However, correlations across the categories have been quite diverse, with panel-wide cross-category Pearson coefficients between 80% and -25%,

Using PCA with statistical learning

To combine the above quantamental categories into single trading signals, we use a statistical learning process that is similar to the one shown in a previous post (“Optimizing macro trading signals – A practical introduction”). This learning process proceeds relies on the scikit-learn package and quantamental wrapper functions of the Macrosynergy package. Its main purpose is to sequentially choose optimal models, principal components, and regression-based signals. The learning process operates in six steps.

  1. We extract monthly-frequency pandas data frames in the scikit-learn format of the predictive features, i.e., lagged end-on-month values of quantamental categories and targets, here cumulative monthly duration returns. Note that as we work with panel data, the feature and target return data frames are double-indexed, featuring the currency area and the time period.
  2. We define model and hyperparameter grids according to the scikit-learn convention. In the present case, learning generally involves choice, optimization, and application of two models: a PCA and a predictive regression that uses the principal components as regressors. The grid versions considered for this analysis are explained further below.
  3. We set an optimization criterion to evaluate trained model versions based on unseen validation or test data sets. Here, the criterion is the balanced accuracy of the monthly return predictions, i.e., the average of the correctly predicted positive and negative monthly returns. In the present context, balanced accuracy seems preferable to a standard R-squared criterion because its implicit choices are less geared towards return outliers, i.e., towards the experiences in periods of high market volatility.
  4. We define a cross-validation splitter for model evaluation in sequentially expanding data sets. Cross-validation is an assessment of the predictive quality of a model based on multiple splits of the data into training and test sets, where each pair is called a “fold.” Generally, cross-validation splitters for panel data must ascertain the logical cohesion of the training and test sets based on a double index of cross-sections and time periods, ensuring that all sets are sub-panels over common time spans. Here, we use the ExpandingKFoldPanelSplit splitter class of the Macrosynergy package, where a fixed number of splits is implemented, but temporally adjacent panel training sets always precede test sets chronologically.
  5. We operate sequential optimization of the PCA and predictive regression models and related signal generation using the SignalOptimizer class of the Macrosynergy package. Based on this optimization, one can extract optimal signals in a standard format and run diagnostics on the stability of the choice of optimal models over time.
  6. Finally, we evaluate the sequentially optimized signals based on PCA and regression in terms of predictive power, accuracy, and naïve PnL generation.

Within the above process, we specify as part of the hyperparameter grids various principal component analyses and extract principal component regressors in four forms:

  • Kitchen-sink PCA approach: The learning process applies PCA to the full set of the 25 quantamental categories and extracts the components according to various criteria. Then it uses the full-set principal components as regressors.
  • Groupwise single-stage PCA approach: The learning process applies PCA separately to each conceptual group. it derives principal components of inflation pressure, excess demand growth, and excess credit and money growth. Then, it uses all intra-group principal components as regressors. This approach equalizes the importance of data variation across groups.
  • Groupwise 2-stage PCA approach: The learning process first identifies the principal components for each conceptual feature group. Then, it identifies the principal components of the group. Finally, it uses the “principal components of principal components” as predictive regressors.
  • Groupwise conceptual-PCA approach: Here, the learning process only applies to PCA to three group conceptual parity signals, i.e., within each group, all categories are normalized and winsorized (capped at three standard deviations) and then averaged. Subsequently, PCA is applied to the three group scores and the principal components used in predictive regression.

The main hyperparameters of all PCAs are the selection criteria of the principal components. Generally, we offer the process three criteria:

  • a cumulative variance criterion that selects the ordered principal components until they capture at most 95% of the overall variation in the data,
  • the Kaiser criterion, a rule of thumb that retains principal components with eigenvalues greater than 1, i.e., components that explain at least as much variance as an original standardized series or
  • a fixed number of (3) components to be extracted.

There are finer points in the application of the PCA, but generally, the present analysis uses the simplest possible options at each turn. For full reference, see the Jupyter Notebook linked above.

For illustration, the chart below shows the evolution of the correlation between the first principal component and the original (normalized and winsorized) macro-quantamental categories for the kitchen-sink PCA approach. It shows that this component is a common background factor in GDP, labour market, inflation, and credit trends. This component seems to track some kind of broad business cycle factor. With the return of inflation in 2020, the relative correlation of the first principal component with CPI growth increased vis-à-vis economic activity.

The second principal component has been related to the difference between growth and inflation indicators, possibly reflecting some kind of “trade-off” factor, distinguishing periods of favourable growth-inflation trade-offs (“goldilocks”) from periods where growth is inflationary. A favourable trade-off allows more lenient monetary policy and supports public finances.

Finally, as a benchmark for all three PCA-based optimized signals, we calculate two types of trading signals based on conceptual parity.

  • The first applies conceptual parity group signals, i.e. average scores for inflation pressure, excess demand, and excess money and credit growth, to the statistical learning regression (regression-based learning with conceptual factors).
  • The second merely takes an average score of the group average scores and requires no estimation or learning at all. We will call this two-stage conceptual parity.

What we can learn from a simplistic PCA-based rates strategy

Evaluation of PCA-based and non-PCA-based signals generated for the ten developed fixed-income markets over the last 20 years reveals important insights.

  • PCA-based methods underperform conceptual parity methods in simple panel correlation analysis for multiple countries. The PCA methods outperform, however, in intertemporal correlation within countries when compared to non-PCA learning. This suggests that some of the information that distinguishes macro trends across countries is lost in the transformation of indicators through the statistical learning process.
  • PCA-based methods outperform conceptual parity with respect to predictive accuracy. That means that they have done a better job of predicting the direction of global duration returns.
  • PCA-based methods have produced higher and more consistent trading values compared to both a pure (2-stage) conceptual parity signal and a regression-based learning approach with conceptual parity.
  • Across specifications, the outperformance of PCAs has been robust. This is remarkable as the sequential choice of regressors and optimization of signals involves little prior judgment and gives little room for implicit or explicit hindsight. Simply put, PCA-based backtests have high credibility.

The scatter plots and panel regression tests below show that all types of signals have displayed positive predictive power with respect to subsequent duration returns at a monthly frequency. However, for the full panel of 10 currency areas, the forward correlation was only significant for the conceptual parity signal.

It is not unusual for conceptual parity signals to outperform optimized signals. After all, the statistical learning process adds a lot of variation to the signal that is related to model and parameter changes, neither of which has anything to do with changing market conditions.

Moreover, here, the lower correlation significance of the learning and PCA-based signals for the panel mainly reflects the inability of the refined statistical signal to predict relative returns across the ten different countries. The charts and statistics below for the U.S. alone show that all signals have displayed significant power for predicting directional returns within a single market. The inability to predict cross-country return differences may be an artefact of the calculation of a single set of principal components for all countries, i.e., the panel, ignoring differences in cross-indicator correlations across countries.

The PCA and learning-based signals also post higher predictive accuracy, signalling the direction of monthly return correctly in 52.9% to 54.3% of all months and currency areas versus 51.5% for the 2-stage conceptual parity signal.

Finally, we assess the economic value of the various signals through naïve PnL performance metrics. The PnL simulation assumes that positive signals translate into 1 USD notional fixed rates receiver positions and negative signals in 1 USD notional exposure fixed rates payer positions. Thus, we apply a binary +1/-1 positioning signal here, the simplest possible signal version. Positions are rebalanced at the beginning of a month based on signals at the end of the previous month, and one day of slippage of trading is added to the change of position. The naïve PnL does not consider transaction costs or risk management rules. All PnLs are scaled (not volatility targeted) to 10% annualized standard deviations for joint graphical representation.

The chart below shows that the value generation of the regression-PCA learning signals has all outperformed the conceptual parity approaches. The best-performing signals for risk-adjusted returns since 2004 have been the Groupwise 2-stage PCA approach, with a 2004-2024 Sharpe ratio of 0.7 and a Sortino ratio of 1.0. PnL generation was seasonal, concentrated in economic downturns and recoveries. Correlation with the Treasury has been a positive 35%, reflecting a 66% long bias of the signal. The performances of other PCA-based learnings have been similar, with Sharpe ratios of 0.6-0.7 and Sortino ratios of 0.9-1.

The 2-stage conceptual parity signal underperformed, with a Sharpe of just 0.3 and value generation on just two episodes. This poor performance was due partly to a slight short bias in the long run. The regression-based learning signal of the group conceptual parity scores fared a bit better, with a long-term Sharpe of 0.4. However, that signal used an 80% long bias. That very long bias contributed to a big drawdown in the 2020s.

Using a slightly more elaborate type of signal and position management confirms the value of the PCA-based signals. The below naïve PnLs take positions proportionately to the macro-quantamental signal with a risk limit of 2 standard deviations long or short. Moreover, individual country IRS positions are vol-targeted at 10% with monthly re-estimation. This type of PnL produced slightly higher Sharpe and Sortino ratios for the PCA signals, with the 2-stage PCA reaching a Sharpe of 0.8 and a Sortino of 0.8. The conceptual parity strategy improved relative to the PCA signals in this signal version, but its value generation was still highly seasonal. In the context of learning, the non-PCA strategy continued to underperform the PCA signals.

Overall, in the above simple strategy example, PCA delivered good predictive statistics and naïve value generation ratios, considering that all trading relied on just three types of macro factors. Importantly, the automated, simplistic and sequential signal generation left virtually no scope for look-ahead bias. The empirical evidence, therefore, testifies to the quality of both the data and the methods that were used.