Information Efficiency, Quantitative Methods, Systematic Value

Evaluating macro trading signals in three simple steps

2,758

Meaningful evaluation of macro trading signals must consider their seasonality and diversity across countries. This post proposes a three-step process to this end. The first step runs significance tests of proposed predictive relations using a panel of markets. The second step reviews the reliability of predictive relations based on accuracy and different correlation metrics across time and markets. The third step estimates the economic value of the signal based on performance metrics of a standardized naïve PnL. All these steps can be implemented with special Python classes of the Macrosynergy package. Conscientious evaluation of macro signals not only benefits their selection for live trading. It also paints a realistic picture of the PnL profile, which is critical for setting risk limits and for broader portfolio integration.

The below post is based on proprietary research of Macrosynergy.

A Jupyter notebook for audit and replication of the research results can be downloaded here. The notebook operation requires access to J.P. Morgan DataQuery to download data from JPMaQS, a premium service of quantamental indicators. J.P. Morgan offers free trials for institutional clients.
Also, an academic research support program sponsors data sets for relevant projects.

This post ties in with this site’s summary of “Quantitative Methods for Macro Information Efficiency.”

A simple macro signal evaluation framework

Macro trends and states are powerful drivers of financial markets. However, the effects of individual macro factors are notoriously time-variant or “seasonal.” This means that they can be stronger or weaker for prolonged periods of time. For example, the influence of consumer price trend metrics can be dominant in inflationary periods but just very subtle in low-inflation times. Moreover, the common history of recorded economics and most liquid financial markets has been 20-50 years, depending on the asset or derivative market and the country we look at. While this would be long for other sciences, the number of economic cycles, crises, and megatrends over a few decades is limited. To empirically evaluate macro trading positions, it is very important to combine the experience of a diverse set of countries (view post here). This is the domain of panel analysis, the analysis of datasets that have two dimensions: countries and time periods.

Considering these constraints, it is not appropriate to assess the expected success of a macro trading signal merely by a single historical metric, such as a Sharpe ratio of a backtested PnL. Instead, it is critical to ascertain various quality criteria, including the statistical significance of predictive power, the accuracy of directional predictions, the consistency of relations across countries and times, and the reliability of value generation over certain investment horizons. This holistic signal evaluation approach can be implemented in three simple, necessary steps:

Panel correlation: These analyses visualize and quantify the relations between macro signals (features) and subsequent returns (targets) across countries or currency areas. An important metric is the significance of forward correlation. This requires a special panel test that adjusts the data of the predictive regression for common global influences across countries (view post here). This test is a most useful selection criterion for macro signal candidates. It is important, however, that the surmised relation between features and targets is similar across countries and that the country-specific features matter, not just their global averages.
Accuracy and correlation robustness: Accuracy measures the share of correctly predicted directions of subsequent returns relative to all predictions. It not only shows an important aspect of feature-target co-movement but also implicitly tests if the signal’s neutral (zero) level has been well chosen. A particularly important metric for macro trading strategies is balanced accuracy, which is the average of the proportions of correctly predicted positive and negative returns. This statistic is immune to past common directional biases in signal and returns. One of the faults of simple accuracy (and most PnL performance ratios) is that it can be inflated by past strong performances of the traded assets in conjunction with a natural long bias in the signal.
The robustness of accuracy metrics, parametric (Pearson) correlation, and non-parametric (Kendall) correlation should be checked across time periods, cross-sections, and variations of features and returns.
Naïve PnL metrics: Naïve profit and loss series can be calculated by taking positions in accordance with normalized signals and regular rebalancing at the beginning of each month, in accordance with signals at the end of the previous month, allowing for a 1-day time-lapse for trading. The trading signals are capped at a maximum of two standard deviations as a reasonable risk limit. A naïve PnL does not consider transaction costs or risk management tools. It is thus not a realistic backtest of actual financial returns in a specific institutional setting. However, it is an objective and undistorted representation of the economic value of the signals.
Performance metrics of naive PnL analysis prominently indicate an economic value in the form of risk-adjusted returns, display correlation to global risk benchmarks, such as bond and equity prices, and show consistency across time, as opposed to seasonality and concentration of value generation.

The above checkups are usually necessary to gain confidence with a trading signal and a realistic assessment of the related PnL profile. A range of more advanced metrics can be helpful as well, depending on context (view post here). Critically, the validity of each signal evaluation process requires the suppression of data mining. It is legitimate to use the signal evaluation process to sharpen the logic behind the signal calculation logic. It is also reasonable to use an evaluation approach for sequential signal optimization through statistical learning (view post here). However, it is misleading and harmful to systematically search across a grid of feature models and parameters until a satisfactory signal has been found and predicate live trading deployment on the experiences with that signal.

Macros signal evaluation with the Macrosynergy package

The above three steps of macro signal evaluation can be easily implemented through three Python classes of the Macrosynergy package. Generally, this free package provides convenience functions for the analysis and transformation of macro-quantamental indicators, i.e., point-in-time macro and return series across major developed and emerging countries, and offers some basic methods for testing the trading or investment value of quantamental strategies.

Panel correlation: The `CategoryRelations` class of the `panel` module manages analysis and visualization of multiple panel categories at different frequencies, i.e., types of indicator time series that are available across different markets. If instantiated with signal categories and return categories, it serves panel correlation analysis. In particular, the `reg_scatter` method of this class displays scatter plots and regression lines in conjunction with the results of the Macrosynergy panel test. It can also show more granular analyses across time and countries. There is also a ` multiple_reg_scatter` function that manages the conduct and display of multiple regression analyses across various feature and target categories.
Accuracy and correlation robustness: The `SignalReturnRelations` class of the `signal` module manages the analyses of signal and return series at various frequencies. It offers statistics on the relation between signals and subsequent returns, including accuracy, balanced accuracy, Pearson correlation, and Kendal correlation, as well as statistics on the directional biases of signals and returns. There is a range of methods for displaying these statistics across signals, time, countries, and target returns. Particularly important table-generating methods are `multiple_relations_table`, which shows all statistics for all signal-return relations and frequencies, `single_relation_table`, which displays all statistics for one specific signal-return relation with greater granularity, such as a presentation by years and cross-sections, and ` single_statistic_table`, which displays a single statistic, such as accuracy, for various signals and targets returns. Useful graphics methods are `accuracy_bars`, which plots accuracy and balanced accuracy across countries or years, and `correlation_bars`, which plots predictive correlation coefficients and their significance across countries or years.
Naïve PnL metrics: The ` NaivePnL` class of the `pnl` module manages illustrative PnLs based on panels of signals and returns with limited transformation options and disregarding transaction costs. The `make_pnl` method computes and collects PnLs in the class instance. The ` plot_pnls` method plots cumulative PnLs, multiple PnL types per cross section, or multiple cross sections per PnL type. The ` evaluate_pnls` method produces a table of key performance metrics for various naïve PnLs. Finally, the `signal_heatmap` method displays heatmaps of signal values across times and cross-section.

The exact application of these functions is explained in the documentation and demonstrated for the example strategies in this post in the related Jupyter notebook.

A simple example strategy

In this post we apply the three-step process of macro signal evaluation to interest rate swap (IRS) strategies that are based on key macro trends that influence central banks’ reaction functions. All data for signals and target returns have been taken from the J.P. Morgan Macrosynergy Quantamental System (JPMaQS). All economic indicators are information states, i.e., the latest instances of a measure based on the full-time series vintage available at a date. The focus is on three macro trend pressure indicators:

Excess GDP growth: This is the difference between point-in-time estimates of real GDP growth and 5-year moving medians of GDP growth. These are calculated as the average of two competing quantamental concepts. The first is a “technical” real GDP growth trend estimate, as % over a year ago, 3-month moving average, based on updating standard nowcasting models whose hyperparameters are governed by sequential supervised learning to avoid look-ahead bias (view documentation here). The second is an “intuitive” GDP growth trend based on actual national accounts and monthly activity data, based on sets of regressions that replicate conventional charting methods (view documentation here).
Excess CPI inflation: For this post, this is simply the difference between an average of standard headline CPI growth measures and a country’s effective inflation target. The CPI growth rates are standard annual headline consumer price inflation and seasonally- and jump-adjusted trends, i.e., % of latest 3 months over previous 3 months at annualized rate and % of latest 6 months over previous 6 months at an annualized rate (view documentation here). The effective inflation target is the estimated official inflation target for the next year, adjusted for past target deviations (view documentation here). The accompanying Jupyter notebook includes other excess inflation metrics, which may be more appropriate, but this post only considers the simplest metrics.
Excess private credit growth: This is the difference between a private credit growth trend, jump-adjusted, as % over a year ago (view documentation here) and estimated medium-term nominal GDP growth. Nominal GDP growth is estimated as the sum of the past 5 years’ average GDP growth trend and the effective inflation target.

These three pressure indicators are combined into a single macro trend pressure indicator simply by averaging, as their values are all annual growth rates with comparable orders of magnitude. Specifically, we produce a broad macro trend pressure indicator using all three metrics and a narrow pressure indicator using only excess growth and inflation. The latter is more conservative based solely on the traditional Taylor rule for monetary policy.

We test the macro trend pressure indicators for 9 developed market (DM) currency areas and 13 emerging market (EM) countries with reasonably liquid interest rate swap markets. The developed areas are Australia (AUD), Canada (CAD), Switzerland (CHF), the euro area (EUR), the UK (GBP), Japan (JPY), Norway (NOK), Sweden (SEK), and the U.S. (USD). The emerging markets are Chile (CLP), Colombia (COP), Czech Republic (CZK), Hungary (HUF), Israel (ILS), India (INR), South Korea (KRW), Mexico (MXN), Poland (PLN), Thailand (THB), Turkey (TRY), Taiwan (TWD), and South Africa (ZAR). Some periods have been “blacklisted” for specific countries in accordance with tradability dummies on JPMaQS (view documentation here).

For all “small countries”, defined as all countries except the U.S. and the euro area, we also produce hybrid macro pressure indicators, which are an equal average of the local macro pressure indicators and the G2 pressure indicators. This concept reflects a widely recognized asymmetry: U.S. and euro area economic and financial developments strongly affect small countries’ rates markets, while small countries hardly affect G2. The target returns are volatility-targeted 2-year IRS fixed receiver returns (view documentation here). For some robustness checks, we also used vol-targeted 5-year IRS returns.

In the sections below, we evaluate macro trend pressure signals for directional strategies in the G2 (U.S. and euro area), directional strategies in 20 smaller markets, and relative cross-country strategies for all 22 currency areas.

G2 directional macro trend pressure signal

Panel correlation

The basic hypothesis is that macro trend pressure is negatively related to subsequent IRS fixed receiver returns. The reason is that excess growth, inflation, and credit supply trigger a shift to monetary policy tightening and nurture public inflation fears. These trends are not immediately and fully priced due to rational inattention (view post here). We test this hypothesis and the related strategy for the two largest fixed-income markets, the U.S. and the euro area, for the period 2000-2024 (May).

The below panel correlation analysis relates end-of month pressure information state to next month’s return. For the broad macro pressure indicator confirms the negative predictive relation and, more importantly, suggests that the probability of this relation being systematic and not due to chance is near 100% for the sample period 2000-2024.

Negative one-month-ahead predictive power also prevails for the narrow macro trend pressure indicator and all three constituents, i.e., excess growth, inflation, and credit growth. The narrow macro trend pressure indicator posted a stronger correlation with subsequent returns than the broad. Meanwhile, the predictive power of excess credit growth has been weakest among the constituents, but the probability of the relation being genuinely systematic is still over 90%.

To gauge the intertemporal stability of the predictive relation, we can divide the sample into two sub-periods, 2000-2011 and 2012-2024. Panel correlation analysis shows negative predictive power for both, each with a statistical significance of above 98%. Other sample divisions show similar results. Intertemporal stability increases the likelihood that the relation is a persistent features of fixed income markets.

Negative predictive relation also prevailed in both the euro area and the U.S., albeit it was stronger in the euro area, possibly because of lower information efficiency and the signals’ disregard of core inflation and labour market data, which play a prominent role in the United States.

Accuracy and correlation robustness

We check predictive accuracy and additional correlation metrics for (the negative values of) macro pressure with respect to subsequent 2-year IRS returns at a monthly and quarterly frequency. As for the panel correlation analysis, this relates end-of-period information states to the next period’s cumulative return.

Monthly accuracy and balanced accuracy for broad and narrow macro pressure have been between 56-58%, depending on frequency. These are high hit ratios of market direction, particularly given that the signals had a short bias, whereas returns recorded a long bias. Most other pressure indicators’ accuracy rates scored above 50% as well, except for the relation between excess inflation and quarterly IRS returns. Possibly, CPI trends are closely watched by markets, and a lagged response to excesses over more than a month, taking the rational inattention argument too far. Moreover, the use of simple headline inflation as indicators may not be appropriate for the U.S., where the central bank targets a core PCE inflation rate.

Moreover, all parametric and non-parametric predictive correlation coefficients of all pressure indicators have been positive.

We also check the robustness of balanced accuracy with respect to exact specifications of the macro pressure argument and targets. Using balanced accuracy, we find pervasive predictive power as illustrated by the below summary map of this statistic, which uses both 2-year and 5-year IRS returns as targets. Again, the weak spot has been excess inflation.

Monthly accuracy has not been above 50% in all years, but only in a bit less than two-thirds of them. This points to seasonality of value generation. However, annual accuracies do not show a pattern of decay overtime but rather temporary dips around various financial crises.

Naïve PnL metrics

Naïve PnLs based on the broad and narrow macro trend indicators have both shown clear long-term upward drifts. However, they also revealed pronounced seasonality, with fat years and lean years coming in longer epochs. Note that PnLs were scaled to 10% to facilitate presentation in graphs.

The seasonality of return generation reflects the seasonality of macroeconomic opportunities. Strong signals were recorded during periods of economic overheating and recession, while in less turbulent periods, the signals were small and often called for opposite directional exposure in the U.S. and the euro area. This type of seasonality is a plausible characteristic of a single-principle macro strategy that only trades two contracts.

Performance metrics suggest economically meaningful alpha generation over the long-term. The 24-year Sharpe ratios for both broad and narrow macro trend pressure have been above 0.7 and the Sortino ratios above 1, while correlation with the 10-year treasury bond returns (JPMaQS ticker USD_GB10YXR_NSA) and the S&P500 (USD_EQXR_NSA) returns have been near zero or slightly negative. Drawdowns have been sizeable. The largest peak-to-trough draw for the broad macro pressure signal has been more than 3 annualized standard deviations or 4 times the average annual return. While this is challenging for risk management it is not unusual for single-principle macro strategies. For comparison, the peak-to-trough drawdown of a risk-parity long-only book would have been 6 times the annual standard deviation.

Finally, we check the concentration of PnL generation. The contribution of the 5% most profitable months to the overall long-term PnL has been 80% for the broad macro pressure signal. This is another reflection of seasonality. It is however not exceptionally high for single-principle strategies. For example, for the long-only book, the 5% best months earned over 160% of the long-term PnL.