Meaningful evaluation of macro trading signals must consider their seasonality and diversity across countries. This post proposes a three-step process to this end. The first step runs significance tests of proposed predictive relations using a panel of markets. The second step reviews the reliability of predictive relations based on accuracy and different correlation metrics across time and markets. The third step estimates the economic value of the signal based on performance metrics of a standardized naïve PnL. All these steps can be implemented with special Python classes of the Macrosynergy package. Conscientious evaluation of macro signals not only benefits their selection for live trading. It also paints a realistic picture of the PnL profile, which is critical for setting risk limits and for broader portfolio integration.

The below post is based on proprietary research of Macrosynergy.

A Jupyter notebook for audit and replication of the research results can be downloaded here. The notebook operation requires access to J.P. Morgan DataQuery to download data from JPMaQS, a premium service of quantamental indicators. J.P. Morgan offers free trials for institutional clients.

Also, an academic research support program sponsors data sets for relevant projects.

This post ties in with this site’s summary of “Quantitative Methods for Macro Information Efficiency.”

## A simple macro signal evaluation framework

Macro trends and states are powerful drivers of financial markets. However, __the effects of individual macro factors are notoriously time-variant or “seasonal.”__ This means that they can be stronger or weaker for prolonged periods of time. For example, the influence of consumer price trend metrics can be dominant in inflationary periods but just very subtle in low-inflation times. Moreover, the common history of recorded economics and most liquid financial markets has been 20-50 years, depending on the asset or derivative market and the country we look at. While this would be long for other sciences, the number of economic cycles, crises, and megatrends over a few decades is limited. To empirically evaluate macro trading positions, it is very important to combine the experience of a diverse set of countries (view post here). This is the domain of panel analysis, the analysis of datasets that have two dimensions: countries and time periods.

Considering these constraints, it is not appropriate to assess the expected success of a macro trading signal merely by a single historical metric, such as a Sharpe ratio of a backtested PnL. Instead, it is critical to ascertain various quality criteria, including the statistical significance of predictive power, the accuracy of directional predictions, the consistency of relations across countries and times, and the reliability of value generation over certain investment horizons. This holistic signal evaluation approach can be implemented in three simple, necessary steps:

**Panel correlation**: These analyses visualize and quantify the relations between macro signals (features) and subsequent returns (targets) across countries or currency areas. An important metric is the__significance of forward correlation__. This requires a special panel test that adjusts the data of the predictive regression for common global influences across countries (view post here). This test is a most useful selection criterion for macro signal candidates. It is important, however, that the surmised relation between features and targets is similar across countries and that the country-specific features matter, not just their global averages.**Accuracy and correlation robustness**: Accuracy measures the__share of correctly predicted directions of subsequent returns relative to all predictions__. It not only shows an important aspect of feature-target co-movement but also implicitly tests if the signal’s neutral (zero) level has been well chosen. A particularly__important metric for macro trading strategies is balanced accuracy__, which is the average of the proportions of correctly predicted positive and negative returns. This statistic is immune to past common directional biases in signal and returns. One of the faults of simple accuracy (and most PnL performance ratios) is that it can be inflated by past strong performances of the traded assets in conjunction with a natural long bias in the signal.

The robustness of accuracy metrics, parametric (Pearson) correlation, and non-parametric (Kendall) correlation should be checked across time periods, cross-sections, and variations of features and returns.**Naïve PnL metrics**: Naïve profit and loss series can be calculated by__taking positions in accordance with normalized signals and regular rebalancing at the beginning of each month__, in accordance with signals at the end of the previous month, allowing for a 1-day time-lapse for trading. The trading signals are capped at a maximum of two standard deviations as a reasonable risk limit. A naïve PnL does not consider transaction costs or risk management tools. It is thus not a realistic backtest of actual financial returns in a specific institutional setting. However, it is an objective and undistorted representation of the economic value of the signals.

Performance metrics of naive PnL analysis prominently indicate an economic value in the form of risk-adjusted returns, display correlation to global risk benchmarks, such as bond and equity prices, and show consistency across time, as opposed to seasonality and concentration of value generation.

The above checkups are usually necessary to gain confidence with a trading signal and a realistic assessment of the related PnL profile. A range of more advanced metrics can be helpful as well, depending on context (view post here). Critically, __the validity of each signal evaluation process requires the suppression of data mining__. It is legitimate to use the signal evaluation process to sharpen the logic behind the signal calculation logic. It is also reasonable to use an evaluation approach for sequential signal optimization through statistical learning (view post here). However, it is misleading and harmful to systematically search across a grid of feature models and parameters until a satisfactory signal has been found and predicate live trading deployment on the experiences with that signal.

## Macros signal evaluation with the Macrosynergy package

The above three steps of macro signal evaluation can be easily implemented through three Python classes of the Macrosynergy package. Generally, this free package provides convenience functions for the analysis and transformation of macro-quantamental indicators, i.e., point-in-time macro and return series across major developed and emerging countries, and offers some basic methods for testing the trading or investment value of quantamental strategies.

**Panel correlation**: The `CategoryRelations` class of the `**panel**` module manages analysis and visualization of multiple panel categories at different frequencies, i.e., types of indicator time series that are available across different markets. If instantiated with signal categories and return categories, it serves panel correlation analysis. In particular, the `reg_scatter` method of this class displays scatter plots and regression lines in conjunction with the results of the Macrosynergy panel test. It can also show more granular analyses across time and countries. There is also a ` multiple_reg_scatter` function that manages the conduct and display of multiple regression analyses across various feature and target categories.**Accuracy and correlation robustness**: The `SignalReturnRelations` class of the `signal` module manages the analyses of signal and return series at various frequencies. It offers statistics on the relation between signals and subsequent returns, including accuracy, balanced accuracy, Pearson correlation, and Kendal correlation, as well as statistics on the directional biases of signals and returns. There is a range of methods for displaying these statistics across signals, time, countries, and target returns. Particularly important table-generating methods are `multiple_relations_table`, which shows all statistics for all signal-return relations and frequencies, `single_relation_table`, which displays all statistics for one specific signal-return relation with greater granularity, such as a presentation by years and cross-sections, and ` single_statistic_table`, which displays a single statistic, such as accuracy, for various signals and targets returns. Useful graphics methods are `accuracy_bars`, which plots accuracy and balanced accuracy across countries or years, and `correlation_bars`, which plots predictive correlation coefficients and their significance across countries or years.**Naïve PnL metrics**: The ` NaivePnL` class of the `pnl` module manages illustrative PnLs based on panels of signals and returns with limited transformation options and disregarding transaction costs. The `make_pnl` method computes and collects PnLs in the class instance. The ` plot_pnls` method plots cumulative PnLs, multiple PnL types per cross section, or multiple cross sections per PnL type. The ` evaluate_pnls` method produces a table of key performance metrics for various naïve PnLs. Finally, the `signal_heatmap` method displays heatmaps of signal values across times and cross-section.

The exact application of these functions is explained in the documentation and demonstrated for the example strategies in this post in the related Jupyter notebook.

## A simple example strategy

In this post we apply the three-step process of macro signal evaluation to __interest rate swap (IRS) strategies that are based on key macro trends that influence central banks’ reaction functions__. All data for signals and target returns have been taken from the J.P. Morgan Macrosynergy Quantamental System (JPMaQS). All economic indicators are information states, i.e., the latest instances of a measure based on the full-time series vintage available at a date. The focus is on three macro trend pressure indicators:

**Excess GDP growth**: This is the__difference between point-in-time estimates of real GDP growth and 5-year moving medians__of GDP growth. These are calculated as the average of two competing quantamental concepts. The first is a “technical” real GDP growth trend estimate, as % over a year ago, 3-month moving average, based on updating standard nowcasting models whose hyperparameters are governed by sequential supervised learning to avoid look-ahead bias (view documentation here). The second is an “intuitive” GDP growth trend based on actual national accounts and monthly activity data, based on sets of regressions that replicate conventional charting methods (view documentation here).**Excess CPI inflation**: For this post, this is simply the__difference between an average of standard headline CPI growth measures and a country’s effective inflation target__. The CPI growth rates are standard annual headline consumer price inflation and seasonally- and jump-adjusted trends, i.e., % of latest 3 months over previous 3 months at annualized rate and % of latest 6 months over previous 6 months at an annualized rate (view documentation here). The effective inflation target is the estimated official inflation target for the next year, adjusted for past target deviations (view documentation here). The accompanying Jupyter notebook includes other excess inflation metrics, which may be more appropriate, but this post only considers the simplest metrics.**Excess private credit growth**: This is the__difference between a private credit growth trend, jump-adjusted, as % over a year ago (view documentation here) and estimated medium-term nominal GDP growth__. Nominal GDP growth is estimated as the sum of the past 5 years’ average GDP growth trend and the effective inflation target.

These __three pressure indicators are combined into a single macro trend pressure indicator simply by averaging__, as their values are all annual growth rates with comparable orders of magnitude. Specifically, we produce a *broad* macro trend pressure indicator using all three metrics and a *narrow* pressure indicator using only excess growth and inflation. The latter is more conservative based solely on the traditional Taylor rule for monetary policy.

We __test the macro trend pressure indicators for 9 developed market (DM) currency areas and 13 emerging market (EM) countries__ with reasonably liquid interest rate swap markets. The developed areas are Australia (AUD), Canada (CAD), Switzerland (CHF), the euro area (EUR), the UK (GBP), Japan (JPY), Norway (NOK), Sweden (SEK), and the U.S. (USD). The emerging markets are Chile (CLP), Colombia (COP), Czech Republic (CZK), Hungary (HUF), Israel (ILS), India (INR), South Korea (KRW), Mexico (MXN), Poland (PLN), Thailand (THB), Turkey (TRY), Taiwan (TWD), and South Africa (ZAR). Some periods have been “blacklisted” for specific countries in accordance with tradability dummies on JPMaQS (view documentation here).

For all “small countries”, defined as all countries except the U.S. and the euro area, we also produce __hybrid macro pressure indicators, which are an equal average of the local macro pressure indicators and the G2 pressure indicators__. This concept reflects a widely recognized asymmetry: U.S. and euro area economic and financial developments strongly affect small countries’ rates markets, while small countries hardly affect G2. The target returns are volatility-targeted 2-year IRS fixed receiver returns (view documentation here). For some robustness checks, we also used vol-targeted 5-year IRS returns.

In the sections below, we evaluate macro trend pressure signals for directional strategies in the G2 (U.S. and euro area), directional strategies in 20 smaller markets, and relative cross-country strategies for all 22 currency areas.

## G2 directional macro trend pressure signal

### Panel correlation

The __basic hypothesis is that macro trend pressure is negatively related to subsequent IRS fixed receiver returns__. The reason is that excess growth, inflation, and credit supply trigger a shift to monetary policy tightening and nurture public inflation fears. These trends are not immediately and fully priced due to rational inattention (view post here). We test this hypothesis and the related strategy for the two largest fixed-income markets, the U.S. and the euro area, for the period 2000-2024 (May).

The below panel correlation analysis relates end-of month pressure information state to next month’s return. For the broad macro pressure indicator confirms the negative predictive relation and, more importantly, suggests that the __probability of this relation being systematic and not due to chance is near 100%__ for the sample period 2000-2024.

Negative one-month-ahead predictive power also prevails for the narrow macro trend pressure indicator and all three constituents, i.e., excess growth, inflation, and credit growth. The narrow macro trend pressure indicator posted a stronger correlation with subsequent returns than the broad. Meanwhile, the predictive power of excess credit growth has been weakest among the constituents, but the probability of the relation being genuinely systematic is still over 90%.

To gauge the intertemporal stability of the predictive relation, we can divide the sample into two sub-periods, 2000-2011 and 2012-2024. Panel correlation analysis shows negative predictive power for both, each with a statistical significance of above 98%. Other sample divisions show similar results. Intertemporal stability increases the likelihood that the relation is a persistent features of fixed income markets.

Negative predictive relation also prevailed in both the euro area and the U.S., albeit it was stronger in the euro area, possibly because of lower information efficiency and the signals’ disregard of core inflation and labour market data, which play a prominent role in the United States.

### Accuracy and correlation robustness

We check predictive accuracy and additional correlation metrics for (the negative values of) macro pressure with respect to subsequent 2-year IRS returns at a monthly and quarterly frequency. As for the panel correlation analysis, this relates end-of-period information states to the next period’s cumulative return.

__Monthly accuracy and balanced accuracy for broad and narrow macro pressure have been between 56-58%,__ depending on frequency. These are high hit ratios of market direction, particularly given that the signals had a short bias, whereas returns recorded a long bias. Most other pressure indicators’ accuracy rates scored above 50% as well, except for the relation between excess inflation and quarterly IRS returns. Possibly, CPI trends are closely watched by markets, and a lagged response to excesses over more than a month, taking the rational inattention argument too far. Moreover, the use of simple headline inflation as indicators may not be appropriate for the U.S., where the central bank targets a core PCE inflation rate.

Moreover, all parametric and non-parametric predictive correlation coefficients of all pressure indicators have been positive.

We also check the robustness of balanced accuracy with respect to exact specifications of the macro pressure argument and targets. Using balanced accuracy, we find pervasive predictive power as illustrated by the below summary map of this statistic, which uses both 2-year and 5-year IRS returns as targets. Again, the weak spot has been excess inflation.

Monthly accuracy has not been above 50% in all years, but only in a bit less than two-thirds of them. This points to seasonality of value generation. However, annual accuracies do not show a pattern of decay overtime but rather temporary dips around various financial crises.

### Naïve PnL metrics

Naïve PnLs based on the broad and narrow macro trend indicators have both shown __clear long-term upward drifts__. However, they also revealed pronounced seasonality, with fat years and lean years coming in longer epochs. Note that PnLs were scaled to 10% to facilitate presentation in graphs.

__The seasonality of return generation reflects the seasonality of macroeconomic opportunities__. Strong signals were recorded during periods of economic overheating and recession, while in less turbulent periods, the signals were small and often called for opposite directional exposure in the U.S. and the euro area. This type of seasonality is a plausible characteristic of a single-principle macro strategy that only trades two contracts.

__Performance metrics suggest economically meaningful alpha generation over the long-term__. The 24-year Sharpe ratios for both broad and narrow macro trend pressure have been above 0.7 and the Sortino ratios above 1, while correlation with the 10-year treasury bond returns (JPMaQS ticker USD_GB10YXR_NSA) and the S&P500 (USD_EQXR_NSA) returns have been near zero or slightly negative. Drawdowns have been sizeable. The largest peak-to-trough draw for the broad macro pressure signal has been more than 3 annualized standard deviations or 4 times the average annual return. While this is challenging for risk management it is not unusual for single-principle macro strategies. For comparison, the peak-to-trough drawdown of a risk-parity long-only book would have been 6 times the annual standard deviation.

Finally, we check the concentration of PnL generation. __The contribution of the 5% most profitable months to the overall long-term PnL has been 80% for the broad macro pressure signal__. This is another reflection of seasonality. It is however not exceptionally high for single-principle strategies. For example, for the long-only book, the 5% best months earned over 160% of the long-term PnL.

## Small countries’ directional macro pressure signal

### Panel correlation

The hypothesis is that macro trend pressure is also negatively related to subsequent IRS fixed receiver returns in the 20 small countries. However, for __smaller economies, it is not just their own economic trends and monetary policies that matter but also those in the G2.__ Therefore, our principal hypothesis is that hybrid (average G2 and local) macro trend pressure indicators should have the greatest directional predictive power. The sample period is 2002-2024 since too many countries have previously lacked signal or target return data.

Panel correlation analysis shows again a __negative and significant forward correlation between the hybrid macro signal and subsequent returns at a monthly frequency__. Linear correlation coefficients are smaller than for the G2 case. This is plausible since the actual relation between macro trends and monetary policy may be quite different creating much additional unexplained variation in the panel analysis. However, due to the increased number of observations, significance is generally very strong, with a nearly 100% probability of a systematic relation between the hybrid broad macro trend pressure signal and subsequent 2-year IRS returns.

Predictive correlation has been negative and significant for two sub-samples from the past 22 years. The probability of this relationship being non-accidental is nearly 100% for both sub-samples, indicating intertemporal stability.

Finally, the predictive correlation between the hybrid broad signal and 2-year IRS returns has been positive for almost all countries and metrics, with the only exception being the Kendall coefficient in India. Also, for 9 countries correlation coefficients have been significant, just based on their own history.

### Accuracy and correlation robustness

We check predictive accuracy and additional correlation metrics for the negatives of hybrid and country-specific macro pressure and subsequent 2-year IRS returns at monthly and quarterly frequencies,

__Accuracy and balanced accuracy have been between 53% and 56% of all macro pressure signals at monthly and quarterly frequencies__. As for the G2, most signals achieved these hit ratios despite their short biases and the positive bias of returns. Furthermore, all Pearson and Kendall predictive correlation coefficients have been positive.

Across the years, the monthly accuracy of the broad macro trend pressure signal with respect to 2-year IRS returns has been uneven but without clear signs of decay.

Balanced accuracy has been generally positive and robust to the choice of the IRS tenor as well as the prediction frequencies and versions of the macro pressure signal.

### Naïve PnL metrics

As for the G2, the naïve PnLs for the hybrid macro trend pressure signals of the 20 smaller countries displayed __a consistent long-term upward drift, but with even greater seasonality__. Judging by the broad macro pressure indicator, most value would have been generated in the run-up of the great financial crisis and in the way of the COVID-19 outbreak.

Again, the __seasonality of returns reflects partly the seasonality of macro signals,__ which posted their largest absolute values between 2004 and 2009 and between 2020 and 2023. During the 2010s, signals were rather weak and not uniform across countries.

The long-term __Sharpe and Sortino ratios for hybrid macro trend pressure signals have been around 1 and 1.5, respectively, with near zero or negative correlation to U.S. bond and equity markets__. This is serious value generation. However, the PnLs have also been very uneven, with the top 5% of all monthly PnLs accounting for nearly 100% of the profit. Furthermore, maximum peak-to-trough drawdowns exceeded 4 times the average annual standard deviation of the strategy PnLs. This illustrates that the __addition of countries does not necessarily reduce PnL seasonality for a directional single-principle strategy__. Fixed-income markets are strongly correlated internationally, and the diversification benefit does not strip out the handicap of lower country signal quality, given our simplistic one-size-fits-indicator calculation.

## All countries’ relative macro pressure signal

### Panel correlation

The above strategies managed directional IRS positions, i.e., exposure to long or short-duration risk. Generally, it is easier to derive directional signals of sufficient quality than relative signals. Signals that call for longs in some countries versus shorts in others need more exact indicators that are comparable across countries. The simple excess growth, inflation, and credit growth rates that make up our macro trend signal are a bit crude in this respect, as standards, data quality, and policy reaction functions are vastly different across countries. However, even in this simplistic form, a relative value strategy may offer significant benefits of diversification.

The __basic hypothesis is that the macro trend pressure in one country relative to a global (22 countries) basket is negatively related to subsequent relative IRS returns__ in that country versus those of a basket of all countries. This hypothesis is strongly backed by the rational inattention hypothesis since few people conscientiously follow relative growth, inflation, and – particularly – credit growth. Thus, our signals may be of low quality but great effect. We test this hypothesis for the period 2002-2024 for country-specific relative macro trend pressure indicators.

The panel correlation analysis reveals the expected negative predictive relationship on a monthly basis. The correlation coefficient is smaller compared to the case of directional relations, which is expected due to the coarseness of the signals. However, __the likelihood that this negative relationship is systematic is nearly 100%__.

All sub-components of the relative macro trend pressure indicator displayed negative predictive relations to relative IRS returns. But not all did so with high significance. In particular, the relative excess GDP growth indicator failed to convincingly reject the hypothesis of a relation by chance. This is plausibly a testimony to the heterogeneity in the quality and importance of real-time GDP growth indicators for policy. In some EM countries, timely growth estimation is of very low quality. Also, most central banks do not have explicit real economic targets.

The sub-period analysis supports the assumption of structural stability of the negative relation between relative macro pressure and relative IRS returns. The predictive relation was highly significant between 2002 and 2011 and between 2012 and 2014.

### Accuracy and correlation robustness

We check predictive accuracy and additional correlation metrics for the negatives of relative macro pressure indicators and subsequent relative 2-year IRS returns at monthly and quarterly frequencies.

__Accuracy and balanced accuracy have been above 50% for almost all pressure indicators and frequencies,__ except for the quarterly-frequency signals solely based on relative excess growth, where it just reached 50%. Also, correlation coefficients, whether parametric or non-parametric, have all been positive, except for the relative excess growth trend at quarterly frequency.

Across the years balanced accuracy statistics have been uneven, but without pattern of decay. Accuracy in 7 out of the last 10 years has been above 50%.

### Naïve PnL metrics

Naïve relative-value PnLs based on the broad and narrow relative macro trend indicators have shown long-term upward drifts. However, the performance of the broad indicator was much stronger, testifying to the importance of relative credit growth.

The uniformity and intensity of the relative signals have been a lot less seasonal than that of directional signals. Obviously, uniformity is suppressed by the formulation of relative signals. But even intensity, i.e., absolute values, have not been very uneven.

The long-term Sharpe and Sortino ratios of naïve PnLs based on a broad relative macro pressure indicator have been 1 and 1.5, respectively, without much correlation to U.S. bond and equity markets. This performance is comparable to the small-country directional strategy. __Seasonality has been less pronounced__, however, with the 5% best monthly PnLs only accounting for 60% of the long-term PnL. And even the maximum peak-to-trough drawdown would have been only two average annualized standard deviations.

However, the performance of the narrow relative macro pressure signal would have been much worse, with a Sharpe of between 0.3 and 0.4. This lack of robustness probably reflects signal calculation quality rather than a faulty principle, since simple inflation and growth differentials are indeed not easily comparable across countries.

## Conclusions from the evaluation

Macro trend pressure indicators based on excess GDP growth, inflation, and credit growth have __pervasive predictive power for directional and relative 2-year IRS returns__, in line with the basic hypotheses of Taylor rules and rational inattention. Relations have been statistically significant for a panel of 22 countries. Accuracy statistics suggest that standard excess macro trends mostly get the direction of duration returns right, although their short bias fails to capture some premium from long-duration positions. Alpha generation has been material, albeit highly seasonal, and, in the case of directional strategies, concentrated on years of significant macro fluctuations.

The __macro pressure factor seems valuable for strategic allocation for duration risk across countries__. Its seasonality calls for combination with other principles, and the low conceptual and empirical quality of some relative macro pressure indicators calls for a conceptual improvement, for example, by considering differences in economic fluctuations across countries.