Regression-based statistical learning helps build trading signals from multiple candidate constituents. The method optimizes models and hyperparameters sequentially and produces point-in-time signals for backtesting and live trading. This post applies regression-based learning to macro trading factors for developed market FX trading, using a novel cross-validation method for expanding panel data. Sequentially optimized models consider nine theoretically valid macro trend indicators to predict FX forward returns. The learning process has delivered significant predictors of returns and consistent positive PnL generation for over 20 years. The most important macro-FX signals, in the long run, have been relative labor market trends, manufacturing business sentiment changes, relative inflation expectations, and terms of trade dynamics.

The post below is based on Macrosynergy’s proprietary research.

A Jupyter notebook for audit and replication of the research results can be downloaded here. The notebook operation requires access to J.P. Morgan DataQuery to download data from JPMaQS, a premium service of quantamental indicators. J.P. Morgan offers free trials for institutional clients. Also, there is an academic research support program that sponsors data sets for relevant projects.

This post ties in with this site’s summary of macro trends and financial returns.

Also, there is an academic research support program that sponsors data sets for relevant projects.

## The case for regression-based learning

Regression-based learning can be used to combine candidate constituents (or features) into single signals for trading strategies. The method __chooses models and hyperparameters sequentially and produces related point-in-time signals__ that are suitable for realistic backtesting and directly applicable to live trading. The basics of the method have been introduced in a previous post (Regression-based macro trading signals).

Regression-based learning is __particularly useful if one wishes to consider a broad set of conceptually different and potentially complementary signal constituent candidates__. This is often the case for macroeconomics-based trading strategies, as the state of the economy and monetary policy is dependent on a range of relevant forces.

The benefits of the regression-based approach are convenience and simplicity. Regression based learning automatically assigns weights to the signal constituent candidate scores based on past statistical relation and – for some regression methods, such as non-negative least squares and elastic net –removes candidates that have insufficient or implausible explanatory power. Meanwhile, the linear form of the standard regression models and the familiar interpretation of its coefficients allow simple inspection of the learning process, enhancing transparency.

## Plausible macro trading indicators for FX

The focus of this post is on macro trading signals for directional FX forward positions in seven “smaller” developed countries versus their natural benchmarks. In particular, we aim at predicting returns in 1-month FX forward positions in all developed markets’ currencies, excluding the G3 (U.S, euro area, and Japan):

- the Australian dollar (AUD), the Canadian dollar (CAD), and the New Zealand dollar (NZD), all versus the U.S. dollar,
- the Swiss franc (CHF), the Norwegian krone (NOK), and the Swedish krona (SEK), all versus the euro, and
- the British pound (GBP) versus an equally-weighted basket of U.S. dollar and euro.

For the analysis below, we focus on __volatility-targeted 1-month FX forward positions, which are rolled and rebalanced at a monthly frequency__. For more details on the generic return calculation, view documentation here.

There is a broad range of candidate economic data that are plausibly related to the performance of developed market currencies, __measuring the competitiveness of the economy, the attractiveness for capital flows, and the outlook for monetary policy__. The main argument for the predictive power of such information with respect to future returns is the economic theory of *rational inattention*, which argues that market participants are unable to continuously process all relevant information and – rationally – set priorities, simplifying the world into a small set of indicators. The theory has been explained in a previous post (Rational inattention and trading strategies). __Economic trends and states are too numerous to track in real-time and are typically less carefully monitored than market price and flow data__.

To ascertain the predictive power of recorded economic developments, we need point-in-time data on information states of the market. They are available from the *J.P. Morgan Macrosynergy Quantamental System* (JPMaQS). This data service provides macro-quantamental indicators, i.e., macroeconomic information states designed for the development and backtesting of trading strategies. __An information state is the latest instance of an economic indicator based on the data vintage available on a given day__. This type of indicator is conceptually free from all look-ahead bias. In JPMaQS, quantamental indicators are usually produced in similar form for a broad range of countries. These panels are called *quantamental categories*.

For the foreign exchange space, JPMaQS contains a particularly wide array of quantamental categories. In fact, there are too many to consider in a single post. Therefore, here we focus on a reduced set of quantamental categories that correspond to data and news that are commonly watched in discretionary currency trading:

**Relative excess GDP growth trends**: Conceptually, this is the difference between current estimated GDP growth trends in the local currency area and the benchmark currency area. Generally,__a positive growth differential versus the benchmark currency area supports positive returns on the local currency, as it is indicative of greater competitiveness and tighter monetary policy__. The basis of the indicator is the JPMaQS category group “*intuitive GDP growth estimates*,” real-time estimated recent GDP growth trends (% over a year ago, 3-month average) based on regressions that use the latest available national accounts data and monthly-frequency activity data. The indicator mimics common methods of market economists. View the full documentation here.**Manufacturing confidence changes**: These are changes in normalized headline measures of local-currency area manufacturing business sentiment. Generally,__improving confidence in tradable goods industries signals stronger competitiveness and is a leading indicator of better news flow__. The basis of the indicator is the JPMaQS category group “*manufacturing confidence scores*,” real-time standardized and seasonally adjusted measures of manufacturing business confidence and their changes based on one or more surveys per country and currency area. View the full documentation here. For the below analysis, we consider an average of 3-month-over-3-month changes and 6-month-over-6-month changes in survey scores.**Relative unemployment trends**: These are recent changes in seasonally adjusted unemployment rates in the local economy versus the benchmark currency area.__Relative tightening of the labor market in the local currency area bodes for relative tightening of monetary policy__. The basis of the indicator is the JPMaQS category group “*labor market dynamics*,” which contains real-time measures of changes in employment and unemployment. View the full documentation here. Here, we use an average of 3-month-over-3-month, 6-month-over-6-month, and over a year ago changes in seasonally adjusted unemployment rates.**Relative excess core CPI inflation**: Conceptually, this indicator measures the excess of local core inflation versus the excess of core inflation in the benchmark currency area. Excess here means relative to the central bank’s inflation target.__If local inflation exceeds its target more than in the benchmark currency area, it is more likely that the local central bank will run a tighter monetary policy and tolerate currency strength__. The basis of the indicator calculation is the JPMaQS categories “*Consistent core CPI trends*” (view documentation here) and “*Inflation targets*” (view documentation here). Specifically, we use an average of 3-month-over-3-month, 6-month-over-6-month, and over a year ago annualized changes in core CPI and subtract from this the currency area’s effective inflation target.**Relative excess GDP deflators**: This is the difference between the broad excess GDP deflator trend in the local economy and the benchmark currency area. Excess means relative to the inflation target.__Higher local currency area excess deflator growth indicates both pressures for tighter monetary policy and improving terms of trade.__GDP deflator trends are part of the JPMaQS category group “*Producer price inflation*.” Within that group are indicators of estimated economy-wide output price growth based on standard econometric (“technical”) estimates that use GDP deflators as targets but are nowcast based on higher-frequency price information. View full documentation here. Here, we estimate the economy-wide estimated output price growth as % over a year ago in 3-month moving averages.**Relative inflation expectations**: This is the difference between estimated inflation expectations in the local economy and the benchmark currency area.__Higher relative inflation expectations across similar economies typically translate into tighter monetary policy and greater tolerance for currency strength__. Here, inflation expectations refer to formulaic estimates of the JPMaQS category group “*Inflation expectations (Macrosynergy method)*.” View full documentation here. Here, we use an average of 1-year, 2-year, and 5-year inflation expectations.**Relative real interest rates**: This is the difference between the real short-term interest rate in the local market and in the benchmark currency area.__Higher real interest rates are indicative of higher implied central bank subsidies (or lower penalties) and higher risk premia__. The real short term rates are taken from the JPMaQS category “*Real interest rates*”. In particular, we use the main 1-month money market rate adjusted for formulaic inflation expectations. View full documentation here.**International liability trends**: Conceptually, these are changes in recorded external financial liabilities of residents to non-residents over the medium term.__A rapid increase in liabilities indicates large past capital inflows that may be hard to sustain and, hence, increases the risk of subsequent currency weakness__. The basis of the indicator calculation is international liabilities as % of GDP from the JPMaQS category group “*International investment position*.” View documentation here. Here, we look at trends in information states of liabilities versus the past two years and five years in percentage points of GDP and take an average.**Terms-of-trade dynamics**: These are changes in export prices relative to changes in import prices.__Improving terms of trade often precede economic outperformance, capital inflows, and positive economic news, all of which support currency strength__. The basis of terms-of-trade dynamics here is the commodity-based terms-of-trade dynamics of the JPMaQS group “Terms-of-trade.” Commodity-based terms of trade use commodity trade and price data alone and can be consistently updated in real-time at a daily frequency. View full documentation here.

The above categories have been chosen and calculated based on theoretical priors, not statistical optimization. For convenience, __all constituent candidates are given the “right sign” that makes their theoretically expected predictive direction positive__. Moreover, all signal constituent candidates are sequentially __normalized around their zero value__, i.e., each indicator is divided by the standard deviation of a panel of the indicator in the past, avoiding a look-ahead bias. We call the normalized data s*ignal constituent candidate scores*.

Correlation across the candidate scores has been modest for most pairs but with notable exceptions, such as the correlation of relative inflation expectations and relative core CPI inflation (positive) and of relative real interest rates and relative inflation expectations (negative). This illustrates that __even if signal constituent candidate scores are conceptually different and, hence, separation is valid, statistical correlations can large both on the negative and positive side__. Regression-based learning naturally takes the correlation of features into account. This approach is valid if we believe that past statistical relations are a good guide for the future.

## A simple regression-based learning approach

To combine the nine candidate scores we use a simple sequential regression-based statistical learning process. Starting history with a minimum sample (of 3 years for two cross sections), we determine at the end of each month an optimal regression model from a grid of variations through cross-validation. Then, __a signal is derived for the next month as the regression-based forecast using the concurrently optimal model version__. This means signals vary over time not only because economic indicators change but also because model versions and parameters change. The sequence of the optimized signal is a valid basis for the out-of-sample final evaluation of the overall learning process, based on statistical criteria and a naïve stylized cumulative profit and loss time series (PnL).

In practical terms, this approach can be implemented in the following steps, as exemplified in the accompanying Jupyter notebook:

**Transform features and targets into suitable formats for scikit-learn**: It is important to remember that the__features and targets of the analysis are panel data, i.e., two-dimensional datasets that contain one category across time and relevant countries__of currency areas. These data are transformed into a (pandas) data frame for all features (X) and a double-indexed series of targets (y). The index dimensions are cross-section (currency area) and time. Periodicity is downsampled, here from daily to monthly, by using the latest value of the features and the sum of the targets. Finally, the features are lagged by one period.**Define appropriate cross-validation splitters**: Cross-validation methods for panels have been summarized in a previous post (Optimizing macro trading signals – A practical introduction). Here, we use the training data splitting of the RollingKFoldPanelSplit class, which instantiates splitters of temporally adjacent panel training sets that can border the test set from the past and future. An illustration of this method is shown below.

Unlike in previous posts, we implement a novel version of this splitter here, which sets the number of splits in accordance with the overall length of the panel. This means that in sequential optimization, the number of splits used in cross-validation increases over time. This illustrates an important feature of regression-based learning: under structural stability of relations, the quality of statistical learning increases with the expanding dataset.

**Define an appropriate score for cross-validation**: Here, we simply choose the conventional R2 metric to compare model versions in cross-validation.**Definition of candidate models and hyperparameters**: These are collected in two Python dictionaries of regression models and their hyperparameter grids. These can then be passed on to the appropriate scikit-learn classes and methods or their Macrosynergy package wrappers. The specific grids used are described below.**Run sequential model optimization and signal generation**: This uses the `SignalOptimizer` class of the learning module of the Macrosynergy package. It mainly serves as a wrapper of standard scikit-learn cross-validation that respects the panel structure of the underlying data.

Finally, the optimized signals are evaluated using standard evaluation functions of the Macrosynergy package. For details and replication, see the related Jupyter Notebook.

As benchmarks for signal evaluation, we used both a “long-only” portfolio of small country FX forwards, i.e., risk parity longs in the small countries’ currency versus the U.S. dollar and the euro, and a *conceptual parity strategy*. __The conceptual parity signal simply averages all signal constituent candidate sores__. This requires no estimation, but if the set of candidates is chosen with good judgment, conceptual parity is typically a high benchmark. Unlike statistical learning, which relies on theoretical plausibility, it is widely diversified, and the only source of signal variation is the underlying data, not changes in model parameters or hyperparameters.

## Empirical findings of simple least-squares regression learning

For a simple OLS learning process, the small model and hyperparameter grid consider linear regression with two hyperparameter decisions:

**Inclusion of a regression intercept:**Conceptually, the neutral level of all (mostly relative) signal constituent candidates is zero. Hence, the regression intercept is presumed to be zero, albeit that may not always be exact, and some theoretical assumptions may have been wrong.**Non-negativity constraint**: This offers the option of non-negative least squares (NNLS) rather than simple OLS. NNLS imposes the constraint that the coefficients must be non-negative. This restriction benefits by allowing for the consideration of theoretical priors on the direction of impact, reducing dependence on scarce data.

Inspection of the learning process shows that over time, the preferred model becomes the most restrictive one. Since 2016, regression without intercept and non-negative constraint has been chosen. This tendency has been shown in previous case studies and reflects the frequently __poor bias-variance trade-off in macro trading models__: model flexibility reduces potential misspecification comes at a high price of enhanced variance, i.e., sample-based variation of models and forecasts.

The bar chart below shows the influence of various constituent scores on signal prediction over time. To be precise, the colored bar sections refer to the average annual regression coefficients of all signal constituent candidate scores considered. Since the scores are normalized, the coefficients are valid measures of the importance of the constituents. While coefficients fluctuated a lot in the early parts of the sample, they stabilized in the 2010s.

Over the past ten years, four conceptual candidates dominated the signal:

- relative unemployment trends,
- manufacturing confidence changes,
- relative inflation expectations, and
- terms-of-trade dynamics.

This does not mean other signal constituent candidate scores should be disregarded going forward. The influence of different macro trends can come with long seasonality. For example, relative real interest rates and the dynamics of international liabilities played important roles during the times of the great global FX carry trade and the great financial crisis before losing statistical power thereafter.

Two other observations are important:

- First,
__coefficients can be quite changeable and large in size in short samples since specific events and trends heavily influence them__. - Second,
__the size of coefficients decreased over time__. Individual constituent coefficients may find it harder to fit longer samples with diverse experiences than episodes of only a few years. Since regression-based signals typically only consider prediction value size, not statistical quality, there is a danger that the declining coefficients will result in an unwanted decline in signals over time.

Compared to conceptual parity, OLS learning-based signals posted greater fluctuations in the early parts of the sample periods. Overtime variability diminished and converged for many countries, which is another indication of a “declining signal” problem.

Panel-based regression confirms the significant positive predictive power of the OLS learning-based signals with respect to subsequent weekly, monthly, and quarterly FX returns. The Macrosynergy panel test assigns above 95% probability of a non-accidental relation for all frequencies.

__Monthly accuracy and balanced accuracy, i.e., the average correct prediction of positive and negative returns, have been near 54%.__ Both Pearson and Kendall (non-parametric) forward correlation has been significantly positive and higher than for the conceptual parity signal.

As for other use cases of quantamental trading signals, we calculate *naïve PnLs*. These are based on monthly position rebalancing in accordance with the optimized signals for all seven currencies, normalized and winsorized at the end of each month. The end-of-month score is the basis for the positions of the next month under the assumption of a 1-day slippage for trading. The naïve PnL does not consider transaction costs, risk management, or compounding. For the chart below, the PnL has been scaled to an annualized volatility of 10%.

__The long-term naïve Sharpe ratio of the learning-based strategy from September 2003 to April 2024 has been above 0.5, and the Sortino ratio near 0.8__. These two performance ratios have been slightly better than those of the conceptual parity strategy. Moreover, the value generation of the learning-based has been more consistent over time. Unlike conceptual parity, the learning-based PnL continued drifting up in the 2020s. Moreover, the correlation of the learning-based PnL to the S&P500 and the 10-year U.S. treasury bond has been near zero.

The learning-based signal’s performance characteristics are encouraging. One should consider that the strategy only has seven markets to trade in, some of which are highly correlated. Also, the strategy only uses macro signals without considering market prices and volumes. Position changes are infrequent and gentle.

However, the naïve PnL of the regression-based learning signal also illustrates a flaw of this simple process. __As regression coefficients of optimal OLS models declined over time, so did the absolute values of the signals.__ As a result, PnL generation flattened, even though the quality of the signal increased over time. This suggests that the simple regression-based prediction, without consideration of statistical power and significance of the underlying model coefficients, is insufficient as a determinant of position sizes.

## Other regression-based learning processes

Beyond simple OLS regression-based learning, we consider three other types of regression-based learning processes. Principally, these processes are similar to the one described above, except that they use different models and hyperparameter grids:

**Regularized regression-based learning**: Regularization aims at reducing the generalization error of regression models by__adding penalties in accordance with the size of the coefficients__. This can mitigate overfitting. Here the learning algorithm chooses from a range of elastic net regressions with varying emphasis on L1 versus L2 regularization and different penalties for coefficient size, as well as simple OLS regression. Both elastic net and OLS can have intercept exclusion and non-negative constraints.**Sign-weighted regression-based learning**: Sign-weighted least squares__equalize the contribution of positive and negative target samples__to the model fit. The learning process can choose between sign-weighted and ordinary least squares, both with intercept exclusion and non-negativity constraints.**Time-weighted regression-based learning:**Time-weighted least Squares__allow prioritizing more recent information in the model fit__by defining a half-life of exponential decay in units of the native dataset frequency. The learning process can choose between time-weighted and ordinary least squares, with the former choosing among half-lives of exponential lookback windows between 12 and 240 months.

All learning processes have produced signals with a positive correlation with subsequent monthly or quarterly returns. Also, most of the relationships have been significant, with over 90% probability. The only exception has been learning based on regularised regressions, which posted below 90% significance values for monthly and weekly frequency.

Accuracy and balanced accuracy values at a monthly frequency have been in a range of 53-55% for learning-based signals, a bit higher than for conceptual parity (below 52%). Also, Kendall’s monthly forward correlation coefficients have been highly significant for all learning processes.

Naïve PnLs have mostly been positive, with consistent value generation. Sign-weighted learning and time-weighted learning have produced similar performance metrics as simple OLS-based learning, with Sharpe ratios between 0.5 and 0.6 and Sortino ratios between 0.7 and 0.8. The correlation of PnLs with bond and equity benchmarks has been marginal. The elastic net process underperformed.

The underperformance of elastic net methods may not be accidental. Shrinkage-based regularization reduces model coefficients, making them less sensitive to specific data samples. The idea is that introducing some bias into the regression can help reduce the variance in predictions made. Whilst theoretically sound, this, in practice, leads to more frequent model changes, as the appropriate type and size of penalties must be estimated based on the scarce data. This __accentuates a source of signal variation that is unrelated to macro trends__.