Home » Research Blog » How to build a macro trading strategy (with open-source Python)

How to build a macro trading strategy (with open-source Python)

This post is a condensed guide on best practices for developing systematic macro trading strategies with links to related resources. The focus is on delivering proofs of strategy concepts that use direct information on the macroeconomy. The critical steps of the process are (1) downloading appropriate time series data panels of macro information and target returns, (2) transforming macro information states into panels of factors, (3) combining factors into a single type of signal per traded contract, and (4) evaluating the quality of the signals in various ways.
Best practices include the formulation of theoretical priors, easily auditable code for preprocessing, visual study of data before and after transformations, signal optimisation management with statistical learning, and a protocol for dealing with rejected hypotheses. A quick, standardised and transparent process supports integrity and reduces moral hazard and data mining. Standard Python data science packages and the open-source Macrosynergy package provide all necessary functionality for efficient proofs of concept.

The post below is based on Macrosynergy’s proprietary research and accompanies the release of the 1.0 version of the Macrosynergy package by Samuel Andresen, Eric Brine, Rushil Gholkar, and Palash Tyagi.

See also the Macrosynergy package documentation site.

Macro strategies and the importance of proof of concept

The term “macro strategy” here refers to a trading strategy that systematically uses information on the macroeconomy, as opposed to company information, trader positioning, or short-term price anomalies. Macroeconomic information can be inferred from market prices (indirect approach) or information states of reports on economic activity (direct approach). If the latter is used, we can call it a “macro-quantamental strategy”. In this post, we assume that signals are at least partly “macro-quantamental”. There are two types of macro-quantamental strategies:

  • Feature-based strategies focus on a single macroeconomic concept, such as growth or inflation changes and apply a point-in-time data series to trade one or more financial contracts systematically. For example, in a previous post, we show that a single proxy of the state of the business cycle can serve as a trading signal for equity, fixed income and foreign exchange strategies (view post here).
  • Target-based strategies focus on a single class of financial market position, such as equity index futures or sovereign credit default swaps and apply a set of plausible conceptual macro factors for systematic trading. For example, multiple conceptual macro factors can support the trading of FX forwards in developed and emerging markets, diversifying performance across both factors and currencies (view post here).

The proof of concept of a macro-quantamental strategy is conclusive empirical evidence that a trading signal and the underlying method of its creation would have delivered significant predictive power and material risk-adjusted returns. This proof is critical for deciding if an idea is worth the allocation of capital and the time and money that is required to set up a trading algorithm (or trader support tool). A valid proof of concept typically requires that we proceed in three steps:

  • We formulate theory based on logical reasoning for the predictive power of the particular macro-information upfront. This can be anything from a plausibility argument to a formal mathematical model. Usually, model size is not a measure of quality, and complications are not a recipe for success.
  • We compute time series of theoretical signals and their constituents and check their properties. Typically, this involves finding, transforming and combining macro data. These steps are explained below. The critical point is that the statistical characteristics of the data, such as statistical distribution and autocorrelation, are consistent with the theory that we want to implement.
  • Finally, we assess the quality of the signal through a broad evaluation of its predictive significance, predictive accuracy, and value generation in backtests. This assessment can initially be based on simplified trading signals, avoiding the complications of trading patterns and risk management rules.

A valid proof of concept needs integrity. It is easy to let the above three-stage process degenerate into data mining by adjusting theory and signal calculation to optimise signal quality metrics. Alas, statistics of predictive relations and backtested PnLs that are derived in this way have no value. Instead, they mislead risk allocation. To support integrity, the process for proofs of concept should be standardised, quick, and auditable. Long projects invite attachment to results and the temptation to torture the data for false evidence. The practices explained in this post place great emphasis on cutting costs and development time, thereby reducing attachment to projects and encouraging integrity.

A specialised open-source Python package

A proof of concept of an institutional macro trading strategy can be delivered in a standard Python environment using a handful of popular data science packages, such as pandas, matplotlib, and scikit-learn, and the specialized Macrosynergy package.

The Macrosynergy package largely works on top of standard Python data science packages and provides classes and functions for the analysis and transformation of quantamental indicators, as well as for developing macro-quantamental signals and testing success with respect to predictive power and stylized PnL generation. Functionality is tailored to the formats and conventions of the J.P. Morgan Macrosynergy Quantamental System (JPMaQS), i.e., daily time series of information states across multiple markets or countries. Principally, the functions can also be applied to market and alternative data in the same format.

The Macrosynergy package shifted to version 1.0 in November 2024. It can be found on GitHub (view repository here). It has a stable updating documentation site (view documentation here) that is maintained by the Macrosynergy developer team. The package contains seven sub-packages:

  • The download subpackage contains functionalities for downloading JPMaQS data from the JPMorgan DataQuery API in a convenient pandas format.
  • The panel subpackage contains various functionalities for running calculations, analyses, and visualizations on JPMaQS Quantamental data frames, operating on time series panels of multiple markets or countries.
  • The learning subpackage contains functions and classes to assist the creation of machine learning solutions with macro quantamental data. Currently, the functionality is built around integrating quantamental data formats and scikit-learn.
  • The signal subpackage contains various functionalities for analysing, visualising, and comparing the relationships between panels of trading signals and panels of subsequent returns.
  • The pnl subpackage contains classes and functions to translate trading signals and generic daily return series into proxy profit-and-loss time series, together with visualizations and evaluation statistics.
  • The management subpackage contains core utilities and functions, such as data frame operations, data simulations or data validations, that are used in other parts of the package or as stand-alone convenience functions.
  • The visuals subpackage contains functions for visualising quantamental data. It is built around the ability to create quick generic plots of data and provide a framework for developing custom plots.

Understanding and finding macro-quantamental data

Generally, macro trading signals rely on price data, flow data, alternative data, and macro-quantamental data. The first three have been staples of systematic trading for some time. This post focuses more on macro-quantamental information, which has traditionally been the domain of discretionary trading and only recently began transforming systematic macro trading (view post here). It is essential to clarify some terminology first.

  • A macro-quantamental indicator is a metric that combines quantitative and fundamental analysis of the macroeconomy, such as an inflation trend or a government balance-to-GDP ratio. Values must be point-in-time information states, i.e., reflect that state of public knowledge at a timestamp. Here, public knowledge does not mean that everybody actually did know but that everybody could have known what the data said.
  • A macro-quantamental category is a time series panel of a macro-quantamental indicator for a set of countries and markets. For example, real-time annual GDP growth estimates across countries constitute a category. Validation of trading strategy principles often relies on categories rather than single indicators to enhance, drawing on the diverse experiences of multiple countries.
  • A macro-quantamental factor is a combination of macro-quantamental indicators or categories. The combination is presumed to serve as a predictor of financial contract returns. For example, growth rates of various measures of economic activity may be combined into an “excess activity growth” factor, which is relevant to the performance of fixed-income or equity markets.
  • Finally, a macro-quantamental signal is a combination of macro-quantamental factors and possibly other types of factors. The signal is designed to govern risk and positioning in a specific market. For example, an excess activity growth factor and an excess inflation factor in conjunction with real interest rates may be combined into a fixed-income trading signal.

At present, the primary source of macro-quantamental indicators for financial market research is the J.P. Morgan-Macrosynergy quantamental system (“JPMaQS”.) Although JPMaQS is a commercial service, conditions are designed such that costs do not impede the active development of quantamental signals. All institutional clients of J.P. Morgan with DataQuery credentials have free access to quantamental data, excluding the most recent months. The free data are typically sufficient to establish proof of concept. Moreover, institutions can request a free JPMaQS trial for the full data set. Finally, there is also special access to academic projects. Once a proof of concept is delivered, full JPMaQS access will allow the strategy to operate reliably. Subscription costs depend upon scope but are usually small compared to the PnL benefits of institutional macro strategy portfolios.

JPMaQS currently contains about 950 quantamental data categories. They are collected in 78 category groups, and the groups are filed under six “themes”: economic trends, macroeconomic balance sheets, financial conditions, shocks and risk measures, stylised trading factors, and generic returns. The latter theme actually contains approximated daily returns of a diverse range of financial derivatives and cash contracts structured in the same format as proper quantamental data to facilitate detecting predictive relationships.

The easiest way to import quantamental categories and target returns into Python is through category tickers. Combined with cross-section identifiers (standard currency symbols), they form indicator tickers that can be downloaded quickly. We can find macro-quantamental categories in one of two ways:

  • Browsing: The section “Quantamental indicators on JPMaQS” of the Quantamental Academy provides links to “books of Jupyter notebooks”, ordered hierarchically by themes and then category groups. Each category group notebook contains a top section with labels, definitions, and, most importantly, tickers for downloading. For example, if we need information on the states of various manufacturing business surveys, those can be found in the theme macroeconomic trends, by scrolling the left-hand sidebar to manufacturing confidence scores. The Jupyter notebooks also contain empirical visualisations, methodological annexes, and examples of empirical predictive relationships.
  • Search bar: The “Quantamental indicators on JPMaQS” section also contains a search bar (“Search themes”) on top of the browsable windows. Entering the label of a category will bring up the documentation for all related concepts in Jupyter notebooks.

Downloading and updating macro-quantamental data

Downloading and regularly updating quantamental indicators and target returns can be managed through the JPMaQSDownload class. Instantiation of the class requires DataQuery client credentials. Actual downloads are executed using the class’s download method. Beyond optional parameters that limit the scope of the download, this method requires three types of information:

  • the tickers of all categories required for building quantamental factors,
  • the tickers of all target return categories that are considered, and
  • the cross-sectional identifiers of the markets that are considered are mostly standard currency symbols (“AUD”, “BRL”, etc.)

Having collected the above pieces of information in Python lists, we can combine them into a single list of quantamental indicators tickers (“<cross-section>_<category_ticker>”). This list is passed to the download method, which then downloads data from the DataQuery API as time-series JSONs and then converts and wrangles the JSONs into standardized pandas data frames or “quantamental data frames”. These “long” data frames have at least four columns containing the cross-section identifier (“CID”), an extended category ticker (“xcat”), a real-time date (“real_date”), and the actual indicator value (“value”). Other potentially useful columns contain the quality grade of an observation (“grading “), the lag to the end of the observation period in days (“eop_lag”), and the lag to the median of the observation period (“mop_lag”).

The download process is exemplified in all Jupyter notebooks on the Quantamental Academy, including the introductory code for Trading Strategies with JPMaQS. Regular re-downloading of JPMaQS data is recommended, even before putting strategies in production. This is not just to capture the new observations but also the potential inclusion of older vintages as the system’s archaeological dig for historic information progresses.

Visualising features and target returns

The visual study of indicators, factors, signals, and targets is one of the most important and underestimated practices in strategy development. It is not just critical for our intuitive understanding. Visualisation is also the most important legitimate part of “learning from the data”, unlike data mining or hindsight.

Studying the properties of data categories and their transformations helps us to address one important question: “Do the data have basic properties that I would expect them to have according to my investment hypothesis?”. In other words, we can check if actual data matches theoretical concepts and, where necessary, make adjustments rather than rushing the process by immediately testing for predictive power and PnL generation. Three types of simple visualisations are beneficial and can be quickly executed in a standardized way:

  1. The visualisation of historical distributions of indicator panels in an efficient, standardised form is supported by the view_ranges. It plots important aspects of their distribution for multiple cross-sections and for one or more categories. Application examples can be viewed here. Key issues to check are the following:
    • The balance of distribution may reveal an unexpected bias towards long or short positions, which may indicate the need to set a more realistic neutral level.
    • Large cross-sectional differences in variation may reveal an unexpected concentration of factor and signal values in one country, which may indicate the need for better scaling across sections.
    • The pattern of outliers may reveal a proclivity to extreme values in both directions (“high kurtosis”) or in one direction (“skewness”), which may indicate the need for winsorization (“capping” or “flooring”) of extreme values to contain the influence of distortions and to avoid inordinate intertemporal risk concentrations.
  1. The visualization of indicator timelines across sections is delivered in a standard format by the view_timelines function. It displays panels of timelines across sections, potentially across indicators. Application examples can be viewed here. Importantly, these timeline facets inform on the speed of factor adjustment and the frequency of position flips, both of which are critical for transaction costs. Many quantamental factors imply gentle position adjustment, but some factors that are based on short-term changes in information states may produce large daily or weekly position changes (view post here). The point is to check if the volatility of factors matches expectations with respect to the desired characteristics of a strategy.
  2. Finally, the visualisation of correlations of categories or across sections is quickly executed through the correl_matrix function. Application examples can be viewed here. The correlation matrices inform on two important aspects of diversification: diversification across countries or markets and diversification across factors. High correlation across countries and factors means that, at least historically, performance has been dominated by a single global factor.

These visualisations and checks should be applied at all stages where new categories are downloaded or calculated. They should also be applied to target returns, checking the distribution of returns, concentration of performances across time, and diversification benefits of positioning across countries or markets.

Transforming quantamental categories into quantamental factors

Macro-quantamental categories are like Lego blocks rather than ready-to-deploy factors. Typically, we must transform and combine categories to arrive at factors and signals. The Macrosynergy package supports transformations with various convenience functions that operate transparently on panels rather than individual indicators. In practice, transparency and reliability often trump complexity.

  • General operations on category panels can be managed by the panel_calculator function of the Macrosynergy package. It uses an easily readable argument, i.e., a formula in text format, to execute transformations for full panels. This function is very flexible and saves a lot of code. Beyond simplification, the main benefit is that operations are easily auditable, reducing gross errors when calculating factors and signals. The text string that governs the operation can contain mathematical operations and Python operations that apply to a pandas panel data frame, i.e., a data frame with time as a row index and cross-sections as a column index. Examples and details of this “workhorse function” can be viewed in the related section of the Macrosynergy package tutorial notebook.
  • The make_zn_scores function specializes in the time-consistent normalization of categories of information states. This type of transformation is particularly important for modifying the distribution of data panels and is a standard preparatory step before summing or averaging categories with different units or orders of magnitude. The function computes z-scores for a category panel based on sequential updating mean absolute deviations and a specified neutral level that may be different from the mean. The term “zn-score” refers to the normalized distance from a neutral value. Setting the right neutral value for quantamental factors is crucial to avoid undue long or short biases. Examples of the application of this function can be viewed in a related section of the Macrosynergy package tutorial notebook.
  • Finally, the calculation of relative category values across countries or markets can be delegated to the make_relative_value. It generates a data frame of relative values for a given list of categories. In this case, “relative” means that the original value is compared to a basket average. The basket can be a set of cross-sections of the same category. By default, basket averages do not require the full set of cross-sections to be calculated for a specific date but are always based on the ones available at the time. This may change the characteristics of relative series over time but preserves valuable information and efficiently uses imbalanced panels. Application examples can be viewed in a related section of the Macrosynergy package tutorial notebook.

For clarity of analysis and usage of statistical learning, it can be advantageous to define all trading factors such that their presumed predictive relation with target returns is positive. For example, this helps filter our factors whose past direction relation has been contrary to theory, for example, by use of non-negative least squares.

Combining macro-quantamental factors into signals

Using economic theory and conceptual parity

Most macro-quantamental strategies combine multiple factors into a single trading signal. This partly reflects that the influence of a single macro concept can be quite seasonal, i.e., concentrated on certain time periods. For example, inflation plausibly influences equity markets mostly when it is far from the central bank’s target and triggers a significant policy response. Combining diverse macro-quantamental factors into one signal broadens the origin of value generation and, typically, reduces the seasonality of PnLs.

There are two basic ways to combine conceptually different factors: mathematical operations and statistical learning. This sub-section deals with the mathematical operations. They come in two flavours: logical combinations and conceptual parity. Both methods are simple but often quite robust out-of-sample, as they rely on plausibility and logic rather than specific past experiences.

  • Logical combinations rely on economic theory. For example, if we assume that excess economic growth and inflation both influence interest rates and that the influence of inflation is twice as strong as the one for growth, our factor is just “1/3 x growth + 2/3 x inflation”. For data panels, these operations are best done using the panel_calculator
  • Conceptual parity combines panels of factors by normalization and subsequent averaging. Thus, after adjusting factor values by standard deviations from neutral levels, this method gives each concept the same weight. For example, if inflation and credit growth are both viewed as individually relevant predictors of bond returns, conceptual parity would give each the same importance in the signal calculation.
    Although this approach is simple, it is also quite robust to economic change and impedes implicit or explicit hindsight bias. Success mainly relies on clearly separating different types of economic influences. For example, inflation and growth are typically separate influences on markets, even if, over a specific stretch of history, they were correlated. By contrast, different types of CPI inflation rates are not separate concepts, even if their correlation is imperfect. If there are strong theoretical priors that one concept is more important than another, conceptual parity can use weighted averages.

Conceptual parity calculations are efficiently managed with the make_zn_scores and  linear_composite functions of the Macrosynergy package. The latter is designed to calculate linear combinations across categories under clearly stated rules. It can produce a composite even if some of the categories are missing. This flexibility is valuable because it enables one to work with the available information rather than discarding it entirely.

Mathematical combinations come with a health warning: once an initial hypothesis has been formed, one must not “play around” with the formulas to obtain better backtested PnLs. This would be plain old data mining. By contrast, it is legitimate to revise signals if they reveal logical inconsistencies or unexpected behaviour. Simply put, mathematical factor combinations must steer clear of optimisation, which would easily degenerate into data mining and invalid backtests. All optimisation should be performed through a sequential statistical learning method, as explained in the next sections.

Using statistical learning

When theoretical priors on macro-quantamental factors are scant or fuzzy, statistical learning is a more suitable method for signal computation. This path also allows developers with little experience in economics to generate good signals by using a broad range of vaguely relevant categories and a suitable learning process. Statistical learning offers methods for sequentially choosing the best model class and other hyperparameters for signal generation, thus supporting realistic backtests and automated operation of strategies. The main purposes of such statistical learning are factor selection, return predictions, and market regime classification (view post here).

The workhorse Python class for statistical learning in macro-quantamental strategy development is SignalOptimizer of the Macrosynergy package. It manages sequential model selection, fitting, optimization and forecasting based on quantamental panel data. It works on top of scikit-learn but, unlike that package’s standard functions, respects the panel structure of features and target returns. Optimization is governed by the calculate_predictions method, which uses a grid of models and hyperparameters, a cross-validation pattern, and a performance criterion for executing any of three jobs:

  • Sequentially selecting constituent categories for a composite signal from a set of candidates,
  • Sequentially selecting and applying regression models for translating a chosen set of factors into a single return prediction, which is a valid basis for a trading signal.
  • Sequentially selecting and applying classification models to detect favourable or unfavourable regimes for the target returns of a strategy.

Regression-based learning is a particularly important and intuitive method for combining quantamental factors into trading signals. The learning process optimises model parameters and hyperparameters sequentially and produces signals at each point in time based on the regression model that performed best up to that date. The process learns from growing datasets and produces point-in-time signals for backtesting and live trading (view post here). Two “add-ons” to standard regression should be considered:

  • Simple regression-based predictions disregard statistical reliability, which tends to increase as time passes or decrease after structural breaks. It is, therefore, advisable to adjust signals by the statistical precision of parameter estimates (view post here). The adjustment correctly aligns intertemporal risk-taking with the predictive power of signals.
  • A helpful auxiliary technique in statistical learning is dimensionality reduction with Principal Components Analysis (PCA, view post here). It condenses the key information from a large dataset into a smaller set of uncorrelated variables called “principal components.” This smaller set often functions better as features for predictive regressions, stabilizing coefficient estimates and reducing the influence of noise.

Qualified statistical learning requires attention to the bias-variance trade-off of machine learning, i.e., the balance between a process’s ability to generalize to unseen data (low variance but potentially high bias) and its ability to accurately fit the available data (low bias but potentially high variance). Statistical learning for macro trading signals has a less favourable bias-variance trade-off than many other areas of quantitative research. This means that as we move from restrictive to flexible models, the benefits of reduced bias typically come at a high price of enhanced variance. This reflects the scarcity and seasonality of macro events and regimes.

While the term variance here refers to model instability, it also leads to “unproductive signal volatility”, i.e., model choice and estimation can become a major source of signal variation that logically has no relation to the variation of conditions for market performance. Hence, one typically needs to be parsimonious in delegating decisions to statistical learning and must emphasize reasonable and logical priors. If data are scarce, simple regression methods, such as OLS and non-negative least squares, often outperform more elaborate methods, such as elastic net or decision trees (view post here).

Signal evaluation and backtesting

Meaningful evaluation of macro trading signals must consider not just backtested PnLs but also statistics that account for their seasonality and diversity across countries. In a previous article (view post here), we presented three complementary types of evaluation, which are briefly summarized below. Conscientious evaluation of macro signals not only informs the selection process. It also paints a realistic picture of the PnL profile, which is critical for setting risk limits and for broader portfolio integration. One should have a protocol for dealing with poor signals constructively. The only real failure is the failure to acknowledge the empirical lessons.

Sometimes, lack of predictive power can be traced to logical and computational errors, such as applying incorrect neutral signal levels or combining factors incorrectly. In these cases, modification of signals is legitimate. However, if the basic trading hypothesis is clearly rejected, it is important to acknowledge the fact and, maybe, trace faults in basic reasoning or interpretation of data. Simply put, in these cases, we must either cut our losses (in terms of development costs) or go back to the drawing board. One of the most harmful habits is trying to salvage an idea by “playing around with the data”, which may gravely mislead decisions of capital allocation and risk management.

Test 1: Significance of proposed predictive relations

These analyses visualize and quantify the relations between macro signals and subsequent returns across countries or currency areas. A critical metric is the significance of forward correlation, i.e., the probability that a predictive relation has not been accidental. This metric requires a special panel test that adjusts the data of the predictive regression for common global influences across countries (view post here). This test is a most useful selection criterion for macro signal candidates. It is important, however, that the hypothesized relation between features and targets is similar across countries and that the country-specific features matter, not just their global averages.

The CategoryRelations class of the Macrosynergy package is the workhorse for analyzing and visualizing the relations of multiple panel categories. Most often, it is applied to trading signals and subsequent target returns across countries or markets using the reg_scatter and multiple_reg_scatter methods. It allows consideration of multiple trading frequencies, blacklisted periods for cross sections, and delays in the application of predictive information. Examples of applications of this class for two categories can be found in the related section of the Macrosynergy package tutorial notebook.

Test 2: Accuracy of directional predictions

Here, statistical accuracy measures the share of correctly predicted signs or directions of subsequent returns relative to all predictions. Accuracy is more about correct classification into “good” and “bad” conditions for the traded contract than about the strength of the relation. It also implicitly tests if the signal’s neutral (zero) level has been well chosen, which is very important for PnL generation. A particularly important metric for macro trading strategies is balanced accuracy, which is the average of the proportions of correctly predicted positive and negative returns. This metric is indispensable if we need a signal that works equally well in bull and bear markets for the target contract.

The SignalReturnRelations class from the Macrosynergy package is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals and panels of subsequent returns. Its methods display various summary tables of classification performance metrics and statistical measures, along with visualizations, including accuracy ratios across various frequencies.. Examples of applications can be found in the related section of the Macrosynergy package tutorial notebook.

Test 3: Performance metrics of stylized PnLs

We can gauge the economic value of trading signals by simulating standardized “naïve” profit and loss series. They can be calculated using daily generic returns data, such as those provided by JPMaQS, and by applying normalized signals and regular rebalancing following previous signals to allow for trading time or slippage. Naive trading signals should be capped below extreme values, say 3 or 4 standard deviations, as a reasonable risk limit. A naïve PnL does not consider transaction costs or risk management tools. It is thus not a realistic backtest of actual financial returns in a specific institutional setting. However, it is an objective and undistorted representation of the economic value of the signals, independent of the rules, size, and conventions of the trading institution.

Standard performance metrics of naive PnL analysis should include various types of risk-adjusted returns (Sharpe, Sortino), correlation coefficients with global risk benchmarks, such as bond and equity prices, measures of seasonality, measures of seasonality, and drawdown analytics.

The NaivePnL class is designed to provide a quick and simple overview of a stylized PnL profile of a set of trading signals. Its make_pnl method produces a proxy daily profit-and-loss series with a limited set of options for modifying signals prior to application. The function is designed for testing, not optimisation or data mining. Other methods govern visualizations of resulting PnLs and trading positions. Examples of applications of the class and its methods can be found in the related section of the Macrosynergy package tutorial notebook.

 

Share

Related articles