**The struggles of using macroeconomic data for trading strategies**

The principal case for incorporating macroeconomic information into trading strategies has long been compelling. Economic theory shows that market prices balance a broader macroeconomic equilibrium and, hence, depend on economic states and shocks. Meanwhile, the __full information efficiency of the broader market is unlikely due to research costs and attention limitations__ (view post here). Discretionary trading, rooted in macroeconomic fundamentals, has a long history and has been the catalyst for numerous successes in the hedge fund industry. Furthermore, trading based on macroeconomic information is not a zero-sum game. Trading profits are not solely derived from the losses of others but are also paid out of the economic gains from a faster and smoother alignment of market prices with economic conditions. Therefore, technological advancements in this field can increase the value generation or “alpha” of the asset management industry overall (view post here).

And yet, macroeconomic data have hitherto played a very modest role in systematic trading. This reflects two major obstacles.

- First,
__the relations between economic information and market prices are often indirect and potentially convoluted__. Building trading signals requires good judgment based on experience with macroeconomic theory and data. Alas, macroeconomics is not normally the core strength of portfolio managers or trading system engineers. Meanwhile, economists do not always converge on clear common views. - Second, to use macroeconomic data in trading,
__professionals must wrangle many deficiencies and inconveniences of standard series__:**Sparse history**: Many economic data series, particularly in emerging economies, have only a few decades of history. Whilst this would be abundant in other fields, in macroeconomics, this __only captures a limited number of business cycles and financial crisis events__. Often, this necessitates looking at multiple currency areas simultaneously and stitching together different data series, depending on what markets used to watch in the recent and more distant past.**Revisions of time series**: Standard economic databases store economic time series in their latest revised state. However, __initial and intermediate releases of many economic indicators, such as GDP or business surveys, may have looked very different.__ This is not only because the data sources have updated information and changed methods but also because adjustment factors for seasonal and calendar effects, as well as for data outliers, are being modified with hindsight. The information recorded *for the* past is typically not the information that was available *in* the past.**Dual timestamps**: Unlike market data, __economic records have different observation periods and release dates__. The former are the periods when the economic event occurred, and the latter are the dates on which the statistics became public. Standard economic databases only associate values with observation periods.**Distortions**: Almost __all economic data are at least temporarily distorted relative to what they promise to measure__. For example, inflation data are often affected by one-off tax changes and administered price hikes. Production and balance sheet data often reflect disruptions, such as strikes or unseasonal weather. Also, there can be sudden breaks in time series due to changes in methodology. Occasionally, statistics offices have even released plainly incorrect data for political reasons.**Calendar effects**: __Many economic data series are strongly influenced by seasonal patterns, working day numbers, and school holiday schedules__. While some series are calendar-adjusted by the source, others are not. Also, calendar adjustment is typically incomplete and not comparable across countries.**Multicollinearity**: The variations of many economic time series are correlated due to common influences, such as business cycles and financial crises. Oftentimes, a multitude of data all seem to tell the same story. It is typically __necessary to distill latent factors that make up common trends in macro data.__ This can be done using domain knowledge, statistical methods, or combinations of these two (view post here).

Generally, data wrangling means transforming raw, irregular data into clean, tidy data sets. In many fields of research, this requires mainly reformatting and relabelling. For macroeconomic trading indicators, the wrangling and preparation of data is a lot more comprehensive:

__Adapting macroeconomic indicators for trading purposes requires transforming activity records into market information states__. Common procedures include [1] stitching different series across time to account for changing availability and convention, [2] combining updates and revisions of time series into “vintage matrixes” as the basis of a single “point-in-time” series, and [3] assigning publication time stamps to the periodic updates and revisions of time series.__Economic information typically involves filters and adjustments__. The parameters of these filters must be estimated sequentially without look-ahead bias. Standard procedures are seasonal, working day, and calendar adjustment (view post here), special holiday pattern adjustment, outlier adjustment, and flexible filtering of volatile series. Seasonal adjustment is still largely the domain of official software, albeit there are modules in R and Python that provide access to these.- Markets often view information through the lens of economists.
__To track economic analyses over time, one must account for changing models, variables, and parameters__. A plausible evolution of economic analysis __can be replicated through machine learning methods__. This point is very important. Conventional econometric models are immutable and not backtestable because they are built with hindsight and do not aim to replicate *perceived* economic trends of the past but *actual* trends*.* Machine learning can simulate changing models, hyperparameters, and model coefficients. One practical approach is “__two-stage supervised learning__” (view post here). The first stage is scouting features. The second stage evaluates candidate models and selects the one that is best at any point in time. Another practical __statistical learning example is the simulation of the results of “nowcasters” over time__ (view post here). This method estimates past information states through a three-step approach of (1) variable pre-selection, (2) orthogonalized factor formation, and (3) regression-based prediction.

News and comments are major drivers for asset prices, probably more so than conventional price and economic data. Yet, no financial professional can read and analyze the vast flow of verbal information. Therefore, comprehensive news analysis is increasingly becoming the domain of __natural language processing, a technology that supports the quantitative evaluation of humans’ natural language__ (view post here). Natural language processing delivers textual information in a structured form that makes it usable for financial market analysis. A range of useful packages is available for extracting and analyzing financial news and comments.

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore

###

**Macro-quantamental indicators**

Overall, __statistical programming nowadays allows the construction of quantamental systems__ (view post here). A quantamental system combines customized, high-quality databases and statistical programming outlines in order to systematically investigate relations between market returns and plausible predictors. The term “quantamental” refers to a joint quantitative and fundamental approach to investing.

Macro quantamental indicators __record the market’s information state with respect to macroeconomic activity, balance sheets, and sentiment__. Quantamental indicators are distinct from regular economic time series insofar as they represent information that was available at the time of reference. Consequently, indicator values are comparable to market price data and are well-suited for backtesting trading ideas and implementing algorithmic strategies.

Quantamental indicators increase the market’s macro information efficiency (and trading profits) for two simple reasons:

__Quantamental indicators broaden the scope of easily backtestable and tradable strategy inputs__. Currently, most systematic strategies focus on market data, such as prices and volumes. Quantamental indicators capture critical aspects of the economic environment, such as growth, inflation, profitability, or financial risks, directly and in a format that is similar to price data. Data in this format can be easily combined across macroeconomic concepts and with price data.- Readily available quantamental indicators reduce information costs through scale effects.
__A quantamental system spreads the investment of low-level data wrangling and codifying fundamental domain know-how across many institutions__. For individual managers, developing trading strategies that use fundamentals becomes much more economical. Access to the system removes expenses for data preparation and reduces development time. It also centralizes curation and common-sense oversight. - Finally,
__quantamental indicators reduce moral hazard__ in systematic strategy building. Typically, if the production of indicators takes much time and high costs, there is a strong incentive to salvage failed related strategy propositions through “flexible interpretation” and effective data mining.

The main source of macro quantamental information for institutional investors is the J.P. Morgan Macrosynergy Quantamental System (JPMaQS). It is a service that makes it easy to use quantitative-fundamental (“quantamental”) information for financial market trading. With JPMaQS, users can access a wide range of relevant macro quantamental data that are designed for algorithmic strategies, as well as for backtesting macro trading principles in general.

Quantamental indicators are principally based on a two-dimensional data set.

- The first dimension is the timeline of real-time dates or information release dates. It marks the progression of the market’s information state.
- The second dimension is the timeline of observation dates. It describes the history of an indicator for a specific information state.

For any given real-time date, a quantamental indicator is calculated based on the full information state, typically a time series that may be based on other time series and estimates that would be available at or before the real-time date. This information state-contingent time series is called a **data vintage**.

The two-dimensional structure of the data means that, unlike regular time series, quantamental indicators __convey information on two types of changes: changes in reported values and reported changes in values__. The time series of the quantamental indicator itself shows changes in reports arising from updates in the market’s information state. By contrast, quantamental indicators of changes are reported dynamics based on the latest information state alone.

###

**Macro indicators and statistical learning in general**

Statistical learning refers to a set of tools or models that help extract insights from datasets, such as macro-quantamental indicators. Not only does statistical learning support the estimation of relations across variables (parameters), but it also governs the choice of models for such estimates (hyperparameters). Moreover, for macro trading, statistical learning has another major benefit: it allows realistic backtesting. Rather than choosing models and features arbitrarily and potentially with hindsight, statistical learning can simulate a rational rules-based choice of method in the past. __Understanding statistical learning is critical in modern financial markets, even for non-quants__(view post here). This is because statistical learning illustrates and replicates how investors’ experiences in markets shape their future behavior.

Within statistical learning pipelines, simple and familiar econometric models can be deployed to simulate point-in-time economic analysis.

**Linear regression** remains the most popular tool for supervised learning in financial markets. It is appropriate if there is a monotonous relation between today’s indicator value and tomorrow’s expected return that can be linearized. Statistical learning based on regression can optimize both model parameters and hyperparameters sequentially and produce signals based on whichever model has predicted returns best up to a point in time (view post here). In the macro trading space, __mixed data sampling (MIDAS) regressions are a useful method for nowcasting economic trends and financial market variables__, such as volatility (view post here). This type of regression allows combining time series of different frequencies and limits the number of parameters that need to be estimated. Linear regression does not only support estimates for the relationship between signals and returns but also for the relationship between contract returns and macro factors. The latter allows immunizing strategy returns against unwanted macro influences (view post here).**Structural vector autoregression **(SVAR) is a practical model class that captures the evolution of a set of linearly related observable time series variables, such as economic data or asset prices. SVAR assumes that all variables depend in fixed proportion on past values of the set and new structural shocks. The method is useful for macro trading strategies (view post here) because it __helps identify specific interpretable markets and macro shocks__ (view post here). For example, SVAR can identify short-term policy, growth, or inflation expectation shocks. Once a shock is identified, it can be used for trading in two ways.- First, one can compare the type of shock implied by markets with the actual news flow and detect fundamental inconsistencies.
- Second, different types of shocks may entail different types of subsequent asset price dynamics and, hence, form a basis for systematic strategies.

- Another useful set of models tackles
**dimension reduction**. This refers to __methods that condense the bulk of the information of many macroeconomic time series into a small set__ with the most important information for investors. In macroeconomics, there are many related data series that have only limited incremental relevant information value. Cramming all of them into a prediction model undermines estimation stability and transparency. There are three popular types of statistical dimension reduction methods.- The first type of dimension reduction
__selects a subset of “best” explanatory variables by means of regularization__, i.e., reducing coefficient values by penalizing coefficient magnitudes in the optimization function applied for statistical fit. Penalty functions that are linear in individual coefficient values can set some of them to zero. Classic methods of this type are Lasso and Elastic Net (view post here). - The second type
__selects a small set of latent background factors of__ all explanatory variables and then uses these background factors for prediction. This is the basic idea behind static and dynamic **factor models**. Factor models are the key technology behind nowcasting in financial markets, a modern approach to monitoring current economic conditions in real-time (view post here). While nowcasting has mostly been used to predict forthcoming data reports, particularly GDP, the underlying factor models can produce a lot more useful information for the investment process, including latent trends, indications of significant changes in such trends, and estimates of the changing importance of various predictor data series (view post here). - The third type
__generates a small set of functions of the original explanatory variables__ that historically would have retained their explanatory power and then deploys these for forecasting. This method is called **Sufficient Dimension Reduction** and is more suitable for non-linear relations. (view post here).

Dimension reduction methods not only help to condense information about predictors of trading strategies but also support portfolio construction. In particular, they are suited for detecting latent factors of a broad set of asset prices (view post here). These factors can be used to improve estimates of the covariance structure of these prices and – by extension – to improve the construction of a well-diversified minimum variance portfolio (view post here).