Introduction to Macrosynergy package #

This notebook shows how to download, process, describe, and analyze macro quantamental data through the Macrosynergy package .

Macro quantamental indicators are mostly time series of information states of economic trends, balance sheets, and conditions. They are particularly suitable for backtesting and operating financial market trading strategies. The primary source of macro quantamental indicators is the J.P. Morgan - Macrosynergy Quantamental System (“JPMaQS”). The format has a few specifics that make it easy to discover investment values:

  • All values are point-in-time, meaning they represent the latest value of a concept at the end of the date for which they have been recorded.

  • The point-in-time format means that they can be related easily to time series of financial returns; many types of financial returns are also included in the system.

  • Data are organized in panels, i.e., one type of quantamental indicator (“category”) time series is available over a range of tradable markets or countries.

  • Each observation of an indicator does not just contain a value but also information on the time that has elapsed since the recorded activity has taken place and the quantity of the replication of the historic information state.

The Macrosynergy package contains convenience functions to handle this specific format and arrive quickly at conclusions for the investment process. It is not designed to compete with general statistics and graphics packages but merely serves as a shortcut to quick results and a guide to the type of operations and analyses that one can swiftly conduct on JPMaQS data.

The notebook covers the following main parts:

  • Get Packages and JPMaQS Data: This section is responsible for installing and importing the necessary Python packages that are used throughout the analysis, including macrosynergy package.

  • Describing: In this part, the notebook shows how to check data availability, detect missing categories, visualize panel distributions with the help of standard bar and box plots, and analyze data with the help of time series and heatmaps.

  • Pre-processing: this part shows examples of simple data transformation, such as creating a new category, excluding series, computing relative values, normalizing data, etc.

  • Relating: the functions in this part look into visualization and analysis of relationships between two categories: it is based on standard seaborn functions scatterplot, but allows for additional customization for trading signal creation.

  • Learning: the macrosynergy.learning subpackage contains functions and classes to assist the creation of statistical learning solutions with macro quantamental data. The functionality is built around integrating the macrosynergy package and associated JPMaQS data with the popular scikit-learn library, which provides a simple interface for fitting common statistical learning models, as well as feature selection methods, cross-validation classes, and performance metrics.

  • Signaling: this part is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals and panels of subsequent returns.

  • Backtesting: the functions here are designed to provide a quick and simple overview of a stylized PnL profile of a set of trading signals. The class carries the label naive because its methods do not take into account transaction costs or position limitations, such as risk management considerations. This is deliberate because costs and limitations are specific to trading size, institutional rules, and regulations.

For examples of standard packages used with JPMaQS, please have a look at the notebooks “JPMaQS with Seaborn” , “JPMaQS with Statsmodels” , and “Panel regression with JPMaQS” .

Get packages and JPMaQS data #

# Uncomment to update the package
"""
%%capture
! pip install macrosynergy --upgrade"""
'\n%%capture\n! pip install macrosynergy --upgrade'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

import macrosynergy.management as msm 
import macrosynergy.panel as msp
import macrosynergy.signal as mss
import macrosynergy.pnl as msn
import macrosynergy.visuals as msv
import macrosynergy.learning as msl

from macrosynergy.download import JPMaQSDownload

# machine learning modules
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

from sklearn.metrics import (
    make_scorer,
    balanced_accuracy_score,
    r2_score,
)


import warnings

warnings.simplefilter("ignore")

The JPMaQS indicators we consider are downloaded using the J.P. Morgan Dataquery API interface within the macrosynergy package. This is done by specifying ticker strings, formed by appending an indicator category code to a currency area code <cross_section>. These constitute the main part of a full quantamental indicator ticker, taking the form DB(JPMAQS,<cross_section>_<category>,<info>) , where denotes the time series of information for the given cross-section and category. The following types of information are available:

value giving the latest available values for the indicator eop_lag referring to days elapsed since the end of the observation period mop_lag referring to the number of days elapsed since the mean observation period grade denoting a grade of the observation, giving a metric of real time information quality.

After instantiating the JPMaQSDownload class within the macrosynergy.download module, one can use the download(tickers,start_date,metrics) method to easily download the necessary data, where tickers is an array of ticker strings, start_date is the first collection date to be considered and metrics is an array comprising the times series information to be downloaded. For more information see here or use the free dataset on Kaggle

To ensure reproducibility, only samples between January 2000 (inclusive) and May 2023 (exclusive) are considered.

cids_dm = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK", "USD"]
cids_em = ["CLP","COP", "CZK", "HUF", "IDR", "ILS", "INR", "KRW", "MXN", "PLN", "THB", "TRY", "TWD", "ZAR",]
cids = cids_dm + cids_em

cids_dux = list(set(cids) - set(["IDR", "NZD"]))
ecos = [
    "CPIC_SA_P1M1ML12",
    "CPIC_SJA_P3M3ML3AR",
    "CPIC_SJA_P6M6ML6AR",
    "CPIH_SA_P1M1ML12",
    "CPIH_SJA_P3M3ML3AR",
    "CPIH_SJA_P6M6ML6AR",
    "INFTEFF_NSA",
    "INTRGDP_NSA_P1M1ML12_3MMA",
    "INTRGDPv5Y_NSA_P1M1ML12_3MMA",
    "PCREDITGDP_SJA_D1M1ML12",
    "RGDP_SA_P1Q1QL4_20QMA",
    "RYLDIRS02Y_NSA",
    "RYLDIRS05Y_NSA",
    "PCREDITBN_SJA_P1M1ML12",
]
mkts = [
    "DU02YXR_NSA",
    "DU05YXR_NSA",
    "DU02YXR_VT10",
    "DU05YXR_VT10",
    "EQXR_NSA",
    "EQXR_VT10",
    "FXXR_NSA",
    "FXXR_VT10",
    "FXCRR_NSA",
    "FXTARGETED_NSA",
    "FXUNTRADABLE_NSA",
]

xcats = ecos + mkts
# Download series from J.P. Morgan DataQuery by tickers

start_date = "2000-01-01"
end_date = "2023-05-01"

tickers = [cid + "_" + xcat for cid in cids for xcat in xcats]
print(f"Maximum number of tickers is {len(tickers)}") 


# Download series from J.P. Morgan DataQuery by tickers

client_id: str = os.getenv("DQ_CLIENT_ID")
client_secret: str = os.getenv("DQ_CLIENT_SECRET")

with JPMaQSDownload(client_id=client_id, client_secret=client_secret) as dq:
    df = dq.download(
        tickers=tickers,
        start_date="2000-01-01",
        suppress_warning=True,
        metrics=["all"],
        show_progress=True,
    )
Maximum number of tickers is 600
Downloading data from JPMaQS.
Timestamp UTC:  2024-04-25 17:40:20
Connection successful!
Requesting data: 100%|███████████████████████████████████████████████████████████████| 120/120 [00:28<00:00,  4.16it/s]
Downloading data: 100%|██████████████████████████████████████████████████████████████| 120/120 [00:39<00:00,  3.06it/s]
Some expressions are missing from the downloaded data. Check logger output for complete list.
84 out of 2400 expressions are missing. To download the catalogue of all available expressions and filter the unavailable expressions, set `get_catalogue=True` in the call to `JPMaQSDownload.download()`.
Some dates are missing from the downloaded data. 
2 out of 6346 dates are missing.

The Macrosynergy package works with data frames of a standard JPMaQS format, i.e., long data frames with at least four columns containing cross-section ( cid ), extended category ( xcat ), real-time dates ( real_date ), and value . Other potentially useful columns contain grades of observations ( grading ), lags to the end of the observation period ( eop_lag ), and lags to the median of the observation period ( mop_lag ).

#  uncomment if running on Kaggle
"""for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
                                                   
df = pd.read_csv('../input/fixed-income-returns-and-macro-trends/JPMaQS_Quantamental_Indicators.csv', index_col=0, parse_dates=['real_date'])""";

The description of each JPMaQS category is available either under Macro Quantamental Academy , JPMorgan Markets (password protected), or on Kaggle (just for the tickers used in this notebook). In particular, the set used for this notebook is using Consumer price inflation trends , Inflation targets , Intuitive growth estimates , Domestic credit ratios , Long-term GDP growth , Real interest rates , Private credit expansion , Duration returns , Equity index future returns , FX forward returns , FX forward carry , and FX tradeability and flexibility

df['ticker'] = df['cid'] + "_" + df["xcat"]
dfx = df.copy()
dfx.info() 
dfx.head(3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3445054 entries, 0 to 3445053
Data columns (total 8 columns):
 #   Column     Dtype         
---  ------     -----         
 0   real_date  datetime64[ns]
 1   cid        object        
 2   xcat       object        
 3   eop_lag    float64       
 4   grading    float64       
 5   mop_lag    float64       
 6   value      float64       
 7   ticker     object        
dtypes: datetime64[ns](1), float64(4), object(3)
memory usage: 210.3+ MB
real_date cid xcat eop_lag grading mop_lag value ticker
0 2000-01-03 AUD CPIC_SA_P1M1ML12 95.0 2.0 292.0 1.244168 AUD_CPIC_SA_P1M1ML12
1 2000-01-03 AUD CPIC_SJA_P3M3ML3AR 95.0 2.0 186.0 3.006383 AUD_CPIC_SJA_P3M3ML3AR
2 2000-01-03 AUD CPIC_SJA_P6M6ML6AR 95.0 2.0 277.0 1.428580 AUD_CPIC_SJA_P6M6ML6AR

Describing #

View available data history with check_availability #

The convenience function check_availability() visualizes start years and the number of missing values at or before the end date of all selected cross-sections and across a list of categories. It also displays unavailable indicators as gray fields and color codes the starting year of each series with darker colors indicating more recent starting years. If we are interested only in availability starting with a particular date, we pass this option as “start”.

msm.check_availability(
    dfx,
    xcats=ecos + ["EQXR_NSA"]+ ["FXXR_NSA"],
    cids=cids_em,
    start="2000-01-01",
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/1f456a66e41e06a8f623e24fefe7794fe265b99e990757cd4096c6282acb2601.png https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/9d815da618304fa1d27b985bd2099d56c65a8872ababdfa7770435a8896be063.png

Detect missing categories or cross-sections with missing_in_df #

The function missing_in_df() is complimentary to check_availability and simply displays (1) categories that are missing across all expected cross-sections for a given category name list and (2) cross-sections that are missing within a category.

cats_exp = ["EQCRR_NSA", "FXCRR_NSA", "INTRGDP_NSA_P1M1ML12_3MMA", "RUBBISH"]
msm.missing_in_df(dfx, xcats=cats_exp, cids=cids)
Missing XCATs across DataFrame:  ['RUBBISH', 'EQCRR_NSA']
Missing cids for FXCRR_NSA:                  ['USD']
Missing cids for INTRGDP_NSA_P1M1ML12_3MMA:  []

Visualize panel distributions with view_ranges #

For an overview of long-term series distributions in a panel, the convenience function view_ranges() uses standard bar plots and box plots of the Seaborn package to quickly and conveniently fit the JPMaQS format.

For example, choosing kind='bar' displays a barplot that focuses on means and standard deviations of one or more categories across sections for a given sample period. One can define the start and the end date of the time series. The default would be the earliest available date as the start and the latest available as the end date.

xcats_sel = ["CPIC_SJA_P6M6ML6AR", "CPIH_SJA_P6M6ML6AR"]
msp.view_ranges(
    dfx,
    xcats=xcats_sel,
    kind="bar",
    sort_cids_by="mean",  # countries sorted by mean of the first category
    title="Means and standard deviations of inflation trends across all major markets since 2000",
    ylab="% annualized",
    start="2000-01-01",
    end="2020-01-01",
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/0e7222b3fdb7e5b92c1efdebc188ad9f6e02a9e5f00e4adf0e76c64e6dcb166e.png

Choosing kind='box' gives a barplot that visualizes 25%, 50% (median), and 75% quantiles and outliers beyond a normal range. Chart title, y-axis label, size, and category labels can be customized as shown below:

xcats_sel = ["RYLDIRS02Y_NSA"]
msp.view_ranges(
    dfx,
    xcats=xcats_sel,
    kind="box",
    start="2012-01-01",
    sort_cids_by="std",  # here sorted by standard deviations
    title="Real interest rates sorted by volatility",
    ylab="% monthly return",
    xcat_labels=["Real 2-year IRS rate"],
    size=(12, 5),
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/42126dc1369cbc906ffa6906f739bf506583f524d358541f690f28cc3643bf6c.png

Visualize panel time series with view_timelines #

The convenience function view_timelines() displays a facet grid of timeline charts of one or more categories.

The cs_mean=True option adds a timeline of the cross-sectional average of a single category to each plot in the facet, emphasizing cross-sectional deviations.

msp.view_timelines(
    dfx,
    xcats=["PCREDITBN_SJA_P1M1ML12"],
    cids=cids_dm,
    ncol=4,
    start="1995-01-01",
    title="Private credit growth, %oya",
    same_y=False,
    cs_mean=True,
    xcat_labels=["Credit growth", "Global average"],
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/2cdbee5887899470ad444a724d25769fe8a25647a70861834189b1a156002c95.png

Arguments can be set according to the data’s nature and the plot’s intention. The following are important choices:

  • For asset returns and similarly volatile series, displaying cumulative sums with cumsum=True is often desirable.

  • The default setting same_y=True shows all lines on the same scale for comparability of size.

  • For large facets xticks=True , the time (x-) axis is printed under each chart, not just the bottom row.

  • The xcat_labels argument customizes the category labels (the default is just category tickers).

cids_sel = ["AUD", "NZD", "GBP", "MXN", "PLN", "ZAR", "KRW", "INR"]
msp.view_timelines(
    dfx,
    xcats=["FXXR_NSA", "FXXR_VT10"],
    cids=cids_sel,
    ncol=3,
    cumsum=True,
    start="2010-01-01",
    same_y=True,
    all_xticks=True,
    title="Cumulative FX returns",
    xcat_labels=["FX returns", "FX forward return for 10% vol target"],
    
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/d0cd42f809839876284865ea00868e53e8d64432e6b88e884bedf6ac46b1ce1c.png

One can display a single chart, displaying several categories for a single cross-section by passing a single-string list to xcats and specifying a cross-section as a list with single element, such as cids=["USD"]

msp.view_timelines(
    dfx,
    xcats=["CPIC_SJA_P6M6ML6AR", "CPIH_SJA_P6M6ML6AR"],
    cids=["USD"],
    start="2000-01-01",
    title="U.S. CPI inflation trends, %6m/6m, saar",
    xcat_labels=["Core", "Headline"],
)  
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/44db948a10262a64f239d47e0008b2baa90fa6cc5cfab1ca8dda2a92a37097e9.png

Setting single_chart=True allows plotting a single category for various cross-sections in one plot. Per default, full tickers are used as labels.

cids_sel = ["AUD", "NZD", "GBP"]

msp.view_timelines(
    dfx,
    xcats=["CPIH_SA_P1M1ML12"],
    cids=cids_sel,
    cumsum=False,
    start="2000-01-01",
    same_y=False,
    title="Annual headline consumer price inflation",
    single_chart=True,
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/dcf267cb1c789d9f55d902250ede80e8a7be613db3499ceeeef189bc4c9ad90a.png

Visualize vintage qualities with heatmap_grades #

The visualization function heatmap_grades() displays a colored table of grading quality of indicators by categories and cross-sections as an average for a given start date. Darker colors represent lower grading.

This function visualizes the grades of the vintages based on which quantamental series have been calculated. JPMaQS uses vintages, i.e., time sequences of time series, to replicate the information of the market in the past. Vintages arise from data revisions, extension, and re-estimation of parameters of any underlying model. JPMaQS grades vintage time series from 1 (highest quality, either original record of the time series available on that date or a series that is marginally different from the original for storage reasons or publication conventions) to grade 3 (rough estimate of the information status). More details on vintages and grades are here

xcats_sel = ["INTRGDPv5Y_NSA_P1M1ML12_3MMA", "CPIC_SJA_P6M6ML6AR", "FXXR_NSA"]
msp.heatmap_grades(
    dfx,
    xcats=xcats_sel,
    cids=cids_em + cids_dm,
    start="2000-01-01",
    size=(15, 2),
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/3c6012be640ca68a2772bed9511c484092b935173f1cd183fcd7121eabaf96cc.png

Pre-processing #

Create new category panels with panel_calculator #

The panel_calculator() function in the macrosynergy.panel module simplifies applying transformations to each panel cross-section using a string-based formula. This function is very flexible and saves a lot of code when creating trading signals across multiple countries. To use the function, consider the category ticker as a panel dataframe and use standard Python and pandas expressions.

Panel category names not at the beginning or end of the string must always have a space before and after the name. Calculated category and panel operations must be separated by ‘=’. Examples: “NEWCAT = ( OLDCAT1 + 0.5) * OLDCAT2” “NEWCAT = np.log( OLDCAT1 ) - np.abs( OLDCAT2 ) ** 1/2”

Note that the argument cids contains the cross-sections for which the new categories are to be calculated. If a cross-section is missing for any of the categories used, none of the new categories will be produced. This means if a specific calculation should be made for a smaller or larger set of cross-sections one must make a separate call to the function.

Below, we calculate plausible metrics of indicators or signals which can be used for analysis:

  • intuitive growth trend

  • excess inflation versus a country’s effective inflation target

  • excess private credit growth

  • excess real interest rate

  • combination of the newly created indicators

calcs = [
    "XGDP_NEG = - INTRGDPv5Y_NSA_P1M1ML12_3MMA", # intuitive growth trend
    "XCPI_NEG =  - ( CPIC_SJA_P6M6ML6AR + CPIH_SA_P1M1ML12 ) / 2 + INFTEFF_NSA", # excess inflation measure
  #  "XINF = CPIH_SA_P1M1ML12 - INFTEFF_NSA",  # excess inflation
    "XPCG_NEG = - PCREDITBN_SJA_P1M1ML12 + INFTEFF_NSA + RGDP_SA_P1Q1QL4_20QMA", # excess private credit growth
    "XRYLD = RYLDIRS05Y_NSA -  INTRGDP_NSA_P1M1ML12_3MMA",  # excess real interest rate
    "XXRYLD = XRYLD + XCPI_NEG",  # newly created panels can be used subsequently
]


dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids)
dfx = msm.update_df(dfx, dfa)


xcats_sel = ["XRYLD", "XCPI_NEG"]


msp.view_timelines(
    dfx,
    xcats=xcats_sel,
    cids=cids_dm,
    ncol=3,
    title="Excess real interest rates and (negative) excess inflation",
    start="2000-01-01",
    same_y=False,
   )
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/906390dc6233a1b3fb53d6b85f19a1b78fc88a7849409c0767303f4b97e7bd8b.png

The panel_calculator function is also suitable for computing cross-section-specific relative economic performance by using a loop and f-strings. For example, here, we calculate absolute target deviations for a range of CPI inflation metrics for all markets.

infs = ["CPIH_SA_P1M1ML12", "CPIH_SJA_P6M6ML6AR", "CPIH_SJA_P3M3ML3AR"]

for inf in infs:
    calcs = [
        f"{inf}vIET = ( {inf} - INFTEFF_NSA )",
    ]

    dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids)
    dfx = msm.update_df(dfx, dfa)


xcats_sel = ["CPIH_SA_P1M1ML12vIET", "CPIH_SJA_P3M3ML3ARvIET"]

msp.view_timelines(
    dfx,
    xcats=xcats_sel,
    cids=cids_dm,
    ncol=4,
    cumsum=False,
    start="2000-01-01",
    same_y=False,
    all_xticks=True,
    title="CPI inflation rates, %ar, versus effective inflation target, market information state",
    xcat_labels=["% over a year ago", "% 3m/3m, saar"],
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/25cd46daa748ea77762b0d40e043201ea68cfdb76985fa899ffa7e000c47f763.png

Panel calculation can use individual series by prepending an i to the ticker name. This ticker can have a cross-sections identifier that is not in the selection defined by cids

cids_sel = cids_dm[:6]
calcs = ["RYLDvUSD = RYLDIRS05Y_NSA -  iUSD_RYLDIRS05Y_NSA"]

dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids_sel)
dfx = msm.update_df(dfx, dfa)

msp.view_timelines(
    dfx,
    xcats=["RYLDvUSD"],
    cids=cids_sel,
    ncol=3,
    start="2000-01-01",
    same_y=False,
    title = "Excess 5-year real IRS yields vs USD benchmark"

)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/41b1475bb4c69d6ec18c10d8e515c1e59f04ca108146dbc3f6699cdf8b00f659.png

Exclude series sections with make_blacklist #

The make_blacklist() helper function creates a standardized dictionary of blacklist periods, i.e., periods that affect the validity of an indicator, based on standardized panels of binary categories, where values of 1 indicate a cause for blacklisting.

Put simply, this function allows converting category variables into blacklist dictionaries that can then be passed to other functions. Below, we picked two indicators for FX tradability and flexibility. FXTARGETED_NSA is an exchange rate target dummy, which takes a value of 1 if the exchange rate is targeted through a peg or any regime that significantly reduces exchange rate flexibility and 0 otherwise. FXUNTRADABLE_NSA is also a dummy variable that takes the value 1 if liquidity in the main FX forward market is limited or there is a distortion between tradable offshore and untradable onshore contracts.

Details on both categories are here

dfb = df[df["xcat"].isin(["FXTARGETED_NSA", "FXUNTRADABLE_NSA"])].loc[
    :, ["cid", "xcat", "real_date", "value"]
]
dfba = (
    dfb.groupby(["cid", "real_date"])
    .aggregate(value=pd.NamedAgg(column="value", aggfunc="max"))
    .reset_index()
)
dfba["xcat"] = "FXBLACK"
fxblack = msp.make_blacklist(dfba, "FXBLACK")
fxblack
{'CHF': (Timestamp('2011-10-03 00:00:00'), Timestamp('2015-01-30 00:00:00')),
 'CZK': (Timestamp('2014-01-01 00:00:00'), Timestamp('2017-07-31 00:00:00')),
 'ILS': (Timestamp('2000-01-03 00:00:00'), Timestamp('2005-12-30 00:00:00')),
 'INR': (Timestamp('2000-01-03 00:00:00'), Timestamp('2004-12-31 00:00:00')),
 'THB': (Timestamp('2007-01-01 00:00:00'), Timestamp('2008-11-28 00:00:00')),
 'TRY_1': (Timestamp('2000-01-03 00:00:00'), Timestamp('2003-09-30 00:00:00')),
 'TRY_2': (Timestamp('2020-01-01 00:00:00'), Timestamp('2024-04-24 00:00:00'))}

Since 2000, roughly a third of the currencies covered by JPMaQS have seen their FX forward market affected either by an official exchange rate target, illiquidity, or convertibility-related distortions. The above output shows periods of disruptions for (primarily) emerging currencies. A notable developed market exception here is CHF, which was pegged between 2011 and 2016. A standard blacklist dictionary can be passed to several package functions that exclude the blacklisted periods from related analyses. If one wishes to just exclude the blacklisted periods from a dataframe independent of specific applications, one can use the reduce_df() helper function.

dffx = df[df["xcat"] == "FXXR_NSA"]
print("Original shape: ", dffx.shape)
dffxx = msm.reduce_df(dffx, blacklist=fxblack)
print("Reduced shape: ", dffxx.shape)
Original shape:  (145252, 8)
Reduced shape:  (137975, 8)

Concatenate dataframes with update_df #

The update_df function in the management module concatenates two JPMaQS data frames and offers two conveniences.

  • It replaces duplicated tickers in the base data frame with those in the added data frame and re-indexes the output data frame.

  • Additionally, you can replace categories in the base data frame by setting xcat_replace=True. This is useful when re-calculating the data panel of a category but not including all cross-sections of the original panel and wanting to avoid confusion by having two different calculation methods under the same category name.

dfa = msp.panel_calculator(
    dfx, calcs=["RYLD52 = RYLDIRS05Y_NSA - RYLDIRS02Y_NSA"], cids=cids_dm
)

dfx = msm.update_df(df=dfx, df_add=dfa)  # composite extended data frame
msm.missing_in_df(dfx, xcats=["RYLD52"], cids=cids_dm)  #quick check of missing values. Empty list means no missing values
No missing XCATs across DataFrame.
Missing cids for RYLD52:  []

Compute panels versus basket with make_relative_value #

The make_relative_value() function generates a data frame of relative values for a given list of categories. In this case, “relative” means that the original value is compared to a basket average. By default, the basket consists of all available cross-sections, and the relative value is calculated by subtracting the basket average from individual cross-section values.

By default, the function assumes that complete_cross=False , meaning that basket averages do not require the full set of cross-sections to be calculated for a specific date but are always based on the ones available at the time.

cids_sel = cids_dm[:6]
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12"]
dfy = msp.make_relative_value(
    dfx,
    xcats=["PCREDITGDP_SJA_D1M1ML12"],
    cids=cids_sel,
    start="2000-01-01",
    blacklist=fxblack,  # cross-sections can be blacklisted for calculation and basket use
    rel_meth="subtract",
    complete_cross=False,  # cross-sections do not have to be complete for basket calculation
    postfix="_vDM",
)

dfx = msm.update_df(df=dfx, df_add=dfy)  # composite extended data frame

dfj = pd.concat([dfx[dfx["xcat"].isin(xcats_sel)], dfy])

xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "PCREDITGDP_SJA_D1M1ML12_vDM"]
msp.view_timelines(
    dfj,
    xcats=xcats_sel,
    cids=cids_sel,
    ncol=3,
    start="2000-01-01",
    same_y=True,
    title = "Private credit growth, %oya, versus DM average"
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/533128d5bd68fb35b94fad18048559d737c89a85c2a511d5ead521c1857e15a9.png

By default, the basket comprises all available cross-sections for every period, as defined by the cids argument. However, it is possible to limit the basket to a subset or a single cross-section by using the basket=[...] argument.

In the make_relative_value() function, an important decision is the use of the blacklist argument. This argument takes a dictionary of cross-sections and date ranges that should be excluded from the output created by the make_blacklist() function. Excluding invalid or distorted data is crucial when calculating relative values because a single cross-section’s distortion can invalidate all cross-sectional relative values.

cids_sel = (list(set(cids_dm) - set(["JPY"])))

xcats_sel = ["CPIC_SA_P1M1ML12"]
dfy = msp.make_relative_value(
    dfx,
    xcats=xcats_sel,
    cids=cids_sel,
    start="2000-01-01",
    blacklist=fxblack,  # remove invalid observations
    basket=["EUR", "USD"],  # basket does not use all cross-sections
    rel_meth="subtract",
    postfix="vG2",
)
dfx = msm.update_df(df=dfx, df_add=dfy)  # composite extended data frame

#dfj = pd.concat([dfx[dfx["xcat"].isin(xcats_sel)], dfy])

msp.view_timelines(
    dfx,
    xcats=["CPIC_SA_P1M1ML12", "CPIC_SA_P1M1ML12vG2"],
    cids=cids_sel,
    ncol=3,
    start="2000-01-01",
    same_y=False,
    title="Core CPI inflation, %oya, versus G2 average",
    
 )
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/6fc85037f1d81cd1f7e2ed2e9272501a01af8aa6ef19ed52d0866be20f853a67.png

Normalize panels with make_zn_scores #

The make_zn_scores() function is a method for normalizing values across different categories. This is particularly important when summing or averaging categories with different units and time series properties. The function computes z-scores for a category panel around a specified neutral level that may be different from the mean. The term “zn-score” refers to the normalized distance from the neutral value.

The default mode of the function calculates scores based on sequential estimates of means and standard deviations, using only past information. This is controlled by the sequential=True argument, and the minimum number of observations required for meaningful estimates is set with the min_obs argument. By default, the function calculates zn-scores for the initial sample period defined by min_obs on an in-sample basis to avoid losing history.

The means and standard deviations are re-estimated daily by default, but the frequency of re-estimation can be controlled with the est_freq argument, which can be set to weekly, monthly, or quarterly.

msp.view_timelines(
    dfx,
    xcats=["XCPI_NEG"],
    cids=cids,
    ncol=3,
    start="2000-01-01",
    same_y=False,
    title="Core CPI inflation, %oya, versus G2 average",
    
 )
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/cd4ec0140f4f267b442ead61407b9a0dd9a10c05ef28502f0140f95d54516f55.png
macros = ["XGDP_NEG", "XCPI_NEG", "XPCG_NEG", "RYLDIRS05Y_NSA"]
xcatx = macros

for xc in xcatx:
    dfa = msp.make_zn_scores(
        dfx,
        xcat=xc,
        cids=cids,
        neutral="zero",
        thresh=3,
        est_freq="M",
        pan_weight=1,
        postfix="_ZN4",
    )
    dfx = msm.update_df(dfx, dfa)

msp.view_ranges(
    dfx,
    xcats=["XGDP_NEG", "XGDP_NEG_ZN4"],
    kind="bar",
    sort_cids_by="mean",
    start="2000-01-01",
   
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/656fc35e47f3b0a36f5c3ccbfbf5cba742459b189f820086141d02d9abb31aa9.png

Important parameters that shape the nature of the zn-scores are:

  • neutral sets the level dividing positive and negative scores. The choices are 'zero' , 'mean' , or 'median' .

  • pan_weight sets the panel’s importance versus the individual cross-sections for scaling the zn-scores. If the category is assumed to be homogeneous across countries regarding its signal, the weight can be close to 1 (whole panel data are the basis for the parameters). If countries are not comparable regarding category means and/or standard deviation and panel weight close to zero is preferable (parameters are all specific to cross-section). The default value is 1.

  • thresh sets the cutoff value (threshold) for winsorization in terms of standard deviations. The minimum value is 1. Setting thresh to values close to 1 will exclude particular high volatility periods from the sample. For the example above, only TRY will be affected by applying the threshold of 2.

xcat = "CPIH_SJA_P6M6ML6AR"
cids_sel = ["ZAR", "HUF", "PLN", "AUD", "JPY"]
dict_ps = {  # dictionary of zn-score specs
    1: {
        "neutral": "zero",
        "pan_weight": 1,
        "thresh": None,
        "postfix": "ZNPZ",
        "label": "panel-based scores around zero",
    },
    2: {
        "neutral": "mean",
        "pan_weight": 1,
        "thresh": None,
        "postfix": "ZNPM",
        "label": "panel-based scores around mean",
    },
    3: {
        "neutral": "mean",
        "pan_weight": 0,
        "thresh": None,
        "postfix": "ZNCM",
        "label": "country-based scores around mean",
    },
    4: {
        "neutral": "zero",
        "pan_weight": 1,
        "thresh": 1.5,
        "postfix": "ZNPW",
        "label": "panel-based winsorized scores around zero",
    },
}

dfy = pd.DataFrame(columns=df.columns)

for dvs in dict_ps.values():
    dfa = msp.make_zn_scores(
        dfx,
        xcat=xcat,
        cids=cids_sel,
        sequential=True,
        neutral=dvs["neutral"],
        pan_weight=dvs["pan_weight"],
        thresh=dvs["thresh"],
        postfix=dvs["postfix"],
        est_freq="m",
    )
    dfy = msm.update_df(dfy, dfa)

#dfy = msm.update_df(dfy, dfa)

compares = [(1, 2), (2, 3), (1, 4)]

for comps in compares:
    print(comps)
    dv1 = dict_ps[comps[0]]
    dv2 = dict_ps[comps[1]]

    msp.view_ranges(
        dfy,
        xcats=[f"{xcat}{dv1['postfix']}", f"{xcat}{dv2['postfix']}"],
        kind="box",
        sort_cids_by="mean",
        start="2000-01-01",
        size=(12, 4),
        title=f"{xcat}: {dv1['label']} vs. {dv2['label']}",
    )
(1, 2)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/865bb9b0a8a79d790642a5f2cd9cb453067b325cab8506db1c877ea08a8dd976.png
(2, 3)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/f6f2f3cf5d24d12d5516f15d88577d64ee3147534b82c00f9f4ff90becd0439a.png
(1, 4)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/251a96d8fa6d9277c867d00522374e37a8aa5531e58b0ecab342e7d7869854a9.png

Estimate asset return elasticities with return_beta #

The function return_beta() estimates betas (elasticities) of a return category to a benchmark. It returns either just the betas or hedged returns of the cross-sections. Hedged returns are returns on a composite position on the principal contract and the benchmark to offset the elasticity of the former with respect to the latter. At present, the only method used to calculate the beta is a simple OLS regression. If oos is set to True (default), the function calculates hedge ratios out of sample, i.e., for each period based on estimates up to the previous period. The related re-estimation frequency is set with refreq (default monthly). The re-estimation is conducted at the end of the period and used as a hedge ratio for all days in the following period. The argument min_obs sets the minimum number of observations, after which the hedging ratio is initially calculated. If the betas are estimated out of sample, calculations are only done for periods after the minimum number of periods are available.

dfx = dfx[["cid", "xcat", "real_date", "value"]]
cids_sel = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK"]
dfh = msp.return_beta(
    dfx,
    xcat="FXXR_NSA",
    cids=cids_sel,
    benchmark_return="USD_EQXR_NSA",
    oos=False,
    hedged_returns=True,
    start="2002-01-01",
    refreq="m",
)
dfh

dfx = msm.update_df(df=dfx, df_add=dfh)
#dfx["xcat"].unique()

The auxiliary function hedge_ratio_display() visualizes the hedge ratios estimated by the hedge_ratio() function.

sns.set(rc={"figure.figsize": (12, 4)})
msp.beta_display(dfh)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/14154df0c89be3e71ca3637fdeb531d5f669500151c0fd5041ea4f4566b58cf1.png

Hedged returns vs un-hedged returns can be displayed using view_timelines() function:

xcats_sel = ["FXXR_NSA", "FXXR_NSA_H"]

msp.view_timelines(
    dfx,
    xcats=xcats_sel,
    cids=cids_sel,
    ncol=3,
    cumsum=True,
    start="2000-01-01",
    same_y=False,
    all_xticks=False,
    title="Unhedged vs hedged cumulative FX forward returns, % of notional: dominant cross",
    xcat_labels=["Unedged", "Hedged"],
    height=3,
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/ee455d06cf60724ad818273f78a8a99a41b769f2e2ff78a6ef475c2df6d40393.png

Generate returns (and carry) of a group of contracts with Basket #

The Basket class supports the calculation of returns and carry of groups of financial contracts using various weighting methods. The main argument is a list of contracts. It is very important to specify any invalid blacklisted periods associated with any of the cross-sections, as these invalid numbers would contaminate the whole basket.

In the below example, we instantiate a Basket object for a group of FX forward returns.

cids_fxcry = ["AUD", "INR", "NZD", "PLN", "TWD", "ZAR"]

ctrs_fxcry = [cid + "_FX" for cid in cids_fxcry]
basket_fxcry = msp.Basket(
    dfx,
    contracts=ctrs_fxcry,
    ret="XR_NSA",
    cry="CRR_NSA",
    start="2010-01-01",
    end="2023-01-01",
    blacklist=fxblack,
)
for attribute, value in basket_fxcry.__dict__.items():
    if not isinstance(
        value, (pd.DataFrame, dict)
    ):  # print all non-df and non-dictionary attributes
        print(attribute, " = ", value)
contracts  =  ['AUD_FX', 'INR_FX', 'NZD_FX', 'PLN_FX', 'TWD_FX', 'ZAR_FX']
ret  =  XR_NSA
ticks_ret  =  ['AUD_FXXR_NSA', 'INR_FXXR_NSA', 'NZD_FXXR_NSA', 'PLN_FXXR_NSA', 'TWD_FXXR_NSA', 'ZAR_FXXR_NSA']
cry_flag  =  True
ticks_cry  =  ['AUD_FXCRR_NSA', 'INR_FXCRR_NSA', 'NZD_FXCRR_NSA', 'PLN_FXCRR_NSA', 'TWD_FXCRR_NSA', 'ZAR_FXCRR_NSA']
cry  =  ['CRR_NSA']
wgt_flag  =  False
ticks_wgt  =  []
dfws_wgt  =  None
tickers  =  ['AUD_FXXR_NSA', 'INR_FXXR_NSA', 'NZD_FXXR_NSA', 'PLN_FXXR_NSA', 'TWD_FXXR_NSA', 'ZAR_FXXR_NSA', 'AUD_FXCRR_NSA', 'INR_FXCRR_NSA', 'NZD_FXCRR_NSA', 'PLN_FXCRR_NSA', 'TWD_FXCRR_NSA', 'ZAR_FXCRR_NSA']
start  =  2010-01-01
end  =  2023-01-01

The make_basket method calculates and stores all performance metrics, i.e., returns and carry, for a specific weighting method of the basket. The different weighting options available provide flexibility in constructing a composite measure that meets specific needs or objectives:

equal : all contracts with non-NA returns have the same weight (default value)
fixed : the weights are proportionate to a single list of values provided. This allows for more customization in the weighting of each contract based on specific preferences or criteria
invsd : the weights based on inverse to standard deviations of recent returns. This can be useful for creating a measure that gives more weight to contracts with more stable returns over time . The lookback period is per default 21 observations, but can be changed with lback_periods . The default method is Exponential MA, it can be changed to a simple moving average, “ma” under lback_meth
values : the weights proportionate to a panel of values of exogenous weight category. This allows for weighting based on external factors that may be relevant to the specific contracts in the basket
inv_values : weights are inversely proportionate to the values of an exogenous weight category. This can be useful for creating a measure that gives less weight to contracts with high values in the external factor, which may indicate greater risk or volatility.

basket_fxcry.make_basket(weight_meth="equal", basket_name="GLB_FXCRY")
basket_fxcry.make_basket(weight_meth="invsd", basket_name="GLB_FXCRYVW")
basket_fxcry.make_basket(
    weight_meth="fixed",
    weights=[1 / 3, 1 / 6, 1 / 12, 1 / 6, 1 / 3, 1 / 12],
    basket_name="GLB_FXCRYFW",
)

The return_basket method returns basket performance data in a standardized format. The basket names for which the performance data are calculated can be limited by using the basket_names argument.

dfb = basket_fxcry.return_basket()
print(dfb.tail())
utiks = list((dfb["cid"] + "_" + dfb["xcat"]).unique())
f"Unique basket tickers: {utiks}"
       cid            xcat  real_date     value
20341  GLB  FXCRYFW_XR_NSA 2022-12-26 -0.002999
20342  GLB  FXCRYFW_XR_NSA 2022-12-27 -0.109833
20343  GLB  FXCRYFW_XR_NSA 2022-12-28  0.193735
20344  GLB  FXCRYFW_XR_NSA 2022-12-29  0.174977
20345  GLB  FXCRYFW_XR_NSA 2022-12-30  0.017897
"Unique basket tickers: ['GLB_FXCRY_CRR_NSA', 'GLB_FXCRY_XR_NSA', 'GLB_FXCRYVW_CRR_NSA', 'GLB_FXCRYVW_XR_NSA', 'GLB_FXCRYFW_CRR_NSA', 'GLB_FXCRYFW_XR_NSA']"

The return_weights method returns the effective weights used in a basket for all contracts. This can be useful if the same weights are to be used for a basket of predictive features.

dfb = basket_fxcry.return_weights()
print(dfb.head())
print(dfb["cid"].unique())
print(dfb["xcat"].unique())
dfbw = dfb.pivot_table(
    index="real_date", columns=["xcat", "cid"], values="value"
).replace(0, np.nan)
dfbw.tail(3).round(2)
   cid              xcat  real_date     value
0  AUD  FX_GLB_FXCRY_WGT 2010-01-01  0.166667
1  AUD  FX_GLB_FXCRY_WGT 2010-01-04  0.166667
2  AUD  FX_GLB_FXCRY_WGT 2010-01-05  0.166667
3  AUD  FX_GLB_FXCRY_WGT 2010-01-06  0.166667
4  AUD  FX_GLB_FXCRY_WGT 2010-01-07  0.166667
['AUD' 'INR' 'NZD' 'PLN' 'TWD' 'ZAR']
['FX_GLB_FXCRY_WGT' 'FX_GLB_FXCRYVW_WGT' 'FX_GLB_FXCRYFW_WGT']
xcat FX_GLB_FXCRYFW_WGT FX_GLB_FXCRYVW_WGT FX_GLB_FXCRY_WGT
cid AUD INR NZD PLN TWD ZAR AUD INR NZD PLN TWD ZAR AUD INR NZD PLN TWD ZAR
real_date
2022-12-28 0.29 0.14 0.07 0.14 0.29 0.07 0.08 0.26 0.09 0.27 0.22 0.07 0.17 0.17 0.17 0.17 0.17 0.17
2022-12-29 0.29 0.14 0.07 0.14 0.29 0.07 0.09 0.27 0.09 0.26 0.23 0.07 0.17 0.17 0.17 0.17 0.17 0.17
2022-12-30 0.29 0.14 0.07 0.14 0.29 0.07 0.09 0.27 0.09 0.25 0.23 0.07 0.17 0.17 0.17 0.17 0.17 0.17

The weights used in a basket for the contracts can be plotted using the weight_visualizer method:

basket_fxcry.weight_visualiser(basket_name="GLB_FXCRYVW", facet_grid=True)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/cfe51fba6065ff135a15c75cbd58b154a48f28d355d69b181c042cc76898ce88.png
basket_fxcry.weight_visualiser(basket_name="GLB_FXCRYVW", subplots=False, size=(10, 4))
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/42d0871e1333eb20e09a760cd9e4f5cd82838d62b491aa47a6ccdb4a4931786f.png
basket_fxcry.weight_visualiser(
    basket_name="GLB_FXCRYVW", subplots=True, 
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/4c89de2bf8f187686f4e9ada2d77c7e1ff1a68540664d8abf20166e5b9615205.png

Calculate linear combinations of panels with linear_composite #

The linear_composite() function is designed to calculate linear combinations of different categories. It can produce a composite even if some of the component data are missing. This flexibility is valuable because it enables to work with the available information rather than discarding it entirely. This behavior is desirable if one works with a composite of a set of categories that capture a similar underlying factor.

In the three examples below, the linear_composite() function is used to calculate the average of two inflation trend metrics by cross-section, the average of two cross-sections for one category (inflation trend) using fixed weights, and the same average using another category as weights. If one of the two constituents or cross-sections is missing, the composite is equal to the remaining.

# Calculation of the simple average of two inflation trend metrics by cross-section

weights = [1, 1]
signs = [1, 1]
cids_sel = ["EUR", "USD", "INR", "ZAR"]
xcats_sel = ["CPIC_SJA_P6M6ML6AR", "CPIH_SJA_P6M6ML6AR"]

dflc = msp.linear_composite(
    df=dfx,
    xcats=xcats_sel,
    cids=cids_sel,
    weights=weights,
    signs=signs,
    complete_xcats=False,
    new_xcat="Composite",
)

df = msm.update_df(df, dflc)

msp.view_timelines(
    dfx,
    xcats=xcats_sel + ["Composite"],
    cids=cids_sel,
    ncol=2,
    start="1995-01-01",
    same_y=False,
    title="Core and headline inflation trends",
    
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/a480cacc0f6bc70fee59350e213ea8db5752ce3d31dc00397668bc8de0bcffe3.png
# Create a composite cross-section over one category. Difference between EUR and USD CPI trends

weights = [1,1]
signs = [1, -1] # setting weights to [1,1] and signs = [1, -1] we effectively subtract the "USD" time series from "EUR"
cids_sel = ["EUR", "USD"]

dflc = msp.linear_composite(
    df=dfx,
    start = "2016-01-01",
    xcats="CPIC_SJA_P6M6ML6AR",
    cids=cids_sel,
    weights=weights,
    signs=signs,
    complete_cids=False,
    new_cid="EUR-USD",
)

df = msm.update_df(df, dflc)

msp.view_timelines(
    dfx,
    xcats="CPIC_SJA_P6M6ML6AR",
    cids=cids_sel+["EUR-USD"],
    start = "2016-01-01",
    same_y=False,
    title = "Seasonally and jump-adjusted core consumer price trends, % 6m/6m ar for major markets",
    
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/962ad8d06964b20cda71ca9fd6e2b07770c3941c2a8a8f7700341fce6b3764ba.png

Another example of the use of linear_composite() function is to create a composite category using another category as weights. The example below uses the 5-year real GDP growth as a weight for the EUR-USD inflation trends. The 5-year real GDP growth is taken purely as an example, it would make more sense to take GDP shares as weights as done in the notebook Business sentiment and commodity future returns . However, the GDP shares time series is only available in full JPMaQS data, not available in the KAGGLE version, hence it is not part of this notebook.

weights = "RGDP_SA_P1Q1QL4_20QMA"
cids_sel = ["EUR", "USD"]
xcat="CPIC_SJA_P6M6ML6AR"
signs = [1, -1]

dflc = msp.linear_composite(
    df=dfx,
    start = "2016-01-01",
    xcats=xcat,
    cids=cids_sel,
    weights=weights,
    signs=signs,
    complete_cids=False,
    new_cid="EUR-USD, weighted by 5-year real GDP growth (moving average)",
)

df = msm.update_df(df, dflc)

msp.view_timelines(
    dfx,
    xcats="CPIC_SJA_P6M6ML6AR",
    cids=cids_sel+["EUR-USD, weighted by 5-year real GDP growth (moving average)"],
    ncol=4,
    start = "2016-01-01",
    same_y=False,
    title = "Seasonally and jump-adjusted core consumer price trends, % 6m/6m ar for major markets",
    
    )
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/da14ff17b5d76cd5ed423538654b45b593057be9f67a502c1e8e917f55e5e604.png

This notebook as well as several other notebooks on Macrosynergy Academy site are using linear composite macro trend pressure indicator. The idea is simple: we add up (negatives of) excess growth, inflation, credit expansion, and real yield using the most common metrics. This gives a simple first-shot candidate for a trading signal. To start with, this composite indicator is not optimized. Later on, we use machine learning module of the package to optimize this indicator.

macros = ["XGDP_NEG", "XCPI_NEG", "XPCG_NEG", "RYLDIRS05Y_NSA"]
xcatx = macros

dfa = msp.linear_composite(
    dfx,
    xcats=[xc + "_ZN4" for xc in xcatx],
    cids=cids,
    new_xcat="MACRO_AVGZ",
)

dfx = msm.update_df(dfx, dfa)

Relating #

Investigate relations between panels with CategoryRelations #

CategoryRelations is a tool that allows for quick visualization and analysis of two categories , i.e., two time-series panels. To use this tool, the user needs to set up certain arguments upfront that determine the period and type of aggregation for which the relation is being analyzed. Here are some of the key arguments:

  • The two-element list xcats sets the categories to be related. For predictive relation, the first is considered the predictive feature category, and the second is the target.

  • The argument freq determines the base period of the analysis, typically set as monthly or quarterly. Since JPMaQS data frames are daily, this requires aggregation of both categories. These are set with xcat_aggs and can use any of pandas’ aggregation methods, such as sum or last . The default is mean .

  • The argument lag sets the lag (delay of arrival) of the first (feature) category in base periods. A positive value means that the feature is related to subsequent targets and - thus - allows analyzing its predictive power.

  • The feature category can be modified by differencing or calculating percentage changes with xcat1_chg1 argument and the auxiliary n_periods argument.

  • A useful argument is xcat_trims , which removes observations above a maximum for the first and the second category in case the dataset contains invalid outliers. It trims the dataset and does not winsorize. Large values are interpreted as invalid and removed, not set to a limit.

  • fwin can be used to transform the target category into forward-moving averages of the base period. This is useful for smoothing out volatility, but should not be used for formal inference.

  • blacklist excludes invalid periods from the analysis.

Based on the above explanation, the following instantiation prepares the analysis of the predictive power of a quarterly change of an inflation metric and the subsequent quarterly 5-year IRS returns while excluding quarterly values above 10:

cr = msp.CategoryRelations(
    dfx,
    xcats=["CPIC_SJA_P6M6ML6AR", "DU05YXR_VT10"],
    cids=cids_dm,
    xcat1_chg="diff",
    n_periods=1,
    freq="Q",
    lag=1,
    fwin=1,  # default forward window is one
    xcat_aggs=[
        "last",
        "sum",
    ],  # the first method refers to the first item in xcats list, the second - to the second
    start="2000-01-01",
    xcat_trims=[10, 10],
)

for attribute, value in cr.__dict__.items():
    print(attribute, " = ", value)
xcats  =  ['CPIC_SJA_P6M6ML6AR', 'DU05YXR_VT10']
cids  =  ['AUD', 'CAD', 'CHF', 'EUR', 'GBP', 'JPY', 'NOK', 'NZD', 'SEK', 'USD']
val  =  value
freq  =  Q
lag  =  1
years  =  None
aggs  =  ['last', 'sum']
xcat1_chg  =  diff
n_periods  =  1
xcat_trims  =  [10, 10]
slip  =  0
df  =                  CPIC_SJA_P6M6ML6AR  DU05YXR_VT10
real_date  cid                                  
2005-12-30 JPY           -0.278129     -1.258532
2006-06-30 JPY            0.465870     -0.099437
2006-09-29 JPY            0.364873      8.571129
2006-12-29 JPY           -0.166131     -1.365543
2007-03-30 JPY           -0.165659      1.497307
...                            ...           ...
2023-06-30 AUD            0.818365     -6.961307
2023-09-29 AUD            0.146163     -2.444644
2023-12-29 AUD           -0.793494      3.937294
2024-03-29 AUD           -1.469322     -0.685874
2024-06-28 AUD           -0.707961     -7.220023

[771 rows x 2 columns]

The .reg_scatter() method is convenient for visualizing the relationship between two categories, including the strength of the linear association and any potential outliers. By default, it includes a regression line with a 95% confidence interval, which can help assess the significance of the relationship.

The reg_scatter() method allows to split the analysis by cross-section ( cid ) or year , which is useful for examining how the relationship between the two categories varies across different markets or over time. This can be especially interesting in cases where the relationship between the two categories is not constant over time or across different markets.

multiple_reg_scatter() method allows comparison of several pairs of two categories relationships side by side, including the strength of the linear association and any potential outliers. By default, it includes a regression line with a 95% confidence interval, which can help assess the significance of the relationship.

The coef_box parameter of the reg_scatter() method provides details about the relationship, such as correlation coefficient and probability of significance, which can help users assess the strength and statistical significance of the relationship.

The prob_est argument in this context is used to specify which type of estimator to use for calculating the probability of a significant relationship between the feature category and the target category.

The default value for prob_est is "pool" , which means that all cross-sections are pooled together, and the probability is based on that pool. This approach can potentially lead to issues with “pseudo-replication” if there is a correlation between the analyzed markets.

An alternative option for prob_est is "map" , which stands for “Macrosynergy panel test”. Often, cross-sectional experiences are not independent and subject to common factors. Simply stacking data can lead to “pseudo-replication” and overestimated significance of correlation. A better method is to check significance through panel regression models with period-specific random effects. This technique adjusts targets and features of the predictive regression for common (global) influences. The stronger these global effects, the greater the weight of deviations from the period-mean in the regression. In the presence of dominant global effects, the test for the significance of a feature would rely mainly upon its ability to explain cross-sectional target differences. Conveniently, the method automatically accounts for the similarity of experiences across sections when assessing the significance and, hence, can be applied to a wide variety of features and targets. View a related research post here that provides more information on this approach.

["XGDP_NEG", "XCPI_NEG", "XPCG_NEG", "RYLDIRS05Y_NSA"]


crx = msp.CategoryRelations(
    dfx,
    xcats=["MACRO_AVGZ", "DU05YXR_VT10"],
    cids=cids_dm,
    n_periods=1,
    freq="Q",
    lag=1,  # delay of arrival of first (explanatory) category in periods as set by freq
    xcat_aggs=["last", "sum"],
    start="2000-01-01",
)



crxx = msp.CategoryRelations(
    dfx,
    xcats=["XCPI_NEG_ZN4", "DU05YXR_VT10"],
    cids=cids_dm,
    n_periods=1,
    freq="Q",
    lag=1,  # delay of arrival of first (explanatory) category in periods as set by freq
    xcat_aggs=["last", "sum"],
    start="2000-01-01",
)


crxxx = msp.CategoryRelations(
    dfx,
    xcats=["XGDP_NEG_ZN4", "DU05YXR_VT10"],
    cids=cids_dm,
    n_periods=1,
    freq="Q",
    lag=1,  # delay of arrival of first (explanatory) category in periods as set by freq
    xcat_aggs=["last", "sum"],
    start="2000-01-01",
)

crxxxx = msp.CategoryRelations(
    dfx,
    xcats=["XPCG_NEG_ZN4", "DU05YXR_VT10"],
    cids=cids_dm,
    n_periods=1,
    freq="Q",
    lag=1,  # delay of arrival of first (explanatory) category in periods as set by freq
    xcat_aggs=["last", "sum"],
    start="2000-01-01",
)



msv.multiple_reg_scatter(
        [crx, crxx, crxxx, crxxxx],
        title="z-scored macroeconomic trends and subsequent quarterly IRS returns",
     #   xlab="Core CPI inflation, %oya, versus effective inflation target, relative to all DM, end-of-month",
        ylab="Next quarter's 5-year IRS return",
        ncol=2,
        nrow=2,
        figsize=(15, 10),
        prob_est="map",
        coef_box="lower left", 
        subplot_titles=["z-score linear composite macro pressure indicator", "z-score negative excess inflation trend", "z-score negative excess GDP trend", "z-score negative excess private credit growth trend"],
     )
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/99953d106566529abd2d75fc867b0e7d17ef8836a9f64a29714083b55db12d0a.png

The years parameter specifies the number of years to aggregate the data for the scatterplot, and when combined with labels=true , it can be used to visualize medium-term concurrent relations. This parameter overrides the freq parameter and doesn’t allow lags, meaning that only the 3-year aggregated feature is compared to the 3-year aggregated target. This can be useful for identifying medium-term concurrent relations.

The separator argument in the .reg_scatter() method supports visualization of the stability of the feature-target relation for different sub-periods and cross-sections. When the separator is set to a year integer, it splits the data into two sub-samples, with the second one starting from the separation year. As a result, regression lines and scatter plots are shown separately for each sub-sample, allowing us to visually assess the stability of the feature-target relation before and after the separation year.

cids_sel = cids_dm[:5]
cr = msp.CategoryRelations(
    df,
    xcats=["FXCRR_NSA", "FXXR_NSA"],
    cids=cids_sel,
    freq="M",
    years=3,
    lag=0,
    xcat_aggs=["mean", "sum"],
    start="2005-01-01",
    blacklist=fxblack,
)
cr.reg_scatter(
    title="Real FX carry and returns (3-year periods)",
    labels=True,
    prob_est="map",
    xlab="Real carry, % ar",
    ylab="Returns, % cumulative",
    coef_box="upper left",
    size=(12, 6),
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/7fb7a6400d7c87d070ef84827d0ea0c237b7a0c2e0e9951621e9a6435b52b3b4.png
cr = msp.CategoryRelations(
    dfx,
    xcats=["FXCRR_NSA", "FXXR_NSA"],
    cids=list(set(cids_em) - set(["ILS", "CLP"])),
    freq="Q",
    years=None,
    lag=1,
    xcat_aggs=["last", "sum"],
    start="2000-01-01",
    blacklist=fxblack,
    xcat_trims=[40, 20],
)
cr.reg_scatter(
    title="Real FX carry and returns (excluding extreme periods)",
    reg_order=1,
    labels=False,
    xlab="Real carry, % ar",
    ylab="Next month's return",
    coef_box="lower right",
    prob_est="map",
    separator=2010,
    size=(10, 6),
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/7f0384115c6bda491fa1f49c2be8e3edc6a9923d97d776cc0c8c294e96f1c15c.png

If the separator argument is set to “cids”, the relationship is shown separately for all cross-sections of the panel. This allows to examine whether the relationship is consistent across markets.

cr.reg_scatter(
    title="Real FX carry and returns (excluding extreme periods)",
    reg_order=1,
    labels=False,
    xlab="Real carry, % ar",
    ylab="Next month's return",
    separator="cids",
    title_adj=1.01,
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/458b9708205f8692d765e417a521f0aa151a708783be2a7014f435b16ff52840.png

The basic statistics of a standard pooled linear regression analysis, combining all features and targets of the panel without further structure and effects, can be displayed based on a statsmodels function by calling the method .ols_table() . For a detailed interpretation of the results from the .ols_table() output, please view this article , which provides a general overview of interpreting linear regression results using the statsmodels summary table.

cr = msp.CategoryRelations(
    dfx,
    xcats=["CPIC_SJA_P6M6ML6AR", "DU05YXR_VT10"],
    cids=cids_dm,
    xcat1_chg="diff",
    n_periods=1,
    freq="M",
    lag=1,
    xcat_aggs=["last", "sum"],
    start="2000-01-01",
)
cr.ols_table()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:           DU05YXR_VT10   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     1.032
Date:                Thu, 25 Apr 2024   Prob (F-statistic):              0.310
Time:                        18:56:30   Log-Likelihood:                -7516.3
No. Observations:                2757   AIC:                         1.504e+04
Df Residuals:                    2755   BIC:                         1.505e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                  0.2327      0.070      3.303      0.001       0.095       0.371
CPIC_SJA_P6M6ML6AR    -0.3130      0.308     -1.016      0.310      -0.917       0.291
==============================================================================
Omnibus:                      182.732   Durbin-Watson:                   1.776
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              676.185
Skew:                          -0.234   Prob(JB):                    1.47e-147
Kurtosis:                       5.380   Cond. No.                         4.38
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Visualize relations across sections or categories with correl_matrix #

The correl_matrix() function visualizes two types of Pearson correlations:

  • correlations within a single category across different cross-sections, or

  • correlations across different categories.

A key argument is freq , which downsamples the standard JPMaQS frequency of business daily data to weekly (‘W’), monthly (‘M’), or quarterly (‘Q’), aggregate by mean.

Additionally, the user can set the cluster argument to True to order the correlated series by proximity based on hierarchical clustering. This can help visualize groups of related variables, making it easier to identify patterns and relationships within the correlation matrix.

cids = cids_dm + cids_em
msp.correl_matrix(
    dfx, xcats="FXXR_NSA", freq="Q", cids=cids, size=(15, 10), cluster=True
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/076f2a51aa4858379bf226a4d269815e0dc51e8048e7237669df67c5ffa788c4.png

One can pass a list of categories to the xcats argument of the correl_matrix() function to display correlations across categories. The resulting output will be a matrix of correlation coefficients between the categories. The freq and cluster arguments can also be used in this case to downsample the frequency of the data and to cluster the categories based on their proximity, respectively.

xcats_sel = ecos
msp.correl_matrix(
    dfx,
    xcats=xcats_sel,
    cids=cids,
    freq="M",
    start="2003-01-01",
    size=(15, 10),
    cluster=True,
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/48d5bccf9d2a36f27d04b489c8c683f1860c24355d54a4147dd0cd1f45455596.png

The msp.correl_matrix() function is designed to compute correlations between any two categories. In the context outlined below, we specifically focus on examining correlations between inflation trends and subsequent asset returns. To enhance the depth of our analysis, we introduce a flexibility in exploring various lags for explanatory variables. This flexibility is facilitated through the use of a dictionary where desired lags can be specified. In this instance, we are interested in investigating lags at monthly frequencies, with specific values set at 1, 3, and 6 months.

macroz = [m + "_ZN4" for m in macros]
feats = macroz

rets=["DU02YXR_VT10",
    "DU02YXR_NSA",
    "DU05YXR_VT10",
    "DU05YXR_NSA",
     "EQXR_NSA",
    "EQXR_VT10",
    "FXXR_NSA",
    "FXXR_VT10"]

lag_dict = {"XGDP_NEG_ZN4": [1, 3], 
    "XCPI_NEG_ZN4": [1, 3], # excess inflation
    "XPCG_NEG_ZN4": [1, 3],  # excess real interest rate
    "RYLDIRS05Y_NSA_ZN4": [1, 3]}


msp.correl_matrix(
    dfx,
    xcats=feats,
    xcats_secondary=rets, 
    cids="EUR",
    freq="M",
    start="2003-01-01",
    lags=lag_dict,
    max_color=0.4,
    cluster=True, 
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/1f410fca2c0d050c208c1c395dcf40a99d478af51a0431b962d3ecc8e934b826.png

Learning #

The macrosynergy.learning subpackage contains functions and classes to assist the creation of statistical learning solutions with macro quantamental data. The functionality is built around integrating the macrosynergy package and associated JPMaQS data with the popular scikit-learn library, which provides a simple interface for fitting common statistical learning models, as well as feature selection methods, cross-validation classes, and performance metrics.

Most standard scikit-learn classes and functions do not respect the panel format of quantamental dataframes, i.e., the double-indexing by both cross section and time period. The below customized wrappers allow to apply the panel format.

Please also see the introductory notebooks where macrosynergy.learning is extensively employed:

The initial step in transforming JPMaQS data into a format suitable for machine learning involves the categories_df() function. This function converts daily data into a monthly format and introduces a lag in the feature variables, a common practice in machine learning to facilitate predictive analysis.

The next step involves constructing the feature and target dataframes:

  • Features ( X ): To form the monthly feature set, we select the last recorded value of daily series for each month. This approach ensures that the end-of-period snapshot, which is crucial in financial analysis, is captured, providing a clear representation of each month’s final state. Potential (daily) features (z-scores) are collected in the list macroz . In this list we earlier collected z-scores of the following variables:

    • “XGDP_NEG” - negative of the intuitive growth trend,

    • “XCPI_NEG” - negative of excess inflation measure

    • “XPCG_NEG” - negative of excess private credit growth,

    • “RYLDIRS05Y_NSA” - real IRS yield: 5-year maturity, expectations-based

  • Targets ( y ): The target variable is created by aggregating the daily returns over each month to derive a total monthly return. This method offers a direct target for predictive models by emphasizing the cumulative outcome for the month, rather than the daily fluctuations. In this notebook, the target dataframe includes (monthly) returns on fixed receiver position, % of risk capital on position scaled to 10% (annualized) volatility target: 5-year maturity DU05YXR_VT10

# Specify features and target category
xcatx = macroz + ["DU05YXR_VT10"]

# Downsample from daily to monthly frequency (features as last and target as sum)
dfw = msm.categories_df(
    df=dfx,
    xcats=xcatx,
    cids=cids_dux,
    freq="M",
    lag=1,
    blacklist=fxblack,
    xcat_aggs=["last", "sum"],
)

# Drop rows with missing values and assign features and target
dfw.dropna(inplace=True)
X = dfw.iloc[:, :-1]
y = dfw.iloc[:, -1]

Cross-validation methods #

Cross-validation refers to the evaluation of a model’s predictive accuracy through multiple divisions of the data into training and validation sets. Each division is known as a “fold.” The macrosynergy package supports the splitting of panel data into folds through three classes:

  • ExpandingIncrementPanelSplit() creates training panel splits that expand over time at fixed intervals, followed by test sets of predetermined time spans. This method allows for a progressive inclusion of more data into the training set over time.

  • ExpandingKFoldPanelSplit() also creates expanding folds and involves a fixed number of splits. Training panels in this configuration are always temporally adjacent and chronologically precede the test set, ensuring that each test phase is preceded by a comprehensive training phase.

  • RollingKFoldPanelSplit() arranges splits where training panels of a fixed maximum duration can directly precede or follow the test set, allowing the use of both past and future data in training. While this arrangement does not mimic the sequential flow of information typical in time series analysis, it effectively leverages the cyclic nature of economic data.

ExpandingIncrementPanelSplit() #

The ExpandingIncrementPanelSplit() class facilitates the generation of expanding windows for cross-validation, essential for modeling scenarios where data is incrementally available over time. This class divides the dataset into training and testing sets, systematically increasing the size of the training set by one observation with each iteration. This approach effectively simulates environments where new information is gradually incorporated at set intervals.

Important parameters here are:

  • train_intervals specifies the length of the training interval in time periods. This parameter controls how much the training set expands with each new split.

  • min_cids sets the minimum number of cross-sections required for the initial training set, with the default being four. This is crucial in scenarios where panel data is unbalanced, ensuring there are enough cross-sections to begin the training process.

  • min_periods sets the smallest number of time periods required for the initial training set, with the default being 500 native frequency units. This is particularly important in an unbalanced panel context and should be used in conjunction with min_cids .

  • test_size determines the length of the test set for each training interval. By default, this is set to 21 periods, which follows the training phase.

  • max_periods defines the maximum duration that any training set can reach during the expanding process. If this cap is reached, the earliest data periods are excluded to maintain this constraint. By setting this value, rolling training is effectively performed.

split_xi = msl.ExpandingIncrementPanelSplit(train_intervals=12, min_periods=12, test_size=24, min_cids=2)

visualise_splits() #

The method visualise_splits can be applied to a splitter and is a convenient method for visualizing the splits produced by each splitter based on the full data sets of features and targets.

split_xi.visualise_splits(X,y)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/3bff768ad0600aab8d8be4ae1ba8deafae1340f6fc41fa9b1467c091c8c50b8f.png

ExpandingKFoldPanelSplit() #

The ExpandingKFoldPanelSplit() class produces sequential learning scenarios, where information sets grow at fixed intervals.

The key parameter here is n_splits , which determines the number of desired splits (must be at least 2). As above, visualise_splits() method is used to visualise if the split has been performed as intended. This replicates scikit-learn ’s TimeSeriesSplit class for panel-format data.

split_xkf = msl.ExpandingKFoldPanelSplit(n_splits=5)
split_xkf.visualise_splits(X, y)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/df71cf0d6b1513483266929047406fe391344aae2561c132e78388d9c61d25c4.png

RollingKFoldPanelSplit() #

The RollingKFoldPanelSplit class produces paired training and test splits, created for a data panel. It is similar to scikit-learn’s KFold class for simple time series. Training and test sets need to be adjacent, but the former needs not strictly precede the latter. This gives the effect of the test set “rolling” forward in time.

split_rkf = msl.RollingKFoldPanelSplit(n_splits=5)
split_rkf.visualise_splits(X, y)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/8f70929d332f3f337a12902c7854139a9e6f181007ba3b62f731f89987584c70.png

Metrics #

Cross validation can be used for model selection and hyperparameter selection, but a statistic is required to be calculated to determine the optimal model. This can be a performance metric like accuracy and balanced accuracy (to be maximised) or RMSE and MAE (to be minimised).

The macrosynergy.learning subpackage contains a collection of custom metrics that are compatible with scikit-learn . All such metrics are implemented as functions accepting two arguments: y_true , the true targets in a supervised learning problem, and y_pred , the predicted targets by a trained model. These are:

  • panel_significance_probability() : computes the significance probability of correlation after fitting a linear mixed effects model between predictions and true targets, accounting for cross-sectional correlations present in the panel. See the research piece ‘ Testing macro trading factors ’ for more information.

  • regression_accuracy() : computes the accuracy between the signs of predictions and targets.

  • regression_balanced_accuracy() : computes the balanced accuracy between the signs of predictions and targets.

  • sharpe_ratio() : computes a naive Sharpe ratio based on the model predictions.

  • sortino_ratio() : computes a naive Sortino ratio based on the model predictions.

Feature selectors #

A scikit-learn pipeline can incorporate a layer of feature selection. We provide some custom selectors in the macrosynergy.learning subpackage for use over a panel.

  • LassoSelector : selects features through a LASSO regression. The alpha of the regression, as well as choice of a positive restriction, is required.

  • ENetSelector : selects features through an Elastic Net regression. The alpha of the regression, as well as the l1_ratio and choice of a positive restriction, is required.

  • MapSelector : selects features based on significance from the Macrosynergy panel test. The p-value threshold is required, as well as choice of a positive restriction. For more information on the panel test, see the research piece ‘ Testing macro trading factors ’.

Feature transformers #

Within a scikit-learn pipeline, it is often useful to transform features into new ones - for instance scaling and/or averaging. The macrosynergy.learning subpackage contains some custom transformers:

  • PanelStandardScaler : transforms features by subtracting historical mean and dividing by historical standard deviation.

  • PanelMinMaxScaler : transforms features by normalizing them between zero and one.

  • FeatureAverager : condenses features into a single feature through averaging.

Predictor classes #

The last stage of any scikit-learn pipeline is a “predictor” class that is trained based on selected and transformed historical features and outputs predictions. We provide the following predictors in macrosynergy.learning :

  • NaivePredictor : a naive predictor class that expects only a single feature as input, and outputs that single feature as the prediction. For instance, this could be used in conjunction with FeatureAverager to create a signal of equally weighted feature z-scores.

  • SignWeightedLinearRegression : a weighted least squares linear regression model that equalises the importance of negative return with positive return historical samples, removing a possible sign bias learnt by the model.

  • TimeWeightedLinearRegression : a weighted least squares linear regression model that increases the importance of more recent samples, by specifying a half-life of exponentially decaying weights with time for each historical sample.

  • LADRegressor : a linear model that is fit by minimising absolute residuals instead of squared residuals.

  • SignWeightedLADRegressor : a weighted least squares LAD regression model that equalises the importance of negative return with positive return historical samples, removing a possible sign bias learnt by the model.

  • TimeWeightedLADRegressor : a weighted least squares LAD regression model that increases the importance of more recent samples, by specifying a half-life of exponentially decaying weights with time for each historical sample.

Signal optimization #

The SignalOptimizer class is used for sequential model selection, fitting, optimization and forecasting based on quantamental panel data.

Three use cases are discussed in detail in the notebook Signal optimization basics :

  • Feature selection chooses from candidate features to combine them into an equally weighted score

  • Return prediction estimates the predictive relation of features and combines them in accordance with their coefficient into a single prediction.

  • Classification estimates the relation between features and the sign of subsequent returns and combines their effect into a binary variable of positive or negative returns.

Below, we showcase the second case, focusing on the principals of generation of an optimized regression-based signal:

The main arguments for instantiating the SignalOptimizer are:

  • inner_splitter , the splitter to be deployed for the cross-validation that determines the choice of the model and hyperparameters,

  • X and y , the double indexed feature matrix and target vector,

  • initial_nsplits and threshold_ndates , which specify the number of cross-validation splits in the initial training set, and the number of dates to be added to this initial set in order for the number of folds to increase by one.

  • blacklist a standardized dictionary to exclude specific combinations of periods and cross-sections from cross-validation

Below, we instantiate the signal optimizer so that the initial split uses 5 cross-validation folds and increases by one every year.

splitter_fsz = msl.RollingKFoldPanelSplit(n_splits=5)
so_reg = msl.SignalOptimizer(
    inner_splitter=splitter_fsz,
    X=X,
    y=y,
    initial_nsplits=5,
    threshold_ndates=12,
    blacklist=fxblack,
)

calculate_predictions() #

The calculate_predictions() method returns predictions for sequentially optimized model type, hyperparameters and parameters. Important parameters here are:

  • name is a label identifying the specific signal optimization process,

  • models is dictionary of scikit-learn predictors or pipelines that contains choices for the type of model to be deployed,

  • hparam_grid is a nested dictionary defining the hyperparameters to consider for each model type,

  • metric - a scikit-learn scorer object that serves as the criterion for optimization

  • min_cids , min_periods and test_size have equivalent meaning as in ExpandingIncrementPanelSplit()

# Model types

mods_reg = {
    "linreg": Pipeline([
        ('selector', msl.LassoSelector(alpha=1e-3, positive=True)),
        ('model', LinearRegression()),
    ]),
}

# Hyperparameter grids

grids_reg = {
    "linreg": {"model__fit_intercept": [True, False]},
}

# Optimization criterion

score_reg = make_scorer(r2_score, greater_is_better=True)
%%time

tdf = so_reg.calculate_predictions(
    name="MACRO_OPTREG",
    models=mods_reg,
    hparam_grid=grids_reg,
    metric=score_reg,
    min_cids=4,
    min_periods=36,
)
100%|████████████████████████████████████████████████████████████████████████████████| 254/254 [00:11<00:00, 21.91it/s]
Wall time: 13.6 s

models_heatmap() #

The models_heatmap method of the SignalOptimizer class visualizes optimal models used for signal calculation over time. If many models have been considered, their number can be limited by the cap argument.

# Get optimized signals and view models heatmap
dfa = so_reg.get_optimized_signals()
som = so_reg.models_heatmap(name="MACRO_OPTREG", cap=6, 
                        title="Optimal regression model used over time", figsize=(18, 6))
display(som)

dfx = msm.update_df(dfx, dfa)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/76f27a56eb96e7b17dc8ea5b53ec52304044170c4908d6be2230adc05ce54a2b.png
None

feature_selection_heatmap() #

The feature_selection_heatmap method of the SignalOptimizer class visualizes the features that were selected over time by the last selector in a scikit-learn pipeline if it is of the appropriate time, such as the LassoSelector .

so_reg.feature_selection_heatmap(
    name="MACRO_OPTREG", title="Feature selection heatmap", figsize=(16, 6)
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/6dc8dbfd884262112d37ec7c5dc7bd8dc0d360de6d868435e839eb5abd8cc463.png

coefs_timeplot() #

The coefs_timeplot method creates a time plot of linear model regression coefficients for each feature. For these statistics to be recorded, the underlying scikit-learn predictor class (in this case, LinearRegression ) must contain coef_ and intercept_ attributes.

Gaps in the lines appear either when a model without the required attributes (e.g. a KNN or Random Forest) is selected or a feature selector (in this case, LassoSelector ) doesn’t select these features.

so_reg.coefs_timeplot(name="MACRO_OPTREG", figsize=(16, 6))
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/e72770b83f2e57dc27ac23e82175e2321e904d878158cfdf4332a2b8c2c85a5f.png

coefs_stackedbarplot() #

The coefs_stackedbarplot() method is an alternative to coefs_timeplot() and displays a stacked bar plot of average annual model coefficients over time.

so_reg.coefs_stackedbarplot(name="MACRO_OPTREG", figsize=(16, 6))
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/875bf794a98226a9cf6e378b2e87d352ebc73e675e64b9ebca44e89d214b9c16.png

intercepts_timeplot() #

Similarly to model coefficients, changing model intercepts can be visualised over time through a timeplot using the intercepts_timeplot() method.

so_reg.intercepts_timeplot(name="MACRO_OPTREG", figsize=(16, 6))
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/5f3f95afbe26cd3c372ebebfba8360c149e7d9a5d26c2c8911148eb8479b9518.png

nsplits_timeplot() #

The nsplits_timeplot() displays number of cross-validation splits that are applied over time. This is useful if at instantiation of SignalOptimizer values have been assigned to initial_nsplits and threshold_ndates that increase the number of cross-validation folds with the sample length.

so_reg.nsplits_timeplot(name="MACRO_OPTREG")
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/ffbd66dcf8a70474dd02c9cc4ce5e2438ce1d9a09365eb34fedce88c7688cdb6.png

Signaling #

SignalReturnRelations #

The SignalReturnRelations class from the macrosynergy.signal module is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals and panels of subsequent returns.

Here are some key aspects and usage details of the SignalReturnRelations class:

  • sig - the list of signals or the main signal category is specified using the sig argument. Each element of the list is being analyzed in relation to subsequent returns.

  • sig_neg - takes a list of “True” and “False” values in relation to the list of signals. The default is False

  • ret specifies the panel of subsequent returns that will be analyzed in relation to the specified signal category.

  • freq denotes the frequency at which the series are sampled. The default is ‘M’ for monthly. The return series will always be summed over the sample period. The signal series will be aggregated according to the value of agg_sig

  • agg_sig specifies the aggregation method applied to the signal values in down-sampling. The default is “last”. This can also be a list of various aggregation methods.

Unlike the CategoryRelations class, here, the focus is on a range of measures of association between signals and returns based on categorization (positive and negative returns) and parametric and nonparametric correlation.

This class applies frequency conversion, corresponding to a trading or rebalancing frequency. It also considers whether the signal is expected to predict returns positively or negatively. This is important for the interpretation of the output. One should also note that there is no regression analysis involved. This means that features should be entered with a meaningful zero value since the sign of the feature is critical for accuracy statistics.

# Instantiate signal-return relations for the list of signals, multiple returns and frequencies

srr = mss.SignalReturnRelations(
    dfx,
    sigs=["MACRO_AVGZ", "MACRO_OPTREG", "XGDP_NEG_ZN4", "XCPI_NEG_ZN4", "XPCG_NEG_ZN4", "RYLDIRS05Y_NSA_ZN4"],
    cosp=True,
    rets=["DU05YXR_VT10", "EQXR_VT10", "FXXR_VT10"],
    freqs=["M"],
    blacklist=fxblack,
    slip=1
)

Summary tables #

The .summary_table() of the SignalReturnRelations class gives a short high-level snapshot of the strength and stability of the main signal relation (the first signal in the list of signals sigs , with the first sign in the list of signs sig_neg and the first frequency in the list of frequencies freqs ). Unless sig_neg had been set to True at instantiation, the relation is assumed to be positive.

The columns of the summary table generally have the following interpretations:

  • accuracy is the ratio of correct predictions of the sign of returns to all predictions. It measures the overall accuracy of the signal’s predictions, regardless of the class imbalance between positive and negative returns.

  • bal_accuracy is the balanced accuracy, which takes into account the class imbalance of the dataset. It is the average of the ratios of correctly detected positive returns and correctly detected negative returns. The best value is 1 and the worst value is 0. This measure avoids inflated performance estimates on imbalanced datasets and is calculated as the average of sensitivity (true positive rate) and specificity (true negative rate). The formula with references is described here

  • pos_sigr is the ratio of positive signals to all predictions. It indicates the long bias of the signal, or the percentage of time the signal is predicting a positive return. The value is between 0 (no positive signals) and 1 (all signals are positive).

  • pos_retr is the ratio of positive returns to all observed returns. It indicates the positive bias of the returns, or the percentage of time the returns are positive. The value is between 0 (no positive returns) and 1 (all returns are positive).

  • pos_prec is the positive precision, which measures the ratio of correct positive return predictions to all positive predictions. It indicates how well the positive predictions of the signal have fared. The best value is 1 and the worst value is 0. A high positive precision can be easily achieved if the ratio of positive returns is high, so it is important to consider this measure in conjunction with other measures such as bal_accuracy. See more info here

  • neg_prec is the negative precision, which measures the ratio of correct negative return predictions to all negative predictions. It indicates how well the negative predictions of the signal have fared. Generally, good positive precision is hard to accomplish if the ratio of negative returns has been high. The best value is 1 and the worst value is 0. See more info here

  • pearson is the Pearson correlation coefficient between signal and subsequent return. Like other correlation coefficients, Pearson varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship.

  • pearson_pval is the probability that the (positive) correlation has been accidental, assuming that returns are independently distributed. Strictly speaking, this value returns a 2-tailed p-value for the null hypothesis that the correlation is 0. The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are reasonable for large datasets. This statistic would be invalid for forward-moving averages.

  • kendall is the Kendall rank correlation coefficient between signal and subsequent return. It is a non-parametric hypothesis test for statistical dependence. For those, who want to refresh their statistical knowledge, please read here

  • kendall_pval is the probability that the (positive) correlation has been accidental, assuming that returns are independently distributed. As before, the test is a two-sided p-value for the null hypothesis that the correlation is 0. P-value below chosen threshold (usually 0.01 or 0.05) will allow us to reject the null hypothesis. This statistic would be invalid for forward-moving averages and for autocorrelated data.

The rows have the following meaning:

  • Panel refers to the whole panel of cross-sections and sample period, excluding unavailable and blacklisted periods.

  • Mean years is the mean of the statistic across all years.

  • Mean cids is the mean of the statistic across all sections.

  • Positive ratio represents the ratio of years, if following “Mean years” (or cross-sections - if following “Mean cids”) for which the corresponding statistic was above its neutral level. The neutral level is defined as 0.5 for classification ratios (such as accuracy and balanced accuracy) and positive correlation probabilities, and 0 for correlation coefficients (such as Pearson and Kendall). For example, if the Positive ratio for accuracy is 0.7, it means that out of all the years (or cross-sections) analyzed, the correct sign of returns was predicted for 70% of them. If the Positive ratio for Pearson is 0.6, it indicates a strong positive correlation between the signal and returns.

srr.summary_table()
accuracy bal_accuracy pos_sigr pos_retr pos_prec neg_prec pearson pearson_pval kendall kendall_pval auc
M: MACRO_AVGZ/last => DU05YXR_VT10 0.53884 0.53416 0.56150 0.54225 0.57221 0.49611 0.10051 0.00000 0.05846 0.00000 0.53389
Mean years 0.53317 0.51586 0.57238 0.54456 0.56407 0.46766 0.04721 0.36458 0.02830 0.39176 0.51672
Positive ratio 0.76000 0.64000 0.60000 0.72000 0.76000 0.40000 0.76000 0.56000 0.76000 0.48000 0.64000
Mean cids 0.53873 0.53338 0.55570 0.54096 0.56879 0.49797 0.10604 0.27722 0.06125 0.32892 0.53214
Positive ratio 0.79167 0.79167 0.66667 0.83333 0.91667 0.54167 1.00000 0.75000 0.83333 0.70833 0.79167

Alternatively, for a one-line table, we can use .single_relation_table() . The first column lists the frequency (‘D’, ‘W’, ‘M’, ‘Q’, ‘A’) followed by the signal’s name with _NEG meaning negative relationship and the name of the return.

srr.single_relation_table()
accuracy bal_accuracy pos_sigr pos_retr pos_prec neg_prec pearson pearson_pval kendall kendall_pval auc
M: MACRO_AVGZ/last => DU05YXR_VT10 0.53884 0.53416 0.5615 0.54225 0.57221 0.49611 0.10051 0.0 0.05846 0.0 0.53389

The cross_section_table() method summarizes accuracy and correlation-related measures for the panel, the mean, and the individual cross-sections. It also gives a “positive ratio”, i.e., the ratio of countries with evidence of positive relations, either in terms of above 50% accuracy, positive correlation, or - more restrictively - high positive correlation probabilities. As for the .summary_table() , the cross_section_table() and yearly_table() analyzes the strength and stability of the main signal relation (the first signal in the list of signals sigs , with the first sign in the list of signs sig_neg and the first frequency in the list of frequencies freqs ). Unless sig_neg had been set to True at instantiation, the relation is assumed to be positive.

srr.cross_section_table()
accuracy bal_accuracy pos_sigr pos_retr pos_prec neg_prec pearson pearson_pval kendall kendall_pval auc
M: MACRO_AVGZ/last => DU05YXR_VT10 0.53884 0.53416 0.56150 0.54225 0.57221 0.49611 0.10051 0.00000 0.05846 0.00000 0.53389
Mean 0.53873 0.53338 0.55570 0.54096 0.56879 0.49797 0.10604 0.27722 0.06125 0.32892 0.53214
PosRatio 0.79167 0.79167 0.66667 0.83333 0.91667 0.54167 1.00000 0.75000 0.83333 0.70833 0.79167
AUD 0.54380 0.53619 0.55839 0.56934 0.60131 0.47107 0.14511 0.01623 0.06174 0.12772 0.53640
CAD 0.48966 0.48913 0.53448 0.50690 0.49677 0.48148 0.03633 0.53775 -0.00150 0.96955 0.48918
CHF 0.53719 0.52398 0.65702 0.54959 0.56604 0.48193 0.13194 0.03708 0.07203 0.08993 0.52183
CLP 0.50962 0.51230 0.44712 0.52404 0.53763 0.48696 0.06107 0.38088 0.01301 0.78023 0.51219
COP 0.51923 0.52019 0.39103 0.50000 0.52459 0.51579 0.01400 0.86230 0.04665 0.38743 0.51923
CZK 0.48918 0.49922 0.40693 0.55411 0.55319 0.44526 0.05951 0.36795 0.01570 0.72246 0.49924
EUR 0.59273 0.58733 0.54545 0.56727 0.64667 0.52800 0.21410 0.00035 0.13322 0.00099 0.58821
GBP 0.53793 0.52618 0.56552 0.59310 0.61585 0.43651 0.20942 0.00033 0.10178 0.00976 0.52666
HUF 0.57664 0.58233 0.63139 0.50000 0.56069 0.60396 0.13470 0.02577 0.08232 0.04225 0.57664
IDR 0.62245 0.59969 0.63265 0.61224 0.68548 0.51389 0.27022 0.00013 0.15039 0.00175 0.59759
ILS 0.55251 0.52712 0.64840 0.59361 0.61268 0.44156 0.01253 0.85368 -0.00540 0.90527 0.52563
INR 0.49074 0.49239 0.54630 0.48148 0.47458 0.51020 0.09551 0.16190 0.02214 0.62833 0.49245
JPY 0.48966 0.46063 0.64828 0.58621 0.55851 0.36275 0.04914 0.40448 -0.00260 0.94734 0.46299
KRW 0.58333 0.58025 0.62500 0.53241 0.59259 0.56790 0.15768 0.02042 0.09320 0.04155 0.57555
MXN 0.55455 0.55528 0.48182 0.51818 0.57547 0.53509 0.01756 0.79564 0.04242 0.34905 0.55528
NOK 0.48276 0.48459 0.46552 0.52759 0.51111 0.45806 0.11942 0.04214 0.05940 0.13152 0.48461
NZD 0.54828 0.54808 0.71034 0.52069 0.54854 0.54762 0.09762 0.09706 0.10994 0.00525 0.53964
PLN 0.58029 0.57634 0.56569 0.54015 0.60645 0.54622 0.23155 0.00011 0.13128 0.00122 0.57550
SEK 0.58156 0.57533 0.54610 0.57447 0.64286 0.50781 0.18459 0.00159 0.12005 0.00231 0.57639
THB 0.55102 0.54386 0.67857 0.53571 0.56391 0.52381 0.01900 0.79150 0.04668 0.33128 0.53846
TRY 0.51250 0.51167 0.38125 0.49375 0.50820 0.51515 0.02451 0.75690 -0.00414 0.93766 0.51102
TWD 0.50463 0.50506 0.49537 0.54630 0.55140 0.45872 0.08938 0.19066 0.05142 0.26081 0.50510
USD 0.54828 0.54900 0.47241 0.51034 0.56204 0.53595 0.08206 0.16338 0.05625 0.15325 0.54887
ZAR 0.53091 0.51501 0.70182 0.54545 0.55440 0.47561 0.08810 0.14506 0.07403 0.06728 0.51267

The yearly_table() method is useful for analyzing how the performance of a trading signal varies over time by providing a breakdown of performance metrics for each year. This can help identify whether the signal has been consistently strong over time or if specific market conditions have driven its performance.

tbl_srr_year = srr.yearly_table()
tbl_srr_year.round(3)
accuracy bal_accuracy pos_sigr pos_retr pos_prec neg_prec pearson pearson_pval kendall kendall_pval auc
M: MACRO_AVGZ/last => DU05YXR_VT10 0.539 0.534 0.562 0.542 0.572 0.496 0.101 0.000 0.058 0.000 0.534
Mean 0.533 0.516 0.572 0.545 0.564 0.468 0.047 0.365 0.028 0.392 0.517
PosRatio 0.760 0.640 0.600 0.720 0.760 0.400 0.760 0.560 0.760 0.480 0.640
2000 0.550 0.563 0.475 0.750 0.816 0.310 -0.006 0.960 0.018 0.816 0.583
2001 0.463 0.402 0.716 0.597 0.542 0.263 0.119 0.172 -0.004 0.939 0.418
2002 0.565 0.489 0.780 0.631 0.626 0.351 0.091 0.243 0.072 0.167 0.492
2003 0.518 0.471 0.786 0.565 0.553 0.389 -0.050 0.518 -0.016 0.763 0.480
2004 0.560 0.515 0.679 0.631 0.640 0.389 -0.011 0.889 0.029 0.574 0.514
2005 0.530 0.533 0.458 0.536 0.571 0.495 0.053 0.492 0.020 0.695 0.533
2006 0.507 0.496 0.302 0.476 0.471 0.522 0.010 0.884 -0.036 0.409 0.497
2007 0.540 0.536 0.341 0.476 0.523 0.548 0.053 0.402 0.048 0.261 0.532
2008 0.500 0.543 0.313 0.599 0.659 0.428 0.173 0.005 0.112 0.007 0.539
2009 0.485 0.497 0.869 0.481 0.481 0.514 0.088 0.143 0.079 0.050 0.499
2010 0.601 0.555 0.739 0.623 0.652 0.458 0.128 0.034 0.093 0.022 0.545
2011 0.544 0.522 0.591 0.626 0.645 0.400 -0.062 0.301 0.004 0.920 0.523
2012 0.529 0.509 0.598 0.605 0.612 0.405 0.030 0.621 0.012 0.761 0.509
2013 0.500 0.514 0.703 0.471 0.479 0.549 0.082 0.174 0.054 0.179 0.512
2014 0.610 0.567 0.606 0.716 0.769 0.365 0.114 0.065 0.104 0.012 0.579
2015 0.560 0.565 0.760 0.524 0.555 0.576 0.098 0.106 0.066 0.104 0.548
2016 0.522 0.519 0.623 0.514 0.529 0.510 0.026 0.668 0.023 0.573 0.518
2017 0.544 0.550 0.431 0.530 0.587 0.512 0.158 0.008 0.057 0.156 0.549
2018 0.479 0.484 0.448 0.545 0.527 0.440 0.028 0.632 -0.001 0.988 0.484
2019 0.552 0.523 0.649 0.604 0.620 0.426 0.020 0.736 0.037 0.349 0.522
2020 0.554 0.446 0.833 0.627 0.609 0.283 -0.037 0.538 -0.099 0.015 0.468
2021 0.511 0.494 0.446 0.348 0.341 0.647 0.083 0.170 0.013 0.745 0.494
2022 0.681 0.515 0.080 0.290 0.318 0.713 0.073 0.224 0.055 0.176 0.505
2023 0.554 0.597 0.312 0.576 0.709 0.484 0.112 0.063 0.090 0.027 0.585
2024 0.370 0.491 0.772 0.272 0.268 0.714 -0.192 0.066 -0.121 0.087 0.492

multiple_relations_table() is a method that compares multiple signal-return relations in one table. It is useful to compare the performance of different signals against the same return series (more than one possible financial return) and multiple possible frequencies. The method returns a table with standard columns used for single_relation_table() and other tables, but the rows display different signals from the list of signals specified upon SignalReturnsRelations () sigs . The row names indicate the frequency (‘D,’ ‘W,’ ‘M,’ ‘Q,’ ‘A’) followed by the signal’s and return’s names.

tbl_srr_multi=srr.multiple_relations_table()
tbl_srr_multi.round(3)
accuracy bal_accuracy pos_sigr pos_retr pos_prec neg_prec pearson pearson_pval kendall kendall_pval auc
Return Signal Frequency Aggregation
DU05YXR_VT10 MACRO_AVGZ M last 0.537 0.534 0.543 0.532 0.563 0.505 0.091 0.000 0.055 0.000 0.534
RYLDIRS05Y_NSA_ZN4 M last 0.535 0.527 0.689 0.532 0.549 0.505 0.058 0.000 0.040 0.000 0.523
XCPI_NEG_ZN4 M last 0.517 0.515 0.527 0.532 0.547 0.484 0.034 0.018 0.023 0.017 0.515
XGDP_NEG_ZN4 M last 0.524 0.522 0.545 0.532 0.552 0.492 0.044 0.002 0.036 0.000 0.522
XPCG_NEG_ZN4 M last 0.512 0.523 0.355 0.532 0.562 0.484 0.057 0.000 0.044 0.000 0.521
MACRO_OPTREG M last 0.546 0.539 0.722 0.532 0.554 0.525 0.085 0.000 0.054 0.000 0.532
EQXR_VT10 MACRO_AVGZ M last 0.527 0.516 0.554 0.602 0.616 0.416 0.055 0.001 0.028 0.014 0.517
RYLDIRS05Y_NSA_ZN4 M last 0.511 0.476 0.661 0.602 0.585 0.367 -0.049 0.004 -0.033 0.004 0.478
XCPI_NEG_ZN4 M last 0.547 0.537 0.547 0.602 0.635 0.439 0.074 0.000 0.046 0.000 0.539
XGDP_NEG_ZN4 M last 0.510 0.502 0.541 0.602 0.603 0.400 0.037 0.029 0.010 0.366 0.502
XPCG_NEG_ZN4 M last 0.470 0.497 0.367 0.602 0.598 0.396 0.015 0.375 0.009 0.426 0.497
MACRO_OPTREG M last 0.547 0.502 0.724 0.602 0.602 0.401 0.018 0.297 0.013 0.255 0.501
FXXR_VT10 MACRO_AVGZ M last 0.513 0.512 0.549 0.519 0.529 0.494 0.044 0.003 0.031 0.002 0.512
RYLDIRS05Y_NSA_ZN4 M last 0.522 0.517 0.696 0.518 0.529 0.505 0.061 0.000 0.048 0.000 0.514
XCPI_NEG_ZN4 M last 0.506 0.505 0.528 0.518 0.523 0.486 0.005 0.748 0.008 0.441 0.505
XGDP_NEG_ZN4 M last 0.510 0.509 0.552 0.518 0.526 0.491 0.036 0.015 0.022 0.024 0.508
XPCG_NEG_ZN4 M last 0.504 0.510 0.356 0.518 0.531 0.488 -0.005 0.746 0.003 0.771 0.509
MACRO_OPTREG M last 0.523 0.519 0.726 0.518 0.529 0.509 0.060 0.000 0.053 0.000 0.515

The single_statistic_table() method generates a table and heatmap featuring a singular statistic for each signal-return correlation. Users can select their preferred statistic from the available options, including “accuracy,” “bal_accuracy,” “pos_sigr,” “pos_retr,” “pos_prec,” “neg_prec,” “kendall,” “kendall_pval,” “pearson,” and “pearson_pval.” The heatmap, where darker (blue) shades indicate higher (positive) values, allows users to visually compare the statistics across different signals for all frequencies (as indicated after the \ following the return’s name).

srr.single_statistic_table(stat="bal_accuracy", show_heatmap=True, min_color= 0.4, max_color = 0.6)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/56d9906681fed2ca797d388c8766e7f02f7f3edcc768c9fbf9cba1d145616dce.png
Return DU05YXR_VT10 EQXR_VT10 FXXR_VT10
Frequency M M M
Signal Aggregation
MACRO_AVGZ last 0.531457 0.511747 0.506784
MACRO_OPTREG last 0.539360 0.501515 0.518787
XGDP_NEG_ZN4 last 0.515001 0.500994 0.505596
XCPI_NEG_ZN4 last 0.513368 0.534829 0.501870
XPCG_NEG_ZN4 last 0.526960 0.491406 0.506858
RYLDIRS05Y_NSA_ZN4 last 0.534240 0.473582 0.516607

Correlation_bars #

The method .correlation_bars() visualizes positive correlation probabilities based on parametric (Pearson) and non-parametric (Kendall) correlation statistics and compares signals between each other, across countries, or years.

The type argument in the .correlation_bars() method determines how the correlation probabilities are grouped and visualized:

  • If type='signals' , the method will plot the correlation probabilities for each signal, comparing them against each other.

  • If type='cross_section' , the method will plot the correlation probabilities for each cross-section (e.g. county), comparing them against each other.

  • If type='years' , the method will plot the correlation probabilities for each year, comparing them against each other.

srr.correlation_bars(type="signals", size=(15, 3), title="Positive correlation probability of signals with 2-years vol-targeted duration return")
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/05af1439e63c25d442d3ac970cfe8aee2b9c82e5b4616650972d137b21842957.png
srr.correlation_bars(type="cross_section", title="Positive correlation probability of main signal with 5-years vol-targeted duration return across currencies", size=(15, 3))
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/990d22b4e63084e035df5561c5f8ce33bf41b1ed284ae54da31ec97f85bfc433.png
srr.correlation_bars(type="years", size=(15, 3), title="Positive correlation probability of main signal with 5-years vol-targeted duration return across years")
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/ac51a00260c02ac20710ac7b634ac26932f74c6233cf7240d0ca71bfd255f70b.png

Accuracy_bars #

The accuracy_bars method operates analogously to the correlation_bars method. Only it shows accuracy and balanced accuracy of the predicted relationship.

srr.accuracy_bars(type="cross_section", title="Accuracy for sign prediction across currencies for the main signal-return relationship", size=(15, 3))
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/3f85656b0ec20dc879498c5768bfd6db84b4f5757d625061d455f655316f5afb.png

Using SignalReturnRelations() function, we can compare side by side predictive power of the two composite signals :

  • the average macro unoptimized signal MACRO_AVGZ ,

  • sequentially optimized forecasts MACRO_OPTREG , for 5-year duration returns:

## Compare optimized signals with simple average z-scores

srr = mss.SignalReturnRelations(
    df=dfx,
    rets=["DU05YXR_VT10"],
    sigs=["MACRO_AVGZ", "MACRO_OPTREG"],
    cosp=True,
    freqs=["M"],
    agg_sigs=["last"],
    start="2004-01-01",
    blacklist=fxblack,
    slip=1,
)

tbl_srr = srr.signals_table()
tbl_srr.round(3)
accuracy bal_accuracy pos_sigr pos_retr pos_prec neg_prec pearson pearson_pval kendall kendall_pval auc
Return Signal Frequency Aggregation
DU05YXR_VT10 MACRO_AVGZ M last 0.537 0.535 0.538 0.532 0.564 0.505 0.092 0.0 0.056 0.0 0.534
MACRO_OPTREG M last 0.546 0.540 0.717 0.532 0.555 0.524 0.086 0.0 0.054 0.0 0.532

Backtesting #

NaivePnL #

Instantiation #

NaivePnl() class is designed to provide a quick and simple overview of a stylized PnL profile of a set of trading signals. The class carries the label naive because its methods do not consider transaction costs or position limitations, such as risk management considerations. This is deliberate because costs and limitations are specific to trading size, institutional rules, and regulations.

The class allows a single target return category to be assigned to the ret argument, defined as the return on a position corresponding to one unit in the signal. A set of signal categories can be assigned as a list of categories to the sigs argument. If the user wishes to evaluate the PnL using benchmark returns, these can be passed as a list of full tickers to the bms argument. The instantiation of the class determines the target and the scope of all subsequent analyses, i.e., the period and the set of eligible countries. All other choices can be made subsequently.

sigs = ["MACRO_AVGZ", "MACRO_OPTREG", "XGDP_NEG_ZN4", "XCPI_NEG_ZN4", "XPCG_NEG_ZN4", "RYLDIRS05Y_NSA_ZN4"]
    

naive_pnl = msn.NaivePnL(
    dfx,
    ret="DU05YXR_VT10",
    sigs=sigs,
    cids=cids,
    start="2004-01-01",
    blacklist=fxblack,
    bms=["USD_DU05YXR_NSA"],
)

make_pnl #

The make_pnl() method calculates a daily PnL for a specific signal category and adds it to the main dataframe of the class instance. Indeed a single signaling category can result in a wide array of actual signals depending on the choices of its final form.

In particular, the signal transformation option ( sig_op ) manages the distribution of the traded signal and gives the following options:

  • zn_score_pan transforms raw signals into z-scores around zero value based on the whole panel. The neutral level & standard deviation will use the cross-section of panels. zn-score here means standardized score with zero being the neutral level and standardization through division by mean absolute value. See make_zn_scores() function explained in this notebook

  • zn_score_cs transforms raw signals into z-scores around zero value based on cross-section alone

  • binary transforms the category values into simple long/shorts (1/-1) signals.

Other important choices include:

  • the signal direction parameter sig_neg can be set to True if the negative value of the transformed signal should be used for PnL calculation,

  • rebalancing frequency ( rebal_freq ) for positions according to signal must be one of ‘daily’ (default), ‘weekly’ or ‘monthly’,

  • rebalancing slippage ( rebal_slip ) in days, where the default is 1, which means that it takes one day to rebalance the position and that the new position produces PnL from the second day after the signal has been recorded,

  • threshold value ( thresh ) beyond which scores are winsorized, i.e. contained at that threshold. This is often realistic, as risk management and the potential of signal value distortions typically preclude outsized and concentrated positions within a strategy.

The method also allows ex-post scaling of PnL to an annualized volatility by assigning an annualized standard deviation of the aggregate PnL to the vol_scale argument. This is for comparative visualization only and very different from apriori volatility targeting.

Method calls add specified PnLs to the class instance for subsequent analysis.

for sig in sigs:
    naive_pnl.make_pnl(
        sig,
        sig_neg=False,
        sig_op="zn_score_pan",
        rebal_freq="monthly",
        vol_scale=10,
        rebal_slip=1,
        thresh=2,
        pnl_name=sig + "_NEGPZN",
    )

for sig in sigs:
    naive_pnl.make_pnl(
        sig,
        sig_neg=False,
        sig_op="binary",
        rebal_freq="monthly",
        vol_scale=10,
        rebal_slip=1,
        thresh=2,
        pnl_name=sig + "_NEGBN",
    )

make_long_pnl #

Based on the provided information, the pnl.make_long_pnl function adds a daily long-only PnL with an equal position signal across markets and time. This can serve as a baseline for comparison against the signal-adjusted returns. The vol_scale parameter is an ex-post scaling factor that adjusts the PnL to the annualized volatility given. This is likely used for comparative visualization to assess the trading strategy’s performance relative to its risk level.

naive_pnl.make_long_pnl(vol_scale=10, label="Long_Only")

plot_pnls #

Available naive PnLs can be listed with the pnl_names attribute:

The plot_pnls() method of the NaivePnl() class is used to plot a line chart of cumulative PnL. The method can plot:

  • A single PnL category for a single cross-section, where the user can assign a single cross-section or “ALL” to the pnl_cids argument and a single PnL category to the pnl_cat s argument.

  • Multiple cross-sections per PnL type, where the user can assign a list of cross-sections to the pnl_cids argument and a single PnL category to the pnl_cats argument.

  • Multiple PnL types per cross-section, where the user can assign a list of PnL categories to the pnl_cats argument and a single cross-section or “ALL” to the pnl_cids argument.

dict_labels = {"MACRO_AVGZ_NEGBN":  "Binary composite macro trend pressure, % ar, in excess of benchmarks", 
               "MACRO_OPTREG_NEGBN": "Binary optimized regression forecasts, % ar, in excess of benchmarks",
                "Long_Only": "Long-only",
                "XGDP_NEG_ZN4_NEGBN": "Binary excess GDP-based signal, % ar, in excess of benchmarks", 
                "XCPI_NEG_ZN4_NEGBN": "Binary excess CPI-based signal, % ar, in excess of benchmarks", 
                "XPCG_NEG_ZN4_NEGBN": "Binary excess private consumption-based signal, % ar, in excess of benchmarks"
               }


naive_pnl.plot_pnls(
    pnl_cats=[
        "MACRO_AVGZ_NEGBN",
        "MACRO_OPTREG_NEGBN",
        "XGDP_NEG_ZN4_NEGBN",
        "XCPI_NEG_ZN4_NEGBN", 
        "XPCG_NEG_ZN4_NEGBN",
        "Long_Only",
    ],
    xcat_labels=dict_labels,
    pnl_cids=["ALL"],
    start="2004-01-01",
    end="2023-12-31",
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/3079a1799c459852e07561a69af1b340f759a6856acb62e76d30eb0fbfcd81ce.png
cids_sel = ["EUR", "GBP", "USD"]

naive_pnl.plot_pnls(
    pnl_cats=["MACRO_AVGZ_NEGPZN"],
    pnl_cids=cids_sel,
    start="2004-01-01",
   # end="2021-01-01",
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/3282e83880490960b975e764b63e6b9fd6df88116374bb7b9735246d167f9ead.png

evaluate_pnls #

The method evaluate_pnls() returns a small dataframe of key PnL statistics. For definitions of Sharpe and Sortino ratios, please see here

The table can only have multiple PnL categories or multiple cross-sections, not both at the same time. The table also shows the daily benchmark correlation of PnLs.

df_eval = naive_pnl.evaluate_pnls(
    pnl_cats=["MACRO_AVGZ_NEGBN", "MACRO_OPTREG_NEGBN", ], pnl_cids=["ALL"], start="2004-01-01", end="2023-12-01"
)
display(df_eval.astype("float").round(2))
xcat MACRO_AVGZ_NEGBN MACRO_OPTREG_NEGBN
Return (pct ar) 11.60 11.10
St. Dev. (pct ar) 9.94 9.93
Sharpe Ratio 1.17 1.12
Sortino Ratio 1.71 1.64
Max 21-day draw -22.75 -15.97
Max 6-month draw -36.21 -21.62
USD_DU05YXR_NSA correl -0.05 0.23
Traded Months 240.00 240.00

signal_heatmap #

The signal_heatmap() method creates a heatmap of signals for a specific PnL across time and sections. The time axis refers to period averages, and the default frequency is monthly (specified with freq=’m’), but quarterly is also an option (freq=’q’).

The heatmap displays each signal as a colored square, with the color representing the signal value. The user can specify the particular strategy by specifying “pnl_name”. By default, the method plots all available cross-sections. The heatmap provides an intuitive representation of the signal values, allowing the user to identify the patterns and trends across time and sections.

The signal_heatmap() method includes a color bar legend that shows the signal values and their corresponding color. If a threshold value is provided in the make_pnl() function, the signal_heatmap() method limits the largest contribution to the specified threshold value. This truncation ensures that any signals with extreme values greater than the threshold will not dominate the visualization, which is important from a risk management perspective.

naive_pnl.signal_heatmap(
    pnl_name="MACRO_OPTREG_NEGPZN",
    pnl_cids=["EUR", "USD", "GBP"],
    freq="q",
    start="2004-01-01",
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/f53be698e2fe186cc878b528f80d8da2c9cf7d77a222f79c90ffb448e2318579.png

agg_signal_bars #

The method agg_signal_bars() indicates the strength and direction of the aggregate signal. If metric='direction' is chosen, it just adds up the signal across all sections. Long and short signals cancel each other out. If metric='strength' is selected, the aggregate absolute signal is displayed, and there is no offset. The method allows for visually understanding the overall direction of the aggregate signal and gaining insight into the proportional exposure to the respective signal by measuring the absolute value, the size of the signal. The question is: “Is the PnL value generated by large returns or a large signal?”

naive_pnl.agg_signal_bars(
    pnl_name="MACRO_OPTREG_NEGPZN",
    freq="q",
    metric="direction",
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/46437cd2991d7b56beed035dd552856ad75ee92a38e595433c50490c80d1fa27.png
naive_pnl.agg_signal_bars(
    pnl_name="MACRO_OPTREG_NEGPZN",
    freq="q",
    metric="strength",
)
https://macrosynergy.com/notebooks.build/introductions/introduction-to-macrosynergy-package/_images/260e0465cf096dc759087890545e1068a0fa225a7fe60390619db93d4e60eaff.png