Introduction to Macrosynergy package #
This notebook shows how to download, process, describe, and analyze macro quantamental data through the Macrosynergy package .
Macro quantamental indicators are mostly time series of information states of economic trends, balance sheets, and conditions. They are particularly suitable for backtesting and operating financial market trading strategies. The primary source of macro quantamental indicators is the J.P. Morgan  Macrosynergy Quantamental System (“JPMaQS”). The format has a few specifics that make it easy to discover investment values:

All values are pointintime, meaning they represent the latest value of a concept at the end of the date for which they have been recorded.

The pointintime format means that they can be related easily to time series of financial returns; many types of financial returns are also included in the system.

Data are organized in panels, i.e., one type of quantamental indicator (“category”) time series is available over a range of tradable markets or countries.

Each observation of an indicator does not just contain a value but also information on the time that has elapsed since the recorded activity has taken place and the quantity of the replication of the historic information state.
The Macrosynergy package contains convenience functions to handle this specific format and arrive quickly at conclusions for the investment process. It is not designed to compete with general statistics and graphics packages but merely serves as a shortcut to quick results and a guide to the type of operations and analyses that one can swiftly conduct on JPMaQS data.
The notebook covers the following main parts:

Get Packages and JPMaQS Data: This section is responsible for installing and importing the necessary Python packages that are used throughout the analysis, including
macrosynergy
package. 
Describing: In this part, the notebook shows how to check data availability, detect missing categories, visualize panel distributions with the help of standard bar and box plots, and analyze data with the help of time series and heatmaps.

Preprocessing: this part shows examples of simple data transformation, such as creating a new category, excluding series, computing relative values, normalizing data, etc.

Relating: the functions in this part look into visualization and analysis of relationships between two categories: it is based on standard
seaborn
functions scatterplot, but allows for additional customization for trading signal creation. 
Learning: the
macrosynergy.learning
subpackage contains functions and classes to assist the creation of statistical learning solutions with macro quantamental data. The functionality is built around integrating themacrosynergy
package and associated JPMaQS data with the popularscikitlearn
library, which provides a simple interface for fitting common statistical learning models, as well as feature selection methods, crossvalidation classes, and performance metrics. 
Signaling: this part is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals and panels of subsequent returns.

Backtesting: the functions here are designed to provide a quick and simple overview of a stylized PnL profile of a set of trading signals. The class carries the label
naive
because its methods do not take into account transaction costs or position limitations, such as risk management considerations. This is deliberate because costs and limitations are specific to trading size, institutional rules, and regulations.
For examples of standard packages used with JPMaQS, please have a look at the notebooks “JPMaQS with Seaborn” , “JPMaQS with Statsmodels” , and “Panel regression with JPMaQS” .
Get packages and JPMaQS data #
# Uncomment to update the package
"""
%%capture
! pip install macrosynergy upgrade"""
'\n%%capture\n! pip install macrosynergy upgrade'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import macrosynergy.management as msm
import macrosynergy.panel as msp
import macrosynergy.signal as mss
import macrosynergy.pnl as msn
import macrosynergy.visuals as msv
import macrosynergy.learning as msl
from macrosynergy.download import JPMaQSDownload
# machine learning modules
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.metrics import (
make_scorer,
balanced_accuracy_score,
r2_score,
)
import warnings
warnings.simplefilter("ignore")
The JPMaQS indicators we consider are downloaded using the J.P. Morgan Dataquery API interface within the
macrosynergy
package. This is done by specifying ticker strings, formed by appending an indicator category code
DB(JPMAQS,<cross_section>_<category>,<info>)
, where
value
giving the latest available values for the indicator
eop_lag
referring to days elapsed since the end of the observation period
mop_lag
referring to the number of days elapsed since the mean observation period
grade
denoting a grade of the observation, giving a metric of real time information quality.
After instantiating the
JPMaQSDownload
class within the
macrosynergy.download
module, one can use the
download(tickers,start_date,metrics)
method to easily download the necessary data, where
tickers
is an array of ticker strings,
start_date
is the first collection date to be considered and
metrics
is an array comprising the times series information to be downloaded. For more information see
here
or use the free dataset on
Kaggle
To ensure reproducibility, only samples between January 2000 (inclusive) and May 2023 (exclusive) are considered.
cids_dm = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK", "USD"]
cids_em = ["CLP","COP", "CZK", "HUF", "IDR", "ILS", "INR", "KRW", "MXN", "PLN", "THB", "TRY", "TWD", "ZAR",]
cids = cids_dm + cids_em
cids_dux = list(set(cids)  set(["IDR", "NZD"]))
ecos = [
"CPIC_SA_P1M1ML12",
"CPIC_SJA_P3M3ML3AR",
"CPIC_SJA_P6M6ML6AR",
"CPIH_SA_P1M1ML12",
"CPIH_SJA_P3M3ML3AR",
"CPIH_SJA_P6M6ML6AR",
"INFTEFF_NSA",
"INTRGDP_NSA_P1M1ML12_3MMA",
"INTRGDPv5Y_NSA_P1M1ML12_3MMA",
"PCREDITGDP_SJA_D1M1ML12",
"RGDP_SA_P1Q1QL4_20QMA",
"RYLDIRS02Y_NSA",
"RYLDIRS05Y_NSA",
"PCREDITBN_SJA_P1M1ML12",
]
mkts = [
"DU02YXR_NSA",
"DU05YXR_NSA",
"DU02YXR_VT10",
"DU05YXR_VT10",
"EQXR_NSA",
"EQXR_VT10",
"FXXR_NSA",
"FXXR_VT10",
"FXCRR_NSA",
"FXTARGETED_NSA",
"FXUNTRADABLE_NSA",
]
xcats = ecos + mkts
# Download series from J.P. Morgan DataQuery by tickers
start_date = "20000101"
end_date = "20230501"
tickers = [cid + "_" + xcat for cid in cids for xcat in xcats]
print(f"Maximum number of tickers is {len(tickers)}")
# Download series from J.P. Morgan DataQuery by tickers
client_id: str = os.getenv("DQ_CLIENT_ID")
client_secret: str = os.getenv("DQ_CLIENT_SECRET")
with JPMaQSDownload(client_id=client_id, client_secret=client_secret) as dq:
df = dq.download(
tickers=tickers,
start_date="20000101",
suppress_warning=True,
metrics=["all"],
show_progress=True,
)
Maximum number of tickers is 600
Downloading data from JPMaQS.
Timestamp UTC: 20240425 17:40:20
Connection successful!
Requesting data: 100%███████████████████████████████████████████████████████████████ 120/120 [00:28<00:00, 4.16it/s]
Downloading data: 100%██████████████████████████████████████████████████████████████ 120/120 [00:39<00:00, 3.06it/s]
Some expressions are missing from the downloaded data. Check logger output for complete list.
84 out of 2400 expressions are missing. To download the catalogue of all available expressions and filter the unavailable expressions, set `get_catalogue=True` in the call to `JPMaQSDownload.download()`.
Some dates are missing from the downloaded data.
2 out of 6346 dates are missing.
The
Macrosynergy
package works with data frames of a standard JPMaQS format, i.e., long data frames with at least four columns containing crosssection (
cid
), extended category (
xcat
), realtime dates (
real_date
), and
value
. Other potentially useful columns contain grades of observations (
grading
), lags to the end of the observation period (
eop_lag
), and lags to the median of the observation period (
mop_lag
).
# uncomment if running on Kaggle
"""for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
df = pd.read_csv('../input/fixedincomereturnsandmacrotrends/JPMaQS_Quantamental_Indicators.csv', index_col=0, parse_dates=['real_date'])""";
The description of each JPMaQS category is available either under Macro Quantamental Academy , JPMorgan Markets (password protected), or on Kaggle (just for the tickers used in this notebook). In particular, the set used for this notebook is using Consumer price inflation trends , Inflation targets , Intuitive growth estimates , Domestic credit ratios , Longterm GDP growth , Real interest rates , Private credit expansion , Duration returns , Equity index future returns , FX forward returns , FX forward carry , and FX tradeability and flexibility
df['ticker'] = df['cid'] + "_" + df["xcat"]
dfx = df.copy()
dfx.info()
dfx.head(3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3445054 entries, 0 to 3445053
Data columns (total 8 columns):
# Column Dtype
  
0 real_date datetime64[ns]
1 cid object
2 xcat object
3 eop_lag float64
4 grading float64
5 mop_lag float64
6 value float64
7 ticker object
dtypes: datetime64[ns](1), float64(4), object(3)
memory usage: 210.3+ MB
real_date  cid  xcat  eop_lag  grading  mop_lag  value  ticker  

0  20000103  AUD  CPIC_SA_P1M1ML12  95.0  2.0  292.0  1.244168  AUD_CPIC_SA_P1M1ML12 
1  20000103  AUD  CPIC_SJA_P3M3ML3AR  95.0  2.0  186.0  3.006383  AUD_CPIC_SJA_P3M3ML3AR 
2  20000103  AUD  CPIC_SJA_P6M6ML6AR  95.0  2.0  277.0  1.428580  AUD_CPIC_SJA_P6M6ML6AR 
Describing #
View available data history with
check_availability
#
The convenience function
check_availability()
visualizes start years and the number of missing values at or before the end date of all selected crosssections and across a list of categories. It also displays unavailable indicators as gray fields and color codes the starting year of each series with darker colors indicating more recent starting years. If we are interested only in availability starting with a particular date, we pass this option as “start”.
msm.check_availability(
dfx,
xcats=ecos + ["EQXR_NSA"]+ ["FXXR_NSA"],
cids=cids_em,
start="20000101",
)
Detect missing categories or crosssections with
missing_in_df
#
The function
missing_in_df()
is complimentary to
check_availability
and simply displays (1) categories that are missing across all expected crosssections for a given category name list and (2) crosssections that are missing within a category.
cats_exp = ["EQCRR_NSA", "FXCRR_NSA", "INTRGDP_NSA_P1M1ML12_3MMA", "RUBBISH"]
msm.missing_in_df(dfx, xcats=cats_exp, cids=cids)
Missing XCATs across DataFrame: ['RUBBISH', 'EQCRR_NSA']
Missing cids for FXCRR_NSA: ['USD']
Missing cids for INTRGDP_NSA_P1M1ML12_3MMA: []
Visualize panel distributions with
view_ranges
#
For an overview of longterm series distributions in a panel, the convenience function
view_ranges()
uses standard bar plots and box plots of the
Seaborn
package to quickly and conveniently fit the JPMaQS format.
For example, choosing
kind='bar'
displays a barplot that focuses on means and standard deviations of one or more categories across sections for a given sample period. One can define the start and the end date of the time series. The default would be the earliest available date as the start and the latest available as the end date.
xcats_sel = ["CPIC_SJA_P6M6ML6AR", "CPIH_SJA_P6M6ML6AR"]
msp.view_ranges(
dfx,
xcats=xcats_sel,
kind="bar",
sort_cids_by="mean", # countries sorted by mean of the first category
title="Means and standard deviations of inflation trends across all major markets since 2000",
ylab="% annualized",
start="20000101",
end="20200101",
)
Choosing
kind='box'
gives a barplot that visualizes 25%, 50% (median), and 75% quantiles and outliers beyond a normal range. Chart title, yaxis label, size, and category labels can be customized as shown below:
xcats_sel = ["RYLDIRS02Y_NSA"]
msp.view_ranges(
dfx,
xcats=xcats_sel,
kind="box",
start="20120101",
sort_cids_by="std", # here sorted by standard deviations
title="Real interest rates sorted by volatility",
ylab="% monthly return",
xcat_labels=["Real 2year IRS rate"],
size=(12, 5),
)
Visualize panel time series with
view_timelines
#
The convenience function
view_timelines()
displays a facet grid of timeline charts of one or more categories.
The
cs_mean=True
option adds a timeline of the crosssectional average of a single category to each plot in the facet, emphasizing crosssectional deviations.
msp.view_timelines(
dfx,
xcats=["PCREDITBN_SJA_P1M1ML12"],
cids=cids_dm,
ncol=4,
start="19950101",
title="Private credit growth, %oya",
same_y=False,
cs_mean=True,
xcat_labels=["Credit growth", "Global average"],
)
Arguments can be set according to the data’s nature and the plot’s intention. The following are important choices:

For asset returns and similarly volatile series, displaying cumulative sums with
cumsum=True
is often desirable. 
The default setting
same_y=True
shows all lines on the same scale for comparability of size. 
For large facets
xticks=True
, the time (x) axis is printed under each chart, not just the bottom row. 
The
xcat_labels
argument customizes the category labels (the default is just category tickers).
cids_sel = ["AUD", "NZD", "GBP", "MXN", "PLN", "ZAR", "KRW", "INR"]
msp.view_timelines(
dfx,
xcats=["FXXR_NSA", "FXXR_VT10"],
cids=cids_sel,
ncol=3,
cumsum=True,
start="20100101",
same_y=True,
all_xticks=True,
title="Cumulative FX returns",
xcat_labels=["FX returns", "FX forward return for 10% vol target"],
)
One can display a single chart, displaying several categories for a single crosssection by passing a singlestring list to
xcats
and specifying a crosssection as a list with single element, such as
cids=["USD"]
msp.view_timelines(
dfx,
xcats=["CPIC_SJA_P6M6ML6AR", "CPIH_SJA_P6M6ML6AR"],
cids=["USD"],
start="20000101",
title="U.S. CPI inflation trends, %6m/6m, saar",
xcat_labels=["Core", "Headline"],
)
Setting
single_chart=True
allows plotting a single category for various crosssections in one plot. Per default, full tickers are used as labels.
cids_sel = ["AUD", "NZD", "GBP"]
msp.view_timelines(
dfx,
xcats=["CPIH_SA_P1M1ML12"],
cids=cids_sel,
cumsum=False,
start="20000101",
same_y=False,
title="Annual headline consumer price inflation",
single_chart=True,
)
Visualize vintage qualities with
heatmap_grades
#
The visualization function
heatmap_grades()
displays a colored table of grading quality of indicators by categories and crosssections as an average for a given start date. Darker colors represent lower grading.
This function visualizes the grades of the vintages based on which quantamental series have been calculated. JPMaQS uses vintages, i.e., time sequences of time series, to replicate the information of the market in the past. Vintages arise from data revisions, extension, and reestimation of parameters of any underlying model. JPMaQS grades vintage time series from 1 (highest quality, either original record of the time series available on that date or a series that is marginally different from the original for storage reasons or publication conventions) to grade 3 (rough estimate of the information status). More details on vintages and grades are here
xcats_sel = ["INTRGDPv5Y_NSA_P1M1ML12_3MMA", "CPIC_SJA_P6M6ML6AR", "FXXR_NSA"]
msp.heatmap_grades(
dfx,
xcats=xcats_sel,
cids=cids_em + cids_dm,
start="20000101",
size=(15, 2),
)
Preprocessing #
Create new category panels with
panel_calculator
#
The
panel_calculator()
function in the
macrosynergy.panel
module simplifies applying transformations to each panel crosssection using a stringbased formula. This function is very flexible and saves a lot of code when creating trading signals across multiple countries. To use the function, consider the category ticker as a panel dataframe and use standard Python and pandas expressions.
Panel category names not at the beginning or end of the string must always have a space before and after the name. Calculated category and panel operations must be separated by ‘=’. Examples: “NEWCAT = ( OLDCAT1 + 0.5) * OLDCAT2” “NEWCAT = np.log( OLDCAT1 )  np.abs( OLDCAT2 ) ** 1/2”
Note that the argument
cids
contains the crosssections for which the new categories are to be calculated. If a crosssection is missing for any of the categories used, none of the new categories will be produced. This means if a specific calculation should be made for a smaller or larger set of crosssections one must make a separate call to the function.
Below, we calculate plausible metrics of indicators or signals which can be used for analysis:

intuitive growth trend

excess inflation versus a country’s effective inflation target

excess private credit growth

excess real interest rate

combination of the newly created indicators
calcs = [
"XGDP_NEG =  INTRGDPv5Y_NSA_P1M1ML12_3MMA", # intuitive growth trend
"XCPI_NEG =  ( CPIC_SJA_P6M6ML6AR + CPIH_SA_P1M1ML12 ) / 2 + INFTEFF_NSA", # excess inflation measure
# "XINF = CPIH_SA_P1M1ML12  INFTEFF_NSA", # excess inflation
"XPCG_NEG =  PCREDITBN_SJA_P1M1ML12 + INFTEFF_NSA + RGDP_SA_P1Q1QL4_20QMA", # excess private credit growth
"XRYLD = RYLDIRS05Y_NSA  INTRGDP_NSA_P1M1ML12_3MMA", # excess real interest rate
"XXRYLD = XRYLD + XCPI_NEG", # newly created panels can be used subsequently
]
dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids)
dfx = msm.update_df(dfx, dfa)
xcats_sel = ["XRYLD", "XCPI_NEG"]
msp.view_timelines(
dfx,
xcats=xcats_sel,
cids=cids_dm,
ncol=3,
title="Excess real interest rates and (negative) excess inflation",
start="20000101",
same_y=False,
)
The
panel_calculator
function is also suitable for computing crosssectionspecific relative economic performance by using a loop and fstrings. For example, here, we calculate absolute target deviations for a range of CPI inflation metrics for all markets.
infs = ["CPIH_SA_P1M1ML12", "CPIH_SJA_P6M6ML6AR", "CPIH_SJA_P3M3ML3AR"]
for inf in infs:
calcs = [
f"{inf}vIET = ( {inf}  INFTEFF_NSA )",
]
dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids)
dfx = msm.update_df(dfx, dfa)
xcats_sel = ["CPIH_SA_P1M1ML12vIET", "CPIH_SJA_P3M3ML3ARvIET"]
msp.view_timelines(
dfx,
xcats=xcats_sel,
cids=cids_dm,
ncol=4,
cumsum=False,
start="20000101",
same_y=False,
all_xticks=True,
title="CPI inflation rates, %ar, versus effective inflation target, market information state",
xcat_labels=["% over a year ago", "% 3m/3m, saar"],
)
Panel calculation can use individual series by prepending an
i
to the ticker name. This ticker can have a crosssections identifier that is not in the selection defined by
cids
cids_sel = cids_dm[:6]
calcs = ["RYLDvUSD = RYLDIRS05Y_NSA  iUSD_RYLDIRS05Y_NSA"]
dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids_sel)
dfx = msm.update_df(dfx, dfa)
msp.view_timelines(
dfx,
xcats=["RYLDvUSD"],
cids=cids_sel,
ncol=3,
start="20000101",
same_y=False,
title = "Excess 5year real IRS yields vs USD benchmark"
)
Exclude series sections with
make_blacklist
#
The
make_blacklist()
helper function creates a standardized dictionary of blacklist periods, i.e., periods that affect the validity of an indicator, based on standardized panels of binary categories, where values of 1 indicate a cause for blacklisting.
Put simply, this function allows converting category variables into blacklist dictionaries that can then be passed to other functions. Below, we picked two indicators for FX tradability and flexibility.
FXTARGETED_NSA
is an exchange rate target dummy, which takes a value of 1 if the exchange rate is targeted through a peg or any regime that significantly reduces exchange rate flexibility and 0 otherwise.
FXUNTRADABLE_NSA
is also a dummy variable that takes the value 1 if liquidity in the main FX forward market is limited or there is a distortion between tradable offshore and untradable onshore contracts.
Details on both categories are here
dfb = df[df["xcat"].isin(["FXTARGETED_NSA", "FXUNTRADABLE_NSA"])].loc[
:, ["cid", "xcat", "real_date", "value"]
]
dfba = (
dfb.groupby(["cid", "real_date"])
.aggregate(value=pd.NamedAgg(column="value", aggfunc="max"))
.reset_index()
)
dfba["xcat"] = "FXBLACK"
fxblack = msp.make_blacklist(dfba, "FXBLACK")
fxblack
{'CHF': (Timestamp('20111003 00:00:00'), Timestamp('20150130 00:00:00')),
'CZK': (Timestamp('20140101 00:00:00'), Timestamp('20170731 00:00:00')),
'ILS': (Timestamp('20000103 00:00:00'), Timestamp('20051230 00:00:00')),
'INR': (Timestamp('20000103 00:00:00'), Timestamp('20041231 00:00:00')),
'THB': (Timestamp('20070101 00:00:00'), Timestamp('20081128 00:00:00')),
'TRY_1': (Timestamp('20000103 00:00:00'), Timestamp('20030930 00:00:00')),
'TRY_2': (Timestamp('20200101 00:00:00'), Timestamp('20240424 00:00:00'))}
Since 2000, roughly a third of the currencies covered by JPMaQS have seen their FX forward market affected either by an official exchange rate target, illiquidity, or convertibilityrelated distortions. The above output shows periods of disruptions for (primarily) emerging currencies. A notable developed market exception here is CHF, which was pegged between 2011 and 2016.
A standard blacklist dictionary can be passed to several package functions that exclude the blacklisted periods from related analyses.
If one wishes to just exclude the blacklisted periods from a dataframe independent of specific applications, one can use the
reduce_df()
helper function.
dffx = df[df["xcat"] == "FXXR_NSA"]
print("Original shape: ", dffx.shape)
dffxx = msm.reduce_df(dffx, blacklist=fxblack)
print("Reduced shape: ", dffxx.shape)
Original shape: (145252, 8)
Reduced shape: (137975, 8)
Concatenate dataframes with
update_df
#
The
update_df
function in the
management
module concatenates two JPMaQS data frames and offers two conveniences.

It replaces duplicated tickers in the base data frame with those in the added data frame and reindexes the output data frame.

Additionally, you can replace categories in the base data frame by setting xcat_replace=True. This is useful when recalculating the data panel of a category but not including all crosssections of the original panel and wanting to avoid confusion by having two different calculation methods under the same category name.
dfa = msp.panel_calculator(
dfx, calcs=["RYLD52 = RYLDIRS05Y_NSA  RYLDIRS02Y_NSA"], cids=cids_dm
)
dfx = msm.update_df(df=dfx, df_add=dfa) # composite extended data frame
msm.missing_in_df(dfx, xcats=["RYLD52"], cids=cids_dm) #quick check of missing values. Empty list means no missing values
No missing XCATs across DataFrame.
Missing cids for RYLD52: []
Compute panels versus basket with
make_relative_value
#
The
make_relative_value()
function generates a data frame of relative values for a given list of categories. In this case, “relative” means that the original value is compared to a basket average. By default, the basket consists of all available crosssections, and the relative value is calculated by subtracting the basket average from individual crosssection values.
By default, the function assumes that
complete_cross=False
, meaning that basket averages do not require the full set of crosssections to be calculated for a specific date but are always based on the ones available at the time.
cids_sel = cids_dm[:6]
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12"]
dfy = msp.make_relative_value(
dfx,
xcats=["PCREDITGDP_SJA_D1M1ML12"],
cids=cids_sel,
start="20000101",
blacklist=fxblack, # crosssections can be blacklisted for calculation and basket use
rel_meth="subtract",
complete_cross=False, # crosssections do not have to be complete for basket calculation
postfix="_vDM",
)
dfx = msm.update_df(df=dfx, df_add=dfy) # composite extended data frame
dfj = pd.concat([dfx[dfx["xcat"].isin(xcats_sel)], dfy])
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "PCREDITGDP_SJA_D1M1ML12_vDM"]
msp.view_timelines(
dfj,
xcats=xcats_sel,
cids=cids_sel,
ncol=3,
start="20000101",
same_y=True,
title = "Private credit growth, %oya, versus DM average"
)
By default, the basket comprises all available crosssections for every period, as defined by the
cids
argument. However, it is possible to limit the basket to a subset or a single crosssection by using the
basket=[...]
argument.
In the
make_relative_value()
function, an important decision is the use of the blacklist argument. This argument takes a dictionary of crosssections and date ranges that should be excluded from the output created by the
make_blacklist()
function. Excluding invalid or distorted data is crucial when calculating relative values because a single crosssection’s distortion can invalidate all crosssectional relative values.
cids_sel = (list(set(cids_dm)  set(["JPY"])))
xcats_sel = ["CPIC_SA_P1M1ML12"]
dfy = msp.make_relative_value(
dfx,
xcats=xcats_sel,
cids=cids_sel,
start="20000101",
blacklist=fxblack, # remove invalid observations
basket=["EUR", "USD"], # basket does not use all crosssections
rel_meth="subtract",
postfix="vG2",
)
dfx = msm.update_df(df=dfx, df_add=dfy) # composite extended data frame
#dfj = pd.concat([dfx[dfx["xcat"].isin(xcats_sel)], dfy])
msp.view_timelines(
dfx,
xcats=["CPIC_SA_P1M1ML12", "CPIC_SA_P1M1ML12vG2"],
cids=cids_sel,
ncol=3,
start="20000101",
same_y=False,
title="Core CPI inflation, %oya, versus G2 average",
)
Normalize panels with
make_zn_scores
#
The
make_zn_scores()
function is a method for normalizing values across different categories. This is particularly important when summing or averaging categories with different units and time series properties. The function computes zscores for a category panel around a specified neutral level that may be different from the mean. The term “znscore” refers to the normalized distance from the neutral value.
The default mode of the function calculates scores based on sequential estimates of means and standard deviations, using only past information. This is controlled by the
sequential=True
argument, and the minimum number of observations required for meaningful estimates is set with the
min_obs
argument. By default, the function calculates znscores for the initial sample period defined by
min_obs
on an insample basis to avoid losing history.
The means and standard deviations are reestimated daily by default, but the frequency of reestimation can be controlled with the
est_freq
argument, which can be set to weekly, monthly, or quarterly.
msp.view_timelines(
dfx,
xcats=["XCPI_NEG"],
cids=cids,
ncol=3,
start="20000101",
same_y=False,
title="Core CPI inflation, %oya, versus G2 average",
)
macros = ["XGDP_NEG", "XCPI_NEG", "XPCG_NEG", "RYLDIRS05Y_NSA"]
xcatx = macros
for xc in xcatx:
dfa = msp.make_zn_scores(
dfx,
xcat=xc,
cids=cids,
neutral="zero",
thresh=3,
est_freq="M",
pan_weight=1,
postfix="_ZN4",
)
dfx = msm.update_df(dfx, dfa)
msp.view_ranges(
dfx,
xcats=["XGDP_NEG", "XGDP_NEG_ZN4"],
kind="bar",
sort_cids_by="mean",
start="20000101",
)
Important parameters that shape the nature of the znscores are:

neutral
sets the level dividing positive and negative scores. The choices are'zero'
,'mean'
, or'median'
. 
pan_weight
sets the panel’s importance versus the individual crosssections for scaling the znscores. If the category is assumed to be homogeneous across countries regarding its signal, the weight can be close to 1 (whole panel data are the basis for the parameters). If countries are not comparable regarding category means and/or standard deviation and panel weight close to zero is preferable (parameters are all specific to crosssection). The default value is 1. 
thresh
sets the cutoff value (threshold) for winsorization in terms of standard deviations. The minimum value is 1. Setting thresh to values close to 1 will exclude particular high volatility periods from the sample. For the example above, only TRY will be affected by applying the threshold of 2.
xcat = "CPIH_SJA_P6M6ML6AR"
cids_sel = ["ZAR", "HUF", "PLN", "AUD", "JPY"]
dict_ps = { # dictionary of znscore specs
1: {
"neutral": "zero",
"pan_weight": 1,
"thresh": None,
"postfix": "ZNPZ",
"label": "panelbased scores around zero",
},
2: {
"neutral": "mean",
"pan_weight": 1,
"thresh": None,
"postfix": "ZNPM",
"label": "panelbased scores around mean",
},
3: {
"neutral": "mean",
"pan_weight": 0,
"thresh": None,
"postfix": "ZNCM",
"label": "countrybased scores around mean",
},
4: {
"neutral": "zero",
"pan_weight": 1,
"thresh": 1.5,
"postfix": "ZNPW",
"label": "panelbased winsorized scores around zero",
},
}
dfy = pd.DataFrame(columns=df.columns)
for dvs in dict_ps.values():
dfa = msp.make_zn_scores(
dfx,
xcat=xcat,
cids=cids_sel,
sequential=True,
neutral=dvs["neutral"],
pan_weight=dvs["pan_weight"],
thresh=dvs["thresh"],
postfix=dvs["postfix"],
est_freq="m",
)
dfy = msm.update_df(dfy, dfa)
#dfy = msm.update_df(dfy, dfa)
compares = [(1, 2), (2, 3), (1, 4)]
for comps in compares:
print(comps)
dv1 = dict_ps[comps[0]]
dv2 = dict_ps[comps[1]]
msp.view_ranges(
dfy,
xcats=[f"{xcat}{dv1['postfix']}", f"{xcat}{dv2['postfix']}"],
kind="box",
sort_cids_by="mean",
start="20000101",
size=(12, 4),
title=f"{xcat}: {dv1['label']} vs. {dv2['label']}",
)
(1, 2)
(2, 3)
(1, 4)
Estimate asset return elasticities with
return_beta
#
The function
return_beta()
estimates betas (elasticities) of a return category to a benchmark. It returns either just the betas or hedged returns of the crosssections. Hedged returns are returns on a composite position on the principal contract and the benchmark to offset the elasticity of the former with respect to the latter. At present, the only method used to calculate the beta is a simple OLS regression. If
oos
is set to True (default), the function calculates hedge ratios out of sample, i.e., for each period based on estimates up to the previous period. The related reestimation frequency is set with
refreq
(default monthly). The reestimation is conducted at the end of the period and used as a hedge ratio for all days in the following period. The argument
min_obs
sets the minimum number of observations, after which the hedging ratio is initially calculated. If the betas are estimated out of sample, calculations are only done for periods after the minimum number of periods are available.
dfx = dfx[["cid", "xcat", "real_date", "value"]]
cids_sel = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK"]
dfh = msp.return_beta(
dfx,
xcat="FXXR_NSA",
cids=cids_sel,
benchmark_return="USD_EQXR_NSA",
oos=False,
hedged_returns=True,
start="20020101",
refreq="m",
)
dfh
dfx = msm.update_df(df=dfx, df_add=dfh)
#dfx["xcat"].unique()
The auxiliary function
hedge_ratio_display()
visualizes the hedge ratios estimated by the
hedge_ratio()
function.
sns.set(rc={"figure.figsize": (12, 4)})
msp.beta_display(dfh)
Hedged returns vs unhedged returns can be displayed using
view_timelines()
function:
xcats_sel = ["FXXR_NSA", "FXXR_NSA_H"]
msp.view_timelines(
dfx,
xcats=xcats_sel,
cids=cids_sel,
ncol=3,
cumsum=True,
start="20000101",
same_y=False,
all_xticks=False,
title="Unhedged vs hedged cumulative FX forward returns, % of notional: dominant cross",
xcat_labels=["Unedged", "Hedged"],
height=3,
)
Generate returns (and carry) of a group of contracts with
Basket
#
The
Basket
class supports the calculation of returns and carry of groups of financial contracts using
various weighting methods. The main argument is a list of contracts. It is very important to specify any invalid blacklisted periods associated with any of the crosssections, as these invalid numbers would contaminate the whole basket.
In the below example, we instantiate a
Basket
object for a group of FX forward returns.
cids_fxcry = ["AUD", "INR", "NZD", "PLN", "TWD", "ZAR"]
ctrs_fxcry = [cid + "_FX" for cid in cids_fxcry]
basket_fxcry = msp.Basket(
dfx,
contracts=ctrs_fxcry,
ret="XR_NSA",
cry="CRR_NSA",
start="20100101",
end="20230101",
blacklist=fxblack,
)
for attribute, value in basket_fxcry.__dict__.items():
if not isinstance(
value, (pd.DataFrame, dict)
): # print all nondf and nondictionary attributes
print(attribute, " = ", value)
contracts = ['AUD_FX', 'INR_FX', 'NZD_FX', 'PLN_FX', 'TWD_FX', 'ZAR_FX']
ret = XR_NSA
ticks_ret = ['AUD_FXXR_NSA', 'INR_FXXR_NSA', 'NZD_FXXR_NSA', 'PLN_FXXR_NSA', 'TWD_FXXR_NSA', 'ZAR_FXXR_NSA']
cry_flag = True
ticks_cry = ['AUD_FXCRR_NSA', 'INR_FXCRR_NSA', 'NZD_FXCRR_NSA', 'PLN_FXCRR_NSA', 'TWD_FXCRR_NSA', 'ZAR_FXCRR_NSA']
cry = ['CRR_NSA']
wgt_flag = False
ticks_wgt = []
dfws_wgt = None
tickers = ['AUD_FXXR_NSA', 'INR_FXXR_NSA', 'NZD_FXXR_NSA', 'PLN_FXXR_NSA', 'TWD_FXXR_NSA', 'ZAR_FXXR_NSA', 'AUD_FXCRR_NSA', 'INR_FXCRR_NSA', 'NZD_FXCRR_NSA', 'PLN_FXCRR_NSA', 'TWD_FXCRR_NSA', 'ZAR_FXCRR_NSA']
start = 20100101
end = 20230101
The
make_basket
method calculates and stores all performance metrics, i.e., returns and carry, for a specific weighting method of the basket. The different weighting options available provide flexibility in constructing a composite measure that meets specific needs or objectives:
equal
: all contracts with nonNA returns have the same weight (default value)
fixed
: the weights are proportionate to a single list of values provided. This allows for more customization in the weighting of each contract based on specific preferences or criteria
invsd
: the weights based on inverse to standard deviations of recent returns. This can be useful for creating a measure that gives more weight to contracts with
more stable returns over time
. The lookback period is per default 21 observations, but can be changed with
lback_periods
. The default method is Exponential MA, it can be changed to a simple moving average, “ma” under
lback_meth
values
: the weights proportionate to a panel of values of exogenous weight category. This allows for weighting based on external factors that may be relevant to the specific contracts in the basket
inv_values
: weights are inversely proportionate to the values of an exogenous weight category. This can be useful for creating a measure that gives less weight to contracts with high values in the external factor, which may indicate greater risk or volatility.
basket_fxcry.make_basket(weight_meth="equal", basket_name="GLB_FXCRY")
basket_fxcry.make_basket(weight_meth="invsd", basket_name="GLB_FXCRYVW")
basket_fxcry.make_basket(
weight_meth="fixed",
weights=[1 / 3, 1 / 6, 1 / 12, 1 / 6, 1 / 3, 1 / 12],
basket_name="GLB_FXCRYFW",
)
The
return_basket
method returns basket performance data in a standardized format. The basket names for which the performance data are calculated can be limited by using the
basket_names
argument.
dfb = basket_fxcry.return_basket()
print(dfb.tail())
utiks = list((dfb["cid"] + "_" + dfb["xcat"]).unique())
f"Unique basket tickers: {utiks}"
cid xcat real_date value
20341 GLB FXCRYFW_XR_NSA 20221226 0.002999
20342 GLB FXCRYFW_XR_NSA 20221227 0.109833
20343 GLB FXCRYFW_XR_NSA 20221228 0.193735
20344 GLB FXCRYFW_XR_NSA 20221229 0.174977
20345 GLB FXCRYFW_XR_NSA 20221230 0.017897
"Unique basket tickers: ['GLB_FXCRY_CRR_NSA', 'GLB_FXCRY_XR_NSA', 'GLB_FXCRYVW_CRR_NSA', 'GLB_FXCRYVW_XR_NSA', 'GLB_FXCRYFW_CRR_NSA', 'GLB_FXCRYFW_XR_NSA']"
The
return_weights
method returns the effective weights used in a basket for all contracts. This can be useful if the same weights are to be used for a basket of predictive features.
dfb = basket_fxcry.return_weights()
print(dfb.head())
print(dfb["cid"].unique())
print(dfb["xcat"].unique())
dfbw = dfb.pivot_table(
index="real_date", columns=["xcat", "cid"], values="value"
).replace(0, np.nan)
dfbw.tail(3).round(2)
cid xcat real_date value
0 AUD FX_GLB_FXCRY_WGT 20100101 0.166667
1 AUD FX_GLB_FXCRY_WGT 20100104 0.166667
2 AUD FX_GLB_FXCRY_WGT 20100105 0.166667
3 AUD FX_GLB_FXCRY_WGT 20100106 0.166667
4 AUD FX_GLB_FXCRY_WGT 20100107 0.166667
['AUD' 'INR' 'NZD' 'PLN' 'TWD' 'ZAR']
['FX_GLB_FXCRY_WGT' 'FX_GLB_FXCRYVW_WGT' 'FX_GLB_FXCRYFW_WGT']
xcat  FX_GLB_FXCRYFW_WGT  FX_GLB_FXCRYVW_WGT  FX_GLB_FXCRY_WGT  

cid  AUD  INR  NZD  PLN  TWD  ZAR  AUD  INR  NZD  PLN  TWD  ZAR  AUD  INR  NZD  PLN  TWD  ZAR 
real_date  
20221228  0.29  0.14  0.07  0.14  0.29  0.07  0.08  0.26  0.09  0.27  0.22  0.07  0.17  0.17  0.17  0.17  0.17  0.17 
20221229  0.29  0.14  0.07  0.14  0.29  0.07  0.09  0.27  0.09  0.26  0.23  0.07  0.17  0.17  0.17  0.17  0.17  0.17 
20221230  0.29  0.14  0.07  0.14  0.29  0.07  0.09  0.27  0.09  0.25  0.23  0.07  0.17  0.17  0.17  0.17  0.17  0.17 
The weights used in a basket for the contracts can be plotted using the
weight_visualizer
method:
basket_fxcry.weight_visualiser(basket_name="GLB_FXCRYVW", facet_grid=True)
basket_fxcry.weight_visualiser(basket_name="GLB_FXCRYVW", subplots=False, size=(10, 4))
basket_fxcry.weight_visualiser(
basket_name="GLB_FXCRYVW", subplots=True,
)
Calculate linear combinations of panels with
linear_composite
#
The
linear_composite()
function is designed to calculate linear combinations of different categories. It can produce a composite even if some of the component data are missing. This flexibility is valuable because it enables to work with the available information rather than discarding it entirely. This behavior is desirable if one works with a composite of a set of categories that capture a similar underlying factor.
In the three examples below, the
linear_composite()
function is used to calculate the average of two inflation trend metrics by crosssection, the average of two crosssections for one category (inflation trend) using fixed weights, and the same average using another category as weights. If one of the two constituents or crosssections is missing, the composite is equal to the remaining.
# Calculation of the simple average of two inflation trend metrics by crosssection
weights = [1, 1]
signs = [1, 1]
cids_sel = ["EUR", "USD", "INR", "ZAR"]
xcats_sel = ["CPIC_SJA_P6M6ML6AR", "CPIH_SJA_P6M6ML6AR"]
dflc = msp.linear_composite(
df=dfx,
xcats=xcats_sel,
cids=cids_sel,
weights=weights,
signs=signs,
complete_xcats=False,
new_xcat="Composite",
)
df = msm.update_df(df, dflc)
msp.view_timelines(
dfx,
xcats=xcats_sel + ["Composite"],
cids=cids_sel,
ncol=2,
start="19950101",
same_y=False,
title="Core and headline inflation trends",
)
# Create a composite crosssection over one category. Difference between EUR and USD CPI trends
weights = [1,1]
signs = [1, 1] # setting weights to [1,1] and signs = [1, 1] we effectively subtract the "USD" time series from "EUR"
cids_sel = ["EUR", "USD"]
dflc = msp.linear_composite(
df=dfx,
start = "20160101",
xcats="CPIC_SJA_P6M6ML6AR",
cids=cids_sel,
weights=weights,
signs=signs,
complete_cids=False,
new_cid="EURUSD",
)
df = msm.update_df(df, dflc)
msp.view_timelines(
dfx,
xcats="CPIC_SJA_P6M6ML6AR",
cids=cids_sel+["EURUSD"],
start = "20160101",
same_y=False,
title = "Seasonally and jumpadjusted core consumer price trends, % 6m/6m ar for major markets",
)
Another example of the use of
linear_composite()
function is to create a composite category using another category as weights. The example below uses the 5year real GDP growth as a weight for the EURUSD inflation trends. The 5year real GDP growth is taken purely as an example, it would make more sense to take GDP shares as weights as done in the notebook
Business sentiment and commodity future returns
. However, the GDP shares time series is only available in full JPMaQS data, not available in the KAGGLE version, hence it is not part of this notebook.
weights = "RGDP_SA_P1Q1QL4_20QMA"
cids_sel = ["EUR", "USD"]
xcat="CPIC_SJA_P6M6ML6AR"
signs = [1, 1]
dflc = msp.linear_composite(
df=dfx,
start = "20160101",
xcats=xcat,
cids=cids_sel,
weights=weights,
signs=signs,
complete_cids=False,
new_cid="EURUSD, weighted by 5year real GDP growth (moving average)",
)
df = msm.update_df(df, dflc)
msp.view_timelines(
dfx,
xcats="CPIC_SJA_P6M6ML6AR",
cids=cids_sel+["EURUSD, weighted by 5year real GDP growth (moving average)"],
ncol=4,
start = "20160101",
same_y=False,
title = "Seasonally and jumpadjusted core consumer price trends, % 6m/6m ar for major markets",
)
This notebook as well as several other notebooks on Macrosynergy Academy site are using linear composite macro trend pressure indicator. The idea is simple: we add up (negatives of) excess growth, inflation, credit expansion, and real yield using the most common metrics. This gives a simple firstshot candidate for a trading signal. To start with, this composite indicator is not optimized. Later on, we use machine learning module of the package to optimize this indicator.
macros = ["XGDP_NEG", "XCPI_NEG", "XPCG_NEG", "RYLDIRS05Y_NSA"]
xcatx = macros
dfa = msp.linear_composite(
dfx,
xcats=[xc + "_ZN4" for xc in xcatx],
cids=cids,
new_xcat="MACRO_AVGZ",
)
dfx = msm.update_df(dfx, dfa)
Relating #
Investigate relations between panels with
CategoryRelations
#
CategoryRelations
is a tool that allows for quick visualization and analysis of
two
categories
, i.e., two timeseries panels. To use this tool, the user needs to set up certain arguments upfront that determine the period and type of aggregation for which the relation is being analyzed. Here are some of the key arguments:

The twoelement list
xcats
sets the categories to be related. For predictive relation, the first is considered the predictive feature category, and the second is the target. 
The argument
freq
determines the base period of the analysis, typically set as monthly or quarterly. Since JPMaQS data frames are daily, this requires aggregation of both categories. These are set withxcat_aggs
and can use any of pandas’ aggregation methods, such assum
orlast
. The default ismean
. 
The argument
lag
sets the lag (delay of arrival) of the first (feature) category in base periods. A positive value means that the feature is related to subsequent targets and  thus  allows analyzing its predictive power. 
The feature category can be modified by differencing or calculating percentage changes with
xcat1_chg1
argument and the auxiliaryn_periods
argument. 
A useful argument is
xcat_trims
, which removes observations above a maximum for the first and the second category in case the dataset contains invalid outliers. It trims the dataset and does not winsorize. Large values are interpreted as invalid and removed, not set to a limit. 
fwin
can be used to transform the target category into forwardmoving averages of the base period. This is useful for smoothing out volatility, but should not be used for formal inference. 
blacklist
excludes invalid periods from the analysis.
Based on the above explanation, the following instantiation prepares the analysis of the predictive power of a quarterly change of an inflation metric and the subsequent quarterly 5year IRS returns while excluding quarterly values above 10:
cr = msp.CategoryRelations(
dfx,
xcats=["CPIC_SJA_P6M6ML6AR", "DU05YXR_VT10"],
cids=cids_dm,
xcat1_chg="diff",
n_periods=1,
freq="Q",
lag=1,
fwin=1, # default forward window is one
xcat_aggs=[
"last",
"sum",
], # the first method refers to the first item in xcats list, the second  to the second
start="20000101",
xcat_trims=[10, 10],
)
for attribute, value in cr.__dict__.items():
print(attribute, " = ", value)
xcats = ['CPIC_SJA_P6M6ML6AR', 'DU05YXR_VT10']
cids = ['AUD', 'CAD', 'CHF', 'EUR', 'GBP', 'JPY', 'NOK', 'NZD', 'SEK', 'USD']
val = value
freq = Q
lag = 1
years = None
aggs = ['last', 'sum']
xcat1_chg = diff
n_periods = 1
xcat_trims = [10, 10]
slip = 0
df = CPIC_SJA_P6M6ML6AR DU05YXR_VT10
real_date cid
20051230 JPY 0.278129 1.258532
20060630 JPY 0.465870 0.099437
20060929 JPY 0.364873 8.571129
20061229 JPY 0.166131 1.365543
20070330 JPY 0.165659 1.497307
... ... ...
20230630 AUD 0.818365 6.961307
20230929 AUD 0.146163 2.444644
20231229 AUD 0.793494 3.937294
20240329 AUD 1.469322 0.685874
20240628 AUD 0.707961 7.220023
[771 rows x 2 columns]
The
.reg_scatter()
method is convenient for visualizing the relationship between two categories, including the strength of the linear association and any potential outliers. By default, it includes a regression line with a 95% confidence interval, which can help assess the significance of the relationship.
The
reg_scatter()
method allows to split the analysis by crosssection (
cid
) or
year
, which is useful for examining how the relationship between the two categories varies across different markets or over time. This can be especially interesting in cases where the relationship between the two categories is not constant over time or across different markets.
multiple_reg_scatter()
method allows comparison of several pairs of two categories relationships side by side, including the strength of the linear association and any potential outliers. By default, it includes a regression line with a 95% confidence interval, which can help assess the significance of the relationship.
The
coef_box
parameter of the
reg_scatter()
method provides details about the relationship, such as correlation coefficient and probability of significance, which can help users assess the strength and statistical significance of the relationship.
The
prob_est
argument in this context is used to specify which type of estimator to use for calculating the probability of a significant relationship between the feature category and the target category.
The default value for
prob_est
is
"pool"
, which means that all crosssections are pooled together, and the probability is based on that pool. This approach can potentially lead to issues with “pseudoreplication” if there is a correlation between the analyzed markets.
An alternative option for
prob_est
is
"map"
, which stands for “Macrosynergy panel test”. Often, crosssectional experiences are not independent and subject to common factors. Simply stacking data can lead to “pseudoreplication” and overestimated significance of correlation. A better method is to check significance through panel regression models with periodspecific random effects. This technique adjusts targets and features of the predictive regression for common (global) influences. The stronger these global effects, the greater the weight of deviations from the periodmean in the regression. In the presence of dominant global effects, the test for the significance of a feature would rely mainly upon its ability to explain crosssectional target differences. Conveniently, the method automatically accounts for the similarity of experiences across sections when assessing the significance and, hence, can be applied to a wide variety of features and targets. View a related research post
here
that provides more information on this approach.
["XGDP_NEG", "XCPI_NEG", "XPCG_NEG", "RYLDIRS05Y_NSA"]
crx = msp.CategoryRelations(
dfx,
xcats=["MACRO_AVGZ", "DU05YXR_VT10"],
cids=cids_dm,
n_periods=1,
freq="Q",
lag=1, # delay of arrival of first (explanatory) category in periods as set by freq
xcat_aggs=["last", "sum"],
start="20000101",
)
crxx = msp.CategoryRelations(
dfx,
xcats=["XCPI_NEG_ZN4", "DU05YXR_VT10"],
cids=cids_dm,
n_periods=1,
freq="Q",
lag=1, # delay of arrival of first (explanatory) category in periods as set by freq
xcat_aggs=["last", "sum"],
start="20000101",
)
crxxx = msp.CategoryRelations(
dfx,
xcats=["XGDP_NEG_ZN4", "DU05YXR_VT10"],
cids=cids_dm,
n_periods=1,
freq="Q",
lag=1, # delay of arrival of first (explanatory) category in periods as set by freq
xcat_aggs=["last", "sum"],
start="20000101",
)
crxxxx = msp.CategoryRelations(
dfx,
xcats=["XPCG_NEG_ZN4", "DU05YXR_VT10"],
cids=cids_dm,
n_periods=1,
freq="Q",
lag=1, # delay of arrival of first (explanatory) category in periods as set by freq
xcat_aggs=["last", "sum"],
start="20000101",
)
msv.multiple_reg_scatter(
[crx, crxx, crxxx, crxxxx],
title="zscored macroeconomic trends and subsequent quarterly IRS returns",
# xlab="Core CPI inflation, %oya, versus effective inflation target, relative to all DM, endofmonth",
ylab="Next quarter's 5year IRS return",
ncol=2,
nrow=2,
figsize=(15, 10),
prob_est="map",
coef_box="lower left",
subplot_titles=["zscore linear composite macro pressure indicator", "zscore negative excess inflation trend", "zscore negative excess GDP trend", "zscore negative excess private credit growth trend"],
)
The
years
parameter specifies the number of years to aggregate the data for the scatterplot, and when combined with
labels=true
, it can be used to visualize mediumterm concurrent relations. This parameter overrides the
freq
parameter and doesn’t allow lags, meaning that only the 3year aggregated feature is compared to the 3year aggregated target. This can be useful for identifying mediumterm concurrent relations.
The
separator
argument in the
.reg_scatter()
method supports visualization of the stability of the featuretarget relation for different subperiods and crosssections. When the
separator
is set to a year integer, it splits the data into two subsamples, with the second one starting from the separation year. As a result, regression lines and scatter plots are shown separately for each subsample, allowing us to visually assess the stability of the featuretarget relation before and after the separation year.
cids_sel = cids_dm[:5]
cr = msp.CategoryRelations(
df,
xcats=["FXCRR_NSA", "FXXR_NSA"],
cids=cids_sel,
freq="M",
years=3,
lag=0,
xcat_aggs=["mean", "sum"],
start="20050101",
blacklist=fxblack,
)
cr.reg_scatter(
title="Real FX carry and returns (3year periods)",
labels=True,
prob_est="map",
xlab="Real carry, % ar",
ylab="Returns, % cumulative",
coef_box="upper left",
size=(12, 6),
)
cr = msp.CategoryRelations(
dfx,
xcats=["FXCRR_NSA", "FXXR_NSA"],
cids=list(set(cids_em)  set(["ILS", "CLP"])),
freq="Q",
years=None,
lag=1,
xcat_aggs=["last", "sum"],
start="20000101",
blacklist=fxblack,
xcat_trims=[40, 20],
)
cr.reg_scatter(
title="Real FX carry and returns (excluding extreme periods)",
reg_order=1,
labels=False,
xlab="Real carry, % ar",
ylab="Next month's return",
coef_box="lower right",
prob_est="map",
separator=2010,
size=(10, 6),
)
If the
separator
argument is set to “cids”, the relationship is shown separately for all crosssections of the panel. This allows to examine whether the relationship is consistent across markets.
cr.reg_scatter(
title="Real FX carry and returns (excluding extreme periods)",
reg_order=1,
labels=False,
xlab="Real carry, % ar",
ylab="Next month's return",
separator="cids",
title_adj=1.01,
)
The basic statistics of a standard pooled linear regression analysis, combining all features and targets of the panel without further structure and effects, can be displayed based on a
statsmodels
function by calling the method
.ols_table()
. For a detailed interpretation of the results from the
.ols_table()
output, please view
this article
, which provides a general overview of interpreting linear regression results using the
statsmodels
summary table.
cr = msp.CategoryRelations(
dfx,
xcats=["CPIC_SJA_P6M6ML6AR", "DU05YXR_VT10"],
cids=cids_dm,
xcat1_chg="diff",
n_periods=1,
freq="M",
lag=1,
xcat_aggs=["last", "sum"],
start="20000101",
)
cr.ols_table()
OLS Regression Results
==============================================================================
Dep. Variable: DU05YXR_VT10 Rsquared: 0.000
Model: OLS Adj. Rsquared: 0.000
Method: Least Squares Fstatistic: 1.032
Date: Thu, 25 Apr 2024 Prob (Fstatistic): 0.310
Time: 18:56:30 LogLikelihood: 7516.3
No. Observations: 2757 AIC: 1.504e+04
Df Residuals: 2755 BIC: 1.505e+04
Df Model: 1
Covariance Type: nonrobust
======================================================================================
coef std err t P>t [0.025 0.975]

const 0.2327 0.070 3.303 0.001 0.095 0.371
CPIC_SJA_P6M6ML6AR 0.3130 0.308 1.016 0.310 0.917 0.291
==============================================================================
Omnibus: 182.732 DurbinWatson: 1.776
Prob(Omnibus): 0.000 JarqueBera (JB): 676.185
Skew: 0.234 Prob(JB): 1.47e147
Kurtosis: 5.380 Cond. No. 4.38
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Visualize relations across sections or categories with
correl_matrix
#
The
correl_matrix()
function visualizes two types of Pearson correlations:

correlations within a single category across different crosssections, or

correlations across different categories.
A key argument is
freq
, which downsamples the standard JPMaQS frequency of business daily data to weekly (‘W’), monthly (‘M’), or quarterly (‘Q’), aggregate by mean.
Additionally, the user can set the
cluster
argument to
True
to order the correlated series by proximity based on hierarchical clustering. This can help visualize groups of related variables, making it easier to identify patterns and relationships within the correlation matrix.
cids = cids_dm + cids_em
msp.correl_matrix(
dfx, xcats="FXXR_NSA", freq="Q", cids=cids, size=(15, 10), cluster=True
)
One can pass a list of categories to the
xcats
argument of the
correl_matrix()
function to display correlations across categories. The resulting output will be a matrix of correlation coefficients between the categories. The freq and cluster arguments can also be used in this case to downsample the frequency of the data and to cluster the categories based on their proximity, respectively.
xcats_sel = ecos
msp.correl_matrix(
dfx,
xcats=xcats_sel,
cids=cids,
freq="M",
start="20030101",
size=(15, 10),
cluster=True,
)
The
msp.correl_matrix()
function is designed to compute correlations between any two categories. In the context outlined below, we specifically focus on examining correlations between inflation trends and subsequent asset returns. To enhance the depth of our analysis, we introduce a flexibility in exploring various lags for explanatory variables. This flexibility is facilitated through the use of a dictionary where desired lags can be specified. In this instance, we are interested in investigating lags at monthly frequencies, with specific values set at 1, 3, and 6 months.
macroz = [m + "_ZN4" for m in macros]
feats = macroz
rets=["DU02YXR_VT10",
"DU02YXR_NSA",
"DU05YXR_VT10",
"DU05YXR_NSA",
"EQXR_NSA",
"EQXR_VT10",
"FXXR_NSA",
"FXXR_VT10"]
lag_dict = {"XGDP_NEG_ZN4": [1, 3],
"XCPI_NEG_ZN4": [1, 3], # excess inflation
"XPCG_NEG_ZN4": [1, 3], # excess real interest rate
"RYLDIRS05Y_NSA_ZN4": [1, 3]}
msp.correl_matrix(
dfx,
xcats=feats,
xcats_secondary=rets,
cids="EUR",
freq="M",
start="20030101",
lags=lag_dict,
max_color=0.4,
cluster=True,
)
Learning #
The
macrosynergy.learning
subpackage contains functions and classes to assist the creation of statistical learning solutions with macro quantamental data.
The functionality is built around integrating the macrosynergy package and associated JPMaQS data with the popular
scikitlearn
library, which provides a simple interface for fitting common statistical learning models, as well as feature selection methods, crossvalidation classes, and performance metrics.
Most standard
scikitlearn
classes and functions do not respect the panel format of quantamental dataframes, i.e., the doubleindexing by both cross section and time period. The below customized wrappers allow to apply the panel format.
Please also see the introductory notebooks where
macrosynergy.learning
is extensively employed:
The initial step in transforming JPMaQS data into a format suitable for machine learning involves the
categories_df()
function. This function converts daily data into a monthly format and introduces a lag in the feature variables, a common practice in machine learning to facilitate predictive analysis.
The next step involves constructing the feature and target dataframes:

Features (
X
): To form the monthly feature set, we select the last recorded value of daily series for each month. This approach ensures that the endofperiod snapshot, which is crucial in financial analysis, is captured, providing a clear representation of each month’s final state. Potential (daily) features (zscores) are collected in the listmacroz
. In this list we earlier collected zscores of the following variables:
“XGDP_NEG”  negative of the intuitive growth trend,

“XCPI_NEG”  negative of excess inflation measure

“XPCG_NEG”  negative of excess private credit growth,

“RYLDIRS05Y_NSA”  real IRS yield: 5year maturity, expectationsbased


Targets (
y
): The target variable is created by aggregating the daily returns over each month to derive a total monthly return. This method offers a direct target for predictive models by emphasizing the cumulative outcome for the month, rather than the daily fluctuations. In this notebook, the target dataframe includes (monthly) returns on fixed receiver position, % of risk capital on position scaled to 10% (annualized) volatility target: 5year maturityDU05YXR_VT10
# Specify features and target category
xcatx = macroz + ["DU05YXR_VT10"]
# Downsample from daily to monthly frequency (features as last and target as sum)
dfw = msm.categories_df(
df=dfx,
xcats=xcatx,
cids=cids_dux,
freq="M",
lag=1,
blacklist=fxblack,
xcat_aggs=["last", "sum"],
)
# Drop rows with missing values and assign features and target
dfw.dropna(inplace=True)
X = dfw.iloc[:, :1]
y = dfw.iloc[:, 1]
Crossvalidation methods #
Crossvalidation refers to the evaluation of a model’s predictive accuracy through multiple divisions of the data into training and validation sets. Each division is known as a “fold.” The
macrosynergy
package supports the splitting of panel data into folds through three classes:

ExpandingIncrementPanelSplit()
creates training panel splits that expand over time at fixed intervals, followed by test sets of predetermined time spans. This method allows for a progressive inclusion of more data into the training set over time. 
ExpandingKFoldPanelSplit()
also creates expanding folds and involves a fixed number of splits. Training panels in this configuration are always temporally adjacent and chronologically precede the test set, ensuring that each test phase is preceded by a comprehensive training phase. 
RollingKFoldPanelSplit()
arranges splits where training panels of a fixed maximum duration can directly precede or follow the test set, allowing the use of both past and future data in training. While this arrangement does not mimic the sequential flow of information typical in time series analysis, it effectively leverages the cyclic nature of economic data.
ExpandingIncrementPanelSplit()
#
The
ExpandingIncrementPanelSplit()
class facilitates the generation of expanding windows for crossvalidation, essential for modeling scenarios where data is incrementally available over time. This class divides the dataset into training and testing sets, systematically increasing the size of the training set by one observation with each iteration. This approach effectively simulates environments where new information is gradually incorporated at set intervals.
Important parameters here are:

train_intervals
specifies the length of the training interval in time periods. This parameter controls how much the training set expands with each new split. 
min_cids
sets the minimum number of crosssections required for the initial training set, with the default being four. This is crucial in scenarios where panel data is unbalanced, ensuring there are enough crosssections to begin the training process. 
min_periods
sets the smallest number of time periods required for the initial training set, with the default being 500 native frequency units. This is particularly important in an unbalanced panel context and should be used in conjunction withmin_cids
. 
test_size
determines the length of the test set for each training interval. By default, this is set to 21 periods, which follows the training phase. 
max_periods
defines the maximum duration that any training set can reach during the expanding process. If this cap is reached, the earliest data periods are excluded to maintain this constraint. By setting this value, rolling training is effectively performed.
split_xi = msl.ExpandingIncrementPanelSplit(train_intervals=12, min_periods=12, test_size=24, min_cids=2)
visualise_splits()
#
The method
visualise_splits
can be applied to a splitter and is a convenient method for visualizing the splits produced by each splitter based on the full data sets of features and targets.
split_xi.visualise_splits(X,y)
ExpandingKFoldPanelSplit()
#
The
ExpandingKFoldPanelSplit()
class produces sequential learning scenarios, where information sets grow at fixed intervals.
The key parameter here is
n_splits
, which determines the number of desired splits (must be at least 2). As above,
visualise_splits()
method is used to visualise if the split has been performed as intended. This replicates
scikitlearn
’s
TimeSeriesSplit
class for panelformat data.
split_xkf = msl.ExpandingKFoldPanelSplit(n_splits=5)
split_xkf.visualise_splits(X, y)
RollingKFoldPanelSplit()
#
The
RollingKFoldPanelSplit
class produces paired training and test splits, created for a data panel. It is similar to scikitlearn’s
KFold
class for simple time series. Training and test sets need to be adjacent, but the former needs not strictly precede the latter. This gives the effect of the test set “rolling” forward in time.
split_rkf = msl.RollingKFoldPanelSplit(n_splits=5)
split_rkf.visualise_splits(X, y)
Metrics #
Cross validation can be used for model selection and hyperparameter selection, but a statistic is required to be calculated to determine the optimal model. This can be a performance metric like accuracy and balanced accuracy (to be maximised) or RMSE and MAE (to be minimised).
The
macrosynergy.learning
subpackage contains a collection of custom metrics that are compatible with
scikitlearn
. All such metrics are implemented as functions accepting two arguments:
y_true
, the true targets in a supervised learning problem, and
y_pred
, the predicted targets by a trained model. These are:

panel_significance_probability()
: computes the significance probability of correlation after fitting a linear mixed effects model between predictions and true targets, accounting for crosssectional correlations present in the panel. See the research piece ‘ Testing macro trading factors ’ for more information. 
regression_accuracy()
: computes the accuracy between the signs of predictions and targets. 
regression_balanced_accuracy()
: computes the balanced accuracy between the signs of predictions and targets. 
sharpe_ratio()
: computes a naive Sharpe ratio based on the model predictions. 
sortino_ratio()
: computes a naive Sortino ratio based on the model predictions.
Feature selectors #
A
scikitlearn
pipeline can incorporate a layer of feature selection. We provide some custom selectors in the
macrosynergy.learning
subpackage for use over a panel.

LassoSelector
: selects features through a LASSO regression. Thealpha
of the regression, as well as choice of apositive
restriction, is required. 
ENetSelector
: selects features through an Elastic Net regression. Thealpha
of the regression, as well as thel1_ratio
and choice of apositive
restriction, is required. 
MapSelector
: selects features based on significance from the Macrosynergy panel test. The pvaluethreshold
is required, as well as choice of apositive
restriction. For more information on the panel test, see the research piece ‘ Testing macro trading factors ’.
Feature transformers #
Within a
scikitlearn
pipeline, it is often useful to transform features into new ones  for instance scaling and/or averaging. The
macrosynergy.learning
subpackage contains some custom transformers:

PanelStandardScaler
: transforms features by subtracting historical mean and dividing by historical standard deviation. 
PanelMinMaxScaler
: transforms features by normalizing them between zero and one. 
FeatureAverager
: condenses features into a single feature through averaging.
Predictor classes #
The last stage of any
scikitlearn
pipeline is a “predictor” class that is trained based on selected and transformed historical features and outputs predictions. We provide the following predictors in
macrosynergy.learning
:

NaivePredictor
: a naive predictor class that expects only a single feature as input, and outputs that single feature as the prediction. For instance, this could be used in conjunction withFeatureAverager
to create a signal of equally weighted feature zscores. 
SignWeightedLinearRegression
: a weighted least squares linear regression model that equalises the importance of negative return with positive return historical samples, removing a possible sign bias learnt by the model. 
TimeWeightedLinearRegression
: a weighted least squares linear regression model that increases the importance of more recent samples, by specifying ahalflife
of exponentially decaying weights with time for each historical sample. 
LADRegressor
: a linear model that is fit by minimising absolute residuals instead of squared residuals. 
SignWeightedLADRegressor
: a weighted least squares LAD regression model that equalises the importance of negative return with positive return historical samples, removing a possible sign bias learnt by the model. 
TimeWeightedLADRegressor
: a weighted least squares LAD regression model that increases the importance of more recent samples, by specifying ahalflife
of exponentially decaying weights with time for each historical sample.
Signal optimization #
The
SignalOptimizer
class is used for sequential model selection, fitting, optimization and forecasting based on quantamental panel data.
Three use cases are discussed in detail in the notebook Signal optimization basics :

Feature selection chooses from candidate features to combine them into an equally weighted score

Return prediction estimates the predictive relation of features and combines them in accordance with their coefficient into a single prediction.

Classification estimates the relation between features and the sign of subsequent returns and combines their effect into a binary variable of positive or negative returns.
Below, we showcase the second case, focusing on the principals of generation of an optimized regressionbased signal:
The main arguments for instantiating the
SignalOptimizer
are:

inner_splitter
, the splitter to be deployed for the crossvalidation that determines the choice of the model and hyperparameters, 
X
andy
, the double indexed feature matrix and target vector, 
initial_nsplits
andthreshold_ndates
, which specify the number of crossvalidation splits in the initial training set, and the number of dates to be added to this initial set in order for the number of folds to increase by one. 
blacklist
a standardized dictionary to exclude specific combinations of periods and crosssections from crossvalidation
Below, we instantiate the signal optimizer so that the initial split uses 5 crossvalidation folds and increases by one every year.
splitter_fsz = msl.RollingKFoldPanelSplit(n_splits=5)
so_reg = msl.SignalOptimizer(
inner_splitter=splitter_fsz,
X=X,
y=y,
initial_nsplits=5,
threshold_ndates=12,
blacklist=fxblack,
)
calculate_predictions()
#
The
calculate_predictions()
method returns predictions for sequentially optimized model type, hyperparameters and parameters.
Important parameters here are:

name
is a label identifying the specific signal optimization process, 
models
is dictionary of scikitlearn predictors or pipelines that contains choices for the type of model to be deployed, 
hparam_grid
is a nested dictionary defining the hyperparameters to consider for each model type, 
metric
 a scikitlearn scorer object that serves as the criterion for optimization 
min_cids
,min_periods
andtest_size
have equivalent meaning as inExpandingIncrementPanelSplit()
# Model types
mods_reg = {
"linreg": Pipeline([
('selector', msl.LassoSelector(alpha=1e3, positive=True)),
('model', LinearRegression()),
]),
}
# Hyperparameter grids
grids_reg = {
"linreg": {"model__fit_intercept": [True, False]},
}
# Optimization criterion
score_reg = make_scorer(r2_score, greater_is_better=True)
%%time
tdf = so_reg.calculate_predictions(
name="MACRO_OPTREG",
models=mods_reg,
hparam_grid=grids_reg,
metric=score_reg,
min_cids=4,
min_periods=36,
)
100%████████████████████████████████████████████████████████████████████████████████ 254/254 [00:11<00:00, 21.91it/s]
Wall time: 13.6 s
models_heatmap()
#
The
models_heatmap
method of the
SignalOptimizer
class visualizes optimal models used for signal calculation over time. If many models have been considered, their number can be limited by the
cap
argument.
# Get optimized signals and view models heatmap
dfa = so_reg.get_optimized_signals()
som = so_reg.models_heatmap(name="MACRO_OPTREG", cap=6,
title="Optimal regression model used over time", figsize=(18, 6))
display(som)
dfx = msm.update_df(dfx, dfa)
None
feature_selection_heatmap()
#
The
feature_selection_heatmap
method of the
SignalOptimizer
class visualizes the features that were selected over time by the last selector in a
scikitlearn
pipeline if it is of the appropriate time, such as the
LassoSelector
.
so_reg.feature_selection_heatmap(
name="MACRO_OPTREG", title="Feature selection heatmap", figsize=(16, 6)
)
coefs_timeplot()
#
The
coefs_timeplot
method creates a time plot of linear model regression coefficients for each feature. For these statistics to be recorded, the underlying
scikitlearn
predictor class (in this case,
LinearRegression
) must contain
coef_
and
intercept_
attributes.
Gaps in the lines appear either when a model without the required attributes (e.g. a KNN or Random Forest) is selected or a feature selector (in this case,
LassoSelector
) doesn’t select these features.
so_reg.coefs_timeplot(name="MACRO_OPTREG", figsize=(16, 6))
coefs_stackedbarplot()
#
The
coefs_stackedbarplot()
method is an alternative to
coefs_timeplot()
and displays a stacked bar plot of average annual model coefficients over time.
so_reg.coefs_stackedbarplot(name="MACRO_OPTREG", figsize=(16, 6))
intercepts_timeplot()
#
Similarly to model coefficients, changing model intercepts can be visualised over time through a timeplot using the
intercepts_timeplot()
method.
so_reg.intercepts_timeplot(name="MACRO_OPTREG", figsize=(16, 6))
nsplits_timeplot()
#
The
nsplits_timeplot()
displays number of crossvalidation splits that are applied over time. This is useful if at instantiation of
SignalOptimizer
values have been assigned to
initial_nsplits
and
threshold_ndates
that increase the number of crossvalidation folds with the sample length.
so_reg.nsplits_timeplot(name="MACRO_OPTREG")
Signaling #
SignalReturnRelations #
The
SignalReturnRelations
class from the
macrosynergy.signal
module is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals and panels of subsequent returns.
Here are some key aspects and usage details of the SignalReturnRelations class:

sig
 the list of signals or the main signal category is specified using the sig argument. Each element of the list is being analyzed in relation to subsequent returns. 
sig_neg
 takes a list of “True” and “False” values in relation to the list of signals. The default is False 
ret
specifies the panel of subsequent returns that will be analyzed in relation to the specified signal category. 
freq
denotes the frequency at which the series are sampled. The default is ‘M’ for monthly. The return series will always be summed over the sample period. The signal series will be aggregated according to the value ofagg_sig

agg_sig
specifies the aggregation method applied to the signal values in downsampling. The default is “last”. This can also be a list of various aggregation methods.
Unlike the
CategoryRelations
class, here, the focus is on a range of measures of association between signals and returns based on categorization (positive and negative returns) and parametric and nonparametric correlation.
This class applies frequency conversion, corresponding to a trading or rebalancing frequency. It also considers whether the signal is expected to predict returns positively or negatively. This is important for the interpretation of the output. One should also note that there is no regression analysis involved. This means that features should be entered with a meaningful zero value since the sign of the feature is critical for accuracy statistics.
# Instantiate signalreturn relations for the list of signals, multiple returns and frequencies
srr = mss.SignalReturnRelations(
dfx,
sigs=["MACRO_AVGZ", "MACRO_OPTREG", "XGDP_NEG_ZN4", "XCPI_NEG_ZN4", "XPCG_NEG_ZN4", "RYLDIRS05Y_NSA_ZN4"],
cosp=True,
rets=["DU05YXR_VT10", "EQXR_VT10", "FXXR_VT10"],
freqs=["M"],
blacklist=fxblack,
slip=1
)
Summary tables #
The
.summary_table()
of the
SignalReturnRelations
class gives a short highlevel snapshot of the strength and stability of the
main signal
relation (the first signal in the list of signals
sigs
, with the first sign in the list of signs
sig_neg
and the first frequency in the list of frequencies
freqs
). Unless
sig_neg
had been set to
True
at instantiation, the relation is assumed to be positive.
The columns of the summary table generally have the following interpretations:

accuracy is the ratio of correct predictions of the sign of returns to all predictions. It measures the overall accuracy of the signal’s predictions, regardless of the class imbalance between positive and negative returns.

bal_accuracy is the balanced accuracy, which takes into account the class imbalance of the dataset. It is the average of the ratios of correctly detected positive returns and correctly detected negative returns. The best value is 1 and the worst value is 0. This measure avoids inflated performance estimates on imbalanced datasets and is calculated as the average of sensitivity (true positive rate) and specificity (true negative rate). The formula with references is described here

pos_sigr is the ratio of positive signals to all predictions. It indicates the long bias of the signal, or the percentage of time the signal is predicting a positive return. The value is between 0 (no positive signals) and 1 (all signals are positive).

pos_retr is the ratio of positive returns to all observed returns. It indicates the positive bias of the returns, or the percentage of time the returns are positive. The value is between 0 (no positive returns) and 1 (all returns are positive).

pos_prec is the positive precision, which measures the ratio of correct positive return predictions to all positive predictions. It indicates how well the positive predictions of the signal have fared. The best value is 1 and the worst value is 0. A high positive precision can be easily achieved if the ratio of positive returns is high, so it is important to consider this measure in conjunction with other measures such as bal_accuracy. See more info here

neg_prec is the negative precision, which measures the ratio of correct negative return predictions to all negative predictions. It indicates how well the negative predictions of the signal have fared. Generally, good positive precision is hard to accomplish if the ratio of negative returns has been high. The best value is 1 and the worst value is 0. See more info here

pearson is the Pearson correlation coefficient between signal and subsequent return. Like other correlation coefficients, Pearson varies between 1 and +1 with 0 implying no correlation. Correlations of 1 or +1 imply an exact linear relationship.

pearson_pval is the probability that the (positive) correlation has been accidental, assuming that returns are independently distributed. Strictly speaking, this value returns a 2tailed pvalue for the null hypothesis that the correlation is 0. The pvalue roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The pvalues are not entirely reliable but are reasonable for large datasets. This statistic would be invalid for forwardmoving averages.

kendall is the Kendall rank correlation coefficient between signal and subsequent return. It is a nonparametric hypothesis test for statistical dependence. For those, who want to refresh their statistical knowledge, please read here

kendall_pval is the probability that the (positive) correlation has been accidental, assuming that returns are independently distributed. As before, the test is a twosided pvalue for the null hypothesis that the correlation is 0. Pvalue below chosen threshold (usually 0.01 or 0.05) will allow us to reject the null hypothesis. This statistic would be invalid for forwardmoving averages and for autocorrelated data.
The rows have the following meaning:

Panel refers to the whole panel of crosssections and sample period, excluding unavailable and blacklisted periods.

Mean years is the mean of the statistic across all years.

Mean cids is the mean of the statistic across all sections.

Positive ratio represents the ratio of years, if following “Mean years” (or crosssections  if following “Mean cids”) for which the corresponding statistic was above its neutral level. The neutral level is defined as 0.5 for classification ratios (such as accuracy and balanced accuracy) and positive correlation probabilities, and 0 for correlation coefficients (such as Pearson and Kendall). For example, if the Positive ratio for accuracy is 0.7, it means that out of all the years (or crosssections) analyzed, the correct sign of returns was predicted for 70% of them. If the Positive ratio for Pearson is 0.6, it indicates a strong positive correlation between the signal and returns.
srr.summary_table()
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

M: MACRO_AVGZ/last => DU05YXR_VT10  0.53884  0.53416  0.56150  0.54225  0.57221  0.49611  0.10051  0.00000  0.05846  0.00000  0.53389 
Mean years  0.53317  0.51586  0.57238  0.54456  0.56407  0.46766  0.04721  0.36458  0.02830  0.39176  0.51672 
Positive ratio  0.76000  0.64000  0.60000  0.72000  0.76000  0.40000  0.76000  0.56000  0.76000  0.48000  0.64000 
Mean cids  0.53873  0.53338  0.55570  0.54096  0.56879  0.49797  0.10604  0.27722  0.06125  0.32892  0.53214 
Positive ratio  0.79167  0.79167  0.66667  0.83333  0.91667  0.54167  1.00000  0.75000  0.83333  0.70833  0.79167 
Alternatively, for a oneline table, we can use
.single_relation_table()
. The first column lists the frequency (‘D’, ‘W’, ‘M’, ‘Q’, ‘A’) followed by the signal’s name with
_NEG
meaning negative relationship and the name of the return.
srr.single_relation_table()
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

M: MACRO_AVGZ/last => DU05YXR_VT10  0.53884  0.53416  0.5615  0.54225  0.57221  0.49611  0.10051  0.0  0.05846  0.0  0.53389 
The
cross_section_table()
method summarizes accuracy and correlationrelated measures for the panel, the mean, and the individual crosssections. It also gives a “positive ratio”, i.e., the ratio of countries with evidence of positive relations, either in terms of above 50% accuracy, positive correlation, or  more restrictively  high positive correlation probabilities. As for the
.summary_table()
, the
cross_section_table()
and
yearly_table()
analyzes the strength and stability of the
main signal
relation (the first signal in the list of signals
sigs
, with the first sign in the list of signs
sig_neg
and the first frequency in the list of frequencies
freqs
). Unless
sig_neg
had been set to
True
at instantiation, the relation is assumed to be positive.
srr.cross_section_table()
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

M: MACRO_AVGZ/last => DU05YXR_VT10  0.53884  0.53416  0.56150  0.54225  0.57221  0.49611  0.10051  0.00000  0.05846  0.00000  0.53389 
Mean  0.53873  0.53338  0.55570  0.54096  0.56879  0.49797  0.10604  0.27722  0.06125  0.32892  0.53214 
PosRatio  0.79167  0.79167  0.66667  0.83333  0.91667  0.54167  1.00000  0.75000  0.83333  0.70833  0.79167 
AUD  0.54380  0.53619  0.55839  0.56934  0.60131  0.47107  0.14511  0.01623  0.06174  0.12772  0.53640 
CAD  0.48966  0.48913  0.53448  0.50690  0.49677  0.48148  0.03633  0.53775  0.00150  0.96955  0.48918 
CHF  0.53719  0.52398  0.65702  0.54959  0.56604  0.48193  0.13194  0.03708  0.07203  0.08993  0.52183 
CLP  0.50962  0.51230  0.44712  0.52404  0.53763  0.48696  0.06107  0.38088  0.01301  0.78023  0.51219 
COP  0.51923  0.52019  0.39103  0.50000  0.52459  0.51579  0.01400  0.86230  0.04665  0.38743  0.51923 
CZK  0.48918  0.49922  0.40693  0.55411  0.55319  0.44526  0.05951  0.36795  0.01570  0.72246  0.49924 
EUR  0.59273  0.58733  0.54545  0.56727  0.64667  0.52800  0.21410  0.00035  0.13322  0.00099  0.58821 
GBP  0.53793  0.52618  0.56552  0.59310  0.61585  0.43651  0.20942  0.00033  0.10178  0.00976  0.52666 
HUF  0.57664  0.58233  0.63139  0.50000  0.56069  0.60396  0.13470  0.02577  0.08232  0.04225  0.57664 
IDR  0.62245  0.59969  0.63265  0.61224  0.68548  0.51389  0.27022  0.00013  0.15039  0.00175  0.59759 
ILS  0.55251  0.52712  0.64840  0.59361  0.61268  0.44156  0.01253  0.85368  0.00540  0.90527  0.52563 
INR  0.49074  0.49239  0.54630  0.48148  0.47458  0.51020  0.09551  0.16190  0.02214  0.62833  0.49245 
JPY  0.48966  0.46063  0.64828  0.58621  0.55851  0.36275  0.04914  0.40448  0.00260  0.94734  0.46299 
KRW  0.58333  0.58025  0.62500  0.53241  0.59259  0.56790  0.15768  0.02042  0.09320  0.04155  0.57555 
MXN  0.55455  0.55528  0.48182  0.51818  0.57547  0.53509  0.01756  0.79564  0.04242  0.34905  0.55528 
NOK  0.48276  0.48459  0.46552  0.52759  0.51111  0.45806  0.11942  0.04214  0.05940  0.13152  0.48461 
NZD  0.54828  0.54808  0.71034  0.52069  0.54854  0.54762  0.09762  0.09706  0.10994  0.00525  0.53964 
PLN  0.58029  0.57634  0.56569  0.54015  0.60645  0.54622  0.23155  0.00011  0.13128  0.00122  0.57550 
SEK  0.58156  0.57533  0.54610  0.57447  0.64286  0.50781  0.18459  0.00159  0.12005  0.00231  0.57639 
THB  0.55102  0.54386  0.67857  0.53571  0.56391  0.52381  0.01900  0.79150  0.04668  0.33128  0.53846 
TRY  0.51250  0.51167  0.38125  0.49375  0.50820  0.51515  0.02451  0.75690  0.00414  0.93766  0.51102 
TWD  0.50463  0.50506  0.49537  0.54630  0.55140  0.45872  0.08938  0.19066  0.05142  0.26081  0.50510 
USD  0.54828  0.54900  0.47241  0.51034  0.56204  0.53595  0.08206  0.16338  0.05625  0.15325  0.54887 
ZAR  0.53091  0.51501  0.70182  0.54545  0.55440  0.47561  0.08810  0.14506  0.07403  0.06728  0.51267 
The
yearly_table()
method is useful for analyzing how the performance of a trading signal varies over time by providing a breakdown of performance metrics for each year. This can help identify whether the signal has been consistently strong over time or if specific market conditions have driven its performance.
tbl_srr_year = srr.yearly_table()
tbl_srr_year.round(3)
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

M: MACRO_AVGZ/last => DU05YXR_VT10  0.539  0.534  0.562  0.542  0.572  0.496  0.101  0.000  0.058  0.000  0.534 
Mean  0.533  0.516  0.572  0.545  0.564  0.468  0.047  0.365  0.028  0.392  0.517 
PosRatio  0.760  0.640  0.600  0.720  0.760  0.400  0.760  0.560  0.760  0.480  0.640 
2000  0.550  0.563  0.475  0.750  0.816  0.310  0.006  0.960  0.018  0.816  0.583 
2001  0.463  0.402  0.716  0.597  0.542  0.263  0.119  0.172  0.004  0.939  0.418 
2002  0.565  0.489  0.780  0.631  0.626  0.351  0.091  0.243  0.072  0.167  0.492 
2003  0.518  0.471  0.786  0.565  0.553  0.389  0.050  0.518  0.016  0.763  0.480 
2004  0.560  0.515  0.679  0.631  0.640  0.389  0.011  0.889  0.029  0.574  0.514 
2005  0.530  0.533  0.458  0.536  0.571  0.495  0.053  0.492  0.020  0.695  0.533 
2006  0.507  0.496  0.302  0.476  0.471  0.522  0.010  0.884  0.036  0.409  0.497 
2007  0.540  0.536  0.341  0.476  0.523  0.548  0.053  0.402  0.048  0.261  0.532 
2008  0.500  0.543  0.313  0.599  0.659  0.428  0.173  0.005  0.112  0.007  0.539 
2009  0.485  0.497  0.869  0.481  0.481  0.514  0.088  0.143  0.079  0.050  0.499 
2010  0.601  0.555  0.739  0.623  0.652  0.458  0.128  0.034  0.093  0.022  0.545 
2011  0.544  0.522  0.591  0.626  0.645  0.400  0.062  0.301  0.004  0.920  0.523 
2012  0.529  0.509  0.598  0.605  0.612  0.405  0.030  0.621  0.012  0.761  0.509 
2013  0.500  0.514  0.703  0.471  0.479  0.549  0.082  0.174  0.054  0.179  0.512 
2014  0.610  0.567  0.606  0.716  0.769  0.365  0.114  0.065  0.104  0.012  0.579 
2015  0.560  0.565  0.760  0.524  0.555  0.576  0.098  0.106  0.066  0.104  0.548 
2016  0.522  0.519  0.623  0.514  0.529  0.510  0.026  0.668  0.023  0.573  0.518 
2017  0.544  0.550  0.431  0.530  0.587  0.512  0.158  0.008  0.057  0.156  0.549 
2018  0.479  0.484  0.448  0.545  0.527  0.440  0.028  0.632  0.001  0.988  0.484 
2019  0.552  0.523  0.649  0.604  0.620  0.426  0.020  0.736  0.037  0.349  0.522 
2020  0.554  0.446  0.833  0.627  0.609  0.283  0.037  0.538  0.099  0.015  0.468 
2021  0.511  0.494  0.446  0.348  0.341  0.647  0.083  0.170  0.013  0.745  0.494 
2022  0.681  0.515  0.080  0.290  0.318  0.713  0.073  0.224  0.055  0.176  0.505 
2023  0.554  0.597  0.312  0.576  0.709  0.484  0.112  0.063  0.090  0.027  0.585 
2024  0.370  0.491  0.772  0.272  0.268  0.714  0.192  0.066  0.121  0.087  0.492 
multiple_relations_table()
is a method that compares multiple signalreturn relations in one table. It is useful to compare the performance of different signals against the same return series (more than one possible financial return) and multiple possible frequencies. The method returns a table with standard columns used for
single_relation_table()
and other tables, but the rows display different signals from the list of signals specified upon SignalReturnsRelations ()
sigs
. The row names indicate the frequency (‘D,’ ‘W,’ ‘M,’ ‘Q,’ ‘A’) followed by the signal’s and return’s names.
tbl_srr_multi=srr.multiple_relations_table()
tbl_srr_multi.round(3)
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

Return  Signal  Frequency  Aggregation  
DU05YXR_VT10  MACRO_AVGZ  M  last  0.537  0.534  0.543  0.532  0.563  0.505  0.091  0.000  0.055  0.000  0.534 
RYLDIRS05Y_NSA_ZN4  M  last  0.535  0.527  0.689  0.532  0.549  0.505  0.058  0.000  0.040  0.000  0.523  
XCPI_NEG_ZN4  M  last  0.517  0.515  0.527  0.532  0.547  0.484  0.034  0.018  0.023  0.017  0.515  
XGDP_NEG_ZN4  M  last  0.524  0.522  0.545  0.532  0.552  0.492  0.044  0.002  0.036  0.000  0.522  
XPCG_NEG_ZN4  M  last  0.512  0.523  0.355  0.532  0.562  0.484  0.057  0.000  0.044  0.000  0.521  
MACRO_OPTREG  M  last  0.546  0.539  0.722  0.532  0.554  0.525  0.085  0.000  0.054  0.000  0.532  
EQXR_VT10  MACRO_AVGZ  M  last  0.527  0.516  0.554  0.602  0.616  0.416  0.055  0.001  0.028  0.014  0.517 
RYLDIRS05Y_NSA_ZN4  M  last  0.511  0.476  0.661  0.602  0.585  0.367  0.049  0.004  0.033  0.004  0.478  
XCPI_NEG_ZN4  M  last  0.547  0.537  0.547  0.602  0.635  0.439  0.074  0.000  0.046  0.000  0.539  
XGDP_NEG_ZN4  M  last  0.510  0.502  0.541  0.602  0.603  0.400  0.037  0.029  0.010  0.366  0.502  
XPCG_NEG_ZN4  M  last  0.470  0.497  0.367  0.602  0.598  0.396  0.015  0.375  0.009  0.426  0.497  
MACRO_OPTREG  M  last  0.547  0.502  0.724  0.602  0.602  0.401  0.018  0.297  0.013  0.255  0.501  
FXXR_VT10  MACRO_AVGZ  M  last  0.513  0.512  0.549  0.519  0.529  0.494  0.044  0.003  0.031  0.002  0.512 
RYLDIRS05Y_NSA_ZN4  M  last  0.522  0.517  0.696  0.518  0.529  0.505  0.061  0.000  0.048  0.000  0.514  
XCPI_NEG_ZN4  M  last  0.506  0.505  0.528  0.518  0.523  0.486  0.005  0.748  0.008  0.441  0.505  
XGDP_NEG_ZN4  M  last  0.510  0.509  0.552  0.518  0.526  0.491  0.036  0.015  0.022  0.024  0.508  
XPCG_NEG_ZN4  M  last  0.504  0.510  0.356  0.518  0.531  0.488  0.005  0.746  0.003  0.771  0.509  
MACRO_OPTREG  M  last  0.523  0.519  0.726  0.518  0.529  0.509  0.060  0.000  0.053  0.000  0.515 
The single_statistic_table() method generates a table and heatmap featuring a singular statistic for each signalreturn correlation. Users can select their preferred statistic from the available options, including “accuracy,” “bal_accuracy,” “pos_sigr,” “pos_retr,” “pos_prec,” “neg_prec,” “kendall,” “kendall_pval,” “pearson,” and “pearson_pval.” The heatmap, where darker (blue) shades indicate higher (positive) values, allows users to visually compare the statistics across different signals for all frequencies (as indicated after the
\
following the return’s name).
srr.single_statistic_table(stat="bal_accuracy", show_heatmap=True, min_color= 0.4, max_color = 0.6)
Return  DU05YXR_VT10  EQXR_VT10  FXXR_VT10  

Frequency  M  M  M  
Signal  Aggregation  
MACRO_AVGZ  last  0.531457  0.511747  0.506784 
MACRO_OPTREG  last  0.539360  0.501515  0.518787 
XGDP_NEG_ZN4  last  0.515001  0.500994  0.505596 
XCPI_NEG_ZN4  last  0.513368  0.534829  0.501870 
XPCG_NEG_ZN4  last  0.526960  0.491406  0.506858 
RYLDIRS05Y_NSA_ZN4  last  0.534240  0.473582  0.516607 
Correlation_bars #
The method
.correlation_bars()
visualizes positive correlation probabilities based on parametric (Pearson) and nonparametric (Kendall) correlation statistics and compares signals between each other, across countries, or years.
The
type
argument in the
.correlation_bars()
method determines how the correlation probabilities are grouped and visualized:

If
type='signals'
, the method will plot the correlation probabilities for each signal, comparing them against each other. 
If
type='cross_section'
, the method will plot the correlation probabilities for each crosssection (e.g. county), comparing them against each other. 
If
type='years'
, the method will plot the correlation probabilities for each year, comparing them against each other.
srr.correlation_bars(type="signals", size=(15, 3), title="Positive correlation probability of signals with 2years voltargeted duration return")
srr.correlation_bars(type="cross_section", title="Positive correlation probability of main signal with 5years voltargeted duration return across currencies", size=(15, 3))
srr.correlation_bars(type="years", size=(15, 3), title="Positive correlation probability of main signal with 5years voltargeted duration return across years")
Accuracy_bars #
The
accuracy_bars
method operates analogously to the
correlation_bars
method. Only it shows accuracy and balanced accuracy of the predicted relationship.
srr.accuracy_bars(type="cross_section", title="Accuracy for sign prediction across currencies for the main signalreturn relationship", size=(15, 3))
Using
SignalReturnRelations()
function, we can compare side by side predictive power of the two composite signals :

the average macro unoptimized signal
MACRO_AVGZ
, 
sequentially optimized forecasts
MACRO_OPTREG
, for 5year duration returns:
## Compare optimized signals with simple average zscores
srr = mss.SignalReturnRelations(
df=dfx,
rets=["DU05YXR_VT10"],
sigs=["MACRO_AVGZ", "MACRO_OPTREG"],
cosp=True,
freqs=["M"],
agg_sigs=["last"],
start="20040101",
blacklist=fxblack,
slip=1,
)
tbl_srr = srr.signals_table()
tbl_srr.round(3)
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

Return  Signal  Frequency  Aggregation  
DU05YXR_VT10  MACRO_AVGZ  M  last  0.537  0.535  0.538  0.532  0.564  0.505  0.092  0.0  0.056  0.0  0.534 
MACRO_OPTREG  M  last  0.546  0.540  0.717  0.532  0.555  0.524  0.086  0.0  0.054  0.0  0.532 
Backtesting #
NaivePnL #
Instantiation #
NaivePnl()
class is designed to provide a quick and simple overview of a stylized PnL profile of a set of trading signals. The class carries the label
naive
because its methods do not consider transaction costs or position limitations, such as risk management considerations. This is deliberate because costs and limitations are specific to trading size, institutional rules, and regulations.
The class allows a single target return category to be assigned to the
ret
argument, defined as the return on a position corresponding to one unit in the signal. A set of signal categories can be assigned as a list of categories to the
sigs
argument. If the user wishes to evaluate the PnL using benchmark returns, these can be passed as a list of full tickers to the
bms
argument. The instantiation of the class determines the target and the scope of all subsequent analyses, i.e., the period and the set of eligible countries. All other choices can be made subsequently.
sigs = ["MACRO_AVGZ", "MACRO_OPTREG", "XGDP_NEG_ZN4", "XCPI_NEG_ZN4", "XPCG_NEG_ZN4", "RYLDIRS05Y_NSA_ZN4"]
naive_pnl = msn.NaivePnL(
dfx,
ret="DU05YXR_VT10",
sigs=sigs,
cids=cids,
start="20040101",
blacklist=fxblack,
bms=["USD_DU05YXR_NSA"],
)
make_pnl #
The
make_pnl()
method calculates a daily PnL for a specific signal category and adds it to the main dataframe of the class instance. Indeed a single signaling category can result in a wide array of actual signals depending on the choices of its final form.
In particular, the signal transformation option (
sig_op
) manages the distribution of the traded signal and gives the following options:

zn_score_pan
transforms raw signals into zscores around zero value based on the whole panel. The neutral level & standard deviation will use the crosssection of panels. znscore here means standardized score with zero being the neutral level and standardization through division by mean absolute value. Seemake_zn_scores()
function explained in this notebook 
zn_score_cs
transforms raw signals into zscores around zero value based on crosssection alone 
binary
transforms the category values into simple long/shorts (1/1) signals.
Other important choices include:

the signal direction parameter
sig_neg
can be set toTrue
if the negative value of the transformed signal should be used for PnL calculation, 
rebalancing frequency (
rebal_freq
) for positions according to signal must be one of ‘daily’ (default), ‘weekly’ or ‘monthly’, 
rebalancing slippage (
rebal_slip
) in days, where the default is 1, which means that it takes one day to rebalance the position and that the new position produces PnL from the second day after the signal has been recorded, 
threshold value (
thresh
) beyond which scores are winsorized, i.e. contained at that threshold. This is often realistic, as risk management and the potential of signal value distortions typically preclude outsized and concentrated positions within a strategy.
The method also allows expost scaling of PnL to an annualized volatility by assigning an annualized standard deviation of the aggregate PnL to the
vol_scale
argument. This is for comparative visualization only and very different from apriori volatility targeting.
Method calls add specified PnLs to the class instance for subsequent analysis.
for sig in sigs:
naive_pnl.make_pnl(
sig,
sig_neg=False,
sig_op="zn_score_pan",
rebal_freq="monthly",
vol_scale=10,
rebal_slip=1,
thresh=2,
pnl_name=sig + "_NEGPZN",
)
for sig in sigs:
naive_pnl.make_pnl(
sig,
sig_neg=False,
sig_op="binary",
rebal_freq="monthly",
vol_scale=10,
rebal_slip=1,
thresh=2,
pnl_name=sig + "_NEGBN",
)
make_long_pnl #
Based on the provided information, the
pnl.make_long_pnl
function adds a daily longonly PnL with an equal position signal across markets and time. This can serve as a baseline for comparison against the signaladjusted returns. The
vol_scale
parameter is an expost scaling factor that adjusts the PnL to the annualized volatility given. This is likely used for comparative visualization to assess the trading strategy’s performance relative to its risk level.
naive_pnl.make_long_pnl(vol_scale=10, label="Long_Only")
plot_pnls #
Available naive PnLs can be listed with the
pnl_names
attribute:
The
plot_pnls()
method of the
NaivePnl()
class is used to plot a line chart of cumulative PnL. The method can plot:

A single PnL category for a single crosssection, where the user can assign a single crosssection or “ALL” to the
pnl_cids
argument and a single PnL category to thepnl_cat
s argument. 
Multiple crosssections per PnL type, where the user can assign a list of crosssections to the
pnl_cids
argument and a single PnL category to thepnl_cats
argument. 
Multiple PnL types per crosssection, where the user can assign a list of PnL categories to the
pnl_cats
argument and a single crosssection or “ALL” to thepnl_cids
argument.
dict_labels = {"MACRO_AVGZ_NEGBN": "Binary composite macro trend pressure, % ar, in excess of benchmarks",
"MACRO_OPTREG_NEGBN": "Binary optimized regression forecasts, % ar, in excess of benchmarks",
"Long_Only": "Longonly",
"XGDP_NEG_ZN4_NEGBN": "Binary excess GDPbased signal, % ar, in excess of benchmarks",
"XCPI_NEG_ZN4_NEGBN": "Binary excess CPIbased signal, % ar, in excess of benchmarks",
"XPCG_NEG_ZN4_NEGBN": "Binary excess private consumptionbased signal, % ar, in excess of benchmarks"
}
naive_pnl.plot_pnls(
pnl_cats=[
"MACRO_AVGZ_NEGBN",
"MACRO_OPTREG_NEGBN",
"XGDP_NEG_ZN4_NEGBN",
"XCPI_NEG_ZN4_NEGBN",
"XPCG_NEG_ZN4_NEGBN",
"Long_Only",
],
xcat_labels=dict_labels,
pnl_cids=["ALL"],
start="20040101",
end="20231231",
)
cids_sel = ["EUR", "GBP", "USD"]
naive_pnl.plot_pnls(
pnl_cats=["MACRO_AVGZ_NEGPZN"],
pnl_cids=cids_sel,
start="20040101",
# end="20210101",
)
evaluate_pnls #
The method
evaluate_pnls()
returns a small dataframe of key PnL statistics. For definitions of Sharpe and Sortino ratios, please see
here
The table can only have multiple PnL categories or multiple crosssections, not both at the same time. The table also shows the daily benchmark correlation of PnLs.
df_eval = naive_pnl.evaluate_pnls(
pnl_cats=["MACRO_AVGZ_NEGBN", "MACRO_OPTREG_NEGBN", ], pnl_cids=["ALL"], start="20040101", end="20231201"
)
display(df_eval.astype("float").round(2))
xcat  MACRO_AVGZ_NEGBN  MACRO_OPTREG_NEGBN 

Return (pct ar)  11.60  11.10 
St. Dev. (pct ar)  9.94  9.93 
Sharpe Ratio  1.17  1.12 
Sortino Ratio  1.71  1.64 
Max 21day draw  22.75  15.97 
Max 6month draw  36.21  21.62 
USD_DU05YXR_NSA correl  0.05  0.23 
Traded Months  240.00  240.00 
signal_heatmap #
The
signal_heatmap()
method creates a heatmap of signals for a specific PnL across time and sections. The time axis refers to period averages, and the default frequency is monthly (specified with freq=’m’), but quarterly is also an option (freq=’q’).
The heatmap displays each signal as a colored square, with the color representing the signal value. The user can specify the particular strategy by specifying “pnl_name”. By default, the method plots all available crosssections. The heatmap provides an intuitive representation of the signal values, allowing the user to identify the patterns and trends across time and sections.
The
signal_heatmap()
method includes a color bar legend that shows the signal values and their corresponding color. If a threshold value is provided in the
make_pnl()
function, the
signal_heatmap()
method limits the largest contribution to the specified threshold value. This truncation ensures that any signals with extreme values greater than the threshold will not dominate the visualization, which is important from a risk management perspective.
naive_pnl.signal_heatmap(
pnl_name="MACRO_OPTREG_NEGPZN",
pnl_cids=["EUR", "USD", "GBP"],
freq="q",
start="20040101",
)
agg_signal_bars #
The method
agg_signal_bars()
indicates the strength and direction of the aggregate signal. If
metric='direction'
is chosen, it just adds up the signal across all sections. Long and short signals cancel each other out. If
metric='strength'
is selected, the aggregate absolute signal is displayed, and there is no offset. The method allows for visually understanding the overall direction of the aggregate signal and gaining insight into the proportional exposure to the respective signal by measuring the absolute value, the size of the signal. The question is: “Is the PnL value generated by large returns or a large signal?”
naive_pnl.agg_signal_bars(
pnl_name="MACRO_OPTREG_NEGPZN",
freq="q",
metric="direction",
)
naive_pnl.agg_signal_bars(
pnl_name="MACRO_OPTREG_NEGPZN",
freq="q",
metric="strength",
)