Signal optimization basics #
This notebook illustrates the points discussed in the post “Optimizing macro trading signals – A practical introduction” on the Macrosynergy website. It demonstrates how sequential signal optimization can be performed, using the
macrosynergy.learning
subpackage, together with the popular
scikitlearn
package. The post applies statistical learning methods for the sequential optimization of the three important tasks: feature selection, return prediction, and market regime classification. The notebook uses the small subset of
JPMaQS dataset
available on Kaggle.
The notebook is organized into three main sections:

Get Packages and JPMaQS Data: This section is dedicated to installing and importing the necessary Python packages for the analysis. It includes standard Python libraries like pandas and seaborn, as well as the
scikitlearn
package and the specializedmacrosynergy
package. 
Transformations and Checks: In this part, the notebook conducts data calculations and transformations to derive relevant signals and targets for the analysis. This involves normalizing feature variables using zscores and constructing simple linear composite indicators. A notable composite indicator,
MACRO_AVGZ
, is created by combining four quantamental indicators (excess GDP growth, excess inflation, excess private credit growth, and real 5year yield). This composite indicator, previously utilized in the Kaggle notebook “Trading strategies with JPMaQS” , serves as a benchmark for the subsequent machine learning applications. The primary goal is to assess whether sequential optimization enhances predictive power and value generation compared to an unweighted composite. 
The third part exemplifies three applications of machine learning:

Feature selection: This segment employs a statistical learning method to sequentially choose an optimal method for selecting feature scores. A comparison is made between this indicator
MACRO_OPTSELZ
and a simple unweighted composite scoreMACRO_AVGZ
using standard value checks and performance metrics used in previous posts and notebooks. 
Prediction: This method focuses on selecting the optimal prediction method for monthly target returns and applying its predictions as signals. The outcome
MACRO_OPTREG
is also compared with the nonoptimized and nonweighted composite signalMACRO_AVGZ
. 
Classification: Statistical learning is applied for the classification of the rates market environment into “good” or “bad” for the target return. An optimal classifier of the direction of market returns is chosen (
MACRO_OPTCLASS
), and the predicted class serves as a binary trading signal for each currency area. The unweighted linear compositeMACRO_AVGZ
is also used as a benchmark in this case.

Notably, this notebook is the first to utilize the macrosynergy subpackage
macrosynergy.learning
, which integrates the
macrosynergy
package and associated JPMaQS data with the widelyused
scikitlearn
library. This notebook establishes the basic statistical learning applications to support trading signal generation through sequential optimization based on panel crossvalidation.
Get packages and JPMaQS data #
This notebook primarily relies on the standard packages available in the Python data science stack. However, the
macrosynergy
package is additionally required for two purposes:

Downloading JPMaQS data: The macrosynergy package facilitates the retrieval of JPMaQS data used in the notebook. For users using the free Kaggle subset , this part of the
macrosynergy
package is not required. 
For analyzing quantamental data and value propositions: The macrosynergy package provides functionality for performing quick analyses of quantamental data and exploring value propositions. The new subpackage macrosynergy.learning integrates the
macrosynergy
package and associated JPMaQS data with the widelyusedscikitlearn
library and is used for sequential signal optimization.
For detailed information and a comprehensive understanding of the macrosynergy package and its functionalities, please refer to the “Introduction to Macrosynergy package” notebook on the Macrosynergy Quantamental Academy or visit the following link on Kaggle.
# Run only if needed!
"""
%%capture
! pip install macrosynergy upgrade"""
'\n%%capture\n! pip install git+https://github.com/macrosynergy/macrosynergy@develop'
import os
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.metrics import (
make_scorer,
balanced_accuracy_score,
r2_score,
)
import macrosynergy.management as msm
import macrosynergy.panel as msp
import macrosynergy.pnl as msn
import macrosynergy.signal as mss
import macrosynergy.learning as msl
from macrosynergy.download import JPMaQSDownload
import warnings
warnings.simplefilter("ignore")
The JPMaQS indicators we consider are downloaded using the J.P. Morgan Dataquery API interface within the
macrosynergy
package. This is done by specifying ticker strings, formed by appending an indicator category code
DB(JPMAQS,<cross_section>_<category>,<info>)
, where
value
giving the latest available values for the indicator
eop_lag
referring to days elapsed since the end of the observation period
mop_lag
referring to the number of days elapsed since the mean observation period
grade
denoting a grade of the observation, giving a metric of real time information quality.
After instantiating the
JPMaQSDownload
class within the
macrosynergy.download
module, one can use the
download(tickers,start_date,metrics)
method to easily download the necessary data, where
tickers
is an array of ticker strings,
start_date
is the first collection date to be considered and
metrics
is an array comprising the times series information to be downloaded. For more information see
here
or use the free dataset on
Kaggle
.
In the cell below, we specified crosssections used for the analysis. For the abbreviations, please see About Dataset
# Crosssections of interest
cids_dm = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK", "USD"]
cids_em = [
"CLP",
"COP",
"CZK",
"HUF",
"IDR",
"ILS",
"INR",
"KRW",
"MXN",
"PLN",
"THB",
"TRY",
"TWD",
"ZAR",
]
cids = cids_dm + cids_em
cids_du = cids_dm + cids_em
cids_dux = list(set(cids_du)  set(["IDR", "NZD"]))
cids_xg2 = list(set(cids_dux)  set(["EUR", "USD"]))
# Quantamental categories of interest
main = [
"RYLDIRS05Y_NSA",
"INTRGDPv5Y_NSA_P1M1ML12_3MMA",
"CPIC_SJA_P6M6ML6AR",
"CPIH_SA_P1M1ML12",
"INFTEFF_NSA",
"PCREDITBN_SJA_P1M1ML12",
"RGDP_SA_P1Q1QL4_20QMA",
]
mkts = [
"DU05YXR_VT10",
"FXTARGETED_NSA",
"FXUNTRADABLE_NSA"
]
xcats = main + mkts
The description of each JPMaQS category is available either under Macro Quantamental Academy , JPMorgan Markets (password protected), or on Kaggle (just for the tickers used in this notebook). In particular, the set used for this notebook is using Consumer price inflation trends , Inflation targets , Intuitive growth estimates , Longterm GDP growth , Private credit expansion , Duration returns , and FX tradeability and flexibility
# Resultant tickers for download
tickers = [cid + "_" + xcat for cid in cids for xcat in xcats]
# Download series from J.P. Morgan DataQuery by tickers
start_date = "20000101"
end_date = None
# Retrieve credentials
oauth_id = os.getenv("DQ_CLIENT_ID") # Replace with own client ID
oauth_secret = os.getenv("DQ_CLIENT_SECRET") # Replace with own secret
# Download from DataQuery
with JPMaQSDownload(client_id=oauth_id, client_secret=oauth_secret) as downloader:
df = downloader.download(
tickers=tickers,
start_date=start_date,
end_date=end_date,
metrics=["value"],
suppress_warning=True,
show_progress=True,
)
dfx = df.copy()
dfx.info()
Downloading data from JPMaQS.
Timestamp UTC: 20240305 12:05:54
Connection successful!
Requesting data: 100%█████████████████████████████████████████████████████████████████ 12/12 [00:02<00:00, 4.71it/s]
Downloading data: 100%████████████████████████████████████████████████████████████████ 12/12 [00:08<00:00, 1.47it/s]
Some expressions are missing from the downloaded data. Check logger output for complete list.
2 out of 240 expressions are missing. To download the catalogue of all available expressions and filter the unavailable expressions, set `get_catalogue=True` in the call to `JPMaQSDownload.download()`.
Some dates are missing from the downloaded data.
2 out of 6309 dates are missing.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1423126 entries, 0 to 1423125
Data columns (total 4 columns):
# Column NonNull Count Dtype
   
0 real_date 1423126 nonnull datetime64[ns]
1 cid 1423126 nonnull object
2 xcat 1423126 nonnull object
3 value 1423126 nonnull float64
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 43.4+ MB
Availability and blacklisting #
It is essential to assess data availability before conducting any analysis. It allows for the identification of any potential gaps or limitations in the dataset, which can impact the validity and reliability of analysis, ensure that a sufficient number of observations for each selected category and crosssection is available, and determine the appropriate periods for analysis.
The
missing_in_df()
function in
macrosynergy.management
allows the user to quickly check whether or not all requested categories have been downloaded.
msm.missing_in_df(df, xcats=xcats, cids=cids)
Missing xcats across df: []
Missing cids for CPIC_SJA_P6M6ML6AR: []
Missing cids for CPIH_SA_P1M1ML12: []
Missing cids for DU05YXR_VT10: []
Missing cids for FXTARGETED_NSA: ['USD']
Missing cids for FXUNTRADABLE_NSA: ['USD']
Missing cids for INFTEFF_NSA: []
Missing cids for INTRGDPv5Y_NSA_P1M1ML12_3MMA: []
Missing cids for PCREDITBN_SJA_P1M1ML12: []
Missing cids for RGDP_SA_P1Q1QL4_20QMA: []
Missing cids for RYLDIRS05Y_NSA: []
The
check_availability()
function in
macrosynergy.management
displays the start dates from which each category is available for each requested country, as well as missing dates or unavailable series.
msm.check_availability(df=dfx, xcats=xcats, cids=cids, missing_recent=False)
Identifying and isolating periods of official exchange rate targets, illiquidity, or convertibilityrelated distortions in FX markets is the first step in creating an FX trading strategy. These periods can significantly impact the behavior and dynamics of currency markets, and failing to account for them can lead to inaccurate or misleading findings. The `make_blacklist()`` helper function creates a standardized dictionary of blacklist periods:
# Create blacklisting dictionary
dfb = df[df["xcat"].isin(["FXTARGETED_NSA", "FXUNTRADABLE_NSA"])].loc[
:, ["cid", "xcat", "real_date", "value"]
]
dfba = (
dfb.groupby(["cid", "real_date"])
.aggregate(value=pd.NamedAgg(column="value", aggfunc="max"))
.reset_index()
)
dfba["xcat"] = "FXBLACK"
fxblack = msp.make_blacklist(dfba, "FXBLACK")
fxblack
{'CHF': (Timestamp('20111003 00:00:00'), Timestamp('20150130 00:00:00')),
'CZK': (Timestamp('20140101 00:00:00'), Timestamp('20170731 00:00:00')),
'ILS': (Timestamp('20000103 00:00:00'), Timestamp('20051230 00:00:00')),
'INR': (Timestamp('20000103 00:00:00'), Timestamp('20041231 00:00:00')),
'THB': (Timestamp('20070101 00:00:00'), Timestamp('20081128 00:00:00')),
'TRY_1': (Timestamp('20000103 00:00:00'), Timestamp('20030930 00:00:00')),
'TRY_2': (Timestamp('20200101 00:00:00'), Timestamp('20240304 00:00:00'))}
Transformation and checks #
Signal constituent candidates #
In this part of the analysis, we create a simple, plausible composite signal based on four quantamental indicators:

Excess inflation (defined the difference between information states of consumer price inflation (view documentation here and a currency area’s estimated effective inflation target (view documentation here ).)

Excess private credit growth: This is the difference between annual growth rates of private credit that are statistically adjusted for jumps (view documentation here ) and the sum of a currency areas 5year median GDP growth and effective inflation target.

Real 5year yield: This real yield is calculated as the 5year swap yield (view documentation here ) minus 5year ahead estimated inflation expectation according to a Macrosynergy methodology (view documentation here ).
calcs = [
"XGDP_NEG =  INTRGDPv5Y_NSA_P1M1ML12_3MMA",
"XCPI_NEG =  ( CPIC_SJA_P6M6ML6AR + CPIH_SA_P1M1ML12 ) / 2 + INFTEFF_NSA",
"XPCG_NEG =  PCREDITBN_SJA_P1M1ML12 + INFTEFF_NSA + RGDP_SA_P1Q1QL4_20QMA",
]
dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids)
dfx = msm.update_df(dfx, dfa)
Individual and average zscores #
Normalizing values across different categories is a common practice in macroeconomics. This is particularly important when summing or averaging categories with different units and time series properties. Using
macrosynergy's
custom function
make_zn_scores()
we normalize the selected scores around neutral value (zero), using only past information. Reestimation is done on a monthly basis. We protect against outliers using 3 standard deviations as the threshold. The normalized indicators receive postfix
_ZN4
. These four normalized scores are then averaged using
linear_composite()
function from the
macrosynergy
package.
macros = ["XGDP_NEG", "XCPI_NEG", "XPCG_NEG", "RYLDIRS05Y_NSA"]
xcatx = macros
for xc in xcatx:
dfa = msp.make_zn_scores(
dfx,
xcat=xc,
cids=cids,
neutral="zero",
thresh=3,
est_freq="M",
pan_weight=1,
postfix="_ZN4",
)
dfx = msm.update_df(dfx, dfa)
dfa = msp.linear_composite(
df=dfx,
xcats=[xc + "_ZN4" for xc in xcatx],
cids=cids,
new_xcat="MACRO_AVGZ",
)
dfx = msm.update_df(dfx, dfa)
The macrosynergy package provides two useful functions,
view_ranges()
and
view_timelines()
. The latter facilitate convenient data visualization for selected indicators and crosssections plotting time series of the four chosen zscores for selected crosssections.
macroz = [m + "_ZN4" for m in macros]
xcatx = macroz
msp.view_timelines(
dfx,
xcats=xcatx,
cids=cids_dux,
ncol=4,
start="20000101",
title="Quantamental indicators, zscores, daily information states (> 0 means presumed positive IRS return impact)",
title_fontsize=30,
same_y=False,
cs_mean=False,
xcat_labels=["Excess growth", "Excess inflation", "Excess credit growth", "Real yield"],
legend_fontsize=16,
)
Here we plot with the function
view_timelines()
the resulting unweighted and unoptimizes composite score
MACRO_AVGZ
. This indicator will be used as a benchmark signal for the evaluation if machine learning applications can improve the value generation of this signal.
xcatx = ["MACRO_AVGZ"]
msp.view_timelines(
dfx,
xcats=xcatx,
cids=cids_dux,
ncol=4,
start="20000101",
title="Composite equallyweighted quantamental macro score (> 0 means presumed positive IRS return impact)",
title_fontsize=30,
same_y=False,
cs_mean=False,
xcat_labels=None,
)
Trading strategies with JPMaQS notebook goes into more details of how well this composite signal performs for various countries, for relative returns etc. The purpose of this notebook is different: it compares the most simple strategy based on this signal for all available crosssections with the three potential machine learning enhancements.
Features and targets for scikitlearn #
As the first preparation for machine learning, we downsample the daily information states to monthly frequency with the help of
categories_df()
function applying the leg of 1 month and using the last value in the month for explanatory variables and sum for the aggregated target (return). As explanatory variables, we use separate zscores of excess GDP growth, inflation, private credit growth, and real 5year yield (all these indicators have postfix
_ZN4
). As a target, we use
DU05YXR_VT10
, Duration return for 10% vol target: 5year maturity.
# Specify features and target category
xcatx = macroz + ["DU05YXR_VT10"]
# Downsample from daily to monthly frequency (features as last and target as sum)
dfw = msm.categories_df(
df=dfx,
xcats=xcatx,
cids=cids_dux,
freq="M",
lag=1,
blacklist=fxblack,
xcat_aggs=["last", "sum"],
)
# Drop rows with missing values and assign features and target
dfw.dropna(inplace=True)
X = dfw.iloc[:, :1]
y = dfw.iloc[:, 1]
Types of crossvalidation #
The
ExpandingIncrementPanelSplit()
class of the
macrosynergy
package is designed to generate temporally expanding training panels with fixed intervals, followed by subsequent test sets of typically short, fixed time spans. This setup replicates sequential learning scenarios, where information sets grow at fixed intervals. In this implementation, the training set expands by 12 months at each subsequent split. The initial training set requires a minimum of 36 months for at least 4 currency areas, and each test set has a fixed length of 12 months. This class facilitates the generation of splits for sequential training, sequential validation, and walkforward validation across a given panel. The accompanying plot illustrates five key points in the process:

The initial split

Progress at onequarter completion

Halfway progress

Threequarter progress

The final split
split_xi = msl.ExpandingIncrementPanelSplit(train_intervals=12, min_periods=36, test_size=12)
split_xi.visualise_splits(X, y)
The
ExpandingKFoldPanelSplit
class allows instantiating panel splitters where a fixed number of splits is implemented, but temporally adjacent panel training sets always precede test sets chronologically and where the time span of the training sets increases with the implied date of the traintest split. It is equivalent to
scikitlearn’s
TimeSeriesSplit
but adapted for panels.
split_xkf = msl.ExpandingKFoldPanelSplit(n_splits=5)
split_xkf.visualise_splits(X, y)
The
RollingKFoldPanelSplit
class instantiates splitters where temporally adjacent panel training sets of fixed joint maximum time spans can border the test set from both the past and future. Thus, most folds do not respect the chronological order but allow training with past and future information. While this does not simulate the evolution of information, it makes better use of the available data and is often acceptable for macro data as economic regimes come in cycles. It is equivalent to
scikitlearn
’s
Kfold class
but adapted for panels.
split_rkf = msl.RollingKFoldPanelSplit(n_splits=5)
split_rkf.visualise_splits(X, y)
Feature selection #
The first example of a machine learning application is feature selection, where features identified as “important” are combined, at each recalibration date, to create a signal through averaging them. Thus, the optimal signal used at each recalibration date is an equally weighted mean of the subset recommended by the best model up to that date.
Sequential optimization #
The purpose of the
Pipeline
is to assemble several steps that can be crossvalidated together while setting different parameters. The two principal models and a set of model hyperparameters are defined below:

LASSO_Z
(Least Absolute Shrinkage and Selection Operator) determines the features that have jointly been significant in predicting returns in a linear regression. We consider alpha values of 10, 1, 0.1, and 0.01 for the hyperparameter grid. 
MAP_Z
(Macrosynergy panel test) assesses the significance of features through the Macrosynergy panel test . For the hyperparameter grid, we consider pvalues of 1%, 5%, 10%, and 20%.
Both models use
FeatureAverager()
and
NaivePredictor()
from the
macrosynergy
package.
# Define models and grids for optimization
mods_fsz = {
"LASSO_Z": Pipeline(
[
("selector", msl.LassoSelector(alpha=0.1, positive=True)),
("zscore", msl.FeatureAverager()),
("predictor", msl.NaivePredictor()),
]
),
"MAP_Z": Pipeline(
[
("selector", msl.MapSelector(threshold=0.05, positive=True)),
("zscore", msl.FeatureAverager()),
("predictor", msl.NaivePredictor()),
]
),
}
grids_fsz = {
"LASSO_Z": {
"selector__alpha": [10, 1.0, 1e1, 1e2],
},
"MAP_Z": {
"selector__threshold": [0.01, 0.05, 0.1, 0.2],
},
}
The standard
make_scorer()
function from the
scikitlearn
library is used to create a scorer object that is used to evaluate the performance on the test set. The scorer function is based on
macrosynergy
’s
panel_significance_probability
function. This function is used to create a linear mixed effects model between the true returns and the predicted returns, returning the significance of the model slope. This scorer function is specific for panel quantamental data, such as JPMaQS.
# Define the optimization criterion
score_fsz = make_scorer(msl.panel_significance_probability)
# Define splits for crossvalidation
splitter_fsz = msl.RollingKFoldPanelSplit(n_splits=4)
The actual model is then selected using
SignalOptimizer
class from the
macrosynergy
package. This class calculates quantamental predictions based on adaptive
hyperparameters (alpha and pvalues as defined above) and model selection (
LASSO_Z
and
MAP_Z
). This customized class is based on
scikitlearn
functions
GridSearchCV
and
RandomizedSearchCV
.
The “heatmap” displays the actual selected model, which changes frequently in the first ten years, but the settled on LASSO selector with a low penalty (0.01) and a panel test selector with a restrictive pvalue threshold of 1%:
%%time
# Signal optimization
so_fsz = msl.SignalOptimizer(inner_splitter=splitter_fsz, X=X, y=y, blacklist=fxblack)
so_fsz.calculate_predictions(
name="MACRO_OPTSELZ",
models=mods_fsz,
hparam_grid=grids_fsz,
metric=score_fsz,
min_cids=4,
min_periods=36,
)
# Get optimized signals and view models heatmap
dfa = so_fsz.get_optimized_signals()
som = so_fsz.models_heatmap(
name="MACRO_OPTSELZ",
cap=6,
title="Optimal selection models over time",
figsize=(18, 6),
)
display(som)
dfx = msm.update_df(dfx, dfa)
100%████████████████████████████████████████████████████████████████████████████████ 253/253 [05:30<00:00, 1.31s/it]
None
Wall time: 7min 6s
The function
view_timelines()
conveniently displays the original, nonoptimized composite signal
MACRO_AVGZ
and the trading signal based on optimized feature selection
MACRO_OPTSELZ
, which shares a good part of the dynamics, but displays more abrupt changes and higher volatility due to frequent model changes.
xcatx = ["MACRO_AVGZ", "MACRO_OPTSELZ"]
msp.view_timelines(
dfx,
xcats=xcatx,
cids=cids_dux,
ncol=4,
start="20040101",
title="Composite signal scores: simple (blue) and with sequentially optimized selection (orange)",
title_fontsize=30,
same_y=False,
cs_mean=False,
xcat_labels=["Simple average score", "Average score with optimized selection"],
legend_fontsize=16,
)
Value checks #
This part tests accuracy and significance levels for the nonoptimized composite signal
MACRO_AVGZ
and the trading signal based on optimized feature selection
MACRO_OPTSELZ
according to the
common metrics
of accuracy, precision and probability values.
The
SignalReturnRelations
class from the macrosynergy.signal module is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals (
MACRO_AVGZ
and
MACRO_OPTSELZ
) and panels of subsequent returns (
DU05YXR_VT10
).
## Compare optimized signals with simple average zscores
srr = mss.SignalReturnRelations(
df=dfx,
rets=["DU05YXR_VT10"],
sigs=["MACRO_AVGZ", "MACRO_OPTSELZ"],
cosp=True,
freqs=["M"],
agg_sigs=["last"],
start="20040101",
blacklist=fxblack,
slip=1,
)
tbl_srr = srr.signals_table()
The interpretations of the columns of the summary table can be found here
display(tbl_srr.astype("float").round(3))
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

MACRO_AVGZ  0.537  0.535  0.536  0.536  0.568  0.501  0.100  0.0  0.062  0.0  0.535 
MACRO_OPTSELZ  0.548  0.550  0.480  0.536  0.588  0.512  0.093  0.0  0.074  0.0  0.550 
NaivePnl()
class is designed to provide a quick and simple overview of a stylized PnL profile of a set of trading signals. The class is labeled naive because its methods do not consider transaction costs or position limitations, such as risk management considerations. This is deliberate because costs and limitations are specific to trading size, institutional rules, and regulations.
Here, the comparison is made between PnLs based on two signals,
MACRO_AVGZ
and
MACRO_OPTSELZ
. Here are the main options chosen for the calculation:

The target is
DU05YXR_VT10
, 
the rebalancing frequency (
rebal_freq
) for positions according to signal is chosen monthly, 
rebalancing slippage (
rebal_slip
) in days is 1, which means that it takes one day to rebalance the position and that the new position produces PnL from the second day after the signal has been recorded, 
zn_score_pan
option transforms raw signals into zscores around zero value based on the whole panel. The neutral level & standard deviation will use the crosssection of panels. Znscore here means standardized score with zero being the neutral level and standardization through division by mean absolute value, 
threshold value (
thresh
) beyond which scores are winsorized, i.e., contained at that threshold. This is often realistic, as risk management and the potential of signal value distortions typically preclude outsized and concentrated positions within a strategy. We apply a threshold of 3.
plot_pnls()
method of the
NaivePnl()
class is used to plot a line chart of cumulative PnL associated with both signals
sigs = ["MACRO_AVGZ", "MACRO_OPTSELZ"]
pnl = msn.NaivePnL(
df=dfx,
ret="DU05YXR_VT10",
sigs=sigs,
cids=cids,
start="20040101",
blacklist=fxblack,
bms="USD_DU05YXR_NSA",
)
for sig in sigs:
pnl.make_pnl(
sig=sig,
sig_op="zn_score_pan",
rebal_freq="monthly",
neutral="zero",
rebal_slip=1,
vol_scale=10,
thresh=3,
)
pnl.plot_pnls(
title="Naive PnLs for average scores, simple and optimized selection",
title_fontsize=14,
xcat_labels=["Simple average score", "Average score with optimized selection"],
)
pcats = ["PNL_" + sig for sig in sigs]
USD_DU05YXR_NSA has no observations in the DataFrame.
The method
evaluate_pnls()
returns a small dataframe of key PnL statistics for the tested strategies
pnl.evaluate_pnls(pnl_cats=pcats)
xcat  PNL_MACRO_AVGZ  PNL_MACRO_OPTSELZ 

Return (pct ar)  10.66697  11.325036 
St. Dev. (pct ar)  10.0  10.0 
Sharpe Ratio  1.066697  1.132504 
Sortino Ratio  1.680606  1.792596 
Max 21day draw  22.0995  17.918406 
Max 6month draw  30.996404  23.966588 
Traded Months  243  243 
Prediction #
Sequential optimization #
The second application of statistical learning methods involves choosing sequentially an optimal prediction method of monthly target returns and then applying it as a signal. Thus, at the end of each month, the method uses the optimized hyperparameters and parameters, which are then used to derive a signal for the next month.
Two possible models are defined here:

Linear regression model with and without intercept. By specifying
positive=True
, we force the coefficients to be positive 
k nearest neighbors regression . This is a nonparametric regression that can be implemented with scikitlearn’s KNeighborsRegressor. Here, we specify the possible set for the number of neighbors as [4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096], and the weights function used in prediction. The choices here are:

‘uniform’: uniform weights, where all points in each neighborhood are weighted equally.

‘distance’: weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

# Define models and grids for optimization
mods_reg = {"knnr": KNeighborsRegressor(), "linreg": LinearRegression(positive=True)}
grids_reg = {
"knnr": {
"n_neighbors": [2**i for i in range(2, 13)],
"weights": ["uniform", "distance"],
},
"linreg": {"fit_intercept": [True, False]},
}
The standard
make_scorer()
function from the
scikitlearn
library is used to create a scorer object that can be used to evaluate the performance on the test set. The scorer function is based on the Rsquared (coefficient of determination) regression score function, and the chosen crossvalidation split here is the
RollingKFoldPanelSplit
.
# Define the optimization criterion
score_reg = make_scorer(r2_score, greater_is_better=True)
# Define splits for crossvalidation
splitter_reg = msl.RollingKFoldPanelSplit(n_splits=5)
As for feature selection, the sequential model selection and optimized predictions can be executed by the
SignalOptimizer
class from the
macrosynergy
package. And the actual choice of a particular model is displayed with the help of “heatmap”. We note greater stability of the model than with feature selection. Interestingly, the simpler linear regression models are prefered to nearest neighbor models over the trading history.
%%time
# Signal optimization
so_reg = msl.SignalOptimizer(inner_splitter=splitter_reg, X=X, y=y, blacklist=fxblack)
tdf = so_reg.calculate_predictions(
name="MACRO_OPTREG",
models=mods_reg,
hparam_grid=grids_reg,
metric=score_reg,
min_cids=4,
min_periods=36,
)
# Get optimized signals and view models heatmap
dfa = so_reg.get_optimized_signals()
som = so_reg.models_heatmap(name="MACRO_OPTREG", cap=6,
title="Optimal regression model used over time", figsize=(18, 6))
display(som)
dfx = msm.update_df(dfx, dfa)
100%████████████████████████████████████████████████████████████████████████████████ 253/253 [01:03<00:00, 3.96it/s]
None
Wall time: 1min 40s
The function
view_timelines()
conveniently displays the original, nonoptimized composite signal
MACRO_AVGZ
together with the optimized regressionbased predictor
MACRO_OPTREG
. The latter displays a long bias compared with the nonoptimized signal.
xcatx = ["MACRO_AVGZ", "MACRO_OPTREG"]
msp.view_timelines(
dfx,
xcats=xcatx,
cids=cids_dux,
ncol=4,
start="20040101",
title="Composite signal score (blue) and sequentially optimized regressionbased forecast (orange)",
title_fontsize=30,
same_y=False,
cs_mean=False,
xcat_labels=["Simple average score", "Sequentially optimized forecasts"],
legend_fontsize=16,
)
Value checks #
This part again tests the accuracy and significance levels for the
MACRO_AVGZ
and the optimized regressionbased predictor
MACRO_OPTREG
according to the
common metrics
of accuracy, precision and probability values.
The
SignalReturnRelations
class from the macrosynergy.signal module is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals (
MACRO_AVGZ
and
MACRO_OPTREG
) and panels of subsequent returns (
DU05YXR_VT10
).
## Compare optimized signals with simple average zscores
srr = mss.SignalReturnRelations(
df=dfx,
rets=["DU05YXR_VT10"],
sigs=["MACRO_AVGZ", "MACRO_OPTREG"],
cosp=True,
freqs=["M"],
agg_sigs=["last"],
start="20040101",
blacklist=fxblack,
slip=1,
)
tbl_srr = srr.signals_table()
The interpretations of the columns of the summary table can be found here
display(tbl_srr.astype("float").round(3))
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

MACRO_AVGZ  0.537  0.535  0.536  0.536  0.568  0.501  0.100  0.0  0.062  0.0  0.535 
MACRO_OPTREG  0.546  0.538  0.728  0.536  0.557  0.518  0.084  0.0  0.056  0.0  0.530 
As with feature selection, we use
NaivePnl()
class to provide a quick and simple overview of a stylized PnL profile of a set of trading signals.
Here, the comparison is made between PnLs based on two signals,
MACRO_AVGZ
and
MACRO_OPTREG
, and we choose the same parameters as above:

The target is
DU05YXR_VT10
, 
the rebalancing frequency (
rebal_freq
) for positions according to signal is chosen monthly, 
rebalancing slippage (
rebal_slip
) in days is 1 
The
zn_score_pan
option transforms raw signals into zscores around zero value based on the whole panel. 
threshold value (
thresh
) is 3.
plot_pnls()
method of the
NaivePnl()
class is used to plot a line chart of cumulative PnL associated with both signals.
sigs = ["MACRO_AVGZ", "MACRO_OPTREG"]
pnl = msn.NaivePnL(
df=dfx,
ret="DU05YXR_VT10",
sigs=sigs,
cids=cids,
start="20040101",
blacklist=fxblack,
bms="USD_DU05YXR_NSA",
)
for sig in sigs:
pnl.make_pnl(
sig=sig,
sig_op="zn_score_pan",
rebal_freq="monthly",
neutral="zero",
rebal_slip=1,
vol_scale=10,
thresh=3,
)
pnl.plot_pnls(
title="Naive PnLs for average scores and optimized regression forecasts",
title_fontsize=14,
xcat_labels=["Simple average score", "Optimized regression forecasts"],
)
pcats = ["PNL_" + sig for sig in sigs]
USD_DU05YXR_NSA has no observations in the DataFrame.
The method
evaluate_pnls()
returns a small dataframe of key PnL statistics for the tested strategies
pnl.evaluate_pnls(pnl_cats=pcats)
xcat  PNL_MACRO_AVGZ  PNL_MACRO_OPTREG 

Return (pct ar)  10.66697  10.119795 
St. Dev. (pct ar)  10.0  10.0 
Sharpe Ratio  1.066697  1.011979 
Sortino Ratio  1.680606  1.488246 
Max 21day draw  22.0995  16.507029 
Max 6month draw  30.996404  24.55008 
Traded Months  243  243 
Classification #
The third statistical learning application is the classification of rates market environment into “good” or “bad” for subsequent monthly IRS receiver returns. For that, the dummy variable
MACRO_AVGZ_SIGN
is added to the dataframe. It receives the value of
+1
if
MACRO_AVGZ
is positive and
1
otherwise.
Sequential optimization #
# Calculate categorical series
ys = np.sign(y)
calcs = [
"MACRO_AVGZ_SIGN = np.sign( MACRO_AVGZ )"
]
dfa = msp.panel_calculator(dfx, calcs=calcs, cids=cids)
dfx = msm.update_df(dfx, dfa)
Two models are defined:

knncls
is a Knearest neighbors classifier with
number of neighbours from the set [4, 16, 64, 256, 1024, 4096]

the choice of

‘uniform’: uniform weights, where all points in each neighborhood are weighted equally.

‘distance’: weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away



logreg
is a logistic regression with and without intercept.
# Define models and grids for optimization
mods_cls = {"knncls" : KNeighborsClassifier(), "logreg": LogisticRegression()}
grids_cls = {
"knncls": {
"n_neighbors": [2**i for i in range(2, 13, 2)],
"weights": ["uniform", "distance"],
},
"logreg": {"fit_intercept": [True, False]},
}
The standard
make_scorer()
function from the
scikitlearn
library is used to create a scorer object that can be used to evaluate the performance on the test set. The scorer function
balanced_accuracy_score
, i.e., the average of recall obtained on each class. This function is more suitable for imbalanced datasets than simple
accuracy_score
.
# Define the optimization criterion
score_cls = make_scorer(balanced_accuracy_score)
# Define splits for crossvalidation
splitter_cls = msl.RollingKFoldPanelSplit(n_splits=5)
The actual model is then selected using
SignalOptimizer
class from the
macrosynergy
package. This class calculates the new optimized indicator
MACRO_OPTCLASS
according to the two specified models with respective parameters, and then the actual selection at each point is displayed using the “heatmap”. The heatmap reveals a strong preference for classification through logistic regression without intercept, the most restrictive model on the menu.
%%time
# Signal optimization
so_cls = msl.SignalOptimizer(inner_splitter=splitter_reg, X=X, y=ys, blacklist=fxblack)
tdf = so_cls.calculate_predictions(
name="MACRO_OPTCLASS",
models=mods_cls,
hparam_grid=grids_cls,
metric=score_cls,
min_cids=4,
min_periods=36,
)
# Get optimized signals and view models heatmap
dfa = so_cls.get_optimized_signals()
som = so_cls.models_heatmap(name="MACRO_OPTCLASS", cap=6,
title="Optimal classifier used over time", figsize=(18, 6))
display(som)
dfx = msm.update_df(dfx, dfa)
100%████████████████████████████████████████████████████████████████████████████████ 253/253 [00:42<00:00, 5.94it/s]
None
Wall time: 1min 5s
The function
view_timelines()
conveniently displays the original, nonoptimized composite signal
MACRO_AVGZ
and the Optimized classifier
MACRO_OPTCLASS
.
xcatx = ["MACRO_AVGZ", "MACRO_OPTCLASS"]
msp.view_timelines(
dfx,
xcats=xcatx,
cids=cids_dux,
ncol=4,
start="20040101",
title="Composite signal score (blue) and sequentially optimized classification (orange)",
title_fontsize=30,
same_y=False,
cs_mean=False,
xcat_labels=["Simple average score", "Sequentially optimized classification"],
legend_fontsize=16,
)
Value checks #
This part tests accuracy and significance levels for the nonoptimized composite signal
MACRO_AVGZ
and the trading signal based on optimized feature selection
MACRO_OPTSELZ
according to the
standard metrics
of accuracy, precision and probability values. .
The
SignalReturnRelations
class from the macrosynergy.signal module is specifically designed to analyze, visualize, and compare the relationships between panels of trading signals and panels of subsequent returns.
## Compare optimized signals with simple average zscores
srr = mss.SignalReturnRelations(
df=dfx,
rets=["DU05YXR_VT10"],
sigs=["MACRO_AVGZ", "MACRO_OPTCLASS"],
cosp=True,
freqs=["M"],
agg_sigs=["last"],
start="20040101",
blacklist=fxblack,
slip=1,
)
tbl_srr = srr.signals_table()
The interpretations of the columns of the summary table can be found here
display(tbl_srr.astype("float").round(3))
accuracy  bal_accuracy  pos_sigr  pos_retr  pos_prec  neg_prec  pearson  pearson_pval  kendall  kendall_pval  auc  

MACRO_AVGZ  0.537  0.535  0.536  0.536  0.568  0.501  0.100  0.0  0.062  0.0  0.535 
MACRO_OPTCLASS  0.535  0.534  0.524  0.536  0.568  0.499  0.083  0.0  0.063  0.0  0.534 
As with feature selection and prediction, we use
NaivePnl()
class to provide a quick and simple overview of a stylized PnL profile of a set of trading signals.
Here, the comparison is made between PnLs based on two signals,
MACRO_AVGZ
and
MACRO_OPTCLASS
, and we choose the same parameters as above:

The target is
DU05YXR_VT10
, 
the rebalancing frequency (
rebal_freq
) for positions according to signal is chosen monthly, 
rebalancing slippage (
rebal_slip
) in days is 1 
zn_score_pan
option, which transforms raw signals into zscores around zero value based on the whole panel. 
threshold value (
thresh
) is 3.
plot_pnls()
method of the
NaivePnl()
class is used to plot a line chart of cumulative PnL associated with both signals.
sigs = ["MACRO_AVGZ", "MACRO_OPTCLASS"]
pnl = msn.NaivePnL(
df=dfx,
ret="DU05YXR_VT10",
sigs=sigs,
cids=cids,
start="20040101",
blacklist=fxblack,
)
for sig in sigs:
pnl.make_pnl(
sig=sig,
sig_op="binary",
rebal_freq="monthly",
neutral="zero",
rebal_slip=1,
vol_scale=10,
thresh=3,
)
pnl.plot_pnls(
title="Naive PnLs for average score signs and optimized classifications",
title_fontsize=14,
xcat_labels=["Simple average score signs", "Optimized classifications"],
)
pcats = ["PNL_" + sig for sig in sigs]
The method
evaluate_pnls()
returns a small dataframe of key PnL statistics for the tested strategies
pnl.evaluate_pnls(pnl_cats=pcats)
xcat  PNL_MACRO_AVGZ  PNL_MACRO_OPTCLASS 

Return (pct ar)  11.091483  11.282659 
St. Dev. (pct ar)  10.0  10.0 
Sharpe Ratio  1.109148  1.128266 
Sortino Ratio  1.638611  1.726211 
Max 21day draw  22.580557  12.867051 
Max 6month draw  35.931307  21.934651 
Traded Months  243  243 
The model selection heatmap in this section indicates that starting from approximately 2009, the algorithm consistently opts for logistic regression without an intercept. When running profit and loss (PnL) analyses from 2009, the optimized classifier yields even more noteworthy outcomes:
sigs = ["MACRO_AVGZ", "MACRO_OPTCLASS"]
pnl = msn.NaivePnL(
df=dfx,
ret="DU05YXR_VT10",
sigs=sigs,
cids=cids,
start="20090101",
blacklist=fxblack,
bms="USD_DU05YXR_NSA",
)
for sig in sigs:
pnl.make_pnl(
sig=sig,
sig_op="binary",
rebal_freq="monthly",
neutral="zero",
rebal_slip=1,
vol_scale=10,
thresh=3,
)
pnl.plot_pnls(
title="Naive PnLs for average score signs and optimized classifications, post 2008",
title_fontsize=14,
xcat_labels=["Simple average score signs", "Optimized classifications"],
)
pcats = ["PNL_" + sig for sig in sigs]
USD_DU05YXR_NSA has no observations in the DataFrame.
pnl.evaluate_pnls(pnl_cats=pcats)
xcat  PNL_MACRO_AVGZ  PNL_MACRO_OPTCLASS 

Return (pct ar)  11.597187  14.063498 
St. Dev. (pct ar)  10.0  10.0 
Sharpe Ratio  1.159719  1.40635 
Sortino Ratio  1.690623  2.170506 
Max 21day draw  21.14724  12.134215 
Max 6month draw  33.650542  20.685375 
Traded Months  183  183 