Check out JPMaQS #

This introductory notebook serves as a guide for utilizing macro-quantamental datasets from the J.P. Morgan Macrosynergy Quantamental System (JPMaQS). It provides an overview of the dataset’s conventions and structure and offers practical examples of data analysis, including data manipulation techniques such as data slicing, visualization, frequency conversions, and the application of lagging and differencing indicators.

The contents of this notebook are organized into the following main sections:

  • Package Setup and JPMaQS Data Retrieval: In this section, we handle the installation and importing of essential Python packages required for our analysis. This includes importing the macrosynergy package for downloading required dataset from JPMaQS.

  • Exploring JPMaQS Time Series Data: This part provides examples of how to create a data panel, extract and analyze specific categories or cross-sections of data, and select specific time intervals for analysis.

  • Applying Mathematical Operations: Here, we demonstrate a straightforward example of creating an additional data category through mathematical operations or transformations.

  • Time Series Transformation Made Easy: This section focuses on converting time series data from its standard daily frequency to other intervals such as weekly, monthly, quarterly, or annually. We also explore how to lag and difference variables to prepare them for further analysis.

  • Window Operations: The final part of the notebook introduces rolling and expanding window functions, which are essential for calculating moving averages, cumulative sums, and other time-dependent metrics.

Get packages and JPMaQS data #

# Uncomment below if running on Kaggle
"""
%%capture
! pip install macrosynergy --upgrade"""
import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

from macrosynergy.download import JPMaQSDownload

import os
import warnings

warnings.simplefilter("ignore")

The JPMaQS indicators we consider are downloaded using the J.P. Morgan Dataquery API interface within the macrosynergy package. This is done by specifying ticker strings, formed by appending an indicator category code to a currency area code <cross_section>. These constitute the main part of a full quantamental indicator ticker, taking the form DB(JPMAQS,<cross_section>_<category>,<info>) , where denotes the time series of information for the given cross-section and category. The following types of information are available:

value giving the latest available values for the indicator eop_lag referring to days elapsed since the end of the observation period mop_lag referring to the number of days elapsed since the mean observation period grade denoting a grade of the observation, giving a metric of real time information quality.

After instantiating the JPMaQSDownload class within the macrosynergy.download module, one can use the download(tickers,start_date,metrics) method to easily download the necessary data, where tickers is an array of ticker strings, start_date is the first collection date to be considered and metrics is an array comprising the times series information to be downloaded. For more information see here or use the free dataset on Kaggle

To ensure reproducibility, only samples between January 2000 (inclusive) and May 2023 (exclusive) are considered.

# Lists of cross-sections (countries with IRS markets and appropriate data)

cids_dm = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK", "USD"]
cids_em = [
    "CLP",
    "COP",
    "CZK",
    "HUF",
    "IDR",
    "ILS",
    "INR",
    "KRW",
    "MXN",
    "PLN",
    "THB",
    "TRY",
    "TWD",
    "ZAR",
]
cids = cids_dm + cids_em

# Lists of main quantamental and return categories

main = [
    "CPIC_SA_P1M1ML12",
    "CPIC_SJA_P3M3ML3AR",
    "CPIC_SJA_P6M6ML6AR",
    "CPIH_SA_P1M1ML12",
    "CPIH_SJA_P3M3ML3AR",
    "CPIH_SJA_P6M6ML6AR",
    "INFTEFF_NSA",
    "INTRGDP_NSA_P1M1ML12_3MMA",
    "INTRGDPv5Y_NSA_P1M1ML12_3MMA",
    "PCREDITGDP_SJA_D1M1ML12",
    "RGDP_SA_P1Q1QL4_20QMA",
    "RYLDIRS02Y_NSA",
    "RYLDIRS05Y_NSA",
    "PCREDITBN_SJA_P1M1ML12",
]
rets = [
    "DU02YXR_NSA",
    "DU05YXR_NSA",
    "DU02YXR_VT10",
    "DU05YXR_VT10",
    "EQXR_NSA",
    "EQXR_VT10",
    "FXXR_NSA",
    "FXXR_VT10",
]

xcats = main + rets

The description of each JPMaQS category is available either under Macro Quantamental Academy , JPMorgan Markets (password protected), or on Kaggle (just for the tickers used in this notebook). In particular, the set used for this notebook is using Consumer price inflation trends , Inflation targets , Intuitive growth estimates , Domestic credit ratios , Long-term GDP growth , Real interest rates , Private credit expansion , Duration returns , Equity index future returns , and FX forward returns .

# Download series from J.P. Morgan DataQuery by tickers

start_date = "2000-01-01"
end_date = "2023-05-01"

tickers = [cid + "_" + xcat for cid in cids for xcat in xcats]
print(f"Maximum number of tickers is {len(tickers)}")

# Retrieve credentials

client_id: str = os.getenv("DQ_CLIENT_ID")
client_secret: str = os.getenv("DQ_CLIENT_SECRET")

with JPMaQSDownload(client_id=client_id, client_secret=client_secret) as dq:
    df = dq.download(
        tickers=tickers,
        start_date="2000-01-01",
        end_date="2023-05-01",
        suppress_warning=True,
        metrics=["value"],
        show_progress=True,
    )
Maximum number of tickers is 528
Downloading data from JPMaQS.
Timestamp UTC:  2023-09-19 11:11:59
Connection successful!
Number of expressions requested: 528
Requesting data: 100%|██████████| 27/27 [00:08<00:00,  3.30it/s]
Downloading data: 100%|██████████| 27/27 [00:14<00:00,  1.89it/s]
#  uncomment if running on Kaggle
"""for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
                                                   
df = pd.read_csv('../input/fixed-income-returns-and-macro-trends/JPMaQS_Quantamental_Indicators.csv', index_col=0, parse_dates=['real_date'])"""
# to explore the data, we can use the following simple code:

display(df["xcat"].unique())
display(df["cid"].unique())
df["ticker"] = df["cid"] + "_" + df["xcat"]
array(['CPIC_SA_P1M1ML12', 'CPIC_SJA_P3M3ML3AR', 'CPIC_SJA_P6M6ML6AR',
       'CPIH_SA_P1M1ML12', 'CPIH_SJA_P3M3ML3AR', 'CPIH_SJA_P6M6ML6AR',
       'FXXR_NSA', 'FXXR_VT10', 'INFTEFF_NSA',
       'INTRGDP_NSA_P1M1ML12_3MMA', 'INTRGDPv5Y_NSA_P1M1ML12_3MMA',
       'PCREDITBN_SJA_P1M1ML12', 'PCREDITGDP_SJA_D1M1ML12',
       'RGDP_SA_P1Q1QL4_20QMA', 'RYLDIRS02Y_NSA', 'RYLDIRS05Y_NSA',
       'DU02YXR_NSA', 'DU02YXR_VT10', 'DU05YXR_NSA', 'DU05YXR_VT10',
       'EQXR_NSA', 'EQXR_VT10'], dtype=object)
array(['AUD', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EUR', 'GBP', 'HUF',
       'IDR', 'ILS', 'INR', 'JPY', 'KRW', 'MXN', 'NOK', 'NZD', 'PLN',
       'SEK', 'THB', 'TRY', 'TWD', 'USD', 'ZAR'], dtype=object)
df.head(3)
real_date cid xcat value ticker
0 2000-01-03 AUD CPIC_SA_P1M1ML12 1.244168 AUD_CPIC_SA_P1M1ML12
1 2000-01-03 AUD CPIC_SJA_P3M3ML3AR 3.006383 AUD_CPIC_SJA_P3M3ML3AR
2 2000-01-03 AUD CPIC_SJA_P6M6ML6AR 1.428580 AUD_CPIC_SJA_P6M6ML6AR

JPMaQS time series dataframes #

Make a panel #

Many data analyses, as well as charting with seaborn and matplotlib can be done directly on the long dataframe. However, it is also easy to extract one or more categories from the standard long JPMaQS dataframe into a standard time series panel dataframe by using the .pivot_table() method, as shown below. This format is helpful for certain mathematical operations on categories and for passing them on to other packages, such as scikit-learn .

# Extract and pivot a single category panel (core inflation) for three chosen currencies

dfx = df[
    (df["xcat"].isin(["CPIC_SJA_P6M6ML6AR", "INFTEFF_NSA"])) & (df["cid"] == "USD")
]

# Pivot the selected dataframe into a "wide" time series format giving every ticker a column

dfw = dfx.pivot_table(index="real_date", columns="ticker", values="value").replace(
    0, np.nan
)

# Check out the panel

sns.set(rc={"figure.figsize": (15, 5)})
dfw.plot(
    title="Adjusted latest core consumer price trend: % 6m/6m and Inflation Target"
)
plt.show()
https://macrosynergy.com/notebooks.build/introductions/check-out-jpmaqs/_images/7d9facd7b4583af5198fbd0be8bce5759af120b6025b15c4036903065e620039.png

Slice a panel #

Time series panels are easy to slice by column (with strings and regex) and by row (using intuitive Datetime conventions).

# Extract two categories and pivot to a time series dataframe (core CPI trend and estimated GDP trend)

xcats_sel = ["CPIC_SJA_P6M6ML6AR", "INTRGDPv5Y_NSA_P1M1ML12_3MMA"]
cids_sel = cids_dm[2:6]
dfx = df.loc[df["xcat"].isin(xcats_sel) & df["cid"].isin(cids_sel), :]
dfw = dfx.pivot_table(index="real_date", columns="ticker", values="value").replace(
    0, np.nan
)  # pivot by ticker

# Replace technical names with something you understand

nicknames = {
    "CPIC_SJA_P6M6ML6AR": "CORE_CPI_TREND",
    "INTRGDPv5Y_NSA_P1M1ML12_3MMA": "GDP_TREND",
}

colnames = list(dfw.columns)
for key, value in nicknames.items():
    colnames = [col.replace(key, value) for col in colnames]
dfw.columns = colnames


# Check out double category panel

dfw.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6086 entries, 2000-01-03 to 2023-05-01
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   CHF_CORE_CPI_TREND  5221 non-null   float64
 1   CHF_GDP_TREND       6086 non-null   float64
 2   EUR_CORE_CPI_TREND  6069 non-null   float64
 3   EUR_GDP_TREND       5804 non-null   float64
 4   GBP_CORE_CPI_TREND  6086 non-null   float64
 5   GBP_GDP_TREND       6086 non-null   float64
 6   JPY_CORE_CPI_TREND  4678 non-null   float64
 7   JPY_GDP_TREND       6086 non-null   float64
dtypes: float64(8)
memory usage: 427.9 KB

Extract and analyze all category series for one cross-section (currency) #

display(dfw.aggregate(["mean", "std"]))
CHF_CORE_CPI_TREND CHF_GDP_TREND EUR_CORE_CPI_TREND EUR_GDP_TREND GBP_CORE_CPI_TREND GBP_GDP_TREND JPY_CORE_CPI_TREND JPY_GDP_TREND
mean 0.360978 -0.238452 1.463912 -0.461945 2.003847 -0.623765 0.363984 -0.004743
std 0.762042 1.662564 0.863351 2.605391 1.273862 3.343750 1.151326 2.283116

Extract and analyze all cross-sectional series for one category #

cols = dfw.columns.str.contains(
    "_GDP_TREND$"
)  # $ means that the preceding string must be at the end of the ticker
dfwx = dfw.loc[:, cols].dropna()
display(dfwx.aggregate(["median", "max", "min"]))
CHF_GDP_TREND EUR_GDP_TREND GBP_GDP_TREND JPY_GDP_TREND
median -0.255295 0.089108 -0.253841 0.357414
max 5.680445 10.562771 17.154058 6.302844
min -8.404751 -12.695249 -22.994717 -10.268773

Select an interval #

display(dfw["2020-3":"2021-3"].agg(["mean", "max", "min"]))
CHF_CORE_CPI_TREND CHF_GDP_TREND EUR_CORE_CPI_TREND EUR_GDP_TREND GBP_CORE_CPI_TREND GBP_GDP_TREND JPY_CORE_CPI_TREND JPY_GDP_TREND
mean -0.052870 -2.791175 0.897769 -6.480288 2.000190 -10.043104 -0.104146 -4.085845
max 0.451175 -0.188698 1.410560 -0.897233 3.158075 -0.655196 0.834353 -0.529565
min -0.824449 -8.404751 -0.201780 -12.695249 1.299452 -22.994717 -1.140706 -8.702678

Apply mathematical operations #

Times series dataframes with equal column names and overlapping time indexes can be arguments of standard mathematical operations. Time series panels can be simply added, subtracted, divided, and applied to other operations.
Important: Since JPMaQS series are all dated as information states they can be added, subtracted, multiplied, and so forth not just among each other but also with market data to give more elaborate trading signals.

# Create an Inflation trend dataframe for selected currencies and pivot it to the wide format by currency identifier

dfx_inf = df.loc[
    (df["xcat"] == "CPIC_SJA_P6M6ML6AR") & df["cid"].isin(["EUR", "USD", "GBP", "JPY"]),
    :,
]
dfx_inf = dfx_inf[
    dfx_inf["real_date"] >= pd.to_datetime("2020-01-01")
]  # set start date
dfw_inf = dfx_inf.pivot_table(index="real_date", columns="cid", values="value").replace(
    0, np.nan
)

# Create GDP trend dataframe for selected currencies and pivot it to a wide format by currency identifier

dfx_gdp = df.loc[
    (df["xcat"] == "INTRGDPv5Y_NSA_P1M1ML12_3MMA")
    & df["cid"].isin(["EUR", "USD", "GBP", "JPY"]),
    :,
]
dfx_gdp = dfx_gdp[
    dfx_gdp["real_date"] >= pd.to_datetime("2020-01-01")
]  # set start date
dfw_gdp = dfx_gdp.pivot_table(index="real_date", columns="cid", values="value").replace(
    0, np.nan
)

# Add Inflation and GDP trend values by currency

dfw_ngdp = dfw_gdp + dfw_inf

# Add Inflation and GDP trend values by currency
sns.set(rc={"figure.figsize": (15, 5)})
dfw_ngdp.plot(title="Sum of estimated growth and core inflation trends")
plt.show()
https://macrosynergy.com/notebooks.build/introductions/check-out-jpmaqs/_images/4bdd7f39255c87cb4a1fbb34603b92416ea076d35d6021bfcbda81f328db0fdf.png

Convert panels back to long format #

Time series panels can be simply converted back to the long format by using the .unstack() method and added to the original dataframe, for subsequent processing.

dfl = dfw_ngdp.unstack().reset_index().rename(columns={0: "value"})
dfl["xcat"] = "NGDP_TREND"
dfl["ticker"] = dfl["cid"] + "_" + dfl["xcat"]
dfd_new = pd.concat([df, dfl], axis=0, ignore_index=True)
dfd_new.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2889584 entries, 0 to 2889583
Data columns (total 5 columns):
 #   Column     Dtype         
---  ------     -----         
 0   real_date  datetime64[ns]
 1   cid        object        
 2   xcat       object        
 3   value      float64       
 4   ticker     object        
dtypes: datetime64[ns](1), float64(1), object(3)
memory usage: 110.2+ MB

Easy time series transformations #

Frequency conversion #

The most common frequency conversion of quantamental panels is downsampling, i.e. reducing the frequency of the daily data to weekly, monthly, or quarterly by some aggregation method, such as mean or last . The most flexible method for that purpose is .resample() , which can be combined with .groupby() to operate on specific tickers.

cids_sel = ["EUR", "GBP"]
xcats_sel = [
    "INTRGDPv5Y_NSA_P1M1ML12_3MMA",
    "INTRGDPv10Y_NSA_P1M1ML12_3MMA",
    "CPIC_SJA_P6M6ML6AR",
]
dfx = df.loc[df["xcat"].isin(xcats_sel) & df["cid"].isin(cids_sel), :]
dfxm = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .last()["value"]
    .reset_index()
)
dfxm.head(3)
cid xcat real_date value
0 EUR CPIC_SJA_P6M6ML6AR 2000-01-31 1.093003
1 EUR CPIC_SJA_P6M6ML6AR 2000-02-29 1.042372
2 EUR CPIC_SJA_P6M6ML6AR 2000-03-31 0.975460

Lagging #

Like all other time series, quantamental panels can be transformed with standard time series methods, such as lags, differences, percent changes, and so forth. Only the interpretation here is different. Lagging here means that the availability of information is delayed. This allows aligning one date’s information status with a subsequent period’s return.

# Slice out a relevant dataframe

cids_sel = ["EUR", "USD"]
xcats_sel = ["CPIC_SJA_P6M6ML6AR", "DU02YXR_NSA"]
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2010-06-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame

filt4 = (
    dfx["xcat"] == xcats_sel[0]
)  # filter for selecting just CPI for subsequent frequency conversion and lagging
filt5 = (
    dfx["xcat"] == xcats_sel[1]
)  # filter for selecting just IRS returns for subsequent frequency conversion and lagging

# Apply appropriate frequency conversion: last value or CPI inflation (predictor) and the sum of subsequent returns

dff1 = (
    dfx[filt4]
    .groupby(["cid", "xcat"])
    .resample("Q", on="real_date")
    .last()["value"]
    .reset_index()
)  # quarterly features based on last value
dff2 = (
    dfx[filt5]
    .groupby(["cid", "xcat"])
    .resample("Q", on="real_date")
    .sum(numeric_only="True")["value"]
    .reset_index()
)  # quarterly features based on mean

# Shift returns forward by one period to prepare for regression analysis

dff2["value"] = dff2.groupby(["xcat", "cid"])[
    "value"
].shift()  # replace by lagged value
dff2.head()
cid xcat real_date value
0 EUR DU02YXR_NSA 2010-06-30 NaN
1 EUR DU02YXR_NSA 2010-09-30 -0.024612
2 EUR DU02YXR_NSA 2010-12-31 -0.146008
3 EUR DU02YXR_NSA 2011-03-31 -0.172114
4 EUR DU02YXR_NSA 2011-06-30 -1.316241

Differencing #

A difference here means the difference in the information state of the market, not in the actual underlying economic concept. It shows how the market has updated its assessment of fundamentals.

# Slice out relevant dataframe

xcats_sel = ["CPIC_SJA_P6M6ML6AR", "INTRGDPv10Y_NSA_P1M1ML12_3MMA"]
cids_sel = ["EUR", "USD"]
dfx = df.loc[df["xcat"].isin(xcats_sel) & df["cid"].isin(cids_sel), :]

dfw = dfx.pivot_table(index="real_date", columns="ticker", values="value").replace(
    0, np.nan
)

# Apply name replacement

nicknames = {
    "CPIC_SJA_P6M6ML6AR": "CORE_CPI_TREND",
    "INTRGDPv10Y_NSA_P1M1ML12_3MMA": "GDP_TREND",
}

colnames = list(dfw.columns)
for key, value in nicknames.items():
    colnames = [col.replace(key, value) for col in colnames]
dfw.columns = colnames

# Apply simple monthly conversion (takes last day's value). JPM's convention uses BM - "Business Month"

dfw = dfw.asfreq("BM")

# Take the first difference
dfw.diff().tail(3)
EUR_CORE_CPI_TREND USD_CORE_CPI_TREND
real_date
2023-02-28 0.404137 0.235954
2023-03-31 0.008200 -0.039917
2023-04-28 0.001464 -0.062272

Window operations #

Rolling windows #

The most common window operations are based on rolling windows, such as rolling (“moving”) averages or standard deviations.

Using rolling averages on quantamental data, which are information states, is equivalent to sticky or sluggish expectations.

# Slice out the panel

dfx_inf = df.loc[(df["xcat"] == "CPIC_SJA_P6M6ML6AR") & df["cid"].isin(cids_em[:5]), :]
dfx_inf = dfx_inf[
    dfx_inf["real_date"] >= pd.to_datetime("2015-01-01")
]  # set start date
dfw_inf = dfx_inf.pivot_table(index="real_date", columns="cid", values="value").replace(
    0, np.nan
)

# Rolling 90 working day averages

sns.set(rc={"figure.figsize": (15, 5)})
dfw_inf90 = dfw_inf.rolling(90).mean()
dfw_inf90.plot(
    title="Consistent core inflation trends, %6m/6m, saar, 90-day moving averages of information states"
)
plt.show()
https://macrosynergy.com/notebooks.build/introductions/check-out-jpmaqs/_images/453213b4ff60e5f0780be7a7e99466dc852a6627dd6fd024c40df0d2dfefed4d.png

Expanding windows #

Expanding window operations are likewise useful. Rather than applying a fixed lookback window, these operations use a lookback that increases with available history.

# Slice out single  category time series

dfx = df.loc[
    (
        df["xcat"].isin(
            ["INTRGDPv5Y_NSA_P1M1ML12_3MMA", "INTRGDPv10Y_NSA_P1M1ML12_3MMA"]
        )
        & df["cid"].isin(["EUR", "USD"])
    )
]
dfx = dfx[dfx["real_date"] >= pd.to_datetime("2010-01-01")]  # set start date
dfw = dfx.pivot_table(index="real_date", columns="cid", values="value").replace(
    0, np.nan
)

dfwm = dfw.resample(
    "M"
).mean()  # monthly average information states on estimated growth

# Z-score growth values estimated without using future information

dfwm_mean = dfwm.expanding(
    min_periods=24
).mean()  # mean based on expanding window with minimum of 2-years required
dfwm_std = dfwm.expanding(
    min_periods=24
).std()  # mean based on expanding window with minimum of 2-years required

sns.set(rc={"figure.figsize": (15, 5)})
dfwm_z = (dfwm - dfwm_mean) / dfwm_std
dfwm_z.plot(title="Time-consistent z-scores of estimated GDP growth")
plt.show()
https://macrosynergy.com/notebooks.build/introductions/check-out-jpmaqs/_images/cfff325e0996dba9709c795c71461778dd7137141edebb4f5a04958a89302381.png