JPMaQS with Seaborn #

In this notebook, we showcase the application of the powerful seaborn visualisation module to build out understanding and gain further insights into both JPMaQS indicators and relationships involving them. Built on top of matplotlib , seaborn offers a convenient interface to create visually appealing and informative statistical graphics.

The notebook covers the following main parts:

  • Get Packages and JPMaQS Data: This section is responsible for installing and importing the necessary Python packages that are used throughout the analysis.

  • Historical distributions of indicators: In this part, the notebook shows how to visualize the empirical distribution of a dataset by displaying the frequency of counts within a specific range. This simple tool helps quickly identify patterns, trends, frequency distribution, central tendency, spread, and outliers. It helps compare distribution for multiple indicators on one plot or side by side and is an important part of the initial analysis.

  • Timelines of indicators: Here, the notebook showcases primarily line plots and line facets. They are beneficial for showing trends, changes over time, or the relationship between two continuous variables. Line facets allow you to create multiple line plots in a grid arrangement, each showing a subset of the data based on a categorical variable.

  • Bivariate relations: this section includes scatterplots, regression plots, linear model plots, joint plots, and pair plots. These plots help understand the relationship between variables and make informed decisions about model selection while gaining insights into the relationship between independent and dependent variables.

  • Color maps represent numerical or categorical data as colors, helping viewers perceive patterns, variations, and relationships within the data.

Get packages and JPMaQS data #

# Uncomment below if running on Kaggle or if you need to install macrosynergy package
"""
%%capture
! pip install macrosynergy --upgrade
"""
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import os

from macrosynergy.download import JPMaQSDownload

import warnings

warnings.simplefilter("ignore")

The JPMaQS indicators we consider are downloaded using the J.P. Morgan Dataquery API interface within the macrosynergy package. This is done by specifying ticker strings, formed by appending an indicator category code to a currency area code <cross_section>. These constitute the main part of a full quantamental indicator ticker, taking the form DB(JPMAQS,<cross_section>_<category>,<info>) , where denotes the time series of information for the given cross-section and category. The following types of information are available:

value giving the latest available values for the indicator eop_lag referring to days elapsed since the end of the observation period mop_lag referring to the number of days elapsed since the mean observation period grade denoting a grade of the observation, giving a metric of real time information quality.

After instantiating the JPMaQSDownload class within the macrosynergy.download module, one can use the download(tickers,start_date,metrics) method to easily download the necessary data, where tickers is an array of ticker strings, start_date is the first collection date to be considered and metrics is an array comprising the times series information to be downloaded. For more information see here or use the free dataset on Kaggle

To ensure reproducibility, only samples between January 2000 (inclusive) and May 2023 (exclusive) are considered.

cids_dm = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK", "USD"]
cids_em = [
    "CLP",
    "COP",
    "CZK",
    "HUF",
    "IDR",
    "ILS",
    "INR",
    "KRW",
    "MXN",
    "PLN",
    "THB",
    "TRY",
    "TWD",
    "ZAR",
]
cids = cids_dm + cids_em
ecos = [
    "CPIC_SA_P1M1ML12",
    "CPIC_SJA_P3M3ML3AR",
    "CPIC_SJA_P6M6ML6AR",
    "CPIH_SA_P1M1ML12",
    "CPIH_SJA_P3M3ML3AR",
    "CPIH_SJA_P6M6ML6AR",
    "INFTEFF_NSA",
    "INTRGDP_NSA_P1M1ML12_3MMA",
    "INTRGDPv5Y_NSA_P1M1ML12_3MMA",
    "PCREDITGDP_SJA_D1M1ML12",
    "RGDP_SA_P1Q1QL4_20QMA",
    "RYLDIRS02Y_NSA",
    "RYLDIRS05Y_NSA",
    "PCREDITBN_SJA_P1M1ML12",
]
mkts = [
    "DU02YXR_NSA",
    "DU05YXR_NSA",
    "DU02YXR_VT10",
    "DU05YXR_VT10",
    "EQXR_NSA",
    "EQXR_VT10",
    "FXXR_NSA",
    "FXXR_VT10",
    "FXCRR_NSA",
    "FXTARGETED_NSA",
    "FXUNTRADABLE_NSA",
]

xcats = ecos + mkts

The description of each JPMaQS category is available either under Macro Quantamental Academy , JPMorgan Markets (password protected), or on Kaggle (just for the tickers used in this notebook). In particular, the set used for this notebook is using Consumer price inflation trends , Inflation targets , Intuitive growth estimates , Domestic credit ratios , Long-term GDP growth , Real interest rates , Private credit expansion , Duration returns , Equity index future returns , FX forward returns , FX forward carry , and FX tradeability and flexibility

# Download series from J.P. Morgan DataQuery by tickers

start_date = "2000-01-01"
end_date = "2023-05-01"

tickers = [cid + "_" + xcat for cid in cids for xcat in xcats]
print(f"Maximum number of tickers is {len(tickers)}")

# Retrieve credentials

client_id: str = os.getenv("DQ_CLIENT_ID")
client_secret: str = os.getenv("DQ_CLIENT_SECRET")

with JPMaQSDownload(client_id=client_id, client_secret=client_secret) as dq:
    df = dq.download(
        tickers=tickers,
        start_date=start_date,
        end_date=end_date,
        suppress_warning=True,
        metrics=["value"],
        report_time_taken=True,
        show_progress=True,
    )
Maximum number of tickers is 600
Downloading data from JPMaQS.
Timestamp UTC:  2023-09-18 09:49:35
Connection successful!
Number of expressions requested: 600
Requesting data: 100%|██████████| 30/30 [00:09<00:00,  3.29it/s]
Downloading data: 100%|██████████| 30/30 [01:00<00:00,  2.00s/it]
Time taken to download data: 	70.45 seconds.
Time taken to convert to dataframe: 	8.64 seconds.
Average upload size: 	0.20 KB
Average download size: 	110390.64 KB
Average time taken: 	28.65 seconds
Longest time taken: 	38.64 seconds
Average transfer rate : 	30829.97 Kbps
#  Uncomment if running on Kaggle
"""for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
                                                   
df = pd.read_csv('../input/fixed-income-returns-and-macro-trends/JPMaQS_Quantamental_Indicators.csv', index_col=0, parse_dates=['real_date'])"""
# It is often helpful to append a ticker column as a concatenation of cid and xcat. This shortens the code for references to individual time series (as opposed to panels).
display(df["xcat"].unique())
display(df["cid"].unique())
df["ticker"] = df["cid"] + "_" + df["xcat"]
df.head(3)
array(['CPIC_SA_P1M1ML12', 'CPIC_SJA_P3M3ML3AR', 'CPIC_SJA_P6M6ML6AR',
       'CPIH_SA_P1M1ML12', 'CPIH_SJA_P3M3ML3AR', 'CPIH_SJA_P6M6ML6AR',
       'FXTARGETED_NSA', 'FXUNTRADABLE_NSA', 'FXXR_NSA', 'FXXR_VT10',
       'INFTEFF_NSA', 'INTRGDP_NSA_P1M1ML12_3MMA',
       'INTRGDPv5Y_NSA_P1M1ML12_3MMA', 'PCREDITBN_SJA_P1M1ML12',
       'PCREDITGDP_SJA_D1M1ML12', 'RGDP_SA_P1Q1QL4_20QMA',
       'RYLDIRS02Y_NSA', 'RYLDIRS05Y_NSA', 'DU02YXR_NSA', 'DU02YXR_VT10',
       'DU05YXR_NSA', 'DU05YXR_VT10', 'EQXR_NSA', 'EQXR_VT10',
       'FXCRR_NSA'], dtype=object)
array(['AUD', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EUR', 'GBP', 'HUF',
       'IDR', 'ILS', 'INR', 'JPY', 'KRW', 'MXN', 'NOK', 'NZD', 'PLN',
       'SEK', 'THB', 'TRY', 'TWD', 'USD', 'ZAR'], dtype=object)
real_date cid xcat value ticker
0 2000-01-03 AUD CPIC_SA_P1M1ML12 1.244168 AUD_CPIC_SA_P1M1ML12
1 2000-01-03 AUD CPIC_SJA_P3M3ML3AR 3.006383 AUD_CPIC_SJA_P3M3ML3AR
2 2000-01-03 AUD CPIC_SJA_P6M6ML6AR 1.428580 AUD_CPIC_SJA_P6M6ML6AR

Historical distributions of indicators #

Histograms for single indicators #

Histograms are a useful visualization to understand the empirical distribution of a dataset by displaying the frequency or count of values within specific value ranges, known as bins. In seaborn , the sns.histplot() function is a versatile tool for creating histograms, and it has replaced the older sns.distplot() method. To incorporate a kernel density estimate (KDE) overlay on top of the histogram, you can set the kde argument to “True” when calling sns.histplot() . The KDE estimate provides a smoothed representation of the underlying distribution, giving additional insights into the shape and density of the data.

dfx = df[df["real_date"] >= pd.to_datetime("2000-01-01")]  # set start date
dfw = dfx.pivot_table(index="real_date", columns="ticker", values="value").replace(
    0, np.nan
)  # bring df to wide format
var = "MXN_EQXR_NSA"  # specified indicator to analyze

col = "teal"
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
sns.histplot(
    x=var, data=dfw, bins=20, kde=True, color=col
)  # histogram with custom bin number and kde overlay
plt.axvline(
    x=np.mean(dfw[var]), color=col, linestyle="--"
)  # add vertical line for mean

plt.title(
    "Mexican equity index future returns: mean and distribution", fontsize=13
)  # add chart title
plt.xlabel("% annualized", fontsize=11)  # overwrite standard x-axis label
plt.ylabel("days observed", fontsize=11)  # overwrite standrad y-axis label
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/04a53b3de95afa91a505928cfcafeef7445cc9bb5176122edff1eed3dbce4066.png

The sns.histplot() provides various options to customize the width of the bins and the units of the y-axis. One can also change the units of the y-axis with the stat argument from ‘count’ to ‘frequency’ (number of observations divided by the bin width), ‘density’ (normalizes counts so that the area of the histogram is 1), or ‘probability’ (normalizes counts so that the sum of the bar heights is 1).

dfx = df[df["real_date"] >= pd.to_datetime("2000-01-01")]  # set start date
dfw = dfx.pivot(index="real_date", columns="ticker", values="value").replace(
    0, np.nan
)  # bring df to wide format
var = "USD_CPIH_SA_P1M1ML12"  # specified indicator to analyze

col = "royalblue"
sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
sns.histplot(
    x=var, data=dfw, binwidth=0.2, stat="probability"
)  # histogram pre-set bin-width and probability bars
plt.axvline(
    x=np.mean(dfw[var]), color=col, linestyle="--"
)  # add vertical line for mean
plt.axvline(
    x=dfw[var].dropna().iloc[-1], color="red", linestyle="--"
)  # add line for latest

plt.title(
    "U.S. standard annual headline consumer price inflation, daily observed (red=latest)",
    fontsize=13,
)  # add chart title
plt.xlabel("% annualized", fontsize=11)  # overwrite standard x-axis label
plt.ylabel(
    "historic probability (since 2000)", fontsize=11
)  # overwrite standrad y-axis label
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/d7a52a117b9fa683d32b98889c2b8aba2b3860324341a221de108d3d4ce91780.png

Histograms for multiple indicators #

The hue argument in sns.histplot() allows displaying multiple counts or probabilities in a single plot, enabling comparisons between different cross-sections or series. The multiple parameter further controls how these distributions are visualized. Setting multiple='layer plots overlapping histograms. Setting `multiple=’stacked’, we plot joint histogram.

cids_sel = ["TWD", "MXN", "TRY"]  # select a group of cross-sections
filt1 = df["xcat"] == "FXCRR_NSA"  # choose (filter out) category
filt2 = df["cid"].isin(cids_sel)  # choose cross-sections
filt3 = df["real_date"] >= pd.to_datetime("2010-01-01")  # set start date
dfx = df[filt1 & filt2 & filt3][["value", "cid"]].replace(
    0, np.nan
)  # dataframe in appropriate format

colors = "pastel"  # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (8, 4)})  #  choose appearance
ax = sns.histplot(
    x="value",
    data=dfx,
    hue="cid",
    element="poly",
    multiple="layer",  # use hue and polygons for overlapping cross-sections
    binrange=(-10, 20),
    binwidth=1,
    stat="density",
    palette=colors,
)
plt.title("Real FX forward carry distributions in comparison", fontsize=13)  # set title
plt.xlabel("% annualized", fontsize=11)  # set x-axis label
plt.ylabel("historic density", fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box to plot to identify cross-sections
leg.set_title("Currencies")  # give title to legend box
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/af52eeccaa08fd999cb815bb98316d642f48395db07a5e23a09b6c305a587289.png
cids_sel = ["MXN", "TRY", "TWD"]  # select a small group of cross-sections
filt1 = df["cid"].isin(cids_sel)  # filter out cross-sections
filt2 = df["xcat"] == "FXCRR_NSA"  # filter out category
filt3 = df["real_date"] >= pd.to_datetime("2010-01-01")  # set start date
dfx = df[filt1 & filt2 & filt3][["value", "cid"]].sort_values(
    "cid"
)  # dataframe in appropriate format

colors = "bone"  # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (8, 4)})  #  choose appearance
ax = sns.histplot(
    x="value",
    data=dfx,
    hue="cid",
    element="bars",
    multiple="stack",  # use hue and bars/stack for overlapping visualization
    binrange=(-10, 20),
    binwidth=0.5,
    stat="count",
    palette=colors,
)

plt.title(
    "Real FX forward carry distribution: contribution of currencies", fontsize=13
)  # set title
plt.xlabel("% annualized", fontsize=11)  # set x-axis label
plt.ylabel("days observed", fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box to plot to identify cross-sections
leg.set_title("Currencies")  # give title to legend box
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/565e061a5723828fe544403ef1d486866d808338c34dbe7bdaecb2db6d836c3c.png

When working with a larger number of cross-sections or when performing two-dimensional segmentation, using a facet grid created by the sns.displot() function is often a preferred approach. The facet grid allows to create multiple subplots, each representing a specific cross-section or segment, making it easier to visualize and compare distributions.

cids_sel = ["BRL", "MXN", "ILS", "ZAR", "KRW"]  # select a small group of cross-sections
filt1 = df["cid"].isin(cids_sel)  # filter out cross-sections
filt2 = df["xcat"] == "FXXR_NSA"  # filter out category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # set start date
dfx = df[filt1 & filt2 & filt3][["real_date", "cid", "value"]].sort_values(
    ["cid", "real_date"]
)  # dataframe in the appropriate format
dfm = (
    dfx.groupby(["cid"])
    .resample("M", on="real_date")
    .sum(numeric_only="True")["value"]
    .reset_index()
)  # convert to monthly

dfm["period"] = "before GFC"  # create custom categorical variable
dfm.loc[dfm["real_date"].dt.year > 2006, "period"] = "GFC"
dfm.loc[dfm["real_date"].dt.year > 2009, "period"] = "after GFC"

sns.set_theme(style="darkgrid")  #  choose appearance
fg = sns.displot(
    dfm,
    x="value",
    col="cid",
    row="period",
    kind="hist",
    stat="density",
    binwidth=1,
    color="darkred",  # specify histplot as basis for distributions
    common_norm=False,  # passthrough for histplot() to secure independent normalization across subsets
    height=2,
    aspect=1.3,  # control size and shape
    facet_kws=dict(margin_titles=True),
)

fg.map(
    plt.axvline, x=0, c=".5", lw=0.75
)  # map horizontal zero line to each chart in grid
fg.set_axis_labels("", "")  # set axes labels of individual charts
fg.fig.suptitle(
    "Monthly FX forward returns across periods since 2000", y=1.02
)  # set facet grid title

for ax in fg.axes.flat:  # modify top and right axes titles
    if ax.get_title():  # check for axes title text
        ax.set_title(ax.get_title().split("=")[1])  # remove unwanted standard text
    if ax.texts:  # check for right ylabel text
        txt = ax.texts[0]
        ax.text(
            txt.get_unitless_position()[0],
            txt.get_unitless_position()[1],
            txt.get_text().split("=")[1],
            transform=ax.transAxes,
            va="center",
        )  # remove unwanted standard text
        ax.texts[0].remove()  # remove original text

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/c28adaf6d13fec8a9ba0f0982e73bde3520d8bdaa589d5f468328cb0e7e02c98.png

Multi-indicator distribution graphs #

The boxplot is a condensed categorical distribution plot , which means it is particularly suitable for visualizing a few selected key distribution features across categories. In seaborn , this type of plot is managed by the sns.boxplot() method. The distributional features can be applied to one or multiple categories across a full range of cross-sections.

Boxes : The boxes in a boxplot represent the interquartile range (IQR), which spans from the 25th percentile (Q1) to the 75th percentile (Q3) of the data distribution within each category. The vertical line within the box represents the median (50th percentile), providing a measure of the central tendency.

Whiskers : The whiskers of a boxplot extend from the box to the minimum and maximum values of the data within a certain range. By default, the range is defined as 1.5 times the IQR. Data points beyond this range are considered outliers and plotted individually.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "JPY",
    "NOK",
    "NZD",
    "SEK",
    "USD",
]  # select cross-sections
filt1 = df["cid"].isin(cids_sel)  #  filter out cross-sections
filt2 = df["xcat"] == "CPIC_SJA_P3M3ML3AR"  #  filter out category

dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values(
    "cid"
)  # dataframe in the appropriate format

color = "orange"
sns.set_theme(style="dark", rc={"figure.figsize": (16, 4)})  #  choose appearance
ax = sns.boxplot(
    data=dfx, x="cid", y="value", color=color, width=0.5, fliersize=2
)  # single category box-whiskers

plt.axhline(y=0, color="black", linestyle="--", lw=1)  # horizontal line at zero
plt.title(
    "Ranges of adjusted latest core consumer price trend since 2000", fontsize=13
)  # set title
plt.xlabel("")  # set x-axis label
plt.ylabel("% annualized, days observed", fontsize=11)  # set y-axis label
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/6a554d7dc389ec3bae2216c9815c177f93fe36d95c06847da56af8972c6c3016.png

Multiple categories for each cross-section can be plotted by using the hue argument and setting it to the column name that is used for categorization.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "JPY",
    "NOK",
    "NZD",
    "SEK",
]  # select cross-sections
xcats_sel = ["RYLDIRS02Y_NSA", "FXCRR_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter out cross-sections
filt2 = df["xcat"].isin(xcats_sel)  # select category
dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values(
    "cid"
)  # dataframe in appropriate format

colors = "hls"  # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (8, 4)})  #  choose appearance
ax = sns.boxplot(
    data=dfx,
    x="cid",
    y="value",
    hue="xcat",  # hue allows subcategories
    palette=colors,
    width=0.6,
    fliersize=2,
)

plt.title(
    "Real IRS yield and real FX forward carry (vs dominant cross)", fontsize=13
)  # set title
plt.axhline(y=0, color="black", linestyle="-", lw=1.5)  # horizontal line at zero
plt.xlabel("")  # set x-axis label
plt.ylabel("% annualized, days observed", fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title("Categories")  # set title of legend box
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/72c6dccf00f8fa1c8bff7e9c54f1a3c0c31598cbe548889ada12299c833a4e8c.png

A violin plot is a categorical distribution plot that combines of boxplot and (mostly) a symmetric KDE plot. It provides a visual representation of the probability density function and highlights the shape of the distribution, focusing on the central tendency and the spread of the data. Like the boxplot , it displays medians and inner quartile ranges. However, unlike a boxplot, it does not focus on outliers but rather on the shape of the probability distribution function. The outer shape represents all possible results. The inner shape represents the distribution of the inner 95% of all values observed. In seaborn , violin plots are managed through the sns.violinplot() method.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "JPY",
    "NZD",
    "SEK",
]  # select cross-sections
xcats_sel = ["EQXR_NSA", "FXXR_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter out cross-sections
filt2 = df["xcat"].isin(xcats_sel)  # select category
dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values(
    "cid"
)  # dataframe in the appropriate format

colors = ["red", "yellow"]  # choose color palette
sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 6)})  #  choose appearance
ax = sns.violinplot(
    data=dfx,
    y="cid",
    x="value",
    hue="xcat",  # hue visualizes multiple categories
    palette=colors,
    linewidth=0.5,
)  # appearance of the violins

plt.title(
    "Distribution of daily equity and FX forward returns", fontsize=13
)  # set title
plt.ylabel("")  # set x-axis label
plt.xlabel("% annualized, days observed", fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title("Categories")  # set title of legend box
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/fd79d710ecaaf81662508be035a010c9a0ee415c32385f07cafc1c6246459a1a.png

The boxenplot , also known as the letter value plot, is similar to the boxplot and provides more detailed information by plotting additional quantiles. It offers a more precise representation of the distribution and can reveal fine-grained variations in the data.

In seaborn , this type of plot is governed by the sns.boxenplot() method.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "JPY",
    "NOK",
    "NZD",
    "SEK",
    "USD",
]  # select cross-sections
filt1 = df["cid"].isin(cids_sel)  #  filter out cross-sections
filt2 = df["xcat"] == "DU05YXR_NSA"  #  filter out category

dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values("cid")

color = "steelblue"
sns.set_theme(style="whitegrid", rc={"figure.figsize": (7, 4)})  #  choose appearance
ax = sns.boxenplot(
    data=dfx, x="cid", y="value", color=color, scale="linear"
)  # single category box-whiskers
ax.set_ylim([-3, 3])
plt.axhline(y=0, color="black", linestyle="--", lw=1)  # horizontal line at zero
plt.title(
    "Stylized distributions of 5-year IRS carry since 2000", fontsize=13
)  # set title
plt.xlabel("")  # set x-axis label
plt.ylabel("% annualized, days observed", fontsize=11)  # set y-axis label
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/67d9c07a6d2c65596fe52088a5ba490465399553ca2a6dc1b8d541105af3eae2.png

Timelines of indicators #

lineplot s #

The purpose of a lineplot is two illustrate a continuous relationship between two variables, where time is typically one of these variables. In seaborn , the method to manage lineplot s is sns.lineplot() . Its most simple application is to pass to it a wide dataframe with a time axis as rows and individual series as columns.

cids_sel = ["GBP", "SEK"]  # select cross-sections
filt1 = df["cid"].isin(cids_sel)  # filter out cross-sections
filt2 = df["xcat"] == "PCREDITGDP_SJA_D1M1ML12"  # filter out category
filt3 = df["real_date"] >= pd.to_datetime("2010-01-01")  # set start date
dfx = df[filt1 & filt2 & filt3]
dfw = dfx.pivot(
    index=["real_date"], columns="cid", values="value"
)  # pivot data frame to common time scale

colors = "tab10"  # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
ax = sns.lineplot(
    data=dfw, estimator=None, palette=colors, linewidth=1
)  # simply pass data frame with time scale to method

plt.axhline(y=0, color="black", linestyle="--", lw=1)  # horizontal line at zero
plt.title(
    "Banks private credit expansion, jump-adjusted, as % of GDP over 1 year",
    fontsize=13,
)  # set title
plt.xlabel("")  # set x-axis label
plt.ylabel("% 6 months over 6 months, annualized", fontsize=11)  # set y-axis label

leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title("Currency areas")  # set title of legend box

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/cf79d35ab4006be0f7f17702e991b18002918b5dbcb460194b6e7b302ef02c6e.png

In addition to plotting lines, the sns.lineplot() method in seaborn can also estimate confidence intervals using bootstrapping. Bootstrapping is a resampling technique that involves creating multiple samples by randomly selecting observations from the observed values with replacement. By default, the sns.lineplot() method in seaborn performs bootstrapping to estimate aggregates, typically the mean, from these resampled samples. It then constructs a confidence interval to represent the uncertainty around the estimated aggregate. The default settings of sns.lineplot() include creating 1,000 resampled samples and computing aggregates from each of these samples. The 95% confidence interval is calculated based on the lower and upper boundaries of the inner 95% aggregate values.

cids_sel = cids_em  # select cross-sections
xcat_sel = "FXCRR_NSA"  # select category
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame

scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]

dfm = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # convert to monthly averages

dfw = dfm.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to appropriate index

colors = "Paired"  # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
sns.lineplot(
    data=dfw, x="real_date", y=xcat_sel, estimator="mean", ci=95
)  # plot mean and its 95% confidence interval

plt.axhline(y=0, color="black", linestyle="--", lw=1)  # horizontal line at zero
plt.title(
    "Real FX carry across EM: monthly mean and 95% confidence", fontsize=13
)  # set title
plt.xlabel("")  # set x-axis label
plt.ylabel("% annualized", fontsize=11)  # set y-axis label

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/8dcc847b93b0b80661df66e54d197e8f04db4e498abcb8544ad6e8170d312c3f.png

The seaborn lineplot allows not only displaying values chronologically but also aggregate information over time units, such as months. This can be useful for identifying seasonal patterns in the data. The confidence interval can be set with the ci argument. If high confidence intervals for many underlying observations do not overlap and reveal a clear pattern, seasonality is likely.

With the ‘hue’ argument, one can also compare confidence intervals across categories.

cids_sel = cids_em  # select cross-sections
xcat_sel = "FXXR_NSA"  # select category
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame

dfm = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .sum()["value"]
    .reset_index()
)  # monthly means
dfw = dfm.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()
dfw["month"] = dfw["real_date"].dt.month
dfw["period"] = "before 2010"
dfw.loc[dfw["real_date"].dt.year > 2010, "period"] = "from 2010"

colors = "Set2"  # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
ax = sns.lineplot(
    data=dfw,
    x="month",
    y=xcat_sel,
    hue="period",  # draw different lines for classes of period category
    estimator="mean",
    ci=95,
    palette=colors,
)  # plot tighter confidence interval

plt.axhline(y=0, color="black", linestyle="--", lw=1)  # horizontal line at zero
plt.title(
    "EM FX returns across months: mean and 95% confidence", fontsize=13
)  # set title
plt.xlabel("")  # set x-axis label
plt.ylabel("%", fontsize=11)  # set y-axis label
leg = ax.axes.get_legend()  # add legend box explicitly for control
leg.set_title("Periods")  # set title of legend box

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/bd29d3c37ae0fc6c97e3ed8699a42567cafde96b1a739206084a10b4a5c2b352.png

Line facets #

The purpose of facet grids in seaborn is to create small multiples, which are multiple plots arranged in a grid-like structure. The class sns.FacetGrid() maps a dataset onto multiple axes arrayed in a grid of rows and columns that correspond to levels of variables in the dataset.

The map() method of the FacetGrid object applies a plotting function to each facet’s subset of the data. The map_dataframe() method of the FacetGrid is similar to map() , but gives more flexibility because it allows to pass additional arguments and keyword arguments (kwargs) to the plotting function.

cids_sel = (
    cids_dm  # ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK']  # select cross-sections
)
xcat_sel = "CPIC_SA_P1M1ML12"  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame

color = "brown"  # choose color palette
sns.set_theme(style="darkgrid")  #  choose appearance
fg = sns.FacetGrid(
    dfx,
    col="cid",
    col_wrap=3,  # set number of columns of the grid
    height=3,
    aspect=1.5,  # set height and aspect ratio of cheach chart
    sharey=True,
)  # gives same y axis to all grid plots
fg.map_dataframe(
    sns.lineplot, x="real_date", y="value", ci=None, lw=0.75, color=color
)  # map `lineplot` to the grid
fg.map(
    plt.axhline, y=0, c="0.5", lw=1.5, linestyle="--"
)  # map horizontal zero line to each chart in grid

fg.set_axis_labels("", "% 6m/6m, ar")  # set axes labels of individual charts
fg.set_titles(col_template="{col_name}")  # set individual charts' title
fg.fig.suptitle("Annual core consumer price inflation", y=1.02)  # set facet grid title
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/bac8a8dde6db272f045424177b81af48768677b24ca2eb9eb7df15d602797922.png

To display multiple categories in a facet grid of lineplot s using the FacetGrid and lineplot methods in seaborn , one manages the categories to be used by setting the hue argument in the lineplot method to the column that contains the categories.

cids_sel = ["AUD", "CAD", "CHF", "EUR", "GBP", "SEK"]  # select cross-sections
xcats_sel = ["INTRGDP_NSA_P1M1ML12_3MMA", "FXCRR_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame

colors = ["steelblue", "black"]  # choose color palette
sns.set_theme(style="whitegrid", palette=colors)  #  choose appearance
fg = sns.FacetGrid(
    dfx,
    col="cid",
    col_wrap=3,  # set number of columns of the grid
    palette=colors,
    hue="xcat",
    hue_order=xcats_sel,  # hue is typically defined at the level of the facet grird
    height=3,
    aspect=1.5,  # set height and aspect ratio of cheach chart
    sharey=False,
)  # gives individual y axes to grid plots
fg.map_dataframe(
    sns.lineplot, x="real_date", y="value", ci=None, lw=1
)  # map `lineplot` to the grid
fg.map(
    plt.axhline, y=0, c=".5", lw=0.75
)  # map horizontal zero line to each chart in grid

fg.set_axis_labels("", "% ar")  # set axes labels of individual charts
fg.set_titles(col_template="{col_name}")  # set individual charts' title
fg.fig.suptitle(
    "Intuitive real GDP growth: % oya and FX carry", y=1.02
)  # set facet grid title

name_to_color = {
    " Intuitive real GDP growth: % oya": colors[0],
    "FX carry": colors[1],
}  # assignement dictionary for legend
patches = [
    mpl.patches.Patch(color=v, label=k) for k, v in name_to_color.items()
]  # legend requires patch (due to bug)
labels = name_to_color.keys()  # series labels for legend box
fg.fig.legend(
    handles=patches, labels=labels, loc="lower center", ncol=3
)  # add legend to bottom of figure

fg.fig.subplots_adjust(bottom=0.15)  # lift bottom so it does not conflict with legend
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/0ee0d87d1c8f73ecbaf4c686075289ec9b34dbf912ee8519bd5060c31f156474.png

Bivariate relations #

This section includes scatterplots, regression plots, linear model plots, joint plots, and pair plots. These plots help understand the relationship between variables and make informed decisions about model selection while gaining insights into the relationship between independent and dependent variables.

Scatterplots #

cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"]  # select cross-sections
xcats_sel = ["RGDP_SA_P1Q1QL4_20QMA", "FXCRR_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]


dfax = (
    dfx.groupby(["cid", "xcat"])
    .resample("A", on="real_date")
    .mean()["value"]
    .reset_index()
)  # annual averages
dfaw = dfax.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

colors = "deep"  # choose color palette
sns.set_theme(
    style="darkgrid", palette=colors, rc={"figure.figsize": (6, 4)}
)  #  choose appearance
ax = sns.scatterplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfaw,  # column names used for scatter
    hue="cid",
    style="cid",  # distinguishes cids by color and marker
    s=100,
)  # controls size of dots

plt.axhline(y=0, color="black", linestyle="--", lw=1)  # horizontal zero line
plt.axvline(x=0, color="black", linestyle="--", lw=1)  # vertical zero line

plt.title(
    "Long-term real GDP growth and real FX forward carry (annual averages)", fontsize=13
)  # set title
plt.xlabel("Long-term real GDP growth, % ar", fontsize=11)  # set x-axis label
plt.ylabel("Real forward carry, % ar", fontsize=11)  # set y-axis label
plt.legend(
    bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.0
)  # place legend outside box

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/043ccf4c2779ceeb697b27e4a0407dbbf431b4832e22da9d16a407b372a948f3.png

When a scatter plot has a large number of points, it can become difficult to distinguish individual points and perceive their density. In such cases, the alpha argument in seaborn's scatter plot ( sns.scatterplot() ) can be used to visualize the density through color intensity.

cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"]  # select cross-sections
xcats_sel = ["RGDP_SA_P1Q1QL4_20QMA", "FXCRR_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross


filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
dfw = dfx.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()


colors = "deep"  # choose color palette
sns.set_theme(
    style="darkgrid", palette=colors, rc={"figure.figsize": (6, 4)}
)  #  choose appearance
ax = sns.scatterplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,  # column names used for scatter
    alpha=0.2,  # control color intensity and transparency
    s=10,
)  # controls size of dots

plt.title(
    "Real interest rates and real FX forward carry (monthly averages)", fontsize=13
)  # set title
plt.xlabel("Real interest rates, % ar", fontsize=11)  # set x-axis label
plt.ylabel("Real forward carry, % ar", fontsize=11)  # set y-axis label

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/3f721b6f0f72b3a90e3d5b70c8ea0ccbfed3f511a99cdfc378fc148968f04836.png

Regression plots #

The sns.regplot() method allows to plot scatter points and a fitted regression line simultaneously. It provides various regression estimators that can be used to fit the line to the data.

The shaded bands around the regression line are confidence intervals that were created by bootstrapping. The option robust = True stipulates robust regression. This will de-weight outliers but takes significantly more computation time. Using this option makes it advisable to reduce the bootstrap samples with n_boot .

cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"]  # select cross-sections
xcats_sel = ["CPIH_SJA_P3M3ML3AR", "EQXR_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]

dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("Q", on="real_date")
    .mean()["value"]
    .reset_index()
)  # weekly averages

dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
sns.regplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,
    ci=98,
    order=1,
    robust=False,  #  can use statsmodels' rboust regression method, but takes more time
    scatter_kws={
        "s": 20,
        "alpha": 0.3,
        "color": "lightgray",
    },  # customize appearance of scatter
    line_kws={"lw": 2, "linestyle": "-.", "color": "salmon"},
)  # customize appearance of line

plt.axhline(y=0, color="black", linestyle="--", lw=1)  # horizontal zero line
plt.axvline(x=0, color="black", linestyle="--", lw=1)  # vertical zero line

plt.title(
    "Core inflation and equity index returns (quarterly)", fontsize=13
)  # set title
plt.xlabel("Core CPI, %6/6m ar", fontsize=11)  # set x-axis label
plt.ylabel("Average daily equity index return, %", fontsize=11)  # set y-axis label

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/cb0cbef2adad08f7b95e306fa77821624052f50490a3fa149fddda8c388defe4.png

seaborn's sns.regplot() method allows to visualize polynomial regression curves by specifying the order argument. The order argument determines the degree of the polynomial regression curve to fit the data.

cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"]  # select cross-sections
xcats_sel = ["FXCRR_NSA", "FXXR_NSA"]  # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame

scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]

dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("Q", on="real_date")
    .mean()["value"]
    .reset_index()
)  # monthly averages
filt4 = (
    dff["xcat"] == xcats_sel[0]
)  # filter for explanatory data in frequency-transformed dataframe
dff.loc[filt4, "value"] = (
    dff[filt4].groupby(["cid", "xcat"])["value"].shift(1)
)  # lag explanatory values by 1 time period


dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
sns.regplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,
    ci=95,
    order=2,  #  2nd-order polynomial fit
    scatter_kws={
        "s": 20,
        "alpha": 0.3,
        "color": "goldenrod",
    },  # customize appearance of scatter
    line_kws={"lw": 1, "linestyle": "-", "color": "tab:blue"},
)  # customize appearance of line

plt.axhline(y=0, color="tab:blue", linestyle="--", lw=1)  # horizontal zero line
plt.axvline(x=0, color="tab:blue", linestyle="--", lw=1)  # vertical zero line

plt.title(
    "FX forward carry and subsequent returns (quarterly averages)", fontsize=13
)  # set title
plt.xlabel("Real forward carry, % ar", fontsize=11)  # set x-axis label
plt.ylabel("FX forward returns, % ar", fontsize=11)  # set y-axis label

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/1ee06f936a6764e6e5553b49f5fd93c9f255349d641a4e361da23123b4a749a7.png

sns.regplot() method offers additional options for regression estimators beyond linear and polynomial regression. Two such options are logistic regression and locally weighted linear regression (LOWESS) .

cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"]  # select cross-sections
xcats_sel = ["FXCRR_NSA", "RYLDIRS02Y_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]

dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # weekly averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
sns.regplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,  # pass the date
    lowess=True,  #  uses statsmodels to estimate a nonparametric locally weighted linear regression
    marker="d",  # choose diamon market
    scatter_kws={
        "s": 50,
        "alpha": 0.2,
        "color": "gray",
    },  # customize the appearance of scatter
    line_kws={"lw": 1.5, "color": "black"},
)  # customize the appearance of the line

plt.axhline(y=0, color="red", linestyle="--", lw=1)  # horizontal zero line
plt.axvline(x=0, color="red", linestyle="--", lw=1)  # vertical zero line

plt.title(
    "Real FX forward carry (monthly averages) and real IRS yield", fontsize=13
)  # set title
plt.xlabel("Real FX forward carry, % ar", fontsize=11)  # set x-axis label
plt.ylabel("Real IRS yield, % ar", fontsize=11)  # set y-axis label

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/9b48833aa017b5ce4925e6cbb4ff522754aee9d6329687ae4664c7f2e7ee8454.png

Linear model plots #

sns.lmplot() method is specifically designed for visualizing linear model data and offers convenient ways to distinguish relationships across categories. In particular, linear model plots allow visualizing one type of relation across categories through color codes and also give access to the facet grids, which allows creating small multiples of regplots with just one line of code.

cids_sel = ["COP", "KRW"]  # select cross-sections
xcats_sel = ["CPIH_SA_P1M1ML12", "FXCRR_NSA"]  # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # monthly averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to category dataframe

sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
fg = sns.lmplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,  # pass data
    hue="cid",  # category that determines color partition
    truncate=False,
    scatter_kws={"s": 20, "alpha": 0.3},
)  # modify appearance

plt.title(
    "Core inflation trend and real FX carry (monthly averages)", fontsize=13
)  # set title
plt.xlabel(
    "Core inflation trend, %6m/6m, saar, jump-adjusted", fontsize=11
)  # set x-axis label
plt.ylabel("Real FX forward carry, % ar", fontsize=11)  # set y-axis label
fg._legend.set_title("Currencies")

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/d4ac1ba71fa1665a7d0b97a064ae19acf9eb5d26e2a96875ddea5ab61532ba41.png
cids_sel = ["COP", "HUF", "KRW", "MXN", "THB", "TWD"]  # select cross-sections
xcats_sel = ["FXCRR_NSA", "FXXR_NSA"]  # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
filt4 = dfx["xcat"] == xcats_sel[0]  # filter for features
filt5 = dfx["xcat"] == xcats_sel[1]  # filter for labels
dff1 = (
    dfx[filt4]
    .groupby(["cid", "xcat"])
    .resample("Q", on="real_date")
    .last()["value"]
    .reset_index()
)  # quarterly features
dff2 = (
    dfx[filt5]
    .groupby(["cid", "xcat"])
    .resample("Q", on="real_date")
    .sum()["value"]
    .reset_index()
)  # quarterly labels
dff = pd.concat([dff1, dff2])  # re-stack features and labels
filt6 = dff["xcat"] == xcats_sel[0]  # filter for frequency-transformed features
dff.loc[filt6, "value"] = (
    dff[filt6].groupby(["cid", "xcat"])["value"].shift(1)
)  # lag explanatory values by 1 time period
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe


sns.set_theme(style="whitegrid")  #  choose appearance
fg = sns.lmplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,  # pass data
    hue="cid",
    col="cid",
    col_wrap=3,  # category that determines partition
    aspect=1.2,
    height=4,  # aspect and height jointly determine shape and size of plots in grid
    truncate=True,
    sharex=False,
    sharey=False,  # appearance of regression plots
    line_kws={"color": "black", "lw": 0.75, "ls": "--"},  # set some aesthetics of line
    scatter_kws={"s": 20, "alpha": 0.3, "color": "r"},
)  # modify appearance

fg.set_titles(col_template="{col_name} versus dominant cross")
fg.set_axis_labels(
    "Real FX forward carry (% ar, month end)", "FX forward return, %, next quarter"
)
fg.fig.suptitle(
    "Real FX forward carry and subsequent quarterly returns",  # set grid title
    y=1.04,
    fontsize=14,
)  # position grid heading and set its font size

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/294d6bd95fafb464d4023f4245d4543c09b9d9c30fcb81281e88306b4dd817aa.png
cids_sel = [
    "AUD",
    "CHF",
    "CAD",
    "EUR",
    "GBP",
    "JPY",
    "SEK",
    "USD",
]  # select cross-sections
xcats_sel = ["RYLDIRS02Y_NSA", "DU02YXR_NSA"]  # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]

filt4 = dfx["xcat"] == xcats_sel[0]  # filter for features
filt5 = dfx["xcat"] == xcats_sel[1]  # filter for labels
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # monthly averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe


sns.set_theme(style="white")  #  choose appearance
fg = sns.lmplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,  # pass data
    hue="cid",
    col="cid",
    col_wrap=4,  # category that determines partition
    aspect=1,
    height=3,  # aspect and height jointly determine shape and size of plots in grid
    x_bins=8,
    truncate=True,
    sharex=False,
    sharey=False,  # appearance of regression plots
    line_kws={
        "color": "darkolivegreen",
        "lw": 0.75,
        "ls": "--",
    },  # set some aesthetics of line
    scatter_kws={"color": "slategray"},
)  # modify appearance

fg.set_titles(col_template="{col_name} market")
fg.set_axis_labels(
    "Real IRS yield: 2-year maturity , %ar", "Duration return, in % of notional, %ar"
)
fg.fig.suptitle(
    "Real IRS yield: 2-year maturity and Duration return, in % of notional",  # set grid title
    y=1.04,
    fontsize=14,
)  # position grid heading and set its font size

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/469495b6d1f15f8ed37c8328056ed285d5b8746ba7410e291dd4236ae3f5f320.png

Jointplots #

The sns.jointplot() function in seaborn is a versatile tool for visualizing the relationship between two variables along with their individual distributions. It combines a scatter plot or other relational plot with two histograms, allowing for a comprehensive analysis of the data. By specifying different values for the kind argument, you can explore various types of relational plots that suit your data and analysis goals.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "NZD",
    "SEK",
    "USD",
]  # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # weekly averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

sns.set_theme(style="white", rc={"figure.figsize": (6, 4)})  #  choose appearance
fg = sns.jointplot(
    x=xcats_sel[0], y=xcats_sel[1], data=dfw, color="steelblue", kind="hex", alpha=0.5
)  # display density in hexgons
fg.fig.suptitle(
    "Private credit expansion and real yield (monthly averages)", y=1.02, fontsize=13
)  # set grid title
fg.set_axis_labels(
    "Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
)  # set x/y axis labels

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/ca2941641fa4f8cee3520187622ba1a497507c8214c09af8a11df945b50d89e5.png

A regression line can be added by applying the plot_joint() method to the joint plot facegrid.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "NZD",
    "SEK",
    "USD",
]  # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # weekly averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)})  #  choose appearance
fg = sns.jointplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,
    kind="hist",  #  choose 2-dimension histogram
    color="red",
)
fg.plot_joint(
    sns.regplot, scatter=False, ci=False, color="black"
)  # one can overlay regression line
fg.fig.suptitle(
    "Private credit expansion and real interest rates (monthly averages)",
    y=1.02,
    fontsize=13,
)  # set grid title
fg.set_axis_labels(
    "Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
)  # set x/y axis labels

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/bdbada60295bd5454f2acf7994b571c409c43380af9e4c29fcb79135da942fad.png

The kernel density estimator ( kind='kde' ) gives a very stylized visualization of the relations.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "NZD",
    "SEK",
    "USD",
]  # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # weekly averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

sns.set_theme(style="white")  #  choose appearance
fg = sns.jointplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,
    kind="kde",
    joint_kws={"fill": True},  # keyword dictionary specific to relational plot
    color="steelblue",
    height=6,
)  # color and size parameters
fg.plot_joint(
    sns.regplot, scatter=False, ci=False, color="red"
)  # one can overlay regression line
fg.fig.suptitle(
    "Private credit expansion and real interest rates (monthly averages)",
    y=1.02,
    fontsize=13,
)  # set grid title
fg.set_axis_labels(
    "Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
)  # set x/y axis labels

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/f02d68b6fddb00d4cd380b9927bd70d88743bdc4069e94d0ce4013c70dd90c44.png

Information about categorical variables can be integrated through the hue argument.

Additional arguments can be passed to the central relational and marginal distribution plots through the joint_kws and marginal_kw keyword dictionaries respectively.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "EUR",
    "GBP",
    "NZD",
    "SEK",
    "USD",
]  # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # weekly averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

dfw["Period"] = "before 2010"  # create custom categorical variable
dfw.loc[dfw["real_date"].dt.year > 2010, "Period"] = "from 2010"

colors = "Set1"  # choose color palette
sns.set_theme(style="dark")  #  choose appearance
fg = sns.jointplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,  # pass appropriate data
    kind="scatter",
    palette=colors,
    height=6,  # parameters for appearance
    hue="Period",  # classes of pepriod category will be visualized by hue
    joint_kws={"marker": "+"},  # keyword dictionary specific to relational plot
    marginal_kws={"lw": 1},
)  # keyword dictionary specific to distribution plot
fg.fig.suptitle(
    "Private credit expansion and real interest rates (monthly averages)",
    y=1.02,
    fontsize=13,
)  # set grid title
fg.set_axis_labels(
    "Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
)  # set x/y axis labels

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/07aae271e3d24f1f9ae280a66c57d0c5aeb81112141d26dff6ad646c9b929708.png

The choice of the kernel density estimator (KDE) in seaborn can affect the visualization of the relationship between variables and the influence of categorical factors on correlation and distribution.

cids_sel = ["EUR", "GBP"]  # select cross-sections
xcats_sel = ["CPIC_SA_P1M1ML12", "RYLDIRS02Y_NSA"]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("M", on="real_date")
    .mean()["value"]
    .reset_index()
)  # weekly averages

dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

colors = "Set1"  # choose color palette
sns.set_theme(style="whitegrid")  #  choose appearance
fg = sns.jointplot(
    x=xcats_sel[0],
    y=xcats_sel[1],
    data=dfw,  # pass appropriate data
    kind="kde",
    palette=colors,
    height=6,  # parameters for appearance
    hue="cid",  # classes of pepriod category will be visualized by hue
    marginal_kws={"lw": 2},
)  # keyword dictionary specific to distribution plot
fg.fig.suptitle(
    "Reported core inflation trend and Real IRS yield (monthly averages)",
    y=1.02,
    fontsize=14,
)  # set grid title
fg.set_axis_labels(
    "Core inflation trend, % 6m/6m, saar",
    "Real short-term interest rate, % ar",
    fontsize=11,
)  # set x/y axis labels

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/f59d69a5e1d1a3e8ebe3a6906e7a008a71caa79c5a322cafbbf8ebda17d20c6b.png

Pairplots #

The sns.pairplot() function manages the display of multiple joint distributions. For example, it can be applied to visualize the joint density of a category across pairs of countries. Specifically, the pairplot is a joint visualization grid of univaraiate distributions on the diagonals and bivariate distributions on the off-diagonals. It collects a lot of information in one place and is therefore an instance of comprehensive exploratory data analysis. The sns.pairplot() output is a PairGrid instance, similar to a facet grid, rather than a single axes object.

Many arguments of sns.pairplot apply either to all diagonals or all off-diagonals:

  • kind governs the type of off-diagonal relational plot to use. It must be one of scatter , kde , hist , or reg .

  • plot_kws takes a dictionary of further arguments that apply to the chosen off-diagonal (main) plots.

  • diag_kind governs the type of diagonal plot to use and is usually hist or kde .

  • diag_kws takes a dictionary of arguments that apply to the chosen diagonal plots.

cids_sel = ["EUR", "GBP", "SEK", "CHF"]  # select cross-sections
xcat_sel = "RYLDIRS02Y_NSA"  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid"]).resample("M", on="real_date").mean()["value"].reset_index()
)  # monthly averages
dfw = dff.pivot(
    index="real_date", columns="cid", values="value"
).reset_index()  # pivot to wide dataframe
dfw = dfw[(dfw.T != 0).any()]  # drop all rows that are all zeroes

color = "teal"  # choose palette
sns.set_theme(style="darkgrid")  #  choose appearance
fg = sns.pairplot(
    data=dfw,
    vars=cids_sel,
    height=2,
    aspect=1.2,  # height and aspect ratio of each facet in the plot
    corner=True,  # removes redundant bivariate plots in symmetric matrix
    kind="scatter",  # choose type of bivariate plot
    plot_kws={
        "s": 20,
        "alpha": 0.3,
        "color": color,
    },  # set parameters for off-diagonal plots
    diag_kind="hist",  # choose type of univariate distribution plot
    diag_kws={"bins": 20, "color": color},
)  # set parameters for off-diagonal plots)
fg.fig.suptitle(
    "Distributions of real IRS yield in Europe (monthly averages)", y=1.02, fontsize=14
)  # set grid title

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/d84e5ca388329815d929554f3783ed50f0d61ba846ca8668a2d136c39bec4797.png

To apply the sns.pairplot() function to cross-sections one would need to pivot the selected dataframe with cross-sections (‘cid’) as the basis for new columns. To apply the sns.pairplot() function to categories one simply needs to pivot the selected dataframe with cross-sections (‘xcat’) as the basis for new columns.

cids_sel = [
    "AUD",
    "CAD",
    "CHF",
    "GBP",
    "JPY",
    "NOK",
    "NZD",
    "SEK",
]  # select cross-sections
xcats_sel = [
    "FXXR_NSA",
    "RYLDIRS02Y_NSA",
    "FXCRR_NSA",
    "INTRGDP_NSA_P1M1ML12_3MMA",
]  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel)  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
    dfx.groupby(["cid", "xcat"])
    .resample("A", on="real_date")
    .mean()["value"]
    .reset_index()
)  # annual averages
dfw = dff.pivot(
    index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()  # pivot to wide dataframe

color = "red"  # choose palette
sns.set_theme(style="whitegrid", palette=colors)  #  choose appearance
fg = sns.pairplot(
    data=dfw,
    vars=xcats_sel,
    height=2,
    aspect=1.2,  # height and aspect ratio of each facet in the plot
    corner=True,  # removes redundant bivariate plots in symmetric matrix
    plot_kws={"color": color, "bins": 20},  # set parameters for off-diagonal plots
    kind="hist",  # choose type of bivariate plot
    diag_kind="kde",  # choose type of univariate distribution plot
    diag_kws={"color": color},
)  # set parameters for off-diagonal plots)
fg.fig.suptitle(
    "Individual and pairwise distribution of FX-related indicators (annual)",
    y=1.02,
    fontsize=14,
)  # set grid title
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/a32b4bee7e3bfb8470233720a2b8904b05e0177314f0c6ce434f27b63406876b.png

Adding even more information, the pairplot can show distributions and relations for separate values of a categorical variable using the hue argument and a related palette choice.

cids_sel = ["AUD", "CAD", "MXN", "ZAR", "CHF"]  # select cross-sections
xcat_sel = "FXXR_NSA"  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
dff = (
    dfx.groupby(["cid"]).resample("M", on="real_date").sum()["value"].reset_index()
)  # monthly sums
dfw = dff.pivot(
    index="real_date", columns="cid", values="value"
).reset_index()  # pivot to wide dataframe


dfw["Period"] = "before 2010"  # create custom categorical variable
dfw.loc[dfw["real_date"].dt.year > 2010, "Period"] = "from 2010"

colors = "bone"  # choose palette
sns.set_theme(style="whitegrid")  #  choose appearance
fg = sns.pairplot(
    data=dfw,
    vars=cids_sel,
    palette=colors,
    hue="Period",  #  apply classification variable to hue
    height=2,
    aspect=1,  # height and aspect ratio of each facet in the plot
    corner=True,  # removes redundant bivariate plots in symmetric matrix
    kind="reg",  # choose the type of bivariate plot
    plot_kws={
        "ci": False,
        "scatter_kws": {"s": 10, "alpha": 0.5},
    },  # set parameters for off-diagonal plots
    diag_kind="hist",  # choose type of univariate distribution plot
    diag_kws={"bins": 20},
)  # set parameters for off-diagonal plots)
fg.fig.suptitle(
    "Relations and distributions of monthly FX returns", fontsize=14
)  # set grid title

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/6c8c26c4437105f68edb91549429c98a4565be345c5f917b7e9fb9488faf2b82.png

Color maps #

Heatmaps #

Heatmaps visualize tabular data by mapping numeric values to colors. They are managed by the sns.heatmap() function. This is a particularly powerful method for condensing a lot of information into a single visualization.

cids_sel = [
    "AUD",
    "BRL",
    "COP",
    "CLP",
    "HUF",
    "MXN",
    "PLN",
    "TRY",
    "ZAR",
    "INR",
    "MYR",
    "PHP",
]  # select cross-sections
xcat_sel = "FXXR_NSA"  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2006-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
dfx["year"] = dfx["real_date"].dt.year  #  # add year category to frame
dfw = (
    dfx.groupby(["cid", "year"])
    .sum(numeric_only="True")
    .reset_index()
    .pivot(index="year", columns="cid", values="value")
)
dfh = dfw.T  # transpose to appropriate format for heatmap function

colors = "vlag_r"  # choose appropriate diverging color palette
fg, ax = plt.subplots(figsize=(18, 8))  # prepare axis and grid
ax = sns.heatmap(
    dfh,
    cmap=colors,
    center=0,  # requires diverging color palette with white zero
    square=True,  # perfect squares
    annot=True,
    fmt=".1f",
    annot_kws={"fontsize": 11},  # format annotation numbers inside color boxes
    linewidth=1,
)  # set width of lines between color boxes

plt.title(
    "Annual FX forward returns in EM: A 15-year history", fontsize=16, y=1.05
)  # set heatmap title
plt.xlabel("")  # control x-axis label
plt.ylabel("")  # control x-axis label
plt.yticks(rotation=0)  # set direction of y-axis marks
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/4e34d471b4addb924a17f7a1f670dfe4866b6d8df4555709fe8ce21f5d2bf938.png

Cross correlations #

One of the easiest ways to display correlations across a range of sections is by using the sns.heatmap() function in combination with a correlation matrix. By applying the .corr() method to a wide cross ‘section x observations’ dataframe, you can obtain the correlation values between the columns (cross-sections). Color coding does all the work of visualization.

cids_sel = cids_dm + ["KRW", "TWD"]  # select cross-sections
xcat_sel = "INTRGDP_NSA_P1M1ML12_3MMA"  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df.loc[
    filt1 & filt2 & filt3, ["cid", "real_date", "value"]
]  # filter out relevant data frame
dfw = dfx.pivot(
    index=["real_date"], columns="cid", values="value"
)  # pivot out to date index and cross-section columns

csquare = dfw.corr()  # cross-correlation matrix
mask = np.triu(
    np.ones_like(csquare, dtype=bool), k=0
)  # mask for reundant upper triangle

colors = sns.diverging_palette(
    20, 190, as_cmap=True
)  # customize diverging color palette
fg, ax = plt.subplots(figsize=(10, 7))  # prepare axis and grid

ax = sns.heatmap(
    csquare,
    cmap=colors,
    center=0,  # requires diverging color palette with white zero
    annot=True,
    fmt=".1f",
    annot_kws={"fontsize": 11},  # format annotation numbers inside color boxes
    mask=mask,  # remove redundant upper triangle
    linewidth=3,
)  # set width of lines between color boxes

plt.title(
    "Intuitive real GDP growth correlation across developed countries",
    fontsize=14,
    y=1.03,
)  # set heatmap title
plt.xlabel("")  # control x-axis label
plt.ylabel("")  # control x-axis label
plt.yticks(rotation=0)  # set direction of y-axis marks
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/141d98535cb76c2e8ea35273733017f1b82982cd0d7dca6307e97be343378251.png

A less common mean of visualization for cross-correlations is the sns.replot() function. It is an interface for drawing relational plots onto a FacetGrid . It is mostly used for displaying bivariate relations but can also produce a color- and size-coded display of correlation coefficients across multiple sections. The graph below allows greater focus on higher correlation coefficients (whether positive or negative). It can be a better visualization for a large number of cross-sections with very diverse relations.

cids_sel = cids  # select cross-sections
xcat_sel = "FXXR_NSA"  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01")  # filter for start date
dfx = df.loc[
    filt1 & filt2 & filt3, ["cid", "real_date", "value"]
]  # filter out relevant data frame
dfw = dfx.pivot(
    index=["real_date"], columns="cid", values="value"
)  # pivot out to date index and cross-section columns

csquare = dfw.corr()  # square correlation coefficient dataframe across all columns
csquare.index.rename("cid0", inplace=True)  # rename index name to allow unstacking
dfc = csquare.unstack().reset_index()  # unstack to long-dataframe
dfc.rename(
    mapper={0: "pearson"}, axis=1, inplace=True
)  # give intuitive name to correlation value column
dfc["abs_coef"] = np.abs(
    dfc["pearson"]
)  # add column of absolute coefficient values (for size)

sns.set_theme(style="whitegrid")
fg = sns.relplot(
    data=dfc,
    x="cid",
    y="cid0",  # define axes
    hue="pearson",
    hue_norm=(-1, 1),
    palette="vlag_r",  # color code correlation coefficients
    size="abs_coef",
    size_norm=(0, 1),
    sizes=(50, 250),  # express absolute coefficients by size
    height=7,
    aspect=1.2,
)  # control facet grid shape

fg.fig.suptitle(
    "Correlation of FX forward return across markets (Pearson coefficients)",
    y=1.02,
    fontsize=14,
)  # set grid title
fg.set(xlabel="", ylabel="")  # remove axes labels
fg.despine(left=True, bottom=True)  # remove axes lines
fg.ax.margins(0.02)  # control proximity of labels to axes
for label in fg.ax.get_xticklabels():
    label.set_rotation(90)  # rotate x-tick labels for readability

plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/0a6a1205549795b2cb520d784dd4fcf76e5e7a140b8f6adf08c0a99fee37a381.png

Clustermaps #

The ‘sns.clustermap()’ function is used to create a heatmap visualization of matrices with additional hierarchical clustering information.

The dendrograms , which represent the clustering lines, display the statistical similarity between columns and rows based on multi-dimensional distance. The similarity here is the inverse of multi-dimensional distance. The default is Euclidean or spatial distance. The dendrogram is created based on hierarchical agglomerative clustering, i.e. sequential clustering of the nearest points in multi-dimensional space. Note that the sns.clustermap method returns a Clustermap object.

cids_sel = [
    "EUR",
    "USD",
    "GBP",
    "CHF",
    "JPY",
    "SEK",
    "CAD",
    "ZAR",
    "INR",
    "MYR",
]  # select cross-sections
xcat_sel = "DU05YXR_NSA"  # select categories
filt1 = df["cid"].isin(cids_sel)  # filter for cross-sections
filt2 = df["xcat"] == xcat_sel  #  filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01")  # filter for start date
dfx = df[filt1 & filt2 & filt3]  # filter out relevant data frame
dfx["year"] = dfx["real_date"].dt.year  #  # add year category to frame
dfw = (
    dfx.groupby(["cid", "year"])
    .sum(numeric_only="True")
    .reset_index()
    .pivot(index="year", columns="cid", values="value")
)  # annual means
dfh = dfw.dropna().T  # transpose to appropriate format for heatmap function

colors = sns.diverging_palette(
    20, 220, as_cmap=True
)  # choose appropriate diverging color palette
fg = sns.clustermap(
    dfh,
    cmap=colors,
    center=0,  # requires diverging color palette with white zero
    figsize=(12, 7),  # set appropriate size
    annot=True,
    fmt=".1f",
    annot_kws={"fontsize": 11},  # format annotation numbers inside color boxes
    linewidth=1,
)  # set width of lines between color boxes

fg.fig.suptitle(
    "Similarities of countries and trading years, based on duration returns", y=1.02
)  # setting title
fg.ax_heatmap.set_xlabel("")  # special way of controlling x-axis label
fg.ax_heatmap.set_ylabel("")  # special way of controlling y-axis label
plt.show()
https://macrosynergy.com/notebooks.build/data-science/jpmaqs-for-seaborn/_images/d6bf56bf58680ce2ffbef3002ea54a23232c5b83404bb0c03c1d8fd8e36f069d.png