JPMaQS with Seaborn #
In this notebook, we showcase the application of the powerful
seaborn
visualisation module to build out understanding and gain further insights into both JPMaQS indicators and relationships involving them. Built on top of
matplotlib
,
seaborn
offers a convenient interface to create visually appealing and informative statistical graphics.
The notebook covers the following main parts:
-
Get Packages and JPMaQS Data: This section is responsible for installing and importing the necessary Python packages that are used throughout the analysis.
-
Historical distributions of indicators: In this part, the notebook shows how to visualize the empirical distribution of a dataset by displaying the frequency of counts within a specific range. This simple tool helps quickly identify patterns, trends, frequency distribution, central tendency, spread, and outliers. It helps compare distribution for multiple indicators on one plot or side by side and is an important part of the initial analysis.
-
Timelines of indicators: Here, the notebook showcases primarily line plots and line facets. They are beneficial for showing trends, changes over time, or the relationship between two continuous variables. Line facets allow you to create multiple line plots in a grid arrangement, each showing a subset of the data based on a categorical variable.
-
Bivariate relations: this section includes scatterplots, regression plots, linear model plots, joint plots, and pair plots. These plots help understand the relationship between variables and make informed decisions about model selection while gaining insights into the relationship between independent and dependent variables.
-
Color maps represent numerical or categorical data as colors, helping viewers perceive patterns, variations, and relationships within the data.
Get packages and JPMaQS data #
# Uncomment below if running on Kaggle or if you need to install macrosynergy package
"""
%%capture
! pip install macrosynergy --upgrade
"""
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import os
from macrosynergy.download import JPMaQSDownload
import warnings
warnings.simplefilter("ignore")
The JPMaQS indicators we consider are downloaded using the J.P. Morgan Dataquery API interface within the
macrosynergy
package. This is done by specifying ticker strings, formed by appending an indicator category code
DB(JPMAQS,<cross_section>_<category>,<info>)
, where
value
giving the latest available values for the indicator
eop_lag
referring to days elapsed since the end of the observation period
mop_lag
referring to the number of days elapsed since the mean observation period
grade
denoting a grade of the observation, giving a metric of real time information quality.
After instantiating the
JPMaQSDownload
class within the
macrosynergy.download
module, one can use the
download(tickers,start_date,metrics)
method to easily download the necessary data, where
tickers
is an array of ticker strings,
start_date
is the first collection date to be considered and
metrics
is an array comprising the times series information to be downloaded. For more information see
here
or use the free dataset on
Kaggle
To ensure reproducibility, only samples between January 2000 (inclusive) and May 2023 (exclusive) are considered.
cids_dm = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK", "USD"]
cids_em = [
"CLP",
"COP",
"CZK",
"HUF",
"IDR",
"ILS",
"INR",
"KRW",
"MXN",
"PLN",
"THB",
"TRY",
"TWD",
"ZAR",
]
cids = cids_dm + cids_em
ecos = [
"CPIC_SA_P1M1ML12",
"CPIC_SJA_P3M3ML3AR",
"CPIC_SJA_P6M6ML6AR",
"CPIH_SA_P1M1ML12",
"CPIH_SJA_P3M3ML3AR",
"CPIH_SJA_P6M6ML6AR",
"INFTEFF_NSA",
"INTRGDP_NSA_P1M1ML12_3MMA",
"INTRGDPv5Y_NSA_P1M1ML12_3MMA",
"PCREDITGDP_SJA_D1M1ML12",
"RGDP_SA_P1Q1QL4_20QMA",
"RYLDIRS02Y_NSA",
"RYLDIRS05Y_NSA",
"PCREDITBN_SJA_P1M1ML12",
]
mkts = [
"DU02YXR_NSA",
"DU05YXR_NSA",
"DU02YXR_VT10",
"DU05YXR_VT10",
"EQXR_NSA",
"EQXR_VT10",
"FXXR_NSA",
"FXXR_VT10",
"FXCRR_NSA",
"FXTARGETED_NSA",
"FXUNTRADABLE_NSA",
]
xcats = ecos + mkts
The description of each JPMaQS category is available either under Macro Quantamental Academy , JPMorgan Markets (password protected), or on Kaggle (just for the tickers used in this notebook). In particular, the set used for this notebook is using Consumer price inflation trends , Inflation targets , Intuitive growth estimates , Domestic credit ratios , Long-term GDP growth , Real interest rates , Private credit expansion , Duration returns , Equity index future returns , FX forward returns , FX forward carry , and FX tradeability and flexibility
# Download series from J.P. Morgan DataQuery by tickers
start_date = "2000-01-01"
end_date = "2023-05-01"
tickers = [cid + "_" + xcat for cid in cids for xcat in xcats]
print(f"Maximum number of tickers is {len(tickers)}")
# Retrieve credentials
client_id: str = os.getenv("DQ_CLIENT_ID")
client_secret: str = os.getenv("DQ_CLIENT_SECRET")
with JPMaQSDownload(client_id=client_id, client_secret=client_secret) as dq:
df = dq.download(
tickers=tickers,
start_date=start_date,
end_date=end_date,
suppress_warning=True,
metrics=["value"],
report_time_taken=True,
show_progress=True,
)
Maximum number of tickers is 600
Downloading data from JPMaQS.
Timestamp UTC: 2023-09-18 09:49:35
Connection successful!
Number of expressions requested: 600
Requesting data: 100%|██████████| 30/30 [00:09<00:00, 3.29it/s]
Downloading data: 100%|██████████| 30/30 [01:00<00:00, 2.00s/it]
Time taken to download data: 70.45 seconds.
Time taken to convert to dataframe: 8.64 seconds.
Average upload size: 0.20 KB
Average download size: 110390.64 KB
Average time taken: 28.65 seconds
Longest time taken: 38.64 seconds
Average transfer rate : 30829.97 Kbps
# Uncomment if running on Kaggle
"""for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
df = pd.read_csv('../input/fixed-income-returns-and-macro-trends/JPMaQS_Quantamental_Indicators.csv', index_col=0, parse_dates=['real_date'])"""
# It is often helpful to append a ticker column as a concatenation of cid and xcat. This shortens the code for references to individual time series (as opposed to panels).
display(df["xcat"].unique())
display(df["cid"].unique())
df["ticker"] = df["cid"] + "_" + df["xcat"]
df.head(3)
array(['CPIC_SA_P1M1ML12', 'CPIC_SJA_P3M3ML3AR', 'CPIC_SJA_P6M6ML6AR',
'CPIH_SA_P1M1ML12', 'CPIH_SJA_P3M3ML3AR', 'CPIH_SJA_P6M6ML6AR',
'FXTARGETED_NSA', 'FXUNTRADABLE_NSA', 'FXXR_NSA', 'FXXR_VT10',
'INFTEFF_NSA', 'INTRGDP_NSA_P1M1ML12_3MMA',
'INTRGDPv5Y_NSA_P1M1ML12_3MMA', 'PCREDITBN_SJA_P1M1ML12',
'PCREDITGDP_SJA_D1M1ML12', 'RGDP_SA_P1Q1QL4_20QMA',
'RYLDIRS02Y_NSA', 'RYLDIRS05Y_NSA', 'DU02YXR_NSA', 'DU02YXR_VT10',
'DU05YXR_NSA', 'DU05YXR_VT10', 'EQXR_NSA', 'EQXR_VT10',
'FXCRR_NSA'], dtype=object)
array(['AUD', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EUR', 'GBP', 'HUF',
'IDR', 'ILS', 'INR', 'JPY', 'KRW', 'MXN', 'NOK', 'NZD', 'PLN',
'SEK', 'THB', 'TRY', 'TWD', 'USD', 'ZAR'], dtype=object)
real_date | cid | xcat | value | ticker | |
---|---|---|---|---|---|
0 | 2000-01-03 | AUD | CPIC_SA_P1M1ML12 | 1.244168 | AUD_CPIC_SA_P1M1ML12 |
1 | 2000-01-03 | AUD | CPIC_SJA_P3M3ML3AR | 3.006383 | AUD_CPIC_SJA_P3M3ML3AR |
2 | 2000-01-03 | AUD | CPIC_SJA_P6M6ML6AR | 1.428580 | AUD_CPIC_SJA_P6M6ML6AR |
Historical distributions of indicators #
Histograms for single indicators #
Histograms are a useful visualization to understand the empirical distribution of a dataset by displaying the frequency or count of values within specific value ranges, known as bins. In
seaborn
, the
sns.histplot()
function is a versatile tool for creating histograms, and it has replaced the older
sns.distplot()
method.
To incorporate a kernel density estimate (KDE) overlay on top of the histogram, you can set the
kde
argument to “True” when calling
sns.histplot()
. The KDE estimate provides a smoothed representation of the underlying distribution, giving additional insights into the shape and density of the data.
dfx = df[df["real_date"] >= pd.to_datetime("2000-01-01")] # set start date
dfw = dfx.pivot_table(index="real_date", columns="ticker", values="value").replace(
0, np.nan
) # bring df to wide format
var = "MXN_EQXR_NSA" # specified indicator to analyze
col = "teal"
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)}) # choose appearance
sns.histplot(
x=var, data=dfw, bins=20, kde=True, color=col
) # histogram with custom bin number and kde overlay
plt.axvline(
x=np.mean(dfw[var]), color=col, linestyle="--"
) # add vertical line for mean
plt.title(
"Mexican equity index future returns: mean and distribution", fontsize=13
) # add chart title
plt.xlabel("% annualized", fontsize=11) # overwrite standard x-axis label
plt.ylabel("days observed", fontsize=11) # overwrite standrad y-axis label
plt.show()
The
sns.histplot()
provides various options to customize the width of the bins and the units of the y-axis. One can also change the units of the y-axis with the
stat
argument from ‘count’ to ‘frequency’ (number of observations divided by the bin width), ‘density’ (normalizes counts so that the area of the histogram is 1), or ‘probability’ (normalizes counts so that the sum of the bar heights is 1).
dfx = df[df["real_date"] >= pd.to_datetime("2000-01-01")] # set start date
dfw = dfx.pivot(index="real_date", columns="ticker", values="value").replace(
0, np.nan
) # bring df to wide format
var = "USD_CPIH_SA_P1M1ML12" # specified indicator to analyze
col = "royalblue"
sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 4)}) # choose appearance
sns.histplot(
x=var, data=dfw, binwidth=0.2, stat="probability"
) # histogram pre-set bin-width and probability bars
plt.axvline(
x=np.mean(dfw[var]), color=col, linestyle="--"
) # add vertical line for mean
plt.axvline(
x=dfw[var].dropna().iloc[-1], color="red", linestyle="--"
) # add line for latest
plt.title(
"U.S. standard annual headline consumer price inflation, daily observed (red=latest)",
fontsize=13,
) # add chart title
plt.xlabel("% annualized", fontsize=11) # overwrite standard x-axis label
plt.ylabel(
"historic probability (since 2000)", fontsize=11
) # overwrite standrad y-axis label
plt.show()
Histograms for multiple indicators #
The
hue
argument in
sns.histplot()
allows displaying multiple counts or probabilities in a single plot, enabling comparisons between different cross-sections or series. The
multiple
parameter further controls how these distributions are visualized. Setting
multiple='layer
plots overlapping histograms. Setting `multiple=’stacked’, we plot joint histogram.
cids_sel = ["TWD", "MXN", "TRY"] # select a group of cross-sections
filt1 = df["xcat"] == "FXCRR_NSA" # choose (filter out) category
filt2 = df["cid"].isin(cids_sel) # choose cross-sections
filt3 = df["real_date"] >= pd.to_datetime("2010-01-01") # set start date
dfx = df[filt1 & filt2 & filt3][["value", "cid"]].replace(
0, np.nan
) # dataframe in appropriate format
colors = "pastel" # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (8, 4)}) # choose appearance
ax = sns.histplot(
x="value",
data=dfx,
hue="cid",
element="poly",
multiple="layer", # use hue and polygons for overlapping cross-sections
binrange=(-10, 20),
binwidth=1,
stat="density",
palette=colors,
)
plt.title("Real FX forward carry distributions in comparison", fontsize=13) # set title
plt.xlabel("% annualized", fontsize=11) # set x-axis label
plt.ylabel("historic density", fontsize=11) # set y-axis label
leg = ax.axes.get_legend() # add legend box to plot to identify cross-sections
leg.set_title("Currencies") # give title to legend box
plt.show()
cids_sel = ["MXN", "TRY", "TWD"] # select a small group of cross-sections
filt1 = df["cid"].isin(cids_sel) # filter out cross-sections
filt2 = df["xcat"] == "FXCRR_NSA" # filter out category
filt3 = df["real_date"] >= pd.to_datetime("2010-01-01") # set start date
dfx = df[filt1 & filt2 & filt3][["value", "cid"]].sort_values(
"cid"
) # dataframe in appropriate format
colors = "bone" # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (8, 4)}) # choose appearance
ax = sns.histplot(
x="value",
data=dfx,
hue="cid",
element="bars",
multiple="stack", # use hue and bars/stack for overlapping visualization
binrange=(-10, 20),
binwidth=0.5,
stat="count",
palette=colors,
)
plt.title(
"Real FX forward carry distribution: contribution of currencies", fontsize=13
) # set title
plt.xlabel("% annualized", fontsize=11) # set x-axis label
plt.ylabel("days observed", fontsize=11) # set y-axis label
leg = ax.axes.get_legend() # add legend box to plot to identify cross-sections
leg.set_title("Currencies") # give title to legend box
plt.show()
When working with a larger number of cross-sections or when performing two-dimensional segmentation, using a facet grid created by the
sns.displot()
function is often a preferred approach. The facet grid allows to create multiple subplots, each representing a specific cross-section or segment, making it easier to visualize and compare distributions.
cids_sel = ["BRL", "MXN", "ILS", "ZAR", "KRW"] # select a small group of cross-sections
filt1 = df["cid"].isin(cids_sel) # filter out cross-sections
filt2 = df["xcat"] == "FXXR_NSA" # filter out category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # set start date
dfx = df[filt1 & filt2 & filt3][["real_date", "cid", "value"]].sort_values(
["cid", "real_date"]
) # dataframe in the appropriate format
dfm = (
dfx.groupby(["cid"])
.resample("M", on="real_date")
.sum(numeric_only="True")["value"]
.reset_index()
) # convert to monthly
dfm["period"] = "before GFC" # create custom categorical variable
dfm.loc[dfm["real_date"].dt.year > 2006, "period"] = "GFC"
dfm.loc[dfm["real_date"].dt.year > 2009, "period"] = "after GFC"
sns.set_theme(style="darkgrid") # choose appearance
fg = sns.displot(
dfm,
x="value",
col="cid",
row="period",
kind="hist",
stat="density",
binwidth=1,
color="darkred", # specify histplot as basis for distributions
common_norm=False, # passthrough for histplot() to secure independent normalization across subsets
height=2,
aspect=1.3, # control size and shape
facet_kws=dict(margin_titles=True),
)
fg.map(
plt.axvline, x=0, c=".5", lw=0.75
) # map horizontal zero line to each chart in grid
fg.set_axis_labels("", "") # set axes labels of individual charts
fg.fig.suptitle(
"Monthly FX forward returns across periods since 2000", y=1.02
) # set facet grid title
for ax in fg.axes.flat: # modify top and right axes titles
if ax.get_title(): # check for axes title text
ax.set_title(ax.get_title().split("=")[1]) # remove unwanted standard text
if ax.texts: # check for right ylabel text
txt = ax.texts[0]
ax.text(
txt.get_unitless_position()[0],
txt.get_unitless_position()[1],
txt.get_text().split("=")[1],
transform=ax.transAxes,
va="center",
) # remove unwanted standard text
ax.texts[0].remove() # remove original text
plt.show()
Multi-indicator distribution graphs #
The
boxplot
is a
condensed categorical distribution plot
, which means it is particularly suitable for visualizing a few selected key distribution features across categories. In
seaborn
, this type of plot is managed by the
sns.boxplot()
method. The distributional features can be applied to one or multiple categories across a full range of cross-sections.
Boxes : The boxes in a boxplot represent the interquartile range (IQR), which spans from the 25th percentile (Q1) to the 75th percentile (Q3) of the data distribution within each category. The vertical line within the box represents the median (50th percentile), providing a measure of the central tendency.
Whiskers : The whiskers of a boxplot extend from the box to the minimum and maximum values of the data within a certain range. By default, the range is defined as 1.5 times the IQR. Data points beyond this range are considered outliers and plotted individually.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"JPY",
"NOK",
"NZD",
"SEK",
"USD",
] # select cross-sections
filt1 = df["cid"].isin(cids_sel) # filter out cross-sections
filt2 = df["xcat"] == "CPIC_SJA_P3M3ML3AR" # filter out category
dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values(
"cid"
) # dataframe in the appropriate format
color = "orange"
sns.set_theme(style="dark", rc={"figure.figsize": (16, 4)}) # choose appearance
ax = sns.boxplot(
data=dfx, x="cid", y="value", color=color, width=0.5, fliersize=2
) # single category box-whiskers
plt.axhline(y=0, color="black", linestyle="--", lw=1) # horizontal line at zero
plt.title(
"Ranges of adjusted latest core consumer price trend since 2000", fontsize=13
) # set title
plt.xlabel("") # set x-axis label
plt.ylabel("% annualized, days observed", fontsize=11) # set y-axis label
plt.show()
Multiple categories for each cross-section can be plotted by using the
hue
argument and setting it to the column name that is used for categorization.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"JPY",
"NOK",
"NZD",
"SEK",
] # select cross-sections
xcats_sel = ["RYLDIRS02Y_NSA", "FXCRR_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter out cross-sections
filt2 = df["xcat"].isin(xcats_sel) # select category
dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values(
"cid"
) # dataframe in appropriate format
colors = "hls" # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (8, 4)}) # choose appearance
ax = sns.boxplot(
data=dfx,
x="cid",
y="value",
hue="xcat", # hue allows subcategories
palette=colors,
width=0.6,
fliersize=2,
)
plt.title(
"Real IRS yield and real FX forward carry (vs dominant cross)", fontsize=13
) # set title
plt.axhline(y=0, color="black", linestyle="-", lw=1.5) # horizontal line at zero
plt.xlabel("") # set x-axis label
plt.ylabel("% annualized, days observed", fontsize=11) # set y-axis label
leg = ax.axes.get_legend() # add legend box explicitly for control
leg.set_title("Categories") # set title of legend box
plt.show()
A
violin plot
is a categorical distribution plot that combines of boxplot and (mostly) a symmetric KDE plot. It provides a visual representation of the probability density function and highlights the shape of the distribution, focusing on the central tendency and the spread of the data. Like the
boxplot
, it displays medians and inner quartile ranges. However, unlike a boxplot, it does not focus on outliers but rather on the shape of the probability distribution function. The outer shape represents all possible results. The inner shape represents the distribution of the inner 95% of all values observed.
In
seaborn
, violin plots are managed through the
sns.violinplot()
method.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"JPY",
"NZD",
"SEK",
] # select cross-sections
xcats_sel = ["EQXR_NSA", "FXXR_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter out cross-sections
filt2 = df["xcat"].isin(xcats_sel) # select category
dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values(
"cid"
) # dataframe in the appropriate format
colors = ["red", "yellow"] # choose color palette
sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 6)}) # choose appearance
ax = sns.violinplot(
data=dfx,
y="cid",
x="value",
hue="xcat", # hue visualizes multiple categories
palette=colors,
linewidth=0.5,
) # appearance of the violins
plt.title(
"Distribution of daily equity and FX forward returns", fontsize=13
) # set title
plt.ylabel("") # set x-axis label
plt.xlabel("% annualized, days observed", fontsize=11) # set y-axis label
leg = ax.axes.get_legend() # add legend box explicitly for control
leg.set_title("Categories") # set title of legend box
plt.show()
The boxenplot , also known as the letter value plot, is similar to the boxplot and provides more detailed information by plotting additional quantiles. It offers a more precise representation of the distribution and can reveal fine-grained variations in the data.
In
seaborn
, this type of plot is governed by the
sns.boxenplot()
method.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"JPY",
"NOK",
"NZD",
"SEK",
"USD",
] # select cross-sections
filt1 = df["cid"].isin(cids_sel) # filter out cross-sections
filt2 = df["xcat"] == "DU05YXR_NSA" # filter out category
dfx = df[filt1 & filt2][["value", "cid", "xcat"]].sort_values("cid")
color = "steelblue"
sns.set_theme(style="whitegrid", rc={"figure.figsize": (7, 4)}) # choose appearance
ax = sns.boxenplot(
data=dfx, x="cid", y="value", color=color, scale="linear"
) # single category box-whiskers
ax.set_ylim([-3, 3])
plt.axhline(y=0, color="black", linestyle="--", lw=1) # horizontal line at zero
plt.title(
"Stylized distributions of 5-year IRS carry since 2000", fontsize=13
) # set title
plt.xlabel("") # set x-axis label
plt.ylabel("% annualized, days observed", fontsize=11) # set y-axis label
plt.show()
Timelines of indicators #
lineplot
s
#
The purpose of a
lineplot
is two illustrate a continuous relationship between two variables, where time is typically one of these variables. In
seaborn
, the method to manage
lineplot
s is
sns.lineplot()
. Its most simple application is to pass to it a wide dataframe with a time axis as rows and individual series as columns.
cids_sel = ["GBP", "SEK"] # select cross-sections
filt1 = df["cid"].isin(cids_sel) # filter out cross-sections
filt2 = df["xcat"] == "PCREDITGDP_SJA_D1M1ML12" # filter out category
filt3 = df["real_date"] >= pd.to_datetime("2010-01-01") # set start date
dfx = df[filt1 & filt2 & filt3]
dfw = dfx.pivot(
index=["real_date"], columns="cid", values="value"
) # pivot data frame to common time scale
colors = "tab10" # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)}) # choose appearance
ax = sns.lineplot(
data=dfw, estimator=None, palette=colors, linewidth=1
) # simply pass data frame with time scale to method
plt.axhline(y=0, color="black", linestyle="--", lw=1) # horizontal line at zero
plt.title(
"Banks private credit expansion, jump-adjusted, as % of GDP over 1 year",
fontsize=13,
) # set title
plt.xlabel("") # set x-axis label
plt.ylabel("% 6 months over 6 months, annualized", fontsize=11) # set y-axis label
leg = ax.axes.get_legend() # add legend box explicitly for control
leg.set_title("Currency areas") # set title of legend box
plt.show()
In addition to plotting lines, the
sns.lineplot()
method in
seaborn
can also estimate confidence intervals using bootstrapping. Bootstrapping is a resampling technique that involves creating multiple samples by randomly selecting observations from the observed values with replacement. By default, the
sns.lineplot()
method in
seaborn
performs bootstrapping to estimate aggregates, typically the mean, from these resampled samples. It then constructs a confidence interval to represent the uncertainty around the estimated aggregate. The default settings of
sns.lineplot()
include creating 1,000 resampled samples and computing aggregates from each of these samples. The 95% confidence interval is calculated based on the lower and upper boundaries of the inner 95% aggregate values.
cids_sel = cids_em # select cross-sections
xcat_sel = "FXCRR_NSA" # select category
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dfm = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # convert to monthly averages
dfw = dfm.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to appropriate index
colors = "Paired" # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)}) # choose appearance
sns.lineplot(
data=dfw, x="real_date", y=xcat_sel, estimator="mean", ci=95
) # plot mean and its 95% confidence interval
plt.axhline(y=0, color="black", linestyle="--", lw=1) # horizontal line at zero
plt.title(
"Real FX carry across EM: monthly mean and 95% confidence", fontsize=13
) # set title
plt.xlabel("") # set x-axis label
plt.ylabel("% annualized", fontsize=11) # set y-axis label
plt.show()
The
seaborn
lineplot
allows not only displaying values chronologically but also aggregate information over time units, such as months. This can be useful for identifying seasonal patterns in the data. The confidence interval can be set with the
ci
argument. If high confidence intervals for many underlying observations do not overlap and reveal a clear pattern, seasonality is likely.
With the ‘hue’ argument, one can also compare confidence intervals across categories.
cids_sel = cids_em # select cross-sections
xcat_sel = "FXXR_NSA" # select category
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
dfm = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.sum()["value"]
.reset_index()
) # monthly means
dfw = dfm.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()
dfw["month"] = dfw["real_date"].dt.month
dfw["period"] = "before 2010"
dfw.loc[dfw["real_date"].dt.year > 2010, "period"] = "from 2010"
colors = "Set2" # choose color palette
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)}) # choose appearance
ax = sns.lineplot(
data=dfw,
x="month",
y=xcat_sel,
hue="period", # draw different lines for classes of period category
estimator="mean",
ci=95,
palette=colors,
) # plot tighter confidence interval
plt.axhline(y=0, color="black", linestyle="--", lw=1) # horizontal line at zero
plt.title(
"EM FX returns across months: mean and 95% confidence", fontsize=13
) # set title
plt.xlabel("") # set x-axis label
plt.ylabel("%", fontsize=11) # set y-axis label
leg = ax.axes.get_legend() # add legend box explicitly for control
leg.set_title("Periods") # set title of legend box
plt.show()
Line facets #
The purpose of facet grids in
seaborn
is to create small multiples, which are multiple plots arranged in a grid-like structure. The class
sns.FacetGrid()
maps a dataset onto multiple axes arrayed in a grid of rows and columns that correspond to levels of variables in the dataset.
The
map()
method of the
FacetGrid
object applies a plotting function to each facet’s subset of the data. The
map_dataframe()
method of the
FacetGrid
is similar to
map()
, but gives more flexibility because it allows to pass additional arguments and keyword arguments (kwargs) to the plotting function.
cids_sel = (
cids_dm # ['AUD', 'CAD', 'CHF', 'GBP', 'NZD', 'SEK'] # select cross-sections
)
xcat_sel = "CPIC_SA_P1M1ML12" # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
color = "brown" # choose color palette
sns.set_theme(style="darkgrid") # choose appearance
fg = sns.FacetGrid(
dfx,
col="cid",
col_wrap=3, # set number of columns of the grid
height=3,
aspect=1.5, # set height and aspect ratio of cheach chart
sharey=True,
) # gives same y axis to all grid plots
fg.map_dataframe(
sns.lineplot, x="real_date", y="value", ci=None, lw=0.75, color=color
) # map `lineplot` to the grid
fg.map(
plt.axhline, y=0, c="0.5", lw=1.5, linestyle="--"
) # map horizontal zero line to each chart in grid
fg.set_axis_labels("", "% 6m/6m, ar") # set axes labels of individual charts
fg.set_titles(col_template="{col_name}") # set individual charts' title
fg.fig.suptitle("Annual core consumer price inflation", y=1.02) # set facet grid title
plt.show()
To display multiple categories in a facet grid of
lineplot
s using the FacetGrid and
lineplot
methods in
seaborn
, one manages the categories to be used by setting the
hue
argument in the
lineplot
method to the column that contains the categories.
cids_sel = ["AUD", "CAD", "CHF", "EUR", "GBP", "SEK"] # select cross-sections
xcats_sel = ["INTRGDP_NSA_P1M1ML12_3MMA", "FXCRR_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
colors = ["steelblue", "black"] # choose color palette
sns.set_theme(style="whitegrid", palette=colors) # choose appearance
fg = sns.FacetGrid(
dfx,
col="cid",
col_wrap=3, # set number of columns of the grid
palette=colors,
hue="xcat",
hue_order=xcats_sel, # hue is typically defined at the level of the facet grird
height=3,
aspect=1.5, # set height and aspect ratio of cheach chart
sharey=False,
) # gives individual y axes to grid plots
fg.map_dataframe(
sns.lineplot, x="real_date", y="value", ci=None, lw=1
) # map `lineplot` to the grid
fg.map(
plt.axhline, y=0, c=".5", lw=0.75
) # map horizontal zero line to each chart in grid
fg.set_axis_labels("", "% ar") # set axes labels of individual charts
fg.set_titles(col_template="{col_name}") # set individual charts' title
fg.fig.suptitle(
"Intuitive real GDP growth: % oya and FX carry", y=1.02
) # set facet grid title
name_to_color = {
" Intuitive real GDP growth: % oya": colors[0],
"FX carry": colors[1],
} # assignement dictionary for legend
patches = [
mpl.patches.Patch(color=v, label=k) for k, v in name_to_color.items()
] # legend requires patch (due to bug)
labels = name_to_color.keys() # series labels for legend box
fg.fig.legend(
handles=patches, labels=labels, loc="lower center", ncol=3
) # add legend to bottom of figure
fg.fig.subplots_adjust(bottom=0.15) # lift bottom so it does not conflict with legend
plt.show()
Bivariate relations #
This section includes scatterplots, regression plots, linear model plots, joint plots, and pair plots. These plots help understand the relationship between variables and make informed decisions about model selection while gaining insights into the relationship between independent and dependent variables.
Scatterplots #
cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"] # select cross-sections
xcats_sel = ["RGDP_SA_P1Q1QL4_20QMA", "FXCRR_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dfax = (
dfx.groupby(["cid", "xcat"])
.resample("A", on="real_date")
.mean()["value"]
.reset_index()
) # annual averages
dfaw = dfax.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
colors = "deep" # choose color palette
sns.set_theme(
style="darkgrid", palette=colors, rc={"figure.figsize": (6, 4)}
) # choose appearance
ax = sns.scatterplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfaw, # column names used for scatter
hue="cid",
style="cid", # distinguishes cids by color and marker
s=100,
) # controls size of dots
plt.axhline(y=0, color="black", linestyle="--", lw=1) # horizontal zero line
plt.axvline(x=0, color="black", linestyle="--", lw=1) # vertical zero line
plt.title(
"Long-term real GDP growth and real FX forward carry (annual averages)", fontsize=13
) # set title
plt.xlabel("Long-term real GDP growth, % ar", fontsize=11) # set x-axis label
plt.ylabel("Real forward carry, % ar", fontsize=11) # set y-axis label
plt.legend(
bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.0
) # place legend outside box
plt.show()
When a scatter plot has a large number of points, it can become difficult to distinguish individual points and perceive their density. In such cases, the alpha argument in
seaborn's
scatter plot (
sns.scatterplot()
) can be used to visualize the density through color intensity.
cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"] # select cross-sections
xcats_sel = ["RGDP_SA_P1Q1QL4_20QMA", "FXCRR_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
dfw = dfx.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index()
colors = "deep" # choose color palette
sns.set_theme(
style="darkgrid", palette=colors, rc={"figure.figsize": (6, 4)}
) # choose appearance
ax = sns.scatterplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw, # column names used for scatter
alpha=0.2, # control color intensity and transparency
s=10,
) # controls size of dots
plt.title(
"Real interest rates and real FX forward carry (monthly averages)", fontsize=13
) # set title
plt.xlabel("Real interest rates, % ar", fontsize=11) # set x-axis label
plt.ylabel("Real forward carry, % ar", fontsize=11) # set y-axis label
plt.show()
Regression plots #
The
sns.regplot()
method allows to plot scatter points and a fitted regression line simultaneously. It provides various regression estimators that can be used to fit the line to the data.
The shaded bands around the regression line are confidence intervals that were created by bootstrapping. The option
robust
=
True
stipulates robust regression. This will de-weight outliers but takes significantly more computation time. Using this option makes it advisable to reduce the bootstrap samples with
n_boot
.
cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"] # select cross-sections
xcats_sel = ["CPIH_SJA_P3M3ML3AR", "EQXR_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("Q", on="real_date")
.mean()["value"]
.reset_index()
) # weekly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)}) # choose appearance
sns.regplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw,
ci=98,
order=1,
robust=False, # can use statsmodels' rboust regression method, but takes more time
scatter_kws={
"s": 20,
"alpha": 0.3,
"color": "lightgray",
}, # customize appearance of scatter
line_kws={"lw": 2, "linestyle": "-.", "color": "salmon"},
) # customize appearance of line
plt.axhline(y=0, color="black", linestyle="--", lw=1) # horizontal zero line
plt.axvline(x=0, color="black", linestyle="--", lw=1) # vertical zero line
plt.title(
"Core inflation and equity index returns (quarterly)", fontsize=13
) # set title
plt.xlabel("Core CPI, %6/6m ar", fontsize=11) # set x-axis label
plt.ylabel("Average daily equity index return, %", fontsize=11) # set y-axis label
plt.show()
seaborn's
sns.regplot()
method allows to visualize polynomial regression curves by specifying the order argument. The order argument determines the degree of the polynomial regression curve to fit the data.
cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"] # select cross-sections
xcats_sel = ["FXCRR_NSA", "FXXR_NSA"] # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("Q", on="real_date")
.mean()["value"]
.reset_index()
) # monthly averages
filt4 = (
dff["xcat"] == xcats_sel[0]
) # filter for explanatory data in frequency-transformed dataframe
dff.loc[filt4, "value"] = (
dff[filt4].groupby(["cid", "xcat"])["value"].shift(1)
) # lag explanatory values by 1 time period
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 4)}) # choose appearance
sns.regplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw,
ci=95,
order=2, # 2nd-order polynomial fit
scatter_kws={
"s": 20,
"alpha": 0.3,
"color": "goldenrod",
}, # customize appearance of scatter
line_kws={"lw": 1, "linestyle": "-", "color": "tab:blue"},
) # customize appearance of line
plt.axhline(y=0, color="tab:blue", linestyle="--", lw=1) # horizontal zero line
plt.axvline(x=0, color="tab:blue", linestyle="--", lw=1) # vertical zero line
plt.title(
"FX forward carry and subsequent returns (quarterly averages)", fontsize=13
) # set title
plt.xlabel("Real forward carry, % ar", fontsize=11) # set x-axis label
plt.ylabel("FX forward returns, % ar", fontsize=11) # set y-axis label
plt.show()
sns.regplot()
method offers additional options for regression estimators beyond linear and polynomial regression. Two such options are
logistic regression
and
locally weighted linear regression (LOWESS)
.
cids_sel = ["AUD", "CAD", "CHF", "GBP", "NZD", "SEK"] # select cross-sections
xcats_sel = ["FXCRR_NSA", "RYLDIRS02Y_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # weekly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)}) # choose appearance
sns.regplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw, # pass the date
lowess=True, # uses statsmodels to estimate a nonparametric locally weighted linear regression
marker="d", # choose diamon market
scatter_kws={
"s": 50,
"alpha": 0.2,
"color": "gray",
}, # customize the appearance of scatter
line_kws={"lw": 1.5, "color": "black"},
) # customize the appearance of the line
plt.axhline(y=0, color="red", linestyle="--", lw=1) # horizontal zero line
plt.axvline(x=0, color="red", linestyle="--", lw=1) # vertical zero line
plt.title(
"Real FX forward carry (monthly averages) and real IRS yield", fontsize=13
) # set title
plt.xlabel("Real FX forward carry, % ar", fontsize=11) # set x-axis label
plt.ylabel("Real IRS yield, % ar", fontsize=11) # set y-axis label
plt.show()
Linear model plots #
sns.lmplot() method is specifically designed for visualizing linear model data and offers convenient ways to distinguish relationships across categories. In particular, linear model plots allow visualizing one type of relation across categories through color codes and also give access to the facet grids, which allows creating small multiples of regplots with just one line of code.
cids_sel = ["COP", "KRW"] # select cross-sections
xcats_sel = ["CPIH_SA_P1M1ML12", "FXCRR_NSA"] # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # monthly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to category dataframe
sns.set_theme(style="darkgrid", rc={"figure.figsize": (6, 4)}) # choose appearance
fg = sns.lmplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw, # pass data
hue="cid", # category that determines color partition
truncate=False,
scatter_kws={"s": 20, "alpha": 0.3},
) # modify appearance
plt.title(
"Core inflation trend and real FX carry (monthly averages)", fontsize=13
) # set title
plt.xlabel(
"Core inflation trend, %6m/6m, saar, jump-adjusted", fontsize=11
) # set x-axis label
plt.ylabel("Real FX forward carry, % ar", fontsize=11) # set y-axis label
fg._legend.set_title("Currencies")
plt.show()
cids_sel = ["COP", "HUF", "KRW", "MXN", "THB", "TWD"] # select cross-sections
xcats_sel = ["FXCRR_NSA", "FXXR_NSA"] # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
filt4 = dfx["xcat"] == xcats_sel[0] # filter for features
filt5 = dfx["xcat"] == xcats_sel[1] # filter for labels
dff1 = (
dfx[filt4]
.groupby(["cid", "xcat"])
.resample("Q", on="real_date")
.last()["value"]
.reset_index()
) # quarterly features
dff2 = (
dfx[filt5]
.groupby(["cid", "xcat"])
.resample("Q", on="real_date")
.sum()["value"]
.reset_index()
) # quarterly labels
dff = pd.concat([dff1, dff2]) # re-stack features and labels
filt6 = dff["xcat"] == xcats_sel[0] # filter for frequency-transformed features
dff.loc[filt6, "value"] = (
dff[filt6].groupby(["cid", "xcat"])["value"].shift(1)
) # lag explanatory values by 1 time period
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="whitegrid") # choose appearance
fg = sns.lmplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw, # pass data
hue="cid",
col="cid",
col_wrap=3, # category that determines partition
aspect=1.2,
height=4, # aspect and height jointly determine shape and size of plots in grid
truncate=True,
sharex=False,
sharey=False, # appearance of regression plots
line_kws={"color": "black", "lw": 0.75, "ls": "--"}, # set some aesthetics of line
scatter_kws={"s": 20, "alpha": 0.3, "color": "r"},
) # modify appearance
fg.set_titles(col_template="{col_name} versus dominant cross")
fg.set_axis_labels(
"Real FX forward carry (% ar, month end)", "FX forward return, %, next quarter"
)
fg.fig.suptitle(
"Real FX forward carry and subsequent quarterly returns", # set grid title
y=1.04,
fontsize=14,
) # position grid heading and set its font size
plt.show()
cids_sel = [
"AUD",
"CHF",
"CAD",
"EUR",
"GBP",
"JPY",
"SEK",
"USD",
] # select cross-sections
xcats_sel = ["RYLDIRS02Y_NSA", "DU02YXR_NSA"] # select explanatory/dependent categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
filt4 = dfx["xcat"] == xcats_sel[0] # filter for features
filt5 = dfx["xcat"] == xcats_sel[1] # filter for labels
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # monthly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="white") # choose appearance
fg = sns.lmplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw, # pass data
hue="cid",
col="cid",
col_wrap=4, # category that determines partition
aspect=1,
height=3, # aspect and height jointly determine shape and size of plots in grid
x_bins=8,
truncate=True,
sharex=False,
sharey=False, # appearance of regression plots
line_kws={
"color": "darkolivegreen",
"lw": 0.75,
"ls": "--",
}, # set some aesthetics of line
scatter_kws={"color": "slategray"},
) # modify appearance
fg.set_titles(col_template="{col_name} market")
fg.set_axis_labels(
"Real IRS yield: 2-year maturity , %ar", "Duration return, in % of notional, %ar"
)
fg.fig.suptitle(
"Real IRS yield: 2-year maturity and Duration return, in % of notional", # set grid title
y=1.04,
fontsize=14,
) # position grid heading and set its font size
plt.show()
Jointplots #
The
sns.jointplot()
function in
seaborn
is a versatile tool for visualizing the relationship between two variables along with their individual distributions. It combines a scatter plot or other relational plot with two histograms, allowing for a comprehensive analysis of the data. By specifying different values for the
kind
argument, you can explore various types of relational plots that suit your data and analysis goals.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"NZD",
"SEK",
"USD",
] # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # weekly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="white", rc={"figure.figsize": (6, 4)}) # choose appearance
fg = sns.jointplot(
x=xcats_sel[0], y=xcats_sel[1], data=dfw, color="steelblue", kind="hex", alpha=0.5
) # display density in hexgons
fg.fig.suptitle(
"Private credit expansion and real yield (monthly averages)", y=1.02, fontsize=13
) # set grid title
fg.set_axis_labels(
"Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
) # set x/y axis labels
plt.show()
A regression line can be added by applying the
plot_joint()
method to the joint plot facegrid.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"NZD",
"SEK",
"USD",
] # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # weekly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="whitegrid", rc={"figure.figsize": (6, 4)}) # choose appearance
fg = sns.jointplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw,
kind="hist", # choose 2-dimension histogram
color="red",
)
fg.plot_joint(
sns.regplot, scatter=False, ci=False, color="black"
) # one can overlay regression line
fg.fig.suptitle(
"Private credit expansion and real interest rates (monthly averages)",
y=1.02,
fontsize=13,
) # set grid title
fg.set_axis_labels(
"Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
) # set x/y axis labels
plt.show()
The kernel density estimator (
kind='kde'
) gives a very stylized visualization of the relations.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"NZD",
"SEK",
"USD",
] # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # weekly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
sns.set_theme(style="white") # choose appearance
fg = sns.jointplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw,
kind="kde",
joint_kws={"fill": True}, # keyword dictionary specific to relational plot
color="steelblue",
height=6,
) # color and size parameters
fg.plot_joint(
sns.regplot, scatter=False, ci=False, color="red"
) # one can overlay regression line
fg.fig.suptitle(
"Private credit expansion and real interest rates (monthly averages)",
y=1.02,
fontsize=13,
) # set grid title
fg.set_axis_labels(
"Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
) # set x/y axis labels
plt.show()
Information about categorical variables can be integrated through the
hue
argument.
Additional arguments can be passed to the central relational and marginal distribution plots through the
joint_kws
and
marginal_kw
keyword dictionaries respectively.
cids_sel = [
"AUD",
"CAD",
"CHF",
"EUR",
"GBP",
"NZD",
"SEK",
"USD",
] # select cross-sections
xcats_sel = ["PCREDITGDP_SJA_D1M1ML12", "RYLDIRS02Y_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # weekly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
dfw["Period"] = "before 2010" # create custom categorical variable
dfw.loc[dfw["real_date"].dt.year > 2010, "Period"] = "from 2010"
colors = "Set1" # choose color palette
sns.set_theme(style="dark") # choose appearance
fg = sns.jointplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw, # pass appropriate data
kind="scatter",
palette=colors,
height=6, # parameters for appearance
hue="Period", # classes of pepriod category will be visualized by hue
joint_kws={"marker": "+"}, # keyword dictionary specific to relational plot
marginal_kws={"lw": 1},
) # keyword dictionary specific to distribution plot
fg.fig.suptitle(
"Private credit expansion and real interest rates (monthly averages)",
y=1.02,
fontsize=13,
) # set grid title
fg.set_axis_labels(
"Credit growth, oya, % of GDP", "Real short-term interest rate, % ar", fontsize=11
) # set x/y axis labels
plt.show()
The choice of the kernel density estimator (KDE) in
seaborn
can affect the visualization of the relationship between variables and the influence of categorical factors on correlation and distribution.
cids_sel = ["EUR", "GBP"] # select cross-sections
xcats_sel = ["CPIC_SA_P1M1ML12", "RYLDIRS02Y_NSA"] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("M", on="real_date")
.mean()["value"]
.reset_index()
) # weekly averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
colors = "Set1" # choose color palette
sns.set_theme(style="whitegrid") # choose appearance
fg = sns.jointplot(
x=xcats_sel[0],
y=xcats_sel[1],
data=dfw, # pass appropriate data
kind="kde",
palette=colors,
height=6, # parameters for appearance
hue="cid", # classes of pepriod category will be visualized by hue
marginal_kws={"lw": 2},
) # keyword dictionary specific to distribution plot
fg.fig.suptitle(
"Reported core inflation trend and Real IRS yield (monthly averages)",
y=1.02,
fontsize=14,
) # set grid title
fg.set_axis_labels(
"Core inflation trend, % 6m/6m, saar",
"Real short-term interest rate, % ar",
fontsize=11,
) # set x/y axis labels
plt.show()
Pairplots #
The
sns.pairplot()
function manages the display of multiple joint distributions. For example, it can be applied to visualize the joint density of a category across pairs of countries. Specifically, the pairplot is a joint visualization grid of univaraiate distributions on the diagonals and bivariate distributions on the off-diagonals. It collects a lot of information in one place and is therefore an instance of comprehensive exploratory data analysis.
The
sns.pairplot()
output is a
PairGrid
instance, similar to a facet grid, rather than a single axes object.
Many arguments of
sns.pairplot
apply either to all diagonals or all off-diagonals:
-
kind
governs the type of off-diagonal relational plot to use. It must be one ofscatter
,kde
,hist
, orreg
. -
plot_kws
takes a dictionary of further arguments that apply to the chosen off-diagonal (main) plots. -
diag_kind
governs the type of diagonal plot to use and is usuallyhist
orkde
. -
diag_kws
takes a dictionary of arguments that apply to the chosen diagonal plots.
cids_sel = ["EUR", "GBP", "SEK", "CHF"] # select cross-sections
xcat_sel = "RYLDIRS02Y_NSA" # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid"]).resample("M", on="real_date").mean()["value"].reset_index()
) # monthly averages
dfw = dff.pivot(
index="real_date", columns="cid", values="value"
).reset_index() # pivot to wide dataframe
dfw = dfw[(dfw.T != 0).any()] # drop all rows that are all zeroes
color = "teal" # choose palette
sns.set_theme(style="darkgrid") # choose appearance
fg = sns.pairplot(
data=dfw,
vars=cids_sel,
height=2,
aspect=1.2, # height and aspect ratio of each facet in the plot
corner=True, # removes redundant bivariate plots in symmetric matrix
kind="scatter", # choose type of bivariate plot
plot_kws={
"s": 20,
"alpha": 0.3,
"color": color,
}, # set parameters for off-diagonal plots
diag_kind="hist", # choose type of univariate distribution plot
diag_kws={"bins": 20, "color": color},
) # set parameters for off-diagonal plots)
fg.fig.suptitle(
"Distributions of real IRS yield in Europe (monthly averages)", y=1.02, fontsize=14
) # set grid title
plt.show()
To apply the
sns.pairplot()
function to cross-sections one would need to pivot the selected dataframe with cross-sections (‘cid’) as the basis for new columns. To apply the
sns.pairplot()
function to categories one simply needs to pivot the selected dataframe with cross-sections (‘xcat’) as the basis for new columns.
cids_sel = [
"AUD",
"CAD",
"CHF",
"GBP",
"JPY",
"NOK",
"NZD",
"SEK",
] # select cross-sections
xcats_sel = [
"FXXR_NSA",
"RYLDIRS02Y_NSA",
"FXCRR_NSA",
"INTRGDP_NSA_P1M1ML12_3MMA",
] # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"].isin(xcats_sel) # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
scols = ["real_date", "cid", "xcat", "value"]
dfx = dfx[scols]
dff = (
dfx.groupby(["cid", "xcat"])
.resample("A", on="real_date")
.mean()["value"]
.reset_index()
) # annual averages
dfw = dff.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).reset_index() # pivot to wide dataframe
color = "red" # choose palette
sns.set_theme(style="whitegrid", palette=colors) # choose appearance
fg = sns.pairplot(
data=dfw,
vars=xcats_sel,
height=2,
aspect=1.2, # height and aspect ratio of each facet in the plot
corner=True, # removes redundant bivariate plots in symmetric matrix
plot_kws={"color": color, "bins": 20}, # set parameters for off-diagonal plots
kind="hist", # choose type of bivariate plot
diag_kind="kde", # choose type of univariate distribution plot
diag_kws={"color": color},
) # set parameters for off-diagonal plots)
fg.fig.suptitle(
"Individual and pairwise distribution of FX-related indicators (annual)",
y=1.02,
fontsize=14,
) # set grid title
plt.show()
Adding even more information, the
pairplot
can show distributions and relations for separate values of a categorical variable using the
hue
argument and a related
palette
choice.
cids_sel = ["AUD", "CAD", "MXN", "ZAR", "CHF"] # select cross-sections
xcat_sel = "FXXR_NSA" # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
dff = (
dfx.groupby(["cid"]).resample("M", on="real_date").sum()["value"].reset_index()
) # monthly sums
dfw = dff.pivot(
index="real_date", columns="cid", values="value"
).reset_index() # pivot to wide dataframe
dfw["Period"] = "before 2010" # create custom categorical variable
dfw.loc[dfw["real_date"].dt.year > 2010, "Period"] = "from 2010"
colors = "bone" # choose palette
sns.set_theme(style="whitegrid") # choose appearance
fg = sns.pairplot(
data=dfw,
vars=cids_sel,
palette=colors,
hue="Period", # apply classification variable to hue
height=2,
aspect=1, # height and aspect ratio of each facet in the plot
corner=True, # removes redundant bivariate plots in symmetric matrix
kind="reg", # choose the type of bivariate plot
plot_kws={
"ci": False,
"scatter_kws": {"s": 10, "alpha": 0.5},
}, # set parameters for off-diagonal plots
diag_kind="hist", # choose type of univariate distribution plot
diag_kws={"bins": 20},
) # set parameters for off-diagonal plots)
fg.fig.suptitle(
"Relations and distributions of monthly FX returns", fontsize=14
) # set grid title
plt.show()
Color maps #
Heatmaps #
Heatmaps visualize tabular data by mapping numeric values to colors. They are managed by the
sns.heatmap()
function. This is a particularly powerful method for condensing a lot of information into a single visualization.
cids_sel = [
"AUD",
"BRL",
"COP",
"CLP",
"HUF",
"MXN",
"PLN",
"TRY",
"ZAR",
"INR",
"MYR",
"PHP",
] # select cross-sections
xcat_sel = "FXXR_NSA" # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2006-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
dfx["year"] = dfx["real_date"].dt.year # # add year category to frame
dfw = (
dfx.groupby(["cid", "year"])
.sum(numeric_only="True")
.reset_index()
.pivot(index="year", columns="cid", values="value")
)
dfh = dfw.T # transpose to appropriate format for heatmap function
colors = "vlag_r" # choose appropriate diverging color palette
fg, ax = plt.subplots(figsize=(18, 8)) # prepare axis and grid
ax = sns.heatmap(
dfh,
cmap=colors,
center=0, # requires diverging color palette with white zero
square=True, # perfect squares
annot=True,
fmt=".1f",
annot_kws={"fontsize": 11}, # format annotation numbers inside color boxes
linewidth=1,
) # set width of lines between color boxes
plt.title(
"Annual FX forward returns in EM: A 15-year history", fontsize=16, y=1.05
) # set heatmap title
plt.xlabel("") # control x-axis label
plt.ylabel("") # control x-axis label
plt.yticks(rotation=0) # set direction of y-axis marks
plt.show()
Cross correlations #
One of the easiest ways to display correlations across a range of sections is by using the
sns.heatmap()
function in combination with a correlation matrix. By applying the
.corr()
method to a wide cross ‘section x observations’ dataframe, you can obtain the correlation values between the columns (cross-sections). Color coding does all the work of visualization.
cids_sel = cids_dm + ["KRW", "TWD"] # select cross-sections
xcat_sel = "INTRGDP_NSA_P1M1ML12_3MMA" # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df.loc[
filt1 & filt2 & filt3, ["cid", "real_date", "value"]
] # filter out relevant data frame
dfw = dfx.pivot(
index=["real_date"], columns="cid", values="value"
) # pivot out to date index and cross-section columns
csquare = dfw.corr() # cross-correlation matrix
mask = np.triu(
np.ones_like(csquare, dtype=bool), k=0
) # mask for reundant upper triangle
colors = sns.diverging_palette(
20, 190, as_cmap=True
) # customize diverging color palette
fg, ax = plt.subplots(figsize=(10, 7)) # prepare axis and grid
ax = sns.heatmap(
csquare,
cmap=colors,
center=0, # requires diverging color palette with white zero
annot=True,
fmt=".1f",
annot_kws={"fontsize": 11}, # format annotation numbers inside color boxes
mask=mask, # remove redundant upper triangle
linewidth=3,
) # set width of lines between color boxes
plt.title(
"Intuitive real GDP growth correlation across developed countries",
fontsize=14,
y=1.03,
) # set heatmap title
plt.xlabel("") # control x-axis label
plt.ylabel("") # control x-axis label
plt.yticks(rotation=0) # set direction of y-axis marks
plt.show()
A less common mean of visualization for cross-correlations is the
sns.replot()
function. It is an interface for drawing relational plots onto a
FacetGrid
. It is mostly used for displaying bivariate relations but can also produce a color- and size-coded display of correlation coefficients across multiple sections.
The graph below allows greater focus on higher correlation coefficients (whether positive or negative). It can be a better visualization for a large number of cross-sections with very diverse relations.
cids_sel = cids # select cross-sections
xcat_sel = "FXXR_NSA" # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2000-01-01") # filter for start date
dfx = df.loc[
filt1 & filt2 & filt3, ["cid", "real_date", "value"]
] # filter out relevant data frame
dfw = dfx.pivot(
index=["real_date"], columns="cid", values="value"
) # pivot out to date index and cross-section columns
csquare = dfw.corr() # square correlation coefficient dataframe across all columns
csquare.index.rename("cid0", inplace=True) # rename index name to allow unstacking
dfc = csquare.unstack().reset_index() # unstack to long-dataframe
dfc.rename(
mapper={0: "pearson"}, axis=1, inplace=True
) # give intuitive name to correlation value column
dfc["abs_coef"] = np.abs(
dfc["pearson"]
) # add column of absolute coefficient values (for size)
sns.set_theme(style="whitegrid")
fg = sns.relplot(
data=dfc,
x="cid",
y="cid0", # define axes
hue="pearson",
hue_norm=(-1, 1),
palette="vlag_r", # color code correlation coefficients
size="abs_coef",
size_norm=(0, 1),
sizes=(50, 250), # express absolute coefficients by size
height=7,
aspect=1.2,
) # control facet grid shape
fg.fig.suptitle(
"Correlation of FX forward return across markets (Pearson coefficients)",
y=1.02,
fontsize=14,
) # set grid title
fg.set(xlabel="", ylabel="") # remove axes labels
fg.despine(left=True, bottom=True) # remove axes lines
fg.ax.margins(0.02) # control proximity of labels to axes
for label in fg.ax.get_xticklabels():
label.set_rotation(90) # rotate x-tick labels for readability
plt.show()
Clustermaps #
The ‘sns.clustermap()’ function is used to create a heatmap visualization of matrices with additional hierarchical clustering information.
The
dendrograms
, which represent the clustering lines, display the statistical similarity between columns and rows based on multi-dimensional distance. The similarity here is the inverse of multi-dimensional distance. The default is Euclidean or spatial distance. The dendrogram is created based on hierarchical agglomerative clustering, i.e. sequential clustering of the nearest points in multi-dimensional space. Note that the
sns.clustermap
method returns a
Clustermap
object.
cids_sel = [
"EUR",
"USD",
"GBP",
"CHF",
"JPY",
"SEK",
"CAD",
"ZAR",
"INR",
"MYR",
] # select cross-sections
xcat_sel = "DU05YXR_NSA" # select categories
filt1 = df["cid"].isin(cids_sel) # filter for cross-sections
filt2 = df["xcat"] == xcat_sel # filter for category
filt3 = df["real_date"] >= pd.to_datetime("2005-01-01") # filter for start date
dfx = df[filt1 & filt2 & filt3] # filter out relevant data frame
dfx["year"] = dfx["real_date"].dt.year # # add year category to frame
dfw = (
dfx.groupby(["cid", "year"])
.sum(numeric_only="True")
.reset_index()
.pivot(index="year", columns="cid", values="value")
) # annual means
dfh = dfw.dropna().T # transpose to appropriate format for heatmap function
colors = sns.diverging_palette(
20, 220, as_cmap=True
) # choose appropriate diverging color palette
fg = sns.clustermap(
dfh,
cmap=colors,
center=0, # requires diverging color palette with white zero
figsize=(12, 7), # set appropriate size
annot=True,
fmt=".1f",
annot_kws={"fontsize": 11}, # format annotation numbers inside color boxes
linewidth=1,
) # set width of lines between color boxes
fg.fig.suptitle(
"Similarities of countries and trading years, based on duration returns", y=1.02
) # setting title
fg.ax_heatmap.set_xlabel("") # special way of controlling x-axis label
fg.ax_heatmap.set_ylabel("") # special way of controlling y-axis label
plt.show()