Panel regression with JPMaQS #
In this notebook, we show how to apply panel regression models to macroquantamental datasets. We will leverage the
statsmodels
and
linearmodels
packages in Python while crossreferencing with the widely used R package ‘plm’, commonly employed in academic research. In particular, we show the application of pooled regression, fixedeffects regression, randomeffects regression, linear mixedeffects models, and seemingly unrelated regressions.
Largely, only the standard packages in the Python data science stack are required to run this notebook. The specialized
macrosynergy
package is also needed to download JPMaQS data and for quick analysis of quantamental data and value propositions.
The notebook is structured into the following key sections:

Getting Packages and JPMaQS Data: In this section, we guide you through the process of installing and importing the necessary Python packages as well as formatting the dataset for further analysis

The Importance of panel analysis: this section discusses why specific models are needed for macroeconomic panel analysis

Pooled regression  is the simplest form of regression, where you do not consider any groupspecific effects.

Fixedeffects regression: the model accounts for individual/groupspecific effects.

Random effects regression: the model considers groupspecific effects, assuming that they are random variables.

Linear mixed effects model: Linear mixedeffects models handle both fixed and random effects.

Seemingly unrelated regressions: SUR models are used when you have multiple equations with correlated errors.
Get packages and JPMaQS data #
# Uncomment below if running on Kaggle
"""
%%capture
! pip install macrosynergy upgrade"""
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
from matplotlib import pyplot as plt
import os
from linearmodels import PooledOLS, PanelOLS
from linearmodels.panel import RandomEffects, compare
from linearmodels.system import SUR
import statsmodels.api as sm
import statsmodels.graphics.tsaplots as tsap
import statsmodels.formula.api as smf
from macrosynergy.download import JPMaQSDownload
%matplotlib inline
import macrosynergy.panel as msp
import warnings
warnings.simplefilter("ignore")
The JPMaQS indicators we consider are downloaded using the J.P. Morgan Dataquery API interface within the
macrosynergy
package. This is done by specifying ticker strings, formed by appending an indicator category code
DB(JPMAQS,<cross_section>_<category>,<info>)
, where
value
giving the latest available values for the indicator
eop_lag
referring to days elapsed since the end of the observation period
mop_lag
referring to the number of days elapsed since the mean observation period
grade
denoting a grade of the observation, giving a metric of real time information quality.
After instantiating the
JPMaQSDownload
class within the
macrosynergy.download
module, one can use the
download(tickers,start_date,metrics)
method to easily download the necessary data, where
tickers
is an array of ticker strings,
start_date
is the first collection date to be considered and
metrics
is an array comprising the times series information to be downloaded. For more information see
here
or use the free dataset on
Kaggle
To ensure reproducibility, only samples between January 2000 (inclusive) and May 2023 (exclusive) are considered.
# Lists of crosssections (countries with IRS markets and appropriate data)
cids_dm = ["AUD", "CAD", "CHF", "EUR", "GBP", "JPY", "NOK", "NZD", "SEK", "USD"]
cids_em = [
"CLP",
"COP",
"CZK",
"HUF",
"IDR",
"ILS",
"INR",
"KRW",
"MXN",
"PLN",
"THB",
"TRY",
"TWD",
"ZAR",
]
cids = cids_dm + cids_em
cids_du = cids_dm + cids_em
cids_dux = list(set(cids_du)  set(["IDR", "NZD"]))
cids_xg2 = list(set(cids_dux)  set(["EUR", "USD"]))
# Lists of main quantamental and return categories
main = [
"CPIC_SA_P1M1ML12",
"CPIC_SJA_P3M3ML3AR",
"CPIC_SJA_P6M6ML6AR",
"CPIH_SA_P1M1ML12",
"CPIH_SJA_P3M3ML3AR",
"CPIH_SJA_P6M6ML6AR",
"INFTEFF_NSA",
"INTRGDP_NSA_P1M1ML12_3MMA",
"INTRGDPv5Y_NSA_P1M1ML12_3MMA",
"PCREDITGDP_SJA_D1M1ML12",
"RGDP_SA_P1Q1QL4_20QMA",
"RYLDIRS02Y_NSA",
"RYLDIRS05Y_NSA",
"PCREDITBN_SJA_P1M1ML12",
]
rets = [
"DU02YXR_NSA",
"DU05YXR_NSA",
"DU02YXR_VT10",
"DU05YXR_VT10",
"EQXR_NSA",
"EQXR_VT10",
"FXXR_NSA",
"FXXR_VT10",
]
xcats = main + rets
The description of each JPMaQS category is available either under Macro Quantamental Academy , JPMorgan Markets (password protected), or on Kaggle (just for the tickers used in this notebook). In particular, the set used for this notebook is using Consumer price inflation trends , Inflation targets , Intuitive growth estimates , Domestic credit ratios , Longterm GDP growth , Real interest rates , Private credit expansion , Duration returns , Equity index future returns , FX forward returns
# Download series from J.P. Morgan DataQuery by tickers
start_date = "20000101"
end_date = "20230501"
tickers = [cid + "_" + xcat for cid in cids for xcat in xcats]
print(f"Maximum number of tickers is {len(tickers)}")
# Retrieve credentials
client_id: str = os.getenv("DQ_CLIENT_ID")
client_secret: str = os.getenv("DQ_CLIENT_SECRET")
with JPMaQSDownload(client_id=client_id, client_secret=client_secret) as dq:
df = dq.download(
tickers=tickers,
start_date="20000101",
end_date="20230501",
suppress_warning=True,
metrics=["all"],
report_time_taken=True,
show_progress=True,
)
Maximum number of tickers is 528
Downloading data from JPMaQS.
Timestamp UTC: 20230919 15:37:31
Connection successful!
Number of expressions requested: 2112
Requesting data: 100%██████████ 106/106 [00:36<00:00, 2.94it/s]
Downloading data: 100%██████████ 106/106 [00:14<00:00, 7.20it/s]
Time taken to download data: 51.96 seconds.
Time taken to convert to dataframe: 43.27 seconds.
Average upload size: 0.20 KB
Average download size: 325288.51 KB
Average time taken: 9.17 seconds
Longest time taken: 12.20 seconds
Average transfer rate : 283836.53 Kbps
# uncomment if running on Kaggle
"""for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
df = pd.read_csv('../input/fixedincomereturnsandmacrotrends/JPMaQS_Quantamental_Indicators.csv', index_col=0, parse_dates=['real_date'])"""
display(df["xcat"].unique())
display(df["cid"].unique())
df["ticker"] = df["cid"] + "_" + df["xcat"]
# df.head(3)
array(['CPIC_SA_P1M1ML12', 'CPIC_SJA_P3M3ML3AR', 'CPIC_SJA_P6M6ML6AR',
'CPIH_SA_P1M1ML12', 'CPIH_SJA_P3M3ML3AR', 'CPIH_SJA_P6M6ML6AR',
'FXXR_NSA', 'FXXR_VT10', 'INFTEFF_NSA',
'INTRGDP_NSA_P1M1ML12_3MMA', 'INTRGDPv5Y_NSA_P1M1ML12_3MMA',
'PCREDITBN_SJA_P1M1ML12', 'PCREDITGDP_SJA_D1M1ML12',
'RGDP_SA_P1Q1QL4_20QMA', 'RYLDIRS02Y_NSA', 'RYLDIRS05Y_NSA',
'DU02YXR_NSA', 'DU02YXR_VT10', 'DU05YXR_NSA', 'DU05YXR_VT10',
'EQXR_NSA', 'EQXR_VT10'], dtype=object)
array(['AUD', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EUR', 'GBP', 'HUF',
'IDR', 'ILS', 'INR', 'JPY', 'KRW', 'MXN', 'NOK', 'NZD', 'PLN',
'SEK', 'THB', 'TRY', 'TWD', 'USD', 'ZAR'], dtype=object)
Reformatting the dataset #
We extract a subdataframe used for most of the examples below. The key categories used for most models in this notebook are the following:

CPIC_SJA_P6M6ML6AR
 Adjusted latest core consumer price trend: % 6m/6m ar 
DU02YXR_VT10
 Return on fixed receiver position, % of risk capital on position scaled to 10% (annualized) volatility target, assuming monthly roll: 2year maturity
The choice of variables reflects the simple idea that inflation should have negative impact on next period’s duration returns. The dataset is the standard data set used for tutorials on JPMaQSrelated methods and is available on Kaggle .
# Prepare for filtering
cids = ["AUD", "CAD", "EUR", "GBP", "JPY", "USD"] # only selected crosssection
for_feats = [
"CPIC_SJA_P6M6ML6AR"
] # categories (building blocks) for feature calculation
targs = ["DU05YXR_VT10"] # target returns
start_date = "20000101" # earliest sample date
# Create filters
filt1 = df["xcat"].isin(for_feats) # filter for feature variables
filt2 = df["real_date"] >= pd.to_datetime(start_date) # filter for start date
filt3 = df["cid"].isin(cids) # filter for market
filt4 = df["xcat"].isin(targs)
# Apply filters
dfx_feats = df[filt1 & filt2 & filt3] # data frame for feature variables
dfx_target = df[filt4 & filt2 & filt3] # data frame for target variables
The we downsample the frequence of the time series and lag the features. The latter allows to align past features with present targets, i.e. assess the predictive power of feature panels.
# Frequency conversion of features and targets to quarterly means and sums respectively
scols = ["real_date", "cid", "xcat", "value"]
dfx_feats = dfx_feats[scols]
dfqx_feats = (
dfx_feats.groupby(["cid", "xcat"]).resample("M", on="real_date").mean()["value"]
)
dfqx_feats = dfqx_feats.reset_index()
dfqx_feats = dfqx_feats.dropna()
scols = ["real_date", "cid", "xcat", "value"]
dfx_target = dfx_target[scols]
dfqx_target = (
dfx_target.groupby(["cid", "xcat"]).resample("M", on="real_date").sum()["value"]
)
dfqx_target = dfqx_target.reset_index().dropna()
# Lag features (past values assigned to today's date)
dfqx_feats["value_lag"] = dfqx_feats.groupby(["cid", "xcat"])["value"].shift(1)
dfqx_feats = dfqx_feats.dropna()
# Pivot features and target dataframes to wide time series panel formats
dfqx_feats = dfqx_feats.pivot(
index=["cid", "real_date"], columns="xcat", values="value_lag"
)
dfqx_target = dfqx_target.pivot(
index=["cid", "real_date"], columns="xcat", values="value"
).dropna()
# Merge to joint features/targets dataframe
dfx_pan = pd.merge(
dfqx_target.reset_index(),
dfqx_feats.reset_index(),
on=["cid", "real_date"],
how="outer",
).set_index(["cid", "real_date"])
dfx_pan = dfx_pan[(dfx_pan != 0).all(1)]
dfx_pan = dfx_pan.dropna()
display(dfx_pan.head(5))
xcat  DU05YXR_VT10  CPIC_SJA_P6M6ML6AR  

cid  real_date  
AUD  20010731  1.699832  2.203520 
20010831  6.972805  2.409941  
20010930  2.483685  3.338835  
20011031  3.285013  3.338835  
20011130  4.312912  3.416523 
The importance of panel analysis #
Panels of time series are datasets that are organized in (i) rows related to points in time or periods, and (ii) columns of crosssections. Panel data contains information on individual crosssections and group developments of overtime. They are a twodimensional hybrid of time series and crosssectional information. For panel regression analysis we typically consider multiple panels of the same structure, just as we consider multiple time series for simple time regression.
JPMaQS quantamental indicators are principally organized in panels: one type of indicator (category) is tracked overtime across a range of countries, currencies or markets. Since JPMaQS data are all daily information states, its panels or combinations thereof can easily be investigated in respect to their predictive power.
Research on panels is critically important for macroquantamental financial market research. Since economic cycles take years to unfold, quantity and diversity of data are limited, even if one looks back several decades. By contrast, combining the experiences of many countries or markets one has a much greater supply of historic information and faster growth in relevant datasets. It is often useful to test a trading strategies on as many markets as possible, even if one can only trade a subset, since the broader set gives statistical analyses greater power.
Moreover, the twodimensional structure of panel data offers several specific advantages for empirical macro strategy research:

Panels allow assessing influences that predominantly vary acrosssections , not time, such as structural features of countries or government balance sheets.

Panels allow using a diversity of experiences across countries or markets . While there are common global factors, there are also many idiosyncratic developments, such as policy mistakes, crises, termsoftrade shocks and so forth. A diverse set of time series makes empirical findings more robust. for example, Japan’s early deflation experience made it a valuable crosssection experience for all types of proposed macro trading strategies.

Panels are necessary to investigate the influence of relative macro factors across countries or markets . Relative macro factors are often critical for FX and crossmarket strategies. Even if relative factors are not being used directly, maybe because of the enhanced leverage and cost of relative positions, their empirical relevance often backs up the credibility of trading strategies based on outright factors. This is because relative signals often strip out communal global factors and, thereby, increase the statistical power of the dataset. For example, if higher relative inflation predicts lower relative fixedincome returns across currency areas, this finding adds credibility to the proposition the higher inflation heralds weaker returns in general and has not just been the artefact of a few global events.
Panels support a special set of regression methods that respect  to varying degrees  respect their twodimensional structure.

Pooled OLS is simply OLS regression performed on panel data. It ignores time and crosssectional characteristics and focuses only on dependencies between units.The implicit assumption is that relations depend neither on time nor on crosssections (or on unobserved variables associated with either of them.

The fixedeffects model allows a separate constant for each crosssection, each time period (or combination of both). The implicit assumption is that there has been a structural difference in the analyzed relation associated with the crosssection.

The randomeffects model also allows differences in the relation acrosssections. However, model parameters, such as intercepts, are assumed to be random variables. Unlike fixedeffects models, randomeffects models do not assume that crosssectional effects are not structurally related to explanatory variable and  hence  are not of interest in themselves, but rather in variance.

Mixed Effects as Combination of latter two. Depending on how we specify the model we can replicate randomeffects using mixed effects (this is done later in the notebook)

Seemingly Unrelated Regressions (SUR)  is a generalization of linear regression model that consists of multiple crosssectional equations, each having its own set of features. SUR also allows to use different set of regressors across crosssection, while using the information from other crosssections.
In this notebook we use mainly the
linearmodels
module, which complements the standard
statsmodels
package with panel regression estimations.
Pooled regression #
The easiest way of analysing panel data is to simply pool the observations for all crosssections and time periods into one dimensional datasets and then run a linear OLS regression, as one would for onedimensional datasets. This method effectively disregards the dimensions of the dataset for formulating the underlying model.
While pooled regression is simple, it is based on a very restrictive model: all crosssectional relations are assumed to be governed by the same intercept and slope coefficients. It does not use up many degrees of freedom and is therefore popular for smaller datasets. If the restrictive assumptions are appropriate, pooled regression makes best use of the available data. Other key assumptions are that there is no correlation between features and no autocorrelation of the error terms.
The Pooled OLS class of the
linearmodels
module provides “just plain OLS that understands various panel data structures”. The regression simply pools all observations for all markets and estimates one big OLS regression. However, it adds some information that is based on the panel structure.
X = sm.add_constant(dfx_pan["CPIC_SJA_P6M6ML6AR"])
lm_pooled = PooledOLS(dfx_pan.DU05YXR_VT10, X).fit(cov_type="clustered")
print(lm_pooled)
PooledOLS Estimation Summary
================================================================================
Dep. Variable: DU05YXR_VT10 Rsquared: 0.0053
Estimator: PooledOLS Rsquared (Between): 1.8826
No. Observations: 1600 Rsquared (Within): 0.0061
Date: Tue, Sep 19 2023 Rsquared (Overall): 0.0053
Time: 16:41:04 Loglikelihood 4261.8
Cov. Estimator: Clustered
Fstatistic: 8.4524
Entities: 6 Pvalue 0.0037
Avg Obs: 266.67 Distribution: F(1,1598)
Min Obs: 219.00
Max Obs: 280.00 Fstatistic (robust): 8.0930
Pvalue 0.0045
Time periods: 281 Distribution: F(1,1598)
Avg Obs: 5.6940
Min Obs: 0.0000
Max Obs: 6.0000
Parameter Estimates
======================================================================================
Parameter Std. Err. Tstat Pvalue Lower CI Upper CI

const 0.6162 0.1486 4.1453 0.0000 0.3246 0.9077
CPIC_SJA_P6M6ML6AR 0.2066 0.0726 2.8448 0.0045 0.3491 0.0642
======================================================================================
First, we check the significance of core CPI trends, measured as percent change over the last 6 months versus the previous 6 months, seasonally and jumpadjusted.
The pooled regression diagnoses high significance of the 1month lagged core CPI trend for duration returns. As should be expected the relation between inflation and subsequent duration returns has been negative and the probability of the relationship not being accidental according to the pooled model has been over 99% (indicated by the pvalue below being below 0.01.)
The top part of the summary shows standard Rsquared and two panelspecific Rsquared measures:

Rsquared (Between) measures how much of the target variance acrosssections is accounted for by feature variance. Here that is the variation explained by the currency area for which we measure the return.

Rsquared (Within) measures how much of the model variance is explained by variation of the explanatory variable within the crosssections. It gives the goodness of fit for data that remove the crosssection mean.

Rsquared (Overall) is a weighted average of the above two.
Fstatistics and their pvalues indicate the historic significance of the explanatory variables, but they do so under the assumption that all observations in the pool are independent. For panels of market data and macro quantamental data this is rarely the case. For correlated time series in a pool, this statistic overstates significance , by concatenating related time series and treating them as added independent observations. If data are heavily correlated their use as a pool is often called pseudoreplication . A similar bias would apply to the coefficient pvalues in the of the output table.
In our example, as expected, the inflation is negatively correlated with subsequent returns and this correlation is significant at 1% level.
Quick look at residual errors and at dispersion of target variable will establish if the assumption of OLS model hold. We are checking the fitted model for three key properties that impact the goodness of fit and efficiency of the estimation:

Normality : The quantilequantile “QQ” test checks if residual errors are normally distributed. This is important for the reliability of confidence interval estimates and assessments of significance. In the top left chart of the panel The quantilequantile “QQ” plot checks if regression errors are normally distributed, by plotting them against a theoretical distribution. In case of a normal normal distribution the dots should be on a diagonal line.

Crossvalue homoskedasticity : The second plot investigated if the estimation errors are homoskedastic, i.e. if the variance of residual errors are constant across all values of the features and, hence, the OLS model is efficient. If the red line is not horizontal and errors are systematically changing with the feature values the estimation probably has not made use of all information.

Targeterror correlation : High correlation between target values and regression errors indicates that our estimation does not explain much of the target variation. The information is inverse to the Rsquare value and high correlation (low Rsquare) is common for financial return series.

Crosssectional homoskedasticity : The final chart checks if the targets have approximately similar means and standard deviations acrosssections. If the means are very different, crosssection dummies may need to be considered in form of fixed or random effects, if they can be theoretically justified. If the the standard deviations are very different, normalization of targets have to be considered..
sns.set(rc={"figure.figsize": (15, 8)})
fig, ([ax1, ax2], [ax3, ax4]) = plt.subplots(2, 2)
# left = 1.8 #x coordinate for text insert
sm.graphics.qqplot(lm_pooled.resids, line="45", fit=True, ax=ax1)
ax1.set_title("QQ plot")
top = ax1.get_ylim()[1] * 0.75
ax2.set_title("Scatterplot of residuals")
ax2.set_xlim((3, 8))
sns.residplot(
x="CPIC_SJA_P6M6ML6AR",
y="DU05YXR_VT10",
data=dfx_pan,
lowess=True,
line_kws=dict(color="r"),
color="darkseagreen",
ax=ax2,
)
ax3.set_title("Raw residuals of Pooled OLS versus DU02YXR_VT10")
ax3.set_ylabel("Residual (y  mu)")
ax3.scatter(
dfx_pan.DU05YXR_VT10,
lm_pooled.resids,
s=4,
c="purple",
label="Residual Error",
)
dfx_pan["cid_dummy"] = dfx_pan.index.get_level_values(0).values
sns.boxplot(x="cid_dummy", y=lm_pooled.resids, data=dfx_pan)
g = sns.stripplot(
x="cid_dummy", y=lm_pooled.resids, data=dfx_pan, palette="bright", ax=ax4
)
ax4.set_title("Pooled residuals by country")
fig.tight_layout()
plt.show()
Sometimes it is useful to ascertain that a relation holds roughly similarly acrosssections (currency areas). The countryspecific relations can be quickly visualized with Seaborn’s
lmplot
function.
grid = sns.lmplot(
x="CPIC_SJA_P6M6ML6AR",
y="DU05YXR_VT10",
col="cid_dummy",
sharex=False,
sharey=True,
aspect=2,
col_wrap=3,
data=dfx_pan,
height=2.5,
)
fig.tight_layout()
Fixedeffects regression #
Basics #
Fixed effects in panel models are influences that are specific to a crosssection or a time period (or both). Also, they are assumed to be constant. Fixed effects are conceptually viewed as unchangeable characteristics of the level, i.e. here of the crosssection or the observation period.
A good summary of the basic fixedeffects model can be found on Wikipedia
Generally, fixed effects are useful to mitigate omitted variable biases, i.e. incorrect attribution of explanatory or predictive power of an unobserved variable to an observed one. For prediction models omitted variable biases are not always a fault.

If correlation between observed and unobserved variables is stable, using the observed feature and its estimated influence for forecasts is a valid strategy.

However, if the correlation between observed and unserved variable has been circumstantial and not expected to persist, the omitted variable bias will mislead the forecaster. In financial market research this danger typically arises for semistructural features. For example, longterm averages of currency returns may be driven by risk premia. However, if an unrelated slowmoving semistructural indicator has also recorded longterm differences across currency areas over the sample period, a pooled regression may easily misinterpret as predictors of risk premia.
Using fixed effects models for prediction bears its own risks. The model is misleading if crosssectional differences reflect unobserved features that are actually changing over time. This concern applies, for example, to regressions on currency areaspecific returns. While returns may have displayed significant differences historically, due for example to different risk premia, they plausibly reflect the specific history of the currency area over the sample period. The assumption of a stable return outperformance of a currency area, unrelated to any observable feature, is typically unrealistic.
As a rule of thumb, fixedeffects models should not be used for temporary effects. Crosssection fixed effects would be misleading and periodspecific effects would just be useless for predictions.
Using dummy variables and
statsmodels
#
Fixed effects can be estimated with the
statsmodels
package by adding crosssection or periodspecific dummy variables to the
PooledOLS
method. The below estimation assumes that each currency area adds or subtracts a fixed premium to the target return.
X = [
"CPIC_SJA_P6M6ML6AR",
"cid_dummy",
]
X = sm.add_constant(dfx_pan[X])
sm_fe_cs = PooledOLS(dfx_pan.DU05YXR_VT10, X).fit(cov_type="clustered")
print(sm_fe_cs)
PooledOLS Estimation Summary
================================================================================
Dep. Variable: DU05YXR_VT10 Rsquared: 0.0069
Estimator: PooledOLS Rsquared (Between): 1.0000
No. Observations: 1600 Rsquared (Within): 0.0064
Date: Tue, Sep 19 2023 Rsquared (Overall): 0.0069
Time: 16:41:06 Loglikelihood 4260.5
Cov. Estimator: Clustered
Fstatistic: 1.8363
Entities: 6 Pvalue 0.0886
Avg Obs: 266.67 Distribution: F(6,1593)
Min Obs: 219.00
Max Obs: 280.00 Fstatistic (robust): 1.7062
Pvalue 0.1158
Time periods: 281 Distribution: F(6,1593)
Avg Obs: 5.6940
Min Obs: 0.0000
Max Obs: 6.0000
Parameter Estimates
======================================================================================
Parameter Std. Err. Tstat Pvalue Lower CI Upper CI

const 0.9753 0.3110 3.1357 0.0017 0.3652 1.5853
CPIC_SJA_P6M6ML6AR 0.2660 0.0874 3.0447 0.0024 0.4374 0.0946
cid_dummy.CAD 0.3074 0.3056 1.0058 0.3147 0.9069 0.2921
cid_dummy.EUR 0.2246 0.3185 0.7052 0.4808 0.8492 0.4001
cid_dummy.GBP 0.2053 0.3164 0.6489 0.5165 0.8260 0.4153
cid_dummy.JPY 0.5607 0.3605 1.5554 0.1200 1.2678 0.1464
cid_dummy.USD 0.2882 0.3068 0.9395 0.3476 0.8899 0.3135
======================================================================================
The constant in the above output applied the first (alphabetical  in our case AUD) market. The effects for other markets can be obtained adding the respective dummy coefficient values to the AUD level.
The significance of fixed effects is indicated by the tstatistics and pvalues of the dummy coefficients. Adding fixed effects for crosssection only increased the absolute value and strengthened the significance for chosen independent variable, though none of fixed effects are significant at 5% level.
Fixed effects across periods can be analyzed similarly. However, with longer time series the output of the fixedeffects panel regression becomes vast and unreadable if we did look at the effects of every recorded period. Hence, here we use calendar years as a dummy variable and still include country as a dummy variable.
dfx_pan["year"] = dfx_pan.index.get_level_values(1).year.astype("category")
dfx_pan["date"] = dfx_pan.index.get_level_values(1).astype("category")
X = ["CPIC_SJA_P6M6ML6AR", "cid_dummy", "year"]
X = sm.add_constant(dfx_pan[X])
sm_fe_time = PooledOLS(dfx_pan.DU05YXR_VT10, X).fit(cov_type="clustered")
print(sm_fe_time)
PooledOLS Estimation Summary
================================================================================
Dep. Variable: DU05YXR_VT10 Rsquared: 0.0950
Estimator: PooledOLS Rsquared (Between): 1.0000
No. Observations: 1600 Rsquared (Within): 0.0945
Date: Tue, Sep 19 2023 Rsquared (Overall): 0.0950
Time: 16:41:06 Loglikelihood 4186.2
Cov. Estimator: Clustered
Fstatistic: 5.6829
Entities: 6 Pvalue 0.0000
Avg Obs: 266.67 Distribution: F(29,1570)
Min Obs: 219.00
Max Obs: 280.00 Fstatistic (robust): 5.5167
Pvalue 0.0000
Time periods: 281 Distribution: F(29,1570)
Avg Obs: 5.6940
Min Obs: 0.0000
Max Obs: 6.0000
Parameter Estimates
======================================================================================
Parameter Std. Err. Tstat Pvalue Lower CI Upper CI

const 2.0846 0.5987 3.4816 0.0005 0.9102 3.2590
CPIC_SJA_P6M6ML6AR 0.0375 0.1233 0.3038 0.7613 0.2045 0.2794
cid_dummy.CAD 0.2337 0.2967 0.7878 0.4309 0.8156 0.3482
cid_dummy.EUR 0.0124 0.3147 0.0393 0.9686 0.6050 0.6297
cid_dummy.GBP 0.1325 0.3043 0.4355 0.6633 0.7293 0.4643
cid_dummy.JPY 0.1960 0.4011 0.4886 0.6252 0.5908 0.9828
cid_dummy.USD 0.1778 0.3001 0.5925 0.5536 0.7665 0.4109
year.2001 0.9743 0.7005 1.3908 0.1645 2.3484 0.3998
year.2002 0.7156 0.6524 1.0970 0.2728 1.9952 0.5640
year.2003 1.9926 0.6915 2.8815 0.0040 3.3490 0.6362
year.2004 1.5753 0.6313 2.4952 0.0127 2.8136 0.3369
year.2005 2.0636 0.6395 3.2267 0.0013 3.3181 0.8092
year.2006 2.9158 0.6147 4.7437 0.0000 4.1214 1.7101
year.2007 2.4550 0.6802 3.6093 0.0003 3.7892 1.1208
year.2008 0.8591 0.6585 1.3046 0.1922 2.1508 0.4326
year.2009 2.1720 0.6170 3.5200 0.0004 3.3823 0.9617
year.2010 1.1628 0.6519 1.7837 0.0747 2.4416 0.1159
year.2011 0.8383 0.5865 1.4293 0.1531 1.9886 0.3121
year.2012 1.2314 0.5901 2.0870 0.0371 2.3888 0.0741
year.2013 3.2710 0.7114 4.5981 0.0000 4.6664 1.8756
year.2014 0.5844 0.6014 0.9718 0.3313 1.7640 0.5951
year.2015 1.7535 0.6110 2.8697 0.0042 2.9520 0.5550
year.2016 1.9195 0.7203 2.6648 0.0078 3.3324 0.5066
year.2017 2.5581 0.6557 3.9012 0.0001 3.8442 1.2719
year.2018 1.5821 0.6694 2.3636 0.0182 2.8950 0.2692
year.2019 0.8103 0.6620 1.2241 0.2211 2.1088 0.4881
year.2020 1.0457 0.6253 1.6722 0.0947 2.2723 0.1809
year.2021 4.0839 0.7970 5.1240 0.0000 5.6472 2.5206
year.2022 4.4528 0.7404 6.0138 0.0000 5.9051 3.0005
year.2023 1.4976 0.9669 1.5488 0.1216 3.3941 0.3990
======================================================================================
Even if the years’ fixed effects are significant, this information is of little use for forecasting since time periods do not recur, however an interesting takeaway from the above summary is that, after having taken into account the effect of crosssection and year, there coefficient for inflation turns positive and looses any significance.
Fixedeffect regression with
linearmodels
#
It is often more efficient to perform panel regression with the
statsmodels
extension
linearmodels
. The
linearmodels
module focuses on panel regression, instrumental variable estimators, system estimators and models for estimating asset prices. The package needs to be be installed separately with
pip
install
linearmodels
. While fixed effects regression is similar to using dummy variables, it is computationally more efficient
The
linearmodels
module features the class
PanelOLS
, which supports
one way
(any unobserved effects that are different across individuals but fixed across time) and
twoway fixed effects
estimation for panel data. This means one can consider crosssectional effects (
entity_effects=True
) or period effects (
time_effects=True
) or both. However,
PanelOLS
only supports linear additive effects.
X = ["CPIC_SJA_P6M6ML6AR"]
X = sm.add_constant(dfx_pan[X])
lm_fe_cs = PanelOLS(
dfx_pan.DU05YXR_VT10, X, time_effects=False, entity_effects=True
).fit(
cov_type="clustered"
) # clustering at crosssection level
print(lm_fe_cs)
PanelOLS Estimation Summary
================================================================================
Dep. Variable: DU05YXR_VT10 Rsquared: 0.0064
Estimator: PanelOLS Rsquared (Between): 3.6440
No. Observations: 1600 Rsquared (Within): 0.0064
Date: Tue, Sep 19 2023 Rsquared (Overall): 0.0048
Time: 16:41:06 Loglikelihood 4260.5
Cov. Estimator: Clustered
Fstatistic: 10.214
Entities: 6 Pvalue 0.0014
Avg Obs: 266.67 Distribution: F(1,1593)
Min Obs: 219.00
Max Obs: 280.00 Fstatistic (robust): 9.2699
Pvalue 0.0024
Time periods: 281 Distribution: F(1,1593)
Avg Obs: 5.6940
Min Obs: 0.0000
Max Obs: 6.0000
Parameter Estimates
======================================================================================
Parameter Std. Err. Tstat Pvalue Lower CI Upper CI

const 0.7193 0.1729 4.1595 0.0000 0.3801 1.0585
CPIC_SJA_P6M6ML6AR 0.2660 0.0874 3.0447 0.0024 0.4374 0.0946
======================================================================================
Ftest for Poolability: 0.5157
Pvalue: 0.7646
Distribution: F(5,1593)
Included effects: Entity
The output above produces similar coefficient estimates and standard errors as pooled regression with dummy variables for each crosssection (
sm_fe_cs
in example above). The biggest difference in statistics comes from
\(R^2\)
calculation: in the
linearmodels
Fixed Effects regression
\(R^2\)
is given as “within
\(R^2\)
”, which means that it devided the usual Sum Squared of Residuals on
demeaned
Total sum of squared, encounting for the fixed effect.
An interesting statistics here is Ftest for “poolability” and its pvalue. A low pvalue means high probability of effect significance and rejects the null hypothesis of no levelspecific effects. This would be taken as an argument against poolability. By contrast, a high pvalue of, say, above 0.1 means a lower probability of effect significance and would lend more support to using a pooled regression, which would be lest costly in terms of degrees of freedom.
Note that the summary does not display the fixed effect estimates (intercept). This is done to make the output compact: the model could potentially be used to analyse data with hundreds of “units” (“entities”) and displaying all of them in one table would be counterproductive.
The intercepts (constants) can be seen with command
lm_fe_cs.estimated_effects
, however they will be different numerically from pooled regression with dummy variables, where the benchmark was alphabetically first crosssection. To get numerically the same results the fixed effects should be called using formula below and after that the intercepts can be extracted.
lm_fe_cs = PanelOLS.from_formula(
"DU05YXR_VT10 ~ 0 + CPIC_SJA_P6M6ML6AR+ EntityEffects",
dfx_pan,
).fit()
chart_lm_fe_cs = lm_fe_cs.estimated_effects.unstack(level=1)
chart_lm_fe_cs = chart_lm_fe_cs[chart_lm_fe_cs.columns[~chart_lm_fe_cs.isnull().all()]]
chart_lm_fe_cs = chart_lm_fe_cs.transpose().reset_index()
filt = chart_lm_fe_cs["real_date"] >= pd.to_datetime(
"20230201"
) # filter for any date (as long as there is no nan or 0)
chart_lm_fe_cs = chart_lm_fe_cs[filt] # filter out relevant data frame
sns.set(rc={"figure.figsize": (5, 3)})
chart_lm_fe_cs = chart_lm_fe_cs.drop(["level_0", "real_date"], axis=1)
sns.barplot(data=chart_lm_fe_cs).set(
title="Estimated Intercepts for each crosssection. Fixed Effects"
)
[Text(0.5, 1.0, 'Estimated Intercepts for each crosssection. Fixed Effects')]
Below example replicates time fixed effects using
linearmodels
. It will be equivalent to pooledOLS using time and “cid_dummy” as dummy variables.
X = [
"CPIC_SJA_P6M6ML6AR",
]
X = sm.add_constant(dfx_pan[X])
lm_fe_time = PanelOLS(
dfx_pan.DU05YXR_VT10, X, time_effects=True, entity_effects=True
).fit(cov_type="clustered")
print(lm_fe_time)
PanelOLS Estimation Summary
================================================================================
Dep. Variable: DU05YXR_VT10 Rsquared: 0.0003
Estimator: PanelOLS Rsquared (Between): 0.1480
No. Observations: 1600 Rsquared (Within): 0.0022
Date: Tue, Sep 19 2023 Rsquared (Overall): 0.0023
Time: 16:41:06 Loglikelihood 3288.0
Cov. Estimator: Clustered
Fstatistic: 0.3870
Entities: 6 Pvalue 0.5340
Avg Obs: 266.67 Distribution: F(1,1314)
Min Obs: 219.00
Max Obs: 280.00 Fstatistic (robust): 0.3234
Pvalue 0.5697
Time periods: 281 Distribution: F(1,1314)
Avg Obs: 5.6940
Min Obs: 0.0000
Max Obs: 6.0000
Parameter Estimates
======================================================================================
Parameter Std. Err. Tstat Pvalue Lower CI Upper CI

const 0.3451 0.1619 2.1316 0.0332 0.0275 0.6627
CPIC_SJA_P6M6ML6AR 0.0506 0.0890 0.5687 0.5697 0.2251 0.1240
======================================================================================
Ftest for Poolability: 11.001
Pvalue: 0.0000
Distribution: F(284,1314)
Included effects: Entity, Time
The interesting part here is that taking into consideration the effect of crosssection and time, the significance of coefficient for
CPIC_SJA_P6M6ML6AR
has decreased and its coefficient is no longer significant. Again, the relevant F statistics shows support for pooled regression. The issue with twoway fixed effect model is a huge loss in degrees of freedom (having estimated each period and crosssection separately we loose 281 degrees of freedom).
Random effects regression #
Basics #
Like fixed effects models, random effects models are used to account for crosssection and periodspecific effects. However, the purpose and assumptions of random effects models in financial markets research are different.

Randomeffects models deal with the problem of pseudoreplication in panels . Random effects are commonalities of measurements that are connected, for example by belonging to the same crosssection or time period. Affected observations are not independent. Treating them as independent leads to pseudoreplication and erroneous inference. For example, coefficient significance in a pooled OLS regression with pseudoreplication may be overstated. Random effects models are made for dealing with nonindependence, as opposed to fixed effects, which focus on the actual parameter estimates of crosssection or periodspecific effects.

Unlike fixed effects models, random effects models do not consider omitted variable bias as a problem . This often suits the purpose of financial market prediction models if one assumes that observedunobserved variable correlation is stable. Then the observed feature can represent the effect of the unobserved one. After all, we care about forecast accuracy not about actual causality.
Importantly, the term “random” does not mean that the effects are erratic or not measurable. They are merely influences that are not explained by the features of the model and are specific to the level, i.e. the period or the crosssection. In a randomeffects model we consider levelspecific commonalities as draws from a random variable and disregard their specific values. The effects themselves do not add any information we are interested in.
Conceptually, random effects represent a sample of all possible values or instances of that effect. Thus, they generalize to all results obtainable from the population. In the case of financial market panel research, they generalize to all possible crosssections (markets, currency areas or countries) and time periods, including the future. This generalization typically makes the random effects model more suitable for predictions.
“Random effect estimates are a function of the group [crosssection or time period] level information as well as the overall (grand) mean of the random effect. Group levels with low sample size and/or poor information are more strongly influenced by the grand mean, which is serving to add information to an otherwise poorlyestimated group. However, a group with a large sample size and/or strong information will have very little influence of the grand mean and largely reflect the information contained entirely within the group. This process is called partial pooling, as opposed to no pooling where no effect is considered or total pooling where separate models are run for the separate groups. Partial pooling results in the phenemenon known as shrinkage, which refers to the grouplevel estimates being shrink toward the mean. What does all this mean? If you use a random effect you should be prepared for your factor levels to have some influence from the overall mean of all levels. With a good, clear signal in each group, you won’t see much influence of the overall mean, but you will with small groups or those without much signal.” Steve Midway
Crosssection random effects with
linearmodels
#
One way of performing crosssectional random effects panel regression is to use the
RandomEffects
class of the
linearmodels
package. It implements a crosssectional random effects model.
X = sm.add_constant(dfx_pan["CPIC_SJA_P6M6ML6AR"])
lm_re = RandomEffects(dfx_pan.DU05YXR_VT10, X).fit(cov_type="clustered")
print(lm_re.summary)
RandomEffects Estimation Summary
================================================================================
Dep. Variable: DU05YXR_VT10 Rsquared: 0.0053
Estimator: RandomEffects Rsquared (Between): 1.8826
No. Observations: 1600 Rsquared (Within): 0.0061
Date: Tue, Sep 19 2023 Rsquared (Overall): 0.0053
Time: 16:41:06 Loglikelihood 4261.8
Cov. Estimator: Clustered
Fstatistic: 8.4524
Entities: 6 Pvalue 0.0037
Avg Obs: 266.67 Distribution: F(1,1598)
Min Obs: 219.00
Max Obs: 280.00 Fstatistic (robust): 8.0930
Pvalue 0.0045
Time periods: 281 Distribution: F(1,1598)
Avg Obs: 5.6940
Min Obs: 0.0000
Max Obs: 6.0000
Parameter Estimates
======================================================================================
Parameter Std. Err. Tstat Pvalue Lower CI Upper CI

const 0.6162 0.1486 4.1453 0.0000 0.3246 0.9077
CPIC_SJA_P6M6ML6AR 0.2066 0.0726 2.8448 0.0045 0.3491 0.0642
======================================================================================
The output is equivalent to the pooled regression. Adding crosssectional random effect in this example does not much change results, as none of crosssectional fixed effects was significant.
Generally, we can check the importance of random effects by extracting the “theta”, a measure for the importance of the level, i.e. here the crosssections, in explaining target variance.
The random effects estimator makes use of a quasidifferenced model:
\(y_{it}  \hat{\theta_i}\bar{y} = (1\hat{\theta_i})\alpha_i + (x_{it} \hat{\theta_i}x_i)\beta + (\epsilon_{it}  \hat{\theta_i}\bar{\epsilon})\)
where y is the target, x the feature, the subscript i denotes a crosssection, and subscript t a period. Here \(\hat\theta_i\) is a function of the variance of the crosssection specific error \(\epsilon_{it}\) , the variance of the intercepts \(\alpha_i\) and the number of observations for entity \(i\) . The coefficient \(\theta_i\) determines how much “demeaning” takes place. When this value is \(0\) , the random effects model effectively becomes a pooled model since this occurs when there is no variance in the effects. When \(\theta_i\) is \(1\) , the model effectively becomes a fixed effects model.
For example, in the above regression the \(\theta_i\) is zero for all crosssections, implying a no significant crosssectionspecific effect. Put simply, the crosssection does not explain much of the variation of the return. This is plausible, because almost all of the return variation has to do with changeable market conditions rather than the country. For financial time series periodspecific effects are often far more important than crosssection specific effects.
print(lm_re.theta.transpose())
cid AUD CAD EUR GBP JPY USD
theta 0.0 0.0 0.0 0.0 0.0 0.0
Conveniently, the
compare
function of the
linearmodels.panel
module can be used to show the key output of different panel models side by side.
res = {
"Pooled": lm_pooled,
"One way fixed Effects": sm_fe_cs,
"Random crosssection specific": lm_re,
}
print(compare(res))
Model Comparison
========================================================================================
Pooled One way fixed Effects Random crosssection specific

Dep. Variable DU05YXR_VT10 DU05YXR_VT10 DU05YXR_VT10
Estimator PooledOLS PooledOLS RandomEffects
No. Observations 1600 1600 1600
Cov. Est. Clustered Clustered Clustered
Rsquared 0.0053 0.0069 0.0053
RSquared (Within) 0.0061 0.0064 0.0061
RSquared (Between) 1.8826 1.0000 1.8826
RSquared (Overall) 0.0053 0.0069 0.0053
Fstatistic 8.4524 1.8363 8.4524
Pvalue (Fstat) 0.0037 0.0886 0.0037
===================== ============== ============== ===============
const 0.6162 0.9753 0.6162
(4.1453) (3.1357) (4.1453)
CPIC_SJA_P6M6ML6AR 0.2066 0.2660 0.2066
(2.8448) (3.0447) (2.8448)
cid_dummy.CAD 0.3074
(1.0058)
cid_dummy.EUR 0.2246
(0.7052)
cid_dummy.GBP 0.2053
(0.6489)
cid_dummy.JPY 0.5607
(1.5554)
cid_dummy.USD 0.2882
(0.9395)

Tstats reported in parentheses
Time periodspecific random effects with
statsmodels
#
Periodspecific random effects are often highly relevant for trading strategy research, as most targets are financial returns of the same type over different markets. The variation of this target is predominantly periodspecific and often returns over the same period are highly correlated. Not accounting for such correlation leads to pseudoreplication of data and overstated significance of results.
The Macrosynergy post
Testing macro trading factors
explains a periodspecific random effects panel regression that is well suited for testing the influence of macro factors across countries. It is a practical and effective econometric technique to tackle pseudoreplication in panels. There is a Rscript on
Kaggle
, replicating the result of the post. R offers better documentation and more flexibility for random/ linear mixed model effects than Python. This is done with two main packages 
plm
and
lme4
.
Here we replicate this procedure in Python. The easiest way to apply periodspecific random effects in Python is to use mixed effects regression (
MixedLM
) from
statsmodels
package. The labeling of the function is, admittedly, a source of confusion. For practical purposes this function gives us the output of a periodspecific random effects model, however. Indeed Python in this case tries to replicate the output of R, using R packages as benchmark.
y = dfx_pan["DU05YXR_VT10"]
X = dfx_pan["CPIC_SJA_P6M6ML6AR"]
X = sm.add_constant(X)
groups = dfx_pan.reset_index().real_date
re = sm.MixedLM(
y,
X,
groups,
).fit(
reml=True
) # Use restricted maximum likelihood
print(re.summary())
Mixed Linear Model Regression Results
=============================================================
Model: MixedLM Dependent Variable: DU05YXR_VT10
No. Observations: 1600 Method: REML
No. Groups: 280 Scale: 4.3401
Min. group size: 4 LogLikelihood: 3784.0539
Max. group size: 6 Converged: Yes
Mean group size: 5.7

Coef. Std.Err. z P>z [0.025 0.975]

const 0.455 0.199 2.281 0.023 0.064 0.845
CPIC_SJA_P6M6ML6AR 0.098 0.056 1.753 0.080 0.207 0.012
Group Var 7.752 0.381
=============================================================
The output of the model has less information than a standard OLS model or even a panel regression from linear model. The reason for that is that many of standard measurements are not applicable to the model:

Method states the fitting method (either Maximum Likelihood ( ML or REML (Restricted maximum likelihood) ), a modified version of standard maximum likelihood. Mixed effects models can be difficult to fit and since they do not have closed form solutions they rely on an optimizer. Standard maximum likelihood often fails by not converging (the summary indicates if the model has converged or not). REML method often yields a better result.

The LogLikelihood value is the natural logarithm of probability of observing the sample data assuming it has been taken from a population that is characterized by the estimated parameters. It helps to compare different models.

The coefficients in the bottom part of the table for the intercept, and the covariates are Fixed Effects of the mixed model. Comparing the output with the very first pooled regression model, we notice, that the negative relation between the monthend core CPI trend and next month’s return still shows as negative, but the statistical probability of this relation is only around 88%.

The Group Var coefficient actually is the variance of the periodspecific random effects overtime. The term
Group
is generic but in this specific case, where we specified as group thereal_date
category, refers to time periods.
Scale is the (scalar) error variance. It is basically the estimated error variance based on given estimates of the slopes and random effects covariance matrix. For discussion on its computation please see source code with remarks
To estimate the importance of random factors we need to extract the scale and variance of the periodspecific random effects from the model and simply divide variance of the periodspecific random effects by the sum of two variances (and multiply by 100 to get percentage):
scale = sm.MixedLM(
y,
X,
groups,
).get_scale(re.fe_params, re.cov_re_unscaled, re.vcomp)
RE_explained = ((re.cov_re / (re.cov_re + scale)) * 100).squeeze().round(2)
print("% of Variance attributable to random factors time, in %, is", RE_explained)
% of Variance attributable to random factors time, in %, is 64.11
The periodspecific random effects can be extracted and plotted with the
.random_effects
method. For plotting see the mixed effects section below.
random_effects = pd.DataFrame.from_dict(re.random_effects)
random = random_effects.T
print(random.tail(5))
Group Var
20230131 3.099347
20230228 3.490843
20230331 3.974264
20230430 0.036600
20230531 0.336469
To see how the model residuals are distributed against the fitted values and detect a possible pattern we can use a simple scatter plot
sns.scatterplot(x=(dfx_pan["DU05YXR_VT10"]  re.resid), y=re.resid, alpha=0.5).set(
title="Residual vs. Fitted"
)
plt.xlabel("Fitted Values")
plt.ylabel("Residuals")
Text(0, 0.5, 'Residuals')
fitted = pd.DataFrame()
fitted["CPIC_SJA_P6M6ML6AR"] = dfx_pan.CPIC_SJA_P6M6ML6AR
fitted["DU05YXR_VT10"] = dfx_pan.DU05YXR_VT10
fitted["re_fit"] = re.fittedvalues
fitted["pool_fit"] = lm_pooled.fitted_values
fitted["fixed_entity_fit"] = lm_fe_cs.fitted_values
fitted["fixed_time_fit"] = sm_fe_time.fitted_values
fitted_l = fitted.reset_index()
fitted_l = pd.melt(
fitted_l,
id_vars=["cid", "real_date"],
value_vars=[
"CPIC_SJA_P6M6ML6AR",
"DU05YXR_VT10",
"re_fit",
"pool_fit",
"fixed_entity_fit",
"fixed_time_fit",
],
var_name="xcat",
value_name="value",
)
cidx = ["AUD", "CAD", "EUR", "GBP", "JPY", "USD"]
msp.view_timelines(
fitted_l,
xcats=["DU05YXR_VT10", "re_fit", "pool_fit", "fixed_entity_fit", "fixed_time_fit"],
cids=cidx,
ncol=3,
start="20000101",
same_y=False,
aspect=1.5,
height=3,
label_adj=0.2,
)
None of the models except for random effects predict much of the variation. It is to be expected as Rsquared numbers were generally very low. The random effect model seems to capture much more.
Linear mixed effects model #
“The term ‘mixed’ or, more fully, ‘mixed effects’, denotes a model that incorporates both fixed and randomeffects terms in a linear predictor expression from which the conditional mean of the response can be evaluated.”
Bates et al
. Note that this paper also describes the fitting of the model in
lme4
package and part of this package is replicated in Python’s
statsmodels
under
smf.mixedlm
model.
“Linear mixedeffects models are an important class of statistical models that can be used to analyze…continuous, hierarchical data…taking into account the correlation of observations…They allow us to effectively partition overall variation of the dependent variable into components corresponding to different levels of data hierarchy.” Gałecki and Burzykowski
The practical benefits of this type of model are flexibility to introduce levelspecific effects to intercepts and slopes . In particular, one can specify random intercept and random slope coefficients. As before, the model will estimate fixed coefficients For example, these models allow random intercepts with fixed means, intercept that vary across a level and a sublevel, correlated random intercepts and slopes, and uncorrelated intercept and slopes. Formally the model posits that the conditional distribution of a target vector for a realized set of levelspecific random effects is normal around a linear combination of features, the random effects and, possibly, a prior offset term.
Linear mixed effects models also offer more flexibility in handling unbalanced repeated measures data. It is important to determine which effects should have random components and which should be fixed. The model is iterative: a tentative model is fitted and consequently modified to generate a better fit. In general this model does not have a closedform solution, so convergence is required for the model to work. For more than one covariate the user can decide on parameter by parameter basis what regressors should be considered as random. For model specifications please see here
Basic documentations with methods description can be viewed here . Use of formula is described here . And the results are discussed here
In the first example we use the group (the crosssections) intercepts and slope coefficients as random factors. Compared to the above pooled regression model the panel estimates of the intercept and slope coefficients only change slightly, reflecting that the explanatory power of crosssections for monthly duration returns is marginal.
groups = dfx_pan.reset_index().cid
lmm1 = smf.mixedlm(
"DU05YXR_VT10 ~ CPIC_SJA_P6M6ML6AR",
dfx_pan,
groups=groups,
re_formula="1+CPIC_SJA_P6M6ML6AR",
).fit()
print(lmm1.summary())
Mixed Linear Model Regression Results
=========================================================================
Model: MixedLM Dependent Variable: DU05YXR_VT10
No. Observations: 1600 Method: REML
No. Groups: 6 Scale: 12.0676
Min. group size: 219 LogLikelihood: 4265.0655
Max. group size: 280 Converged: Yes
Mean group size: 266.7

Coef. Std.Err. z P>z [0.025 0.975]

Intercept 0.619 0.166 3.733 0.000 0.294 0.945
CPIC_SJA_P6M6ML6AR 0.208 0.073 2.844 0.004 0.351 0.065
Group Var 0.002 0.021
Group x CPIC_SJA_P6M6ML6AR Cov 0.001 0.012
CPIC_SJA_P6M6ML6AR Var 0.001 0.014
=========================================================================
However, unlike in the pooled model we can now derive crosssection specific intercepts and  more importantly  slope coefficients by accessing the
random_effects
attribute.
This can be very important for prediction if we have reason to believe that the crosssectional differences of elasticities are, to some extent, persistent. In this case considering crosssection specific slope coefficient can deliver better predictions than the panel or pool coefficient. The conditional means of random effects given the data can be extracted with property
random_effects
:
print(pd.DataFrame.from_dict(lmm1.random_effects), "\n")
print(
"Fixed effects for linear mixed effects model with cid as random factor are \n",
lmm1.fe_params,
)
AUD CAD EUR GBP JPY USD
Group 0.004418 0.000942 0.000963 0.008898 0.010806 0.006304
CPIC_SJA_P6M6ML6AR 0.003832 0.000402 0.000495 0.006088 0.006915 0.004565
Fixed effects for linear mixed effects model with cid as random factor are
Intercept 0.619364
CPIC_SJA_P6M6ML6AR 0.207805
dtype: float64