Home » Research Blog » Inventory scores and metal futures returns

Inventory scores and metal futures returns

Jupyter Notebook

Inventory scores are quantamental (point-in-time) indicators of the inventory states and dynamics of economies or commodity sectors. Inventory scores plausibly predict base metal futures returns due to two effects. First, they influence the convenience yield of a metal and the discount at which futures are trading relative to physical stock. Second, they predict demand changes for restocking by producers and industrial consumers. Inventory scores are available for finished manufacturing goods and base metals themselves. An empirical analysis for 2000-2024 shows the strong predictive power of finished goods inventory scores and some modest additional predictive power of commodity-specific inventory scores.

The below post is based on proprietary research of Macrosynergy.

A Jupyter notebook for audit and replication of the research results can be downloaded here. The notebook operation requires access to J.P. Morgan DataQuery to download data from JPMaQS, a premium service of quantamental indicators. J.P. Morgan offers free trials for institutional clients.
Also, there is an academic research support program that sponsors data sets for relevant projects.

This post ties in with this site’s summary on macro trends and systematic value.

Base metal futures markets

Base metals are common and inexpensive materials mainly used in manufacturing and construction. Compared to precious metals, such as gold, they are abundant in nature and corrode more easily. The five most liquid base metal contracts that are traded in financial markets are for aluminium (ALM), which is used in construction, packaging, cars, and airplanes; copper (CPR), which plays a critical role its role in alloys and is essential for plumbing, and electric wiring; lead (LED), which is best known for its use in batteries, construction, and ammunition; nickel (NIC), which is favoured for corrosion-resistant alloys and also batteries, and zinc (ZNC), which is also used in alloys and for galvanizing steel to avoid corrosion. Other tradable contracts, such as tin, have been excluded in this post due to lower trading volumes and a lack of relevant inventory data.

The main platforms that trade these contracts are the London Metal Exchange (LME), the CME Group, which includes COMEX and NYMEX, and the Shanghai Futures Exchange (SHFE). For the analysis in this post, we use generic futures returns of the J.P. Morgan Macrosynergy Quantamental System (JPMaQS). JPMaQS constructs a continuous front future return series assuming that positions are rolled, from front to second contract, on the first day of the month when the front contract is deliverable (view full documentation).

The generic future returns for the five base metals have been positively correlated on a daily or monthly basis. However, medium- and long-term performance has been very different across the contracts.

Two types of inventory effects

The state and dynamics of inventories in an economy plausibly affects commodity future returns in two ways:

  • The convenience yield channel: A convenience yield measures the benefit of holding physical commodity stocks, as opposed to (future) claims via financial contracts. In the commodity space, access to physical stock increases supply security. If such security commands a high premium, futures prices trade at a discount to the value of the physical commodity but become more valuable as the delivery date draws closer. Low inventories normally indicate high marginal convenience yields. This often translates into higher risk premia, futures curve backwardation, and higher expected excess returns (view post here).
  • The inventory demand channel: Falling inventories of a commodity or related finished goods indicate, all other things equal, rising demand for stabilization or re-stocking. On the finished goods side, declining inventories or survey metrics that suggest that inventories are falling or becoming inadequate call for a pickup in production and thus lead to higher raw material demand indirectly. On the commodity side, insufficient inventories raise raw material demand directly, all other things being equal.

Ultimately, predictive power arises from rational inattention with respect to the related data (view post here). Inattention is plausible, because inventory surveys and warehouse data are a rather subtle force compared to financial market shocks and supply shocks.

Two types of quantamental inventory scores

To empirically test inventories’ predictive power with respect to returns, we need point-in-time information states of related statistics. These can be taken from the J.P. Morgan Macrosynergy Quantamental System (JPMaQS). JPMaQS provides a daily history of quantamental indicators, i.e., information states based on concurrent data vintages. A vintage is a time series associated with specific date. Information states are designed for developing and backtesting trading strategies. JPMaQS provides two types of inventory indicators that are relevant to this post:

  • Manufacturing inventory assessment scores: These are survey scores of manufacturers’ assessments of their finished goods inventories (view documentation here). The score here refers to a survey index normalized using the historical mean and standard deviation of its concurrent data vintage. The data are sourced from national statistical offices and business groups. Conceptually, the indices reflect either the direction of inventory changes or the adequacy of inventory levels, depending on country conventions. Inventory assessments are seasonally adjusted, either at the source or by JPMaQS, on an expanding sample basis to avoid any look-ahead bias. Most series are monthly, but some are quarterly and are treated below as 3-month averages. For the below analysis, we consider score levels in three months moving averages (3mma), changes of the past three months over the previous three months (diff 3m/3m), changes of the past six months over the previous six months (diff 6m/6m), and changes over a year ago in three-month moving averages (diff oya, 3mma).
  • Individual metal excess inventories: These are physical inventory volumes of commodity exchanges and national agencies relative to their medium-term moving average (view documentation here). Lookbacks of this average are 2 to 15 years, depending on available records. Here, we focus only on the five major base metal inventories in LME warehouses. The original frequency of the data is daily, but JPMaQS converts them to monthly frequency based on period-end data and then applies seasonal adjustment on an expanding sample basis. We consider the same four averages and changes as for the inventory assessment scores.

Two types of metal futures strategies based on inventory scores

Timing the global metals futures market

The first hypothesis is that high or rising finished goods inventories in industry negatively predict subsequent base metals futures returns. For this purpose, we calculate global weighted averages of inventory assessment scores and their derivatives using 33 developed and emerging market countries. The weights for the global aggregates are concurrent information states of shares in global industrial value added (view documentation here).

We also calculate an average z-score of all four derived metrics of the global manufacturing inventory assessment score, which shall be called the composite manufacturing inventory assessment score. This is a type of conceptual parity signal. Looking at an unweighted average reflects that it is hard to judge, without hindsight, whether any of the signal candidates have better prospects as a predictor than the others.

As expected, the relationship between the composite inventory assessment score and subsequent returns of an equally weighted base metals futures basket has been negative, and significantly so.

The negative predictive relation also holds for each individual transformation of the inventory assessment score at both a monthly and quarterly frequency and both for parametric (Pearson) and non-parametric (Kendall) correlation measures. Generally, changes in inventory assessment scores had higher predictive power than levels.

Also, almost all accuracy and balance accuracy metrics, i.e., ratios of correct prediction of subsequent returns and average correct predictions of positive and negative returns, have been above 50%. The lowest accuracy, just around 50%, was found for monthly predictions based on the inventory assessment level. However, quarterly predictions based on inventory assessment levels posted near 60% accuracy and balanced accuracy ratios, suggesting that their influence has been more gradual.

We calculate stylized naïve profit and lost performance series (PnLs) of strategies that manage exposure to base metal futures by using inventory assessment score signals. In accordance with standard conventions in Macrosynergy posts, naive PnLs are calculated using regular rebalancing in accordance with the score at the end of each month for new positions at the beginning of the next month, allowing for a 1-day time-lapse for trading. The trading signals are limited to a maximum of 3 standard deviations as a reasonable risk limit. The naïve PnL does not consider transaction costs or compounding. For charting, the PnL has been scaled to an annualized volatility of 10%

We consider two types of simple strategies. One just takes positions in accordance with the normalized and winsorized inventory assessment score. The other implements a long bias, by adding 1 standard deviation to the normalized score. The first is a balanced pure alpha strategy, while second is more akin to a “smart beta” strategy that manages exposure to base metals.

Using the composite inventory assessment score, both naïve PnLs outperform the long-only risk parity portfolio in the metals futures market in the long run.

The balanced pure alpha strategy recorded a 25-year Sharpe ratio of 0.45 and a Sortino ratio of 0.65, without any correlation to global equity or other risk asset markets. The long-biased strategy achieved a long-term Sharpe ratio of 0.6 and a Sortino ratio of over 0.85 at the expense of a 15-20% correlation to global equity and risk markets. For comparison, the long-term Sharpe ratio of the long-only risk parity book would have been 0.3-0.35, with around 35% correlation to equity and other risk asset markets. All base metals PnLs have been quite seasonal, which is common for single-principle signals, and have been accentuated by the large fluctuations of inventories and prices around the great financial crisis.

All individual derivatives of the manufacturing inventory assessment scores would also have produced positive Sharpe ratios and mostly in excess of the long-only position. The inventory assessment score changes have produced generally higher performance ratios than the levels. However, the value contribution of the levels has been a little less seasonal.

Contract-specific exposure

The second hypothesis is that commodity-specific excess inventory scores add to the value generation of the manufacturing inventory scores by providing contract-specific signals that do not just govern the direction and size of exposure to metals but also allocation to individual contracts.

For this purpose, we calculate normalized scores of commodity-specific excess inventories and their derivatives, i.e., a score for each of the aforementioned transformations. We do this also for each manufacturing inventory assessment score derivative. Based on these, two sets we calculate averages of the scores of commodity inventories and manufacturing inventory assessments. This results in composite inventory scores for each base metal. They are half metal-specific and half a reflection of the state of global manufacturing inventories. The below panel illustrates this combination for the 3-month average levels of both types of scores.

Finally, and analogously to the manufacturing inventory assessment scores, we calculate a composite inventory score that averages over all transformations.

Unlike in the case of (global) manufacturing inventory scores, the commodity-specific inventory scores are analysed in a panel, i.e., across separate contracts and time. Thus, target returns are now not for a metal basket but for individual metals. Also, we look at volatility-targeted returns to prevent higher-volatility metals and periods from dominating the analysis.

Panel analysis shows that in line with the basic hypothesis there has been a negative relation between composite inventory scores and subsequent returns. The significance is fairly high at less than 2% probability of an accidental relation, using the Macrosynergy panel test.

Predictive month-ahead correlations of all versions (transformations) of average commodity and manufacturing goods inventory scores since 2000 are positive and significant. Also, accuracy and predictive accuracy ratios are all above 50%. However, the commodity inventory scores alone post much smaller predictive correlations and mostly low significance. An exception is short-term changes in commodity-specific inventories, which command significant predictive power.

The naïve PnLs calculated in the same way as above, based on balanced (unbiased), produced slightly higher long-term performance ratios than those for manufacturing inventory scores alone. The Sharpe ratio of the balanced strategy reached over 0.5 and the Sortino ratio 0.75, with a near-zero correlation with global equity and other risk asset classes. However, the performance ratios of the long-biased strategies were slightly below those based on manufacturing inventory scores alone. Thus, the consideration of commodity-specific inventories has produced some benefits for alpha generation but has not helped the smart beta version of the strategy.

Overall, the consideration of commodity-specific inventories has not been a game-changer in terms of Sharpe ratios. It appears that it is easier to predict general metals price changes based on inventories than to predict the outperformance of some metals over others. However, the inclusion of commodity-specific scores has reduced PnL seasonality and allowed a smoother PnL-generating process.

All inventory scores of different transformations have produced a respectable Share ratio for the unbiased alpha version of the strategy, with Sharpe ratios between 0.35 and 0.45 and Sortino ratios between 0.5 and 0.65. None of these strategies showed meaningful correlations to global equity and risk baskets.


Related articles