Home » Research Blog » R tidyverse for macro trading research

R tidyverse for macro trading research

The tidyverse is a collection of packages that facilitate data science with R. It is particularly powerful for macro trading research because [a] it supports efficient and standardized work with R’s vast universe of econometric models, [b] is well adapted for analyzing data vintages (i.e. data series that change over time), and [c] supports code in form of visually clean chains of statistical operations. The tidyverse’s core and peripheral packages share common design principles that harmonize workflow for crucial tasks: [1] organizing data structures, [2] transforming the content of data structures, [3] functional programming with complex nested data sets, [4] extraction of statistical information across models in a standardized form, [5] coding and mathematics with date-time objects, [6] coding with strings and regular expressions, [7] a flexible machine learning workflow, [8] highly versatile and consistent graphics creation, and [9] connectors to financial analysis packages.

The below post is based on personal experience and ties in with this site’s summary on quantititative methods for macro information efficiency.

Basics

The tidyverse is a collection of R packages for data science that share design principles, coding grammar and standard data structures. Hence, they work together comfortably and become incrementally easier to learn and to put into practice. Tidyverse functions can typically be chained through the pipe operator (`%>%`, from the magrittr package), which allows displaying code in logical sequence, making it easier to audit the logic of statistical operations.

The tidyverse is particularly suitable for macro trading research for three reasons.:

  • First, it allows working efficiently with R’s vast and diverse universe of econometric models, which include many specialized packages for macroeconomic and financial market analysis.
  • Second, the tidyverse works particularly well with “vintages” of data, i.e. historic data series that change overtime as a consequence of revisions or evolving standards and conventions. Revisions and vintages are common in macroeconomic data and must be considered for validating trading rules that rely on such data. The tidyverse organizes vintages comfortably in “list columns”, i.e. dataframe columns that contain datastructures rather than simple values.
  • Third, tidyverse functions allow presenting subsequent operations on a data structure as a visually clean chain of function calls, making it easier to understand and audit statistical operations with common sense; this is of particular importance for macro-financial modelling, where errors in logic and theoretical consistency are endemic and cannot be compensated by vast amounts of data.

Key functionality that helps researchers in macro-financial analysis to play at the top of the strategy development game includes the following modules:

  • The tidyr package specializes in organizing and reshaping datasets in tidy formats, accomodating nested formats with list columns, time series, and formats that are suitable for larger databases.
  • The dplyr package offers broad and consistent functionality for transforming data tables, including groupings, summaries, subset extractions, row-wise and column-wise calculations and variable creation, and mutating joins.
  • The purrr package greatly facilitates functional programming on simple and complex (“nested”) datasets, such as data tables with list columns.
  • The broom package helps to extract training information across a broad variety of models in a standardized form.
  • The lubridate package supports coding with date and date-time objects, particularly mathematic operations with periods, durations and time intervals.
  • The stringr package facilitates coding with strings and regular expressions by the sheer consistency of its functions.
  • The tidymodels set of packages is the tidyverse’s machine learning toolkit, offering a coherent suite of modules for data sampling, data pre-processing for estimation, standardized model training, hyperparameter tuning. model validation and bundling processes into a single workflow.
  • The ggplot2 package is a vast and highly popular system of graphic functions based on Wilkinson’s grammar of graphics.
  • The tidyquant suite of packages connects the tidyverse with specialized time series and financial analysis packages.

Good tutorials for R tidyverse include the online book “R for data science” by Hadley Wickham and Garrett Grolemund and Datacamp’s R courses, many of which are heavily geared towards the tidyverse (start with their tidyverse fundamentals course).

Illustrations of the benefits

The purpose of the below Jupyter notebook is to illustrate some particular strengths of the above tidyverse packages for macroeconomic and financial markets research. The notebook does not include examples for ggplot2 (which is too vast a subject for a short post) and the tidyquant packages (which link to other R “ecosystems”).

Click here for better viewing of the below Jupyter notebook.

Share

Related articles