The power of R for trading (part 1)

R is an object-oriented programming language and work environment for statistical analysis. It is not just for programmers, but for everyone conducting data analysis, including portfolio managers and traders. Even with limited coding skills R outclasses Excel spreadsheets and boosts information efficiency. First, like Excel, the R environment is built around data structures, albeit far more flexible ones. Operations on data are simple and efficient, particularly for import, wrangling, and complex transformations. Second, R is a functional programming language. This means that functions can use other functions as arguments, making code succinct and readable. Specialized “functions of functions” map elaborate coding subroutines to data structures. Third, R users have access to a repository of almost 15,000 packages of function for all sorts of operations and analyses. Finally, R supports vast arrays of visualizations, which are essential in financial research for building intuition and trust in statistical findings.

(more…)

Modern backtesting with integrity

Machine learning offers powerful tools for backtesting trading strategies. However, its computational power and convenience can also be corrosive for financial investment due to its tendency to find temporary patterns while data samples for cross validation are limited. Machine learning produces valid backtests only when applied with sound principles. These should include [1] formulating a logical economic theory up front, [2] choosing sample data up front, [3] keeping the model simple and intuitive, [4] limiting try-outs when testing ideas, [5] accepting model decay overtime rather than ‘tweaking’ specifications, and [6] remaining realistic about reliability. The most important principle of all is integrity: aiming to produce good research rather than good backtests and to communicate statistical findings honestly rather than selling them.

(more…)

Financial econometrics and machine learning

Supervised machine learning enhances the econometric toolbox by methods that find functional forms of prediction models in a manner that optimizes out-of-sample forecasting. It mainly serves prediction, whereas classical econometrics mainly estimates specific structural parameters of the economy. Machine learning emphasizes past patterns in data rather than top-down theoretical priors. The prediction function is typically found in two stages: [1] picking the “best” form conditional on a given level of complexity and [2] picking the “best” complexity based on past out-of-sample forecast performance. This method is attractive for financial forecasting, where returns depend on many complex relations most of which are not well understood even by professionals, and where backtesting of strategies should be free of theoretical bias that arises from historical experience.

(more…)