The Grahamian: June 2011

Thursday, June 30, 2011

Statistical Arbitrage: Part I

Statistical arbitrage is a type algorithmic trading technique that relies on hedging all exposure to risk factors in order to profit from small, mean-reverting, predictable movements in security/currency/commodity prices. Put differently, statistical arbitrage is a quantitative system of trading one instrument against a smaller amount of many others, which allow for macroeconomic variables typically affecting prices of the first instrument to be hedged by similar exposure in the opposite direction by the others.

To do statistical arbitrage, we use a multivariate regression equation, which assumes the instantaneous rate of change in an instrument's price, divided by its price, is equal to its alpha times the change in time, plus sigma for all i of a sensitivity 'beta sub i' times the current value of factor i, and finally, plus an error.

In some cases, if the alpha is low, it may be safely ignored, and the value of the instrument then depends exclusively on the betas, the risk factors, and the cointegration residual. By buying 'beta sub i' portfolios of the risk factor i (which can be described by a portfolio that is revealed by principal components analysis), for each risk factor i, we get a complete portfolio whose value is determined only by the cointegration residual of the single instrument and its alpha, which may or may not be zero. (If the cointegration residual has a nonzero drift, this will be our alpha).

This is an extremely simple procedure. Now that we have a portfolio whose value is to be a synthetic time series that we must model. And that's where approaches diverge. Ornstein-Uhlenbeck processes are the most simple, robust, and widely used model that can be used to describe the cointegration residual, but they may not be the best, since they make many assumptions about the time series without any data to support them. For instance, an Ornstein-Uhlenbeck process assumes that the synthetic time series is stationary, has Gaussian distribution with fixed mean, and has a constant, linear mean-reversion speed dependent on distance from the mean. Problems with these assumptions are permanent changes in the alpha value which means that the cointegration residual becomes non-stationary until the problem is corrected and the new drift is grouped as alpha (hence the need to regularly use regression to separate out this drift, while still not having a large sample size to perform the regression accurately, can be a problem.) Next, if the distribution is non-Gaussian, it would be more efficient to capitalize on than to theorize the issue away. Lastly, reversion speed can be better described by something other than distance. It may actually not be proportional to the distance from the mean, if less participants in the market are willing to risk doing Stat Arb in an Extremistan (and not just out-of-equilibrium) environment, and more might be willing stat arbitrageurs when the the residual is small, pushing it too far out of equilibrium on the opposite side. Lastly, if, to avoid this, market participants avoid holding the instrument once it gets past a certain point--say a certain z-score--we would experience a situation where instruments bounced back (at least occasionally) when they get too close to equilibrium.

Of course I don't know what would happen and how it would happen, but Bayesian Networks have more promise in this regard than their rivals, linear stochastic differential equations. Nonlinear Fokker-Planck Equations seem to have some potential for replacing black box predictions like those of Artificial Neural Networks or Hidden Markov Models, but the issue is that there is no great way to derive the 'correct' formula. Either way, we must make assumptions as to the structure of such an equation before we can try to calibrate it.

That all being said, and knowing I'm a fan of hidden Markov models, I would highly recommend those the most for a stat arb prediction engine.

Tuesday, June 28, 2011

Hidden Markov Models: Part II

Since hidden Markov models help researchers to find what sorts of observations tend to come after other types of observations, we have a situation where we can forecast stock behavior once we know what hidden state the security/commodity/currency/synthetic time series was most recently in. (By a synthetic time series, I mean some type of hedged position with a single value being bet on; cointegration residuals in statistical arbitrage, implied volatilities after delta hedging, and PCA-derived risk factor values.)

To successfully deploy capital using an HMM-driven technique, one needs to avoid overfitting the model to the data. I struggled with this. Closing prices from each day turned out to be too unpredictable to work, because the more time passed, the less the older patterns mattered; consequently, I was forced to limit the size of the training sequence I was using, so that the hidden Markov model would only bother trying to learn from the relevant data. Unfortunately, a training set of only 40 days is too small to work well. But 50 days is pushing it, and more than that is quite outdated any daily trading model.

The way around the problem was to use higher frequency data--because it would be relevant while still providing a wealth of information and hidden patterns. Besides all that, higher frequency data allows for more predictions to be made in any given day, and thus limits volatility in portfolio returns. (Real time prices are available through professional brokerage or subscriptions to specific Reuters or Yahoo Finance services. Delayed, but regularly updated prices are available on Google Finance, and no subscription is necessary.)

In order for a hidden Markov model--or any statistical strategy--to work, the trading techniques must be used many, many times. As the number of times the strategy is used increases, the variability in strategy's overall success decreases, and the strategy has more potential for a clean statistical edge to shine through. Conversely, if only a few instruments are held as a portfolio, the portfolio's return is less certain. Trading a few instruments with a prediction algorithm is like going spearfishing with a toothpick. It really is that impractical.

Also, if you know how to do something with a synthetic time series, do it. There tends to be much less variability in outcomes when unwanted risk factors are hedged, and thus much less uncertainty regarding the hidden Markov model's predictive ability.

Sunday, June 5, 2011

Hidden Markov Models

Hidden Markov models have been proven successful for speech recognition, and their success carries over to the prediction of financial time series. According to Patterson's The Quants, and Mallaby's More Money Than God, Renaissance Technologies owes a great deal of their success to hidden Markov models. Research by academics in this paper and this other paper further validated the financial utility of hidden Markov models, and papers such as this one demonstrated their superiority over GARCH(1, 1) models for accurate volatility modeling.

The major issue with using hidden Markov models to predict financial time series is that we are trying to forecast the inherently chaotic. Put differently, forcing HMMs to learn from raw financial data is not always the best idea because it forces them to learn to try and predict the outcome of Brownian motion. On the other hand, that's what information theory is supposed to be about--detecting and predicting signals through a 'noisy' passageway. So while HMMs can still certainly be used on financial data, it's a bit much to ask. The one glaring exception to this is the use of high frequency data, which contains more data and hence is more likely to contain some pattern or other that daily or longer-term data does not reveal. So if dealing with daily or longer-term data, it's a lot easier to do something that eliminates market noise and results in a more statistically calm, pattern-containing time series. Some such methods for hidden Markov models include statistical arbitrage, volatility arbitrage, correlation forecasting, and volume prediction.

Of course, HMMs can also be used even less directly; for instance, by doing information extraction--getting pure information from humans' news articles, such as those on Reuters.com. (My next post will discuss this briefly, and the one after that will talk about other information extraction methods.)

But the most fruitful, direct application of HMMs is in high frequency trading. Because they inherently sort returns into groups (with observations of these returns corresponding to certain probability distributions) that are the underlying 'states,' hidden Markov models can separate out statistically different price movements the same way they can distinguish between vowels and consonants in a two-state model. Put differently, the way underlying states fit together with each other means that even if observed returns are uncorrelated with each other across time, they may be related in a more subtle way: one single certain type of return may be followed by another certain type of return more often than by returns not belonging to that type. I'll leave the rest up to the reader's imagination and programming skills.

Wednesday, June 1, 2011

A New Kind of Global Macro

This is going to be kind of esoteric because I'm holding my cards close to the vest.

Imagine:
Since the entire basis of global macro is risk management, what if we could put portfolios together one risk factor at a time? Wouldn't that be interesting? Risk exposure is the beginning and the end of all global macro strategy. So why wait until the end to integrate it? It would be much better to bet on the risk factors right off the bat? Why focus on certain commodities, currencies, and corporation's securities (or the respective derivatives)? That was never the point of global macro. We care about capitalizing on macroeconomic changes anyway. Why get exposure to those changes from a few instruments when you could get right to the source, and just buy the factor itself?

It's simpler. It's more efficient. It's more potent. It's better diversified.

So the answer is to construct portfolios consisting of nothing but a few uncorrelated risk factors selected in whatever relative portion desired. The method driving the prediction of risk factors is not the issue. That is for another time. What we're concerned with is portfolio allocation once the decisions have been made.

And best of all? Portfolio optimization is easy. The risk factors are approximated by baskets of many instruments, and the instruments each have a covariance with each other. But that has already addressed by principal components analysis. We can merely treat each factor as an instrument to be bought. (This is reasonable--because it is like buying a stock, which is a basket of risk factors, only these ones have with non-zero sensitivity to only one risk factor.) Once we have bought these baby, de-facto 'stocks,' we realize that they have no correlation at all. Thus, the terms inside the first sigma in the Markowitz Mean-Variance model are eliminated, and the second one is easily maximized by investing the most money where the highest return is expected. This is similar to setting our risk adversity parameter to zero, (though we have done nothing of the kind,) since it is now trivial.

This may seem dangerous, because the allocation weights are not constrained, but a rule can be added separately, allowing a maximum position size in any given risk factor, as well as a maximum position and/or maximum portfolio variance. On top of that, we can use VaR systems and stress testing like any global macro fund would.

The exact method for predicting factors varies considerably. There are several approaches that could work, including the John Paulson approach, the Soros approach, and the Robert Frey approach.

Terms