Terms

This blog is for educational and informational purposes only. The contents of this blog are opinions of the author and should not be interpreted as investment advice. The author takes no responsibility for the investment decisions of other individuals or institutions, and is not liable for any losses they may incur. By reading this blog, you are agreeing that you understand and agree to the terms above.

Tuesday, May 31, 2011

The Data-Driven Manifesto

First and foremost, timing is everything. And not just in humor.

The correct timing is the sole requirement for profitability. Whether it be signals generated by a statistical model, or an important event in the news, all information is incorporated very quickly into market prices. Thus, the only thing that matters is timing each trade just right, whether that means capturing imbalances in supply and demand at the microstructure level, parsing news articles with information extraction algorithms, or betting on an trend reversal after seeing a familiar signal. There is no such thing as a temporally-irrelevant investment. Everything depends on time.

If data is old, one cannot use it to successfully invest. No fundamental data or technical data or obvious statistical data is necessary at all, because it is public knowledge that has already been incorporated into the price of the security, commodity, currency or derivative in question. Betting on a known fact is useless, because the market price already reflects that fact, and will not change due to the fact. How could it? The information has already been unveiled. It is not going to unveil itself again--it cannot become common knowledge twice!

However, if data suggests a statistical anomaly in the presumably efficient market, then it may be proper to act on it, assuming that others are not aware of the anomaly. Widely observed statistical phenomena such as the value, size, and momentum factors are not good foundations for trading strategies because the phenomena are already widely observed and therefore more risky; indeed, the more market participants using a strategy, the more potential that strategy has to underperform the overall market index.

The whole reason investment strategies are market-neutral is because their creators did not want to worry about predicting where investors, as a flock, would go next. Unfortunately, any strategy, with sufficient popularity, suffers from the same problem; people invest in the strategy, and then they withdraw capital, creating volatility in returns from the strategy. This is true for everything from statistical arbitrage to the failure of quantitative equity selection models based on value and size. The only reason that the strategies worked well was because few people used them, but rather their returns came from corrections. For instance, when stat arb was still new, success came in the form of lower returns for an outperforming stock. It wasn't that investors necessarily used stat arb and recognized that the value of the stock was too high, but rather that the strategies' successes were phenomenon unrelated to the direct investment decisions of any individual or institution. But once they became popular, the returns of such strategies was directly impacted by a greater portion of the market, that now knew of and traded the strategy directly. At that point, the strategies were inherently subject to the same fickleness that market indices have always been subject to.

Rather intuitively, the more participants using a strategy, the lower returns associated with it, and the lower the Sharpe ratio. The standard deviation of an overused strategy's returns are also much higher, again for intuitive reasons: the number of market participants is positively correlated with the number of multi-strategy investors; the number of strategies increases, the probability that at least one strategy fails increases as well--and eventually one of the many strategies is bound to fail; as the number of participants increases the number of investors being exposed most heavily to the failing strategy increases as well. As the number of investors being exposed most heavily to the failing strategy approaches infinity, the probability of one of these investors being heavily leveraged and consequently receiving a large margin call (as a result of the failing strategy) converges to unity. From that point, the participant with the margin call liquidates an arbitrary number of strategies' portfolios. If this number includes, say, a value/momentum book, then the value/momentum strategy will suffer from the unwinding of the participant's portfolio.

The problem, then, is not market crashes or strategy failure, ipso facto, but rather a spillover effect on other portfolios using the same, over-popular strategy. The solution is obvious: use strategies that other participants have (literally) never even heard of. The mere knowledge of a possible strategy may encourage participants to covertly experiment with it, and perhaps put it into practice without declaring it. Luckily, as more participants use the strategy, the Sharpe ratio of the strategy declines, and vigilant observation will allow the original users of the strategy to leave quietly before it fails in dramatic fashion.

* * * * *

The problem with theory-driven strategies is that they usually reject temporal trading rules.
For instance, CAPM, EMH, APT and MPT in general fail to account for the possibility of different expected returns across time. They cannot adjust to changing market conditions either, and their models often make too many restrictive assumptions.

The fullest implication of "data-driven" strategies is that their associated models are not merely created ahead of time and parametrically synchronized with data, but rather that the data itself determines the model's structure, and not just its parameters.

For instance, the number of hidden layers in an artificial neural network, the number of iterations of a genetic algorithm for portfolio selection, and the number of states in a hidden Markov model are all input by a human programmer. And yet, the decisions made by humans are the ones that constrict the model from becoming fully formed. Thus, these decisions need to be supported by the data

The optimization method for the most accurate model (as measured by the probability of the model producing the training sequence) should be the one that leads to global optimality. Hill-climbing algorithms can only guarantee local optimality and are therefore less desirable than algorithms that search for global maxima. This is intuitive, since the more accurate the model is, the better it represents the truth behind how the market moves and works.

Once the globally optimal model's structure is perfected through historical profitability tests, the model is ready to use, and no more human intervention is necessary. However, as a scientific experiment, it would be interesting to see what kind of unexpected connections, classifications, and procedures the models could come up with. It will be complex, and likely counterintuitive.

"Black box" has become finance-lingo for any algorithmic trading strategy without a simple, logical backing. The models' structures and parameters--or anything too complex or too counterintuitive for a human to understand--are labeled as "black-box" as if that was a bad thing. The fact that the strategy is obscure helps to avoid the crowding effect that leads to the downfall of every hyped investment strategy, from LTCM's fixed income arbitrage disaster to PDT's temporary Stat Arb troubles in August and November 2007. The more black-box it is, the less others are likely to catch on, and the better the strategy's performance will be. Put differently, assuming that it is sound, it won't be subject to failure on account of sheer popularity. From a bigger perspective, all data-driven strategies work well--assuming that they are not too well known, in which case their success is nothing but a house of cards that has been lucky not to suffer from a gust of wind--because they exploit market phenomena that do exist, rather than ones that ought to exist. Indeed, data-driven strategies are valid because they have been validated by the market.

No comments:

Post a Comment