<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6505913835072324494</id><updated>2011-12-28T22:04:13.201-08:00</updated><category term='Overfitting'/><category term='Information Extraction'/><category term='Performance'/><category term='Statistical Arbitrage'/><category term='George Soros'/><category term='John Paulson'/><category term='Machine Learning'/><category term='Wavelets'/><category term='Leverage'/><category term='Volatility Modeling'/><category term='Quantitative Finance'/><category term='Kalman Filter'/><category term='Market Neutral'/><category term='Fuzzy Logic'/><category term='High Frequency Trading'/><category term='Sharpe Ratios'/><category term='Componentless'/><category term='Fokker-Planck Equations'/><category term='Time Series'/><category term='Multistrategy Funds'/><category term='Beta'/><category term='Algotrading'/><category term='Ex Post Facto Alpha'/><category term='Bayesian Networks'/><category term='Data Feeds'/><category term='Artificial Neural Networks'/><category term='Size'/><category term='News'/><category term='Regime Switching Models'/><category term='Robert Frey'/><category term='Earnings'/><category term='Hidden Markov Models'/><category term='Theory-Driven'/><category term='Event Arbitrage'/><category term='Pattern Recognition'/><category term='Value'/><category term='HFT'/><category term='Funds of Funds'/><category term='Momentum'/><category term='Data-Driven'/><category term='VWAP'/><category term='Moving Average'/><category term='Principal Components Analysis'/><category term='Diversification'/><category term='Transaction Cost Modeling'/><category term='Genetic Algorithms'/><category term='Portfolio Management'/><category term='Ornstein-Uhlenbeck'/><category term='Alpha'/><category term='Value at Risk'/><category term='Risk Factors'/><category term='TWAP'/><category term='Algorithmic Trading'/><category term='Global Macro'/><category term='Optimization'/><category term='Mergers and Acquisitions'/><title type='text'>The Grahamian</title><subtitle type='html'>Data-driven investment analysis</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>9</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-1696489750496647879</id><published>2011-12-28T22:03:00.000-08:00</published><updated>2011-12-28T22:04:13.218-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Multistrategy Funds'/><category scheme='http://www.blogger.com/atom/ns#' term='Portfolio Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Fuzzy Logic'/><category scheme='http://www.blogger.com/atom/ns#' term='Sharpe Ratios'/><category scheme='http://www.blogger.com/atom/ns#' term='Funds of Funds'/><category scheme='http://www.blogger.com/atom/ns#' term='Optimization'/><title type='text'>Multistrategy and Multisignal Portfolio Construction</title><content type='html'>The best portfolios are, no doubt, ones that earn the highest returns with the least volatility. After ascertaining that you possess several informative signals for the same group of securities, you may be wondering how to capitalize on these signals as a whole. If the signals are high-quality enough to predict specific returns, rather than just being the simple "buy/sell" type, then you can take one of two possible approaches:&lt;br /&gt;&lt;br /&gt;1) If you are deciding how to allocate capital to different strategies, run 'backtests' to determine the historical average return and the standard deviation of each strategy, as well as the historical correlation between such strategies, and then use mean-variance portfolio optimization to find the Sharpe-optimal 'weighting' of each strategy in a multistrategy fund. While this is great for finished strategies that utilize completely different structures (for instance, stat arb is market neutral, global macro is directional), it unfortunately does not solve the heart of the problem: if there are no strategies, but only signals, how do we construct a multisignal portfolio at the securities level?&lt;br /&gt;&lt;br /&gt;2) use historical price data to generate signals, and use all signals from a given day in the past to do multivariate linear regression and determine the sensitivity in the output to the sensitivity in input. Alternately, we can train an artificial neural network to do something similar, and pray it doesn't overfit the data. Lastly, we can set up an machine learning system that uses fuzzy logic to 'reason' its way through investment decisions and portfolio construction, analyzing historical patterns of how certain signals contradict the others when something bad is about to happen. In such an event, the system can work out a more informed expectation of the returns, which can then be fed into a portfolio optimizer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-1696489750496647879?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/1696489750496647879/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/12/multistrategy-and-multisignal-portfolio.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/1696489750496647879'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/1696489750496647879'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/12/multistrategy-and-multisignal-portfolio.html' title='Multistrategy and Multisignal Portfolio Construction'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-6306689569915694846</id><published>2011-09-04T02:41:00.000-07:00</published><updated>2011-12-28T21:06:48.809-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Alpha'/><category scheme='http://www.blogger.com/atom/ns#' term='Moving Average'/><category scheme='http://www.blogger.com/atom/ns#' term='Kalman Filter'/><category scheme='http://www.blogger.com/atom/ns#' term='Ex Post Facto Alpha'/><category scheme='http://www.blogger.com/atom/ns#' term='Beta'/><category scheme='http://www.blogger.com/atom/ns#' term='Wavelets'/><category scheme='http://www.blogger.com/atom/ns#' term='Time Series'/><category scheme='http://www.blogger.com/atom/ns#' term='Performance'/><category scheme='http://www.blogger.com/atom/ns#' term='Hidden Markov Models'/><title type='text'>Alpha Prediction</title><content type='html'>Though the topic is very similar to statistical arbitrage, I thought I'd address the issue of alpha prediction instead. If we do an OLS linear regression on a time series of equity returns, with the independent variable being the difference between the returns of a benchmark equity index and the risk free rate, the equity's return can be described in terms of the index return by&lt;br /&gt;&lt;br /&gt;R_k = alpha_k&amp;nbsp; + beta_k * (R_m - R_f),&lt;br /&gt;&lt;br /&gt;where alpha_k and beta_k are parameters from the regression and R_k is the return of equity k, R_m is the return of the market index that equity k is in, and R_f is the risk-free rate. &lt;br /&gt;&lt;br /&gt;Then, by buying one share of equity k, and short selling beta_k shares of the equity index, we can hedge our exposure to the term R_m. Doing so makes us beta-neutral, though of course such a procedure is not very effective in portfolio management when only implemented on a few equities. In practice, we would need to do this on many equities at once before we could get a clean statistical edge.&lt;br /&gt;&lt;br /&gt;Applying the the ex post facto beta from the regression to the past values of equity k's returns, we can infer its ex post facto alpha at each point in time we measure. Of course, we must not go back too far in time or our measure of beta will be an useless statistic that will only mislead us. If we want to see a very long series of alphas, it is best to re-estimate the alphas by doing regression backwards in time at each point where we are trying to find the alpha.&lt;br /&gt;&lt;br /&gt;Moving on. Once we have the time series of alphas--expressed as percentage returns, since those are the units our regressions were in--we can smooth them with a Kalman Filter, a Haar Wavelet Transform, or even just a moving average. Then we can run some type of predictive model on them. (And of course, I suggest using a hidden Markov model for this. A Hidden Markov Gaussian Mixture, where the observations from a each state are drawn from different Gaussian distribution, works well and avoids overfitting.)&lt;br /&gt;&lt;br /&gt;Once our model is calibrated with (or has learned from) the data, we forecast the alpha over the next time period (and this depends on what time period we used to measure the data; it could've been seconds, minutes, days, or weeks).&lt;br /&gt;&lt;br /&gt;We save the equity's alpha forecast, and run the same process of finding the time series of alphas on each other equity, create a predictive model for the new alpha series, and repeat. Once we have an arbitrarily high number of predictions, we can buy the stocks with positive expected alpha, short the ones with negative expected alpha, and hedge the remaining portfolio beta with the corresponding amount of market index.&lt;br /&gt;&lt;br /&gt;This procedure is helpful for identifying stocks that have outperformed or underperformed. Since the strategy is market neutral, it works (in principle at least!) regardless of where the market is going.&lt;br /&gt;&lt;br /&gt;Such a strategy could be used at a mutual fund to earn decent returns. However, I imagine it is of little interest to investment banks or hedge funds, since it is essentially a mixture of one-factor statistical arbitrage and trend following.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-6306689569915694846?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/6306689569915694846/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/09/alpha-prediction.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/6306689569915694846'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/6306689569915694846'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/09/alpha-prediction.html' title='Alpha Prediction'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-7522720273365554700</id><published>2011-06-30T17:10:00.000-07:00</published><updated>2011-07-02T21:15:22.117-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Algotrading'/><category scheme='http://www.blogger.com/atom/ns#' term='Fokker-Planck Equations'/><category scheme='http://www.blogger.com/atom/ns#' term='Principal Components Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Bayesian Networks'/><category scheme='http://www.blogger.com/atom/ns#' term='Algorithmic Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistical Arbitrage'/><category scheme='http://www.blogger.com/atom/ns#' term='Quantitative Finance'/><category scheme='http://www.blogger.com/atom/ns#' term='Ornstein-Uhlenbeck'/><category scheme='http://www.blogger.com/atom/ns#' term='Hidden Markov Models'/><title type='text'>Statistical Arbitrage: Part I</title><content type='html'>&lt;a href="http://math.nyu.edu/faculty/avellane/AvellanedaLeeStatArb071108.pdf"&gt;Statistical arbitrage&lt;/a&gt; is a type algorithmic trading technique that relies on hedging all exposure to risk factors in order to profit from small, mean-reverting, predictable movements in security/currency/commodity prices. Put differently, statistical arbitrage is a quantitative system of trading one instrument against a smaller amount of many others, which allow for macroeconomic variables typically affecting prices of the first instrument to be hedged by similar exposure in the opposite direction by the others. &lt;br /&gt;&lt;br /&gt;To do statistical arbitrage, we use a multivariate regression equation, which assumes the instantaneous rate of change in an instrument's price, divided by its price, is equal to its alpha times the change in time, plus sigma for all i of a sensitivity 'beta sub i' times the current value of factor i, and finally, plus an error. &lt;br /&gt;&amp;nbsp; &lt;br /&gt;In some cases, if the alpha is low, it may be safely ignored, and the value of the instrument then depends exclusively on the betas, the risk factors, and the &lt;a href="http://en.wikipedia.org/wiki/Cointegration"&gt;cointegration&lt;/a&gt; residual. By buying 'beta sub i' portfolios of the risk factor i (which can be described by a portfolio that is revealed by principal components analysis), for each risk factor i, we get a complete portfolio whose value is determined only by the cointegration residual of the single instrument and its alpha, which may or may not be zero. (If the cointegration residual has a nonzero drift, this will be our alpha).&lt;br /&gt;&lt;br /&gt;This is an extremely simple procedure. Now that we have a portfolio whose value is to be a synthetic time series that we must model. And that's where approaches diverge. &lt;a href="http://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process"&gt;Ornstein-Uhlenbeck processes&lt;/a&gt; are the most simple, robust, and widely used model that can be used to describe the cointegration residual, but they may not be the best, since they make many assumptions about the time series without any data to support them. For instance, an Ornstein-Uhlenbeck process assumes that the synthetic time series is stationary, has Gaussian distribution with fixed mean, and has a constant, linear mean-reversion speed dependent on distance from the mean. Problems with these assumptions are permanent changes in the alpha value which means that the cointegration residual becomes non-stationary until the problem is corrected and the new drift is grouped as alpha (hence the need to regularly use regression to separate out this drift, while still not having a large sample size to perform the regression accurately, can be a problem.) Next, if the distribution is non-Gaussian, it would be more efficient to capitalize on than to theorize the issue away. Lastly, reversion speed can be better described by something other than distance. It may actually not be proportional to the distance from the mean, if less participants in the market are willing to risk doing Stat Arb in an Extremistan (and not just out-of-equilibrium) environment, and more might be willing stat arbitrageurs when the the residual is small, pushing it too far out of equilibrium on the opposite side. Lastly, if, to avoid this, market participants avoid holding the instrument once it gets past a certain point--say a certain z-score--we would experience a situation where instruments bounced back (at least occasionally) when they get too close to equilibrium.&lt;br /&gt;&lt;br /&gt;Of course I don't know what would happen and how it would happen, but Bayesian Networks have more promise in this regard than their rivals, linear stochastic differential equations. Nonlinear &lt;a href="http://en.wikipedia.org/wiki/Fokker%E2%80%93Planck_equation"&gt;Fokker-Planck Equations&lt;/a&gt; seem to have some potential for replacing black box predictions like those of Artificial Neural Networks or Hidden Markov Models, but the issue is that there is no great way to derive the 'correct' formula. Either way, we must make assumptions as to the structure of such an equation before we can try to calibrate it.&lt;br /&gt;&lt;br /&gt;That all being said, and knowing I'm a fan of hidden Markov models, I would highly recommend those the most for a stat arb prediction engine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-7522720273365554700?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/7522720273365554700/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/statistical-arbitrage-part-i.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/7522720273365554700'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/7522720273365554700'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/statistical-arbitrage-part-i.html' title='Statistical Arbitrage: Part I'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-2039531941471194543</id><published>2011-06-28T21:38:00.000-07:00</published><updated>2011-06-28T22:00:10.551-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Algotrading'/><category scheme='http://www.blogger.com/atom/ns#' term='HFT'/><category scheme='http://www.blogger.com/atom/ns#' term='High Frequency Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='Principal Components Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Volatility Modeling'/><category scheme='http://www.blogger.com/atom/ns#' term='Algorithmic Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistical Arbitrage'/><category scheme='http://www.blogger.com/atom/ns#' term='Quantitative Finance'/><category scheme='http://www.blogger.com/atom/ns#' term='Overfitting'/><category scheme='http://www.blogger.com/atom/ns#' term='Hidden Markov Models'/><title type='text'>Hidden Markov Models: Part II</title><content type='html'>Since hidden Markov models help researchers to find what sorts of observations tend to come after other types of observations, we have a situation where we can forecast stock behavior once we know what hidden state the security/commodity/currency/synthetic time series was most recently in. (By a synthetic time series, I mean some type of hedged position with a single value being bet on; cointegration residuals in statistical arbitrage, implied volatilities after delta hedging, and PCA-derived risk factor values.)&lt;br /&gt;&lt;br /&gt;To successfully deploy capital using an HMM-driven technique, one needs to avoid overfitting the model to the data. I struggled with this. Closing prices from each day turned out to be too unpredictable to work, because the more time passed, the less the older patterns mattered; consequently, I was forced to limit the size of the training sequence I was using, so that the hidden Markov model would only bother trying to learn from the relevant data. Unfortunately, a training set of only 40 days is too small to work well. But 50 days is pushing it, and more than that is quite outdated any daily trading model.&lt;br /&gt;&lt;br /&gt;The way around the problem was to use higher frequency data--because it would be relevant while still providing a wealth of information and hidden patterns. Besides all that, higher frequency data allows for more predictions to be made in any given day, and thus limits volatility in portfolio returns. (Real time prices are available through professional brokerage or subscriptions to specific Reuters or Yahoo Finance services. Delayed, but regularly updated prices are available on Google Finance, and no subscription is necessary.)&lt;br /&gt;&lt;br /&gt;In order for a hidden Markov model--or any statistical strategy--to work, the trading techniques must be used many, many times. As the number of times the strategy is used increases, the variability in strategy's overall success decreases, and the strategy has more potential for a clean statistical edge to shine through. Conversely, if only a few instruments are held as a portfolio, the portfolio's return is less certain. Trading a few instruments with a prediction algorithm is like going spearfishing with a toothpick. It really is that impractical.&lt;br /&gt;&lt;br /&gt;Also, if you know how to do something with a synthetic time series, do it. There tends to be much less variability  in outcomes when unwanted risk factors are hedged, and thus much less uncertainty regarding the hidden Markov model's predictive ability.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-2039531941471194543?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/2039531941471194543/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/hidden-markov-models-part-ii.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/2039531941471194543'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/2039531941471194543'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/hidden-markov-models-part-ii.html' title='Hidden Markov Models: Part II'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-163614348705465571</id><published>2011-06-13T18:57:00.000-07:00</published><updated>2011-06-28T21:50:48.077-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Event Arbitrage'/><category scheme='http://www.blogger.com/atom/ns#' term='Algotrading'/><category scheme='http://www.blogger.com/atom/ns#' term='HFT'/><category scheme='http://www.blogger.com/atom/ns#' term='Data Feeds'/><category scheme='http://www.blogger.com/atom/ns#' term='Mergers and Acquisitions'/><category scheme='http://www.blogger.com/atom/ns#' term='High Frequency Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='News'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Extraction'/><category scheme='http://www.blogger.com/atom/ns#' term='Algorithmic Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='Quantitative Finance'/><category scheme='http://www.blogger.com/atom/ns#' term='Earnings'/><title type='text'>Event Arbitrage</title><content type='html'>"I don't throw darts at a board. I bet on sure things. Read Sun-tzu, The Art of War. Every battle is won before it is ever fought." - Gordon Gekko&lt;br /&gt;&lt;br /&gt;I don't mean that the way Gekko did, but the idea is the same with high frequency event arbitrage. By the time ordinary investors have reacted to the news, the 'battle' is over and algorithmic traders have already adjusted the price. The trick is to be the one who gets there first--to be the one who has the fastest news feed, the fastest information extraction algorithms, the fastest execution. As you can imagine, this is exactly what many quantitative hedge funds, proprietary trading desks, and other high frequency traders do--they compete for speed.&lt;br /&gt;&lt;br /&gt;It's predictable money. Whoever can both identify a news article and interpret it before the general market has reacted, stands a good chance of making money. The theories behind quantitative finance usually ignore such a possibility, because it implies there is a so called "free lunch" for the traders who can capitalize on it. Since high frequency trading--and HFT event arbitrage in particular--is like picking stocks when you know what they will do next, this puts the efficient market hypothesis and the random walk hypothesis into question.&lt;br /&gt;&lt;br /&gt;So the real trick is to develop a class of algorithms to deal with each possible kind of corporate event (including earnings reports, earnings outlook, mergers and acquisitions, and analyst rating changes). There are a few ways to do the information extraction that is necessary to turn news articles into algorithmic trading signals. The first is to set up the keyword and/or 'keyphrase' search lists by hand, meaning that the article is scanned for relevant words and the sentiment attached to them. The second is more complicated: use unsupervised machine learning to connect keywords and keyphrases to actual market performance of equities at the time the news is released. To do this, algorithmic trading firms commonly group articles together using clustering algorithms like the segmental k-means algorithm.&lt;br /&gt;&lt;br /&gt;The first is method easy, and its algorithms are fast. The second method yields a better quality interpretation of news articles, though is somewhat slow in comparison. In practice, quant funds usually use a news feed that has already been parsed by such algorithms, and is complete with XML tags to describe what the article is reporting.&lt;br /&gt;&lt;br /&gt;The rest of this post will be about the first method. (Being relatively easy to describe!) Words might be, for a list to identify earnings-related stories:&lt;br /&gt;&lt;br /&gt;Analyst&lt;br /&gt;analyst&lt;br /&gt;Analysts&lt;br /&gt;analysts&lt;br /&gt;Analysts'&lt;br /&gt;analysts'&lt;br /&gt;Earnings&lt;br /&gt;earnings&lt;br /&gt;EPS&lt;br /&gt;Estimate&lt;br /&gt;estimate&lt;br /&gt;Estimates&lt;br /&gt;estimates&lt;br /&gt;Expect&lt;br /&gt;expect&lt;br /&gt;Expected&lt;br /&gt;Expected&lt;br /&gt;Expects&lt;br /&gt;expects&lt;br /&gt;Expectations&lt;br /&gt;expectations&lt;br /&gt;Forecast&lt;br /&gt;forecast&lt;br /&gt;Forecasts&lt;br /&gt;forecasts&lt;br /&gt;Guidance&lt;br /&gt;guidance&lt;br /&gt;Income&lt;br /&gt;income&lt;br /&gt;Outlook&lt;br /&gt;outlook&lt;br /&gt;Prediction&lt;br /&gt;prediction&lt;br /&gt;Predictions&lt;br /&gt;predictions&lt;br /&gt;Profit&lt;br /&gt;profit&lt;br /&gt;Profits&lt;br /&gt;profits&lt;br /&gt;&lt;br /&gt;Words might be, for a good earnings piece:&lt;br /&gt;&lt;br /&gt;Above&lt;br /&gt;above&lt;br /&gt;Amaze&lt;br /&gt;amaze&lt;br /&gt;Amazing&lt;br /&gt;amazing&lt;br /&gt;Beat&lt;br /&gt;beat&lt;br /&gt;Beyond&lt;br /&gt;beyond&lt;br /&gt;"Book a profit"&lt;br /&gt;"book a profit"&lt;br /&gt;"Booked a profit"&lt;br /&gt;"booked a profit"&lt;br /&gt;"Books a profit"&lt;br /&gt;"books a profit"&lt;br /&gt;"Books profit"&lt;br /&gt;"books profit"&lt;br /&gt;"Booking a profit"&lt;br /&gt;"booking a profit"&lt;br /&gt;Confidence&lt;br /&gt;confidence&lt;br /&gt;Confident&lt;br /&gt;confident&lt;br /&gt;Exceed&lt;br /&gt;exceed&lt;br /&gt;Exceeded&lt;br /&gt;exceeded&lt;br /&gt;Good&lt;br /&gt;good&lt;br /&gt;Great&lt;br /&gt;great&lt;br /&gt;Greatly&lt;br /&gt;greatly&lt;br /&gt;Grow&lt;br /&gt;grow&lt;br /&gt;Growing&lt;br /&gt;growing&lt;br /&gt;Growth&lt;br /&gt;growth&lt;br /&gt;High&lt;br /&gt;high&lt;br /&gt;Higher&lt;br /&gt;higher&lt;br /&gt;Highest&lt;br /&gt;highest&lt;br /&gt;Improved&lt;br /&gt;improved&lt;br /&gt;Improving&lt;br /&gt;improving&lt;br /&gt;Increase&lt;br /&gt;increase&lt;br /&gt;Increasing&lt;br /&gt;increasing&lt;br /&gt;Optimistic&lt;br /&gt;optimistic&lt;br /&gt;Outperformed&lt;br /&gt;outperformed&lt;br /&gt;Positive&lt;br /&gt;positive&lt;br /&gt;Raise&lt;br /&gt;raise&lt;br /&gt;Raised&lt;br /&gt;raised&lt;br /&gt;Raises&lt;br /&gt;raises&lt;br /&gt;Raising&lt;br /&gt;raising&lt;br /&gt;Rise&lt;br /&gt;rise&lt;br /&gt;Rising&lt;br /&gt;rising&lt;br /&gt;Rose&lt;br /&gt;rose&lt;br /&gt;Soar&lt;br /&gt;soar&lt;br /&gt;Soars&lt;br /&gt;soars&lt;br /&gt;Soaring&lt;br /&gt;soaring&lt;br /&gt;Strength&lt;br /&gt;strength&lt;br /&gt;Strengthen&lt;br /&gt;strengthen&lt;br /&gt;Strengthens&lt;br /&gt;strengthens&lt;br /&gt;Strengthening&lt;br /&gt;strengthening&lt;br /&gt;Strong&lt;br /&gt;strong&lt;br /&gt;Succeeded&lt;br /&gt;succeeded&lt;br /&gt;Success&lt;br /&gt;success&lt;br /&gt;Successes&lt;br /&gt;successes&lt;br /&gt;Surge&lt;br /&gt;surge&lt;br /&gt;Surging&lt;br /&gt;surging&lt;br /&gt;Up&lt;br /&gt;up&lt;br /&gt;"Up Huge"&lt;br /&gt;"Up huge"&lt;br /&gt;"up huge"&lt;br /&gt;Underestimated&lt;br /&gt;underestimated&lt;br /&gt;&lt;br /&gt;Words might be, for a bad earnings piece:&lt;br /&gt;&lt;br /&gt;Bad&lt;br /&gt;bad&lt;br /&gt;Badly&lt;br /&gt;badly&lt;br /&gt;Below&lt;br /&gt;below&lt;br /&gt;"Book a loss"&lt;br /&gt;"book a loss"&lt;br /&gt;"Booked a loss"&lt;br /&gt;"booked a loss"&lt;br /&gt;"Books a loss"&lt;br /&gt;"books a loss"&lt;br /&gt;"Booking a loss"&lt;br /&gt;"booking a loss"&lt;br /&gt;Cost&lt;br /&gt;cost&lt;br /&gt;Costs&lt;br /&gt;costs&lt;br /&gt;Cut&lt;br /&gt;cut&lt;br /&gt;Cuts&lt;br /&gt;cuts&lt;br /&gt;Decrease&lt;br /&gt;decrease&lt;br /&gt;Decreases&lt;br /&gt;decreases&lt;br /&gt;Decreasing&lt;br /&gt;decreasing&lt;br /&gt;Disappoint&lt;br /&gt;disappoint&lt;br /&gt;Disappoints&lt;br /&gt;disappoints&lt;br /&gt;Disappointed&lt;br /&gt;disappointed&lt;br /&gt;Disappointing&lt;br /&gt;disappointing&lt;br /&gt;Disappointment&lt;br /&gt;disappointment&lt;br /&gt;Down&lt;br /&gt;down&lt;br /&gt;Drop&lt;br /&gt;drop&lt;br /&gt;Dropped&lt;br /&gt;dropped&lt;br /&gt;Dropping&lt;br /&gt;dropping&lt;br /&gt;Fail&lt;br /&gt;fail&lt;br /&gt;Failed&lt;br /&gt;failed&lt;br /&gt;Failing&lt;br /&gt;failing&lt;br /&gt;Fall&lt;br /&gt;fall&lt;br /&gt;Falling&lt;br /&gt;falling&lt;br /&gt;Fell&lt;br /&gt;fell&lt;br /&gt;Hit&lt;br /&gt;hit&lt;br /&gt;Hurt&lt;br /&gt;hurt&lt;br /&gt;Hurts&lt;br /&gt;hurts&lt;br /&gt;Hurting&lt;br /&gt;hurting&lt;br /&gt;"Increased cost"&lt;br /&gt;"increased cost"&lt;br /&gt;"Increased costs"&lt;br /&gt;"increased costs"&lt;br /&gt;"Increasing cost"&lt;br /&gt;"increasing cost"&lt;br /&gt;"Increasing costs"&lt;br /&gt;"increasing costs"&lt;br /&gt;"Increased * cost"&lt;br /&gt;"increased * cost"&lt;br /&gt;"Increased * costs"&lt;br /&gt;"increased * costs"&lt;br /&gt;"Increasing * cost"&lt;br /&gt;"increasing * cost"&lt;br /&gt;"Increasing * costs"&lt;br /&gt;"increasing * costs"&lt;br /&gt;Lack&lt;br /&gt;lack&lt;br /&gt;Lacks&lt;br /&gt;lacks&lt;br /&gt;Lacking&lt;br /&gt;lacking&lt;br /&gt;Less&lt;br /&gt;less&lt;br /&gt;Loss&lt;br /&gt;loss&lt;br /&gt;Low&lt;br /&gt;low&lt;br /&gt;Lower&lt;br /&gt;lower&lt;br /&gt;Lowest&lt;br /&gt;lowest&lt;br /&gt;Negative&lt;br /&gt;negative&lt;br /&gt;Overestimated&lt;br /&gt;overestimated&lt;br /&gt;Poor&lt;br /&gt;poor&lt;br /&gt;Poorly&lt;br /&gt;poorly&lt;br /&gt;Problem&lt;br /&gt;problem&lt;br /&gt;Problems&lt;br /&gt;problems&lt;br /&gt;Reduce&lt;br /&gt;reduce&lt;br /&gt;Reduces&lt;br /&gt;reduces&lt;br /&gt;Reduced&lt;br /&gt;reduced&lt;br /&gt;Reducing&lt;br /&gt;reducing&lt;br /&gt;Shrank&lt;br /&gt;shrank&lt;br /&gt;Shrink&lt;br /&gt;shrink&lt;br /&gt;Shrunk&lt;br /&gt;shrunk&lt;br /&gt;Struggle&lt;br /&gt;struggle&lt;br /&gt;Struggles&lt;br /&gt;struggles&lt;br /&gt;Struggling&lt;br /&gt;struggling&lt;br /&gt;Under&lt;br /&gt;under&lt;br /&gt;Underperform&lt;br /&gt;underperform&lt;br /&gt;Underperformed&lt;br /&gt;underperformed&lt;br /&gt;Underperforming&lt;br /&gt;underperforming&lt;br /&gt;Worse&lt;br /&gt;worse&lt;br /&gt;Worsening&lt;br /&gt;worsening&lt;br /&gt;Worst&lt;br /&gt;worst&lt;br /&gt;&lt;br /&gt;Of course, this is child's play compared to what classification algorithms are capable of.&amp;nbsp; One nice way to start an information extraction system is with Mahout, a Java library specializing in machine learning and commonly used with Hadoop, a framework for data-intensive applications requiring distributed processing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-163614348705465571?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/163614348705465571/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/event-arbitrage.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/163614348705465571'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/163614348705465571'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/event-arbitrage.html' title='Event Arbitrage'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-8269399074800775355</id><published>2011-06-05T01:25:00.000-07:00</published><updated>2011-07-06T18:39:58.993-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Algotrading'/><category scheme='http://www.blogger.com/atom/ns#' term='Pattern Recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='HFT'/><category scheme='http://www.blogger.com/atom/ns#' term='High Frequency Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='Volatility Modeling'/><category scheme='http://www.blogger.com/atom/ns#' term='Algorithmic Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistical Arbitrage'/><category scheme='http://www.blogger.com/atom/ns#' term='Quantitative Finance'/><category scheme='http://www.blogger.com/atom/ns#' term='Regime Switching Models'/><category scheme='http://www.blogger.com/atom/ns#' term='Hidden Markov Models'/><title type='text'>Hidden Markov Models</title><content type='html'>&lt;a href="http://en.wikipedia.org/wiki/Hidden_Markov_models"&gt;Hidden Markov models&lt;/a&gt; have been proven successful for speech recognition, and their success carries over to the prediction of financial time series. According to Patterson's &lt;i&gt;The Quants&lt;/i&gt;, and Mallaby's &lt;i&gt;More Money Than God&lt;/i&gt;, &lt;a href="http://en.wikipedia.org/wiki/Renaissance_Technologies"&gt;Renaissance Technologies&lt;/a&gt; owes a great deal of their success to hidden Markov models. Research by academics in &lt;a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-63.pdf"&gt;this paper&lt;/a&gt; and &lt;a href="http://www.cs.sfu.ca/%7Eanoop/students/rzhang/rzhang_msc_thesis.pdf"&gt;this other paper&lt;/a&gt; further validated the financial utility of hidden Markov models, and papers such as  &lt;a href="https://mospace.umsystem.edu/xmlui/bitstream/handle/10355/4795/research.pdf?sequence=3"&gt;this one&lt;/a&gt; demonstrated their superiority over GARCH(1, 1) models for accurate volatility modeling.&lt;br /&gt;&lt;br /&gt;The major issue with using hidden Markov models to predict financial time series is that we are trying to forecast the inherently chaotic. Put differently, forcing HMMs to learn from raw financial data is not always the best idea because it forces them to learn to try and predict the outcome of Brownian motion. On the other hand, that's what information theory is supposed to be about--detecting and predicting signals through a 'noisy' passageway. So while HMMs can still certainly be used on financial data, it's a bit much to ask. The one glaring exception to this is the use of high frequency data, which contains more data and hence is more likely to contain some pattern or other that daily or longer-term data does not reveal. So if dealing with daily or longer-term data, it's a lot easier to do something that eliminates market noise and results in a more statistically calm, pattern-containing time series. Some such methods for hidden Markov models include statistical arbitrage, volatility arbitrage, correlation forecasting, and volume prediction.&lt;br /&gt;&lt;br /&gt;Of course, HMMs can also be used even less directly; for instance, by doing information extraction--getting pure information from humans' news articles, such as those on Reuters.com. (My next post will discuss this briefly, and the one after that will talk about other information extraction methods.)&lt;br /&gt;&lt;br /&gt;But the most fruitful, direct application of HMMs is in high frequency trading. Because they inherently sort returns into groups (with observations of these returns corresponding to certain probability distributions) that are the underlying 'states,' hidden Markov models can separate out statistically different price movements the same way they can distinguish between vowels and consonants in a two-state model. Put differently, the way underlying states fit together with each other means that even if observed returns are uncorrelated with each other across time, they may be related in a more subtle way: one single certain type of return may be followed by another certain type of return more often than by returns not belonging to that type. I'll leave the rest up to the reader's imagination and programming skills.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-8269399074800775355?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/8269399074800775355/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/hidden-markov-models-and-time-series.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/8269399074800775355'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/8269399074800775355'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/hidden-markov-models-and-time-series.html' title='Hidden Markov Models'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-862002038763809168</id><published>2011-06-01T21:08:00.000-07:00</published><updated>2011-06-18T21:09:06.529-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Robert Frey'/><category scheme='http://www.blogger.com/atom/ns#' term='Diversification'/><category scheme='http://www.blogger.com/atom/ns#' term='Principal Components Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Global Macro'/><category scheme='http://www.blogger.com/atom/ns#' term='Portfolio Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Value at Risk'/><category scheme='http://www.blogger.com/atom/ns#' term='John Paulson'/><category scheme='http://www.blogger.com/atom/ns#' term='George Soros'/><title type='text'>A New Kind of Global Macro</title><content type='html'>This is going to be kind of esoteric because I'm holding my cards close to the vest. &lt;br /&gt;&lt;br /&gt;Imagine:&lt;br /&gt;Since the entire basis of global macro is risk management, what if we could put portfolios together one risk factor at a time? Wouldn't that be interesting? Risk exposure is the beginning and the end of all global macro strategy. So why wait until the end to integrate it? It would be much better to bet on the risk factors right off the bat? Why focus on certain commodities, currencies, and corporation's securities (or the respective derivatives)? That was never the point of global macro. We care about capitalizing on macroeconomic changes anyway. Why get exposure to those changes from a few instruments when you could get right to the source, and just buy the factor itself?&lt;br /&gt;&lt;br /&gt;It's simpler. It's more efficient. It's more potent. It's better diversified.&lt;br /&gt;&lt;br /&gt;So the answer is to construct portfolios consisting of nothing but a few uncorrelated risk factors selected in whatever relative portion desired. The method driving the prediction of risk factors is not the issue. That is for another time. What we're concerned with is portfolio allocation once the decisions have been made.&lt;br /&gt;&lt;br /&gt;And best of all? Portfolio optimization is easy. The risk factors are approximated by baskets of many instruments, and the instruments each have a covariance with each other. But that has already addressed by principal components analysis. We can merely treat each factor as an instrument to be bought. (This is reasonable--because it is like buying a stock, which is a basket of risk factors, only these ones have with non-zero sensitivity to only one risk factor.) Once we have bought these baby, de-facto 'stocks,' we realize that they have no correlation at all. Thus, the terms inside the first sigma in the Markowitz Mean-Variance model are eliminated, and the second one is easily maximized by investing the most money where the highest return is expected. This is similar to setting our risk adversity parameter to zero, (though we have done nothing of the kind,) since it is now trivial.&lt;br /&gt;&lt;br /&gt;This may seem dangerous, because the allocation weights are not constrained, but a rule can be added separately, allowing a maximum position size in any given risk factor, as well as a maximum position and/or maximum portfolio variance. On top of that, we can use VaR systems and stress testing like any global macro fund would.&lt;br /&gt;&lt;br /&gt;The exact method for predicting factors varies considerably. There are several approaches that could work, including the John Paulson approach, the Soros approach, and the Robert Frey approach.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-862002038763809168?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/862002038763809168/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/new-kind-of-global-macro.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/862002038763809168'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/862002038763809168'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/06/new-kind-of-global-macro.html' title='A New Kind of Global Macro'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-2123862685366586820</id><published>2011-05-31T02:10:00.001-07:00</published><updated>2011-09-04T03:07:28.408-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Value'/><category scheme='http://www.blogger.com/atom/ns#' term='Size'/><category scheme='http://www.blogger.com/atom/ns#' term='Data-Driven'/><category scheme='http://www.blogger.com/atom/ns#' term='Theory-Driven'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistical Arbitrage'/><category scheme='http://www.blogger.com/atom/ns#' term='Genetic Algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='Momentum'/><category scheme='http://www.blogger.com/atom/ns#' term='Artificial Neural Networks'/><category scheme='http://www.blogger.com/atom/ns#' term='Time Series'/><category scheme='http://www.blogger.com/atom/ns#' term='Optimization'/><category scheme='http://www.blogger.com/atom/ns#' term='Hidden Markov Models'/><title type='text'>The Data-Driven Manifesto</title><content type='html'>First and foremost, timing is everything. And not just in humor.&lt;br /&gt;&lt;br /&gt;The correct timing is the sole requirement for profitability. Whether it be signals generated by a statistical model, or an important event in the news, all information is incorporated very quickly into market prices. Thus, the only thing that matters is timing each trade just right, whether that means capturing imbalances in supply and demand at the microstructure level, parsing news articles with information extraction algorithms, or betting on an trend reversal after seeing a familiar signal. There is no such thing as a temporally-irrelevant investment. Everything depends on time.&lt;br /&gt;&lt;br /&gt;If data is old, one cannot use it to successfully invest. No fundamental data or technical data or obvious statistical data is necessary at all, because it is public knowledge that has already been incorporated into the price of the security, commodity, currency or derivative in question. Betting on a known fact is useless, because the market price already reflects that fact, and will not change due to the fact. How could it? The information has already been unveiled. It is not going to unveil itself again--it cannot become common knowledge twice!&lt;br /&gt;&lt;br /&gt;However, if data suggests a statistical anomaly in the presumably efficient market, then it may be proper to act on it, assuming that others are not aware of the anomaly. Widely observed statistical phenomena such as the value, size, and momentum factors are not good foundations for trading strategies because the phenomena are already widely observed and therefore more risky; indeed, the more market participants using a strategy, the more potential that strategy has to underperform the overall market index.&lt;br /&gt;&lt;br /&gt;The whole reason investment strategies are market-neutral is because their creators did not want to worry about predicting where investors, as a flock, would go next. Unfortunately, any strategy, with sufficient popularity, suffers from the same problem; people invest in the strategy, and then they withdraw capital, creating volatility in returns from the strategy. This is true for everything from statistical arbitrage to the failure of quantitative equity selection models based on value and size. The only reason that the strategies worked well was because few people used them, but rather their returns came from corrections. For instance, when stat arb was still new, success came in the form of lower returns for an outperforming stock. It wasn't that investors necessarily used stat arb and recognized that the value of the stock was too high, but rather that the strategies' successes were phenomenon unrelated to the direct investment decisions of any individual or institution. But once they became popular, the returns of such strategies was directly impacted by a greater portion of the market, that now knew of and traded the strategy directly. At that point, the strategies were inherently subject to the same fickleness that market indices have always been subject to.&lt;br /&gt;&lt;br /&gt;Rather intuitively, the more participants using a strategy, the lower returns associated with it, and the lower the Sharpe ratio. The standard deviation of an overused strategy's returns are also much higher, again for intuitive reasons: the number of market participants is positively correlated with the number of multi-strategy investors; the number of strategies increases, the probability that at least one strategy fails increases as well--and eventually one of the many strategies is bound to fail; as the number of participants increases the number of investors being exposed most heavily to the failing strategy increases as well. As the number of investors being exposed most heavily to the failing strategy approaches infinity, the probability of one of these investors being heavily leveraged and consequently receiving a large margin call (as a result of the failing strategy) converges to unity. From that point, the participant with the margin call liquidates an arbitrary number of strategies' portfolios. If this number includes, say, a value/momentum book, then the value/momentum strategy will suffer from the unwinding of the participant's portfolio.&lt;br /&gt;&lt;br /&gt;The problem, then, is not market crashes or strategy failure, ipso facto, but rather a spillover effect on other portfolios using the same, over-popular strategy. The solution is obvious: use strategies that other participants have (literally) never even heard of. The mere knowledge of a possible strategy may encourage participants to covertly experiment with it, and perhaps put it into practice without declaring it. Luckily, as more participants use the strategy, the Sharpe ratio of the strategy declines, and vigilant observation will allow the original users of the strategy to leave quietly before it fails in dramatic fashion.&lt;br /&gt;&lt;br /&gt;* * * * *&lt;br /&gt;&lt;br /&gt;The problem with theory-driven strategies is that they usually reject temporal trading rules. &lt;br /&gt;For instance, CAPM, EMH, APT and MPT in general fail to account for the possibility of different expected returns across time. They cannot adjust to changing market conditions either, and their models often make too many restrictive assumptions. &lt;br /&gt;&lt;br /&gt;The fullest implication of "data-driven" strategies is that their associated models are not merely created ahead of time and parametrically synchronized with data, but rather that the data itself determines the model's structure, and not just its parameters.&lt;br /&gt;&lt;br /&gt;For instance, the number of hidden layers in an artificial neural network, the number of iterations of a genetic algorithm for portfolio selection, and the number of states in a hidden Markov model are all input by a human programmer. And yet, the decisions made by humans are the ones that constrict the model from becoming fully formed. Thus, these decisions need to be supported by the data &lt;br /&gt;&lt;br /&gt;The optimization method for the most accurate model (as measured by the probability of the model producing the training sequence) should be the one that leads to global optimality. Hill-climbing algorithms can only guarantee local optimality and are therefore less desirable than algorithms that search for global maxima. This is intuitive, since the more accurate the model is, the better it represents the truth behind how the market moves and works.&lt;br /&gt;&lt;br /&gt;Once the globally optimal model's structure is perfected through historical profitability tests, the model is ready to use, and no more human intervention is necessary. However, as a scientific experiment, it would be interesting to see what kind of unexpected connections, classifications, and procedures the models could come up with. It will be complex, and likely counterintuitive.&lt;br /&gt;&lt;br /&gt;"Black box" has become finance-lingo for any algorithmic trading strategy without a simple, logical backing. The models' structures and parameters--or anything too complex or too counterintuitive for a human to understand--are labeled as "black-box" as if that was a bad thing. The fact that the strategy is obscure helps to avoid the crowding effect that leads to the downfall of every hyped investment strategy, from &lt;a href="http://en.wikipedia.org/wiki/Long-Term_Capital_Management"&gt;LTCM's&lt;/a&gt; fixed income arbitrage disaster to &lt;a href="http://www.finalternatives.com/node/15145"&gt;PDT's&lt;/a&gt; temporary Stat Arb troubles in August and November 2007. The more black-box it is, the less others are likely to catch on, and the better the strategy's performance will be. Put differently, assuming that it is sound, it won't be subject to failure on account of sheer popularity. From a bigger perspective, all data-driven strategies work well--assuming that they are not&lt;i&gt; too&lt;/i&gt; well known, in which case their success is nothing but a house of cards that has been lucky not to suffer from a gust of wind--because they exploit market phenomena that &lt;i&gt;do&lt;/i&gt; exist, rather than ones that &lt;i&gt;ought to&lt;/i&gt; exist. Indeed, data-driven strategies are valid because they have been validated by the market.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-2123862685366586820?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/2123862685366586820/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/05/data-driven-manifesto_31.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/2123862685366586820'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/2123862685366586820'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/05/data-driven-manifesto_31.html' title='The Data-Driven Manifesto'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6505913835072324494.post-2212896359476491617</id><published>2011-05-31T00:47:00.000-07:00</published><updated>2011-07-08T21:57:03.605-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Market Neutral'/><category scheme='http://www.blogger.com/atom/ns#' term='High Frequency Trading'/><category scheme='http://www.blogger.com/atom/ns#' term='TWAP'/><category scheme='http://www.blogger.com/atom/ns#' term='Principal Components Analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='Portfolio Management'/><category scheme='http://www.blogger.com/atom/ns#' term='Componentless'/><category scheme='http://www.blogger.com/atom/ns#' term='Leverage'/><category scheme='http://www.blogger.com/atom/ns#' term='Transaction Cost Modeling'/><category scheme='http://www.blogger.com/atom/ns#' term='VWAP'/><category scheme='http://www.blogger.com/atom/ns#' term='Value at Risk'/><category scheme='http://www.blogger.com/atom/ns#' term='Risk Factors'/><title type='text'>High Frequency Portfolios</title><content type='html'>An investment management company must keep track of risk across all positions and strategies used in each of its funds. At high frequency, however, this matters less because the positions are held only a short time, and the strategies can be made market neutral or even componentless (as measured by Principal Components Analysis). Instead, the focus is on transaction costs. How much money does the strategy lose from market impact when entering and exiting the trade? How much of those remaining profits are eaten away at by broker fees? These costs will determine how much of a position may be built in the time available for the strategy to work. Thus, the costs and the time series forecast determine how much may be invested, and the investment is entirely time based. Enter here; wait for the time series' next value; exit here if prediction is unfavorable or if a better post-transaction cost profit is predicted elsewhere--and hold position if it is favorable; repeat.&lt;br /&gt;&lt;br /&gt;However, the size of orders should also be limited to a certain percentage of portfolio value, the leverage should remain fixed, and a Value-at-Risk system should also be used to determine whether a trade's marginal VaR is too high to make it a worthwhile investment.&lt;br /&gt;&lt;br /&gt;One theoretical (gasp!) price impact model is written as an integral, (from the initial price to the final acceptable transaction price) where the integrand is the price multiplied by the liquidity function, which is a function of the price, and can be derived empirically through experimentation and the use of some statistical regression. Of course, we integrate with respect to the price. &lt;br /&gt;&lt;br /&gt;This is intuitive because we have an infinitely small amount we can buy without moving the price. The price moves as we buy, so we have the price times the amount we can buy at that price, for every price between the initial and the final one. (And this works in reverse if selling--the top term of the integral will just be lower than the bottom.)&lt;br /&gt;&lt;br /&gt;If there is a certain largest absolute price impact that the investor is willing to allow, he can simply set that impact equal to the integral described above, and solve for the upper term. Then, he can see how many shares he will be buying by integrating the liquidity function with respect to the price, and using the earlier boundaries for the new boundaries of integration. Conversely, if an investor wants to see how much of an impact will be made by a trade of a certain size, the later expression can be set equal to a certain number of shares, and the upper bound of the integral can be solved for. In that case, we plug the upper bound into the former expression, and integrate to find the market impact of the trade.&lt;br /&gt;&lt;br /&gt;From there, the investor can construct a search heuristic (particle swarm optimization, cuckoo search, etc.) to search for the optimal trade size. From there, we can use a sort of execution algorithm that allows for transactions across time--something that minimizes (if buying) or maximizes (if selling) the VWAP or TWAP.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6505913835072324494-2212896359476491617?l=thegrahamian.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thegrahamian.blogspot.com/feeds/2212896359476491617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://thegrahamian.blogspot.com/2011/05/data-driven-manifesto.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/2212896359476491617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6505913835072324494/posts/default/2212896359476491617'/><link rel='alternate' type='text/html' href='http://thegrahamian.blogspot.com/2011/05/data-driven-manifesto.html' title='High Frequency Portfolios'/><author><name>The Grahamian</name><uri>http://www.blogger.com/profile/05043520457306733564</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://3.bp.blogspot.com/_z1O9dBxRmzg/TUO6sbwMuBI/AAAAAAAAABs/43yK_u16QLk/s220/BreakfastRush.bmp'/></author><thr:total>0</thr:total></entry></feed>
