The best portfolios are, no doubt, ones that earn the highest returns with the least volatility. After ascertaining that you possess several informative signals for the same group of securities, you may be wondering how to capitalize on these signals as a whole. If the signals are high-quality enough to predict specific returns, rather than just being the simple "buy/sell" type, then you can take one of two possible approaches:
1) If you are deciding how to allocate capital to different strategies, run 'backtests' to determine the historical average return and the standard deviation of each strategy, as well as the historical correlation between such strategies, and then use mean-variance portfolio optimization to find the Sharpe-optimal 'weighting' of each strategy in a multistrategy fund. While this is great for finished strategies that utilize completely different structures (for instance, stat arb is market neutral, global macro is directional), it unfortunately does not solve the heart of the problem: if there are no strategies, but only signals, how do we construct a multisignal portfolio at the securities level?
2) use historical price data to generate signals, and use all signals from a given day in the past to do multivariate linear regression and determine the sensitivity in the output to the sensitivity in input. Alternately, we can train an artificial neural network to do something similar, and pray it doesn't overfit the data. Lastly, we can set up an machine learning system that uses fuzzy logic to 'reason' its way through investment decisions and portfolio construction, analyzing historical patterns of how certain signals contradict the others when something bad is about to happen. In such an event, the system can work out a more informed expectation of the returns, which can then be fed into a portfolio optimizer.
The Grahamian
Data-driven investment analysis
Terms
This blog is for educational and informational purposes only. The contents of this blog are opinions of the author and should not be interpreted as investment advice. The author takes no responsibility for the investment decisions of other individuals or institutions, and is not liable for any losses they may incur. By reading this blog, you are agreeing that you understand and agree to the terms above.
Wednesday, December 28, 2011
Sunday, September 4, 2011
Alpha Prediction
Though the topic is very similar to statistical arbitrage, I thought I'd address the issue of alpha prediction instead. If we do an OLS linear regression on a time series of equity returns, with the independent variable being the difference between the returns of a benchmark equity index and the risk free rate, the equity's return can be described in terms of the index return by
R_k = alpha_k + beta_k * (R_m - R_f),
where alpha_k and beta_k are parameters from the regression and R_k is the return of equity k, R_m is the return of the market index that equity k is in, and R_f is the risk-free rate.
Then, by buying one share of equity k, and short selling beta_k shares of the equity index, we can hedge our exposure to the term R_m. Doing so makes us beta-neutral, though of course such a procedure is not very effective in portfolio management when only implemented on a few equities. In practice, we would need to do this on many equities at once before we could get a clean statistical edge.
Applying the the ex post facto beta from the regression to the past values of equity k's returns, we can infer its ex post facto alpha at each point in time we measure. Of course, we must not go back too far in time or our measure of beta will be an useless statistic that will only mislead us. If we want to see a very long series of alphas, it is best to re-estimate the alphas by doing regression backwards in time at each point where we are trying to find the alpha.
Moving on. Once we have the time series of alphas--expressed as percentage returns, since those are the units our regressions were in--we can smooth them with a Kalman Filter, a Haar Wavelet Transform, or even just a moving average. Then we can run some type of predictive model on them. (And of course, I suggest using a hidden Markov model for this. A Hidden Markov Gaussian Mixture, where the observations from a each state are drawn from different Gaussian distribution, works well and avoids overfitting.)
Once our model is calibrated with (or has learned from) the data, we forecast the alpha over the next time period (and this depends on what time period we used to measure the data; it could've been seconds, minutes, days, or weeks).
We save the equity's alpha forecast, and run the same process of finding the time series of alphas on each other equity, create a predictive model for the new alpha series, and repeat. Once we have an arbitrarily high number of predictions, we can buy the stocks with positive expected alpha, short the ones with negative expected alpha, and hedge the remaining portfolio beta with the corresponding amount of market index.
This procedure is helpful for identifying stocks that have outperformed or underperformed. Since the strategy is market neutral, it works (in principle at least!) regardless of where the market is going.
Such a strategy could be used at a mutual fund to earn decent returns. However, I imagine it is of little interest to investment banks or hedge funds, since it is essentially a mixture of one-factor statistical arbitrage and trend following.
R_k = alpha_k + beta_k * (R_m - R_f),
where alpha_k and beta_k are parameters from the regression and R_k is the return of equity k, R_m is the return of the market index that equity k is in, and R_f is the risk-free rate.
Then, by buying one share of equity k, and short selling beta_k shares of the equity index, we can hedge our exposure to the term R_m. Doing so makes us beta-neutral, though of course such a procedure is not very effective in portfolio management when only implemented on a few equities. In practice, we would need to do this on many equities at once before we could get a clean statistical edge.
Applying the the ex post facto beta from the regression to the past values of equity k's returns, we can infer its ex post facto alpha at each point in time we measure. Of course, we must not go back too far in time or our measure of beta will be an useless statistic that will only mislead us. If we want to see a very long series of alphas, it is best to re-estimate the alphas by doing regression backwards in time at each point where we are trying to find the alpha.
Moving on. Once we have the time series of alphas--expressed as percentage returns, since those are the units our regressions were in--we can smooth them with a Kalman Filter, a Haar Wavelet Transform, or even just a moving average. Then we can run some type of predictive model on them. (And of course, I suggest using a hidden Markov model for this. A Hidden Markov Gaussian Mixture, where the observations from a each state are drawn from different Gaussian distribution, works well and avoids overfitting.)
Once our model is calibrated with (or has learned from) the data, we forecast the alpha over the next time period (and this depends on what time period we used to measure the data; it could've been seconds, minutes, days, or weeks).
We save the equity's alpha forecast, and run the same process of finding the time series of alphas on each other equity, create a predictive model for the new alpha series, and repeat. Once we have an arbitrarily high number of predictions, we can buy the stocks with positive expected alpha, short the ones with negative expected alpha, and hedge the remaining portfolio beta with the corresponding amount of market index.
This procedure is helpful for identifying stocks that have outperformed or underperformed. Since the strategy is market neutral, it works (in principle at least!) regardless of where the market is going.
Such a strategy could be used at a mutual fund to earn decent returns. However, I imagine it is of little interest to investment banks or hedge funds, since it is essentially a mixture of one-factor statistical arbitrage and trend following.
Thursday, June 30, 2011
Statistical Arbitrage: Part I
Statistical arbitrage is a type algorithmic trading technique that relies on hedging all exposure to risk factors in order to profit from small, mean-reverting, predictable movements in security/currency/commodity prices. Put differently, statistical arbitrage is a quantitative system of trading one instrument against a smaller amount of many others, which allow for macroeconomic variables typically affecting prices of the first instrument to be hedged by similar exposure in the opposite direction by the others.
To do statistical arbitrage, we use a multivariate regression equation, which assumes the instantaneous rate of change in an instrument's price, divided by its price, is equal to its alpha times the change in time, plus sigma for all i of a sensitivity 'beta sub i' times the current value of factor i, and finally, plus an error.
In some cases, if the alpha is low, it may be safely ignored, and the value of the instrument then depends exclusively on the betas, the risk factors, and the cointegration residual. By buying 'beta sub i' portfolios of the risk factor i (which can be described by a portfolio that is revealed by principal components analysis), for each risk factor i, we get a complete portfolio whose value is determined only by the cointegration residual of the single instrument and its alpha, which may or may not be zero. (If the cointegration residual has a nonzero drift, this will be our alpha).
This is an extremely simple procedure. Now that we have a portfolio whose value is to be a synthetic time series that we must model. And that's where approaches diverge. Ornstein-Uhlenbeck processes are the most simple, robust, and widely used model that can be used to describe the cointegration residual, but they may not be the best, since they make many assumptions about the time series without any data to support them. For instance, an Ornstein-Uhlenbeck process assumes that the synthetic time series is stationary, has Gaussian distribution with fixed mean, and has a constant, linear mean-reversion speed dependent on distance from the mean. Problems with these assumptions are permanent changes in the alpha value which means that the cointegration residual becomes non-stationary until the problem is corrected and the new drift is grouped as alpha (hence the need to regularly use regression to separate out this drift, while still not having a large sample size to perform the regression accurately, can be a problem.) Next, if the distribution is non-Gaussian, it would be more efficient to capitalize on than to theorize the issue away. Lastly, reversion speed can be better described by something other than distance. It may actually not be proportional to the distance from the mean, if less participants in the market are willing to risk doing Stat Arb in an Extremistan (and not just out-of-equilibrium) environment, and more might be willing stat arbitrageurs when the the residual is small, pushing it too far out of equilibrium on the opposite side. Lastly, if, to avoid this, market participants avoid holding the instrument once it gets past a certain point--say a certain z-score--we would experience a situation where instruments bounced back (at least occasionally) when they get too close to equilibrium.
Of course I don't know what would happen and how it would happen, but Bayesian Networks have more promise in this regard than their rivals, linear stochastic differential equations. Nonlinear Fokker-Planck Equations seem to have some potential for replacing black box predictions like those of Artificial Neural Networks or Hidden Markov Models, but the issue is that there is no great way to derive the 'correct' formula. Either way, we must make assumptions as to the structure of such an equation before we can try to calibrate it.
That all being said, and knowing I'm a fan of hidden Markov models, I would highly recommend those the most for a stat arb prediction engine.
To do statistical arbitrage, we use a multivariate regression equation, which assumes the instantaneous rate of change in an instrument's price, divided by its price, is equal to its alpha times the change in time, plus sigma for all i of a sensitivity 'beta sub i' times the current value of factor i, and finally, plus an error.
In some cases, if the alpha is low, it may be safely ignored, and the value of the instrument then depends exclusively on the betas, the risk factors, and the cointegration residual. By buying 'beta sub i' portfolios of the risk factor i (which can be described by a portfolio that is revealed by principal components analysis), for each risk factor i, we get a complete portfolio whose value is determined only by the cointegration residual of the single instrument and its alpha, which may or may not be zero. (If the cointegration residual has a nonzero drift, this will be our alpha).
This is an extremely simple procedure. Now that we have a portfolio whose value is to be a synthetic time series that we must model. And that's where approaches diverge. Ornstein-Uhlenbeck processes are the most simple, robust, and widely used model that can be used to describe the cointegration residual, but they may not be the best, since they make many assumptions about the time series without any data to support them. For instance, an Ornstein-Uhlenbeck process assumes that the synthetic time series is stationary, has Gaussian distribution with fixed mean, and has a constant, linear mean-reversion speed dependent on distance from the mean. Problems with these assumptions are permanent changes in the alpha value which means that the cointegration residual becomes non-stationary until the problem is corrected and the new drift is grouped as alpha (hence the need to regularly use regression to separate out this drift, while still not having a large sample size to perform the regression accurately, can be a problem.) Next, if the distribution is non-Gaussian, it would be more efficient to capitalize on than to theorize the issue away. Lastly, reversion speed can be better described by something other than distance. It may actually not be proportional to the distance from the mean, if less participants in the market are willing to risk doing Stat Arb in an Extremistan (and not just out-of-equilibrium) environment, and more might be willing stat arbitrageurs when the the residual is small, pushing it too far out of equilibrium on the opposite side. Lastly, if, to avoid this, market participants avoid holding the instrument once it gets past a certain point--say a certain z-score--we would experience a situation where instruments bounced back (at least occasionally) when they get too close to equilibrium.
Of course I don't know what would happen and how it would happen, but Bayesian Networks have more promise in this regard than their rivals, linear stochastic differential equations. Nonlinear Fokker-Planck Equations seem to have some potential for replacing black box predictions like those of Artificial Neural Networks or Hidden Markov Models, but the issue is that there is no great way to derive the 'correct' formula. Either way, we must make assumptions as to the structure of such an equation before we can try to calibrate it.
That all being said, and knowing I'm a fan of hidden Markov models, I would highly recommend those the most for a stat arb prediction engine.
Tuesday, June 28, 2011
Hidden Markov Models: Part II
Since hidden Markov models help researchers to find what sorts of observations tend to come after other types of observations, we have a situation where we can forecast stock behavior once we know what hidden state the security/commodity/currency/synthetic time series was most recently in. (By a synthetic time series, I mean some type of hedged position with a single value being bet on; cointegration residuals in statistical arbitrage, implied volatilities after delta hedging, and PCA-derived risk factor values.)
To successfully deploy capital using an HMM-driven technique, one needs to avoid overfitting the model to the data. I struggled with this. Closing prices from each day turned out to be too unpredictable to work, because the more time passed, the less the older patterns mattered; consequently, I was forced to limit the size of the training sequence I was using, so that the hidden Markov model would only bother trying to learn from the relevant data. Unfortunately, a training set of only 40 days is too small to work well. But 50 days is pushing it, and more than that is quite outdated any daily trading model.
The way around the problem was to use higher frequency data--because it would be relevant while still providing a wealth of information and hidden patterns. Besides all that, higher frequency data allows for more predictions to be made in any given day, and thus limits volatility in portfolio returns. (Real time prices are available through professional brokerage or subscriptions to specific Reuters or Yahoo Finance services. Delayed, but regularly updated prices are available on Google Finance, and no subscription is necessary.)
In order for a hidden Markov model--or any statistical strategy--to work, the trading techniques must be used many, many times. As the number of times the strategy is used increases, the variability in strategy's overall success decreases, and the strategy has more potential for a clean statistical edge to shine through. Conversely, if only a few instruments are held as a portfolio, the portfolio's return is less certain. Trading a few instruments with a prediction algorithm is like going spearfishing with a toothpick. It really is that impractical.
Also, if you know how to do something with a synthetic time series, do it. There tends to be much less variability in outcomes when unwanted risk factors are hedged, and thus much less uncertainty regarding the hidden Markov model's predictive ability.
To successfully deploy capital using an HMM-driven technique, one needs to avoid overfitting the model to the data. I struggled with this. Closing prices from each day turned out to be too unpredictable to work, because the more time passed, the less the older patterns mattered; consequently, I was forced to limit the size of the training sequence I was using, so that the hidden Markov model would only bother trying to learn from the relevant data. Unfortunately, a training set of only 40 days is too small to work well. But 50 days is pushing it, and more than that is quite outdated any daily trading model.
The way around the problem was to use higher frequency data--because it would be relevant while still providing a wealth of information and hidden patterns. Besides all that, higher frequency data allows for more predictions to be made in any given day, and thus limits volatility in portfolio returns. (Real time prices are available through professional brokerage or subscriptions to specific Reuters or Yahoo Finance services. Delayed, but regularly updated prices are available on Google Finance, and no subscription is necessary.)
In order for a hidden Markov model--or any statistical strategy--to work, the trading techniques must be used many, many times. As the number of times the strategy is used increases, the variability in strategy's overall success decreases, and the strategy has more potential for a clean statistical edge to shine through. Conversely, if only a few instruments are held as a portfolio, the portfolio's return is less certain. Trading a few instruments with a prediction algorithm is like going spearfishing with a toothpick. It really is that impractical.
Also, if you know how to do something with a synthetic time series, do it. There tends to be much less variability in outcomes when unwanted risk factors are hedged, and thus much less uncertainty regarding the hidden Markov model's predictive ability.
Monday, June 13, 2011
Event Arbitrage
"I don't throw darts at a board. I bet on sure things. Read Sun-tzu, The Art of War. Every battle is won before it is ever fought." - Gordon Gekko
I don't mean that the way Gekko did, but the idea is the same with high frequency event arbitrage. By the time ordinary investors have reacted to the news, the 'battle' is over and algorithmic traders have already adjusted the price. The trick is to be the one who gets there first--to be the one who has the fastest news feed, the fastest information extraction algorithms, the fastest execution. As you can imagine, this is exactly what many quantitative hedge funds, proprietary trading desks, and other high frequency traders do--they compete for speed.
It's predictable money. Whoever can both identify a news article and interpret it before the general market has reacted, stands a good chance of making money. The theories behind quantitative finance usually ignore such a possibility, because it implies there is a so called "free lunch" for the traders who can capitalize on it. Since high frequency trading--and HFT event arbitrage in particular--is like picking stocks when you know what they will do next, this puts the efficient market hypothesis and the random walk hypothesis into question.
So the real trick is to develop a class of algorithms to deal with each possible kind of corporate event (including earnings reports, earnings outlook, mergers and acquisitions, and analyst rating changes). There are a few ways to do the information extraction that is necessary to turn news articles into algorithmic trading signals. The first is to set up the keyword and/or 'keyphrase' search lists by hand, meaning that the article is scanned for relevant words and the sentiment attached to them. The second is more complicated: use unsupervised machine learning to connect keywords and keyphrases to actual market performance of equities at the time the news is released. To do this, algorithmic trading firms commonly group articles together using clustering algorithms like the segmental k-means algorithm.
The first is method easy, and its algorithms are fast. The second method yields a better quality interpretation of news articles, though is somewhat slow in comparison. In practice, quant funds usually use a news feed that has already been parsed by such algorithms, and is complete with XML tags to describe what the article is reporting.
The rest of this post will be about the first method. (Being relatively easy to describe!) Words might be, for a list to identify earnings-related stories:
Analyst
analyst
Analysts
analysts
Analysts'
analysts'
Earnings
earnings
EPS
Estimate
estimate
Estimates
estimates
Expect
expect
Expected
Expected
Expects
expects
Expectations
expectations
Forecast
forecast
Forecasts
forecasts
Guidance
guidance
Income
income
Outlook
outlook
Prediction
prediction
Predictions
predictions
Profit
profit
Profits
profits
Words might be, for a good earnings piece:
Above
above
Amaze
amaze
Amazing
amazing
Beat
beat
Beyond
beyond
"Book a profit"
"book a profit"
"Booked a profit"
"booked a profit"
"Books a profit"
"books a profit"
"Books profit"
"books profit"
"Booking a profit"
"booking a profit"
Confidence
confidence
Confident
confident
Exceed
exceed
Exceeded
exceeded
Good
good
Great
great
Greatly
greatly
Grow
grow
Growing
growing
Growth
growth
High
high
Higher
higher
Highest
highest
Improved
improved
Improving
improving
Increase
increase
Increasing
increasing
Optimistic
optimistic
Outperformed
outperformed
Positive
positive
Raise
raise
Raised
raised
Raises
raises
Raising
raising
Rise
rise
Rising
rising
Rose
rose
Soar
soar
Soars
soars
Soaring
soaring
Strength
strength
Strengthen
strengthen
Strengthens
strengthens
Strengthening
strengthening
Strong
strong
Succeeded
succeeded
Success
success
Successes
successes
Surge
surge
Surging
surging
Up
up
"Up Huge"
"Up huge"
"up huge"
Underestimated
underestimated
Words might be, for a bad earnings piece:
Bad
bad
Badly
badly
Below
below
"Book a loss"
"book a loss"
"Booked a loss"
"booked a loss"
"Books a loss"
"books a loss"
"Booking a loss"
"booking a loss"
Cost
cost
Costs
costs
Cut
cut
Cuts
cuts
Decrease
decrease
Decreases
decreases
Decreasing
decreasing
Disappoint
disappoint
Disappoints
disappoints
Disappointed
disappointed
Disappointing
disappointing
Disappointment
disappointment
Down
down
Drop
drop
Dropped
dropped
Dropping
dropping
Fail
fail
Failed
failed
Failing
failing
Fall
fall
Falling
falling
Fell
fell
Hit
hit
Hurt
hurt
Hurts
hurts
Hurting
hurting
"Increased cost"
"increased cost"
"Increased costs"
"increased costs"
"Increasing cost"
"increasing cost"
"Increasing costs"
"increasing costs"
"Increased * cost"
"increased * cost"
"Increased * costs"
"increased * costs"
"Increasing * cost"
"increasing * cost"
"Increasing * costs"
"increasing * costs"
Lack
lack
Lacks
lacks
Lacking
lacking
Less
less
Loss
loss
Low
low
Lower
lower
Lowest
lowest
Negative
negative
Overestimated
overestimated
Poor
poor
Poorly
poorly
Problem
problem
Problems
problems
Reduce
reduce
Reduces
reduces
Reduced
reduced
Reducing
reducing
Shrank
shrank
Shrink
shrink
Shrunk
shrunk
Struggle
struggle
Struggles
struggles
Struggling
struggling
Under
under
Underperform
underperform
Underperformed
underperformed
Underperforming
underperforming
Worse
worse
Worsening
worsening
Worst
worst
Of course, this is child's play compared to what classification algorithms are capable of. One nice way to start an information extraction system is with Mahout, a Java library specializing in machine learning and commonly used with Hadoop, a framework for data-intensive applications requiring distributed processing.
I don't mean that the way Gekko did, but the idea is the same with high frequency event arbitrage. By the time ordinary investors have reacted to the news, the 'battle' is over and algorithmic traders have already adjusted the price. The trick is to be the one who gets there first--to be the one who has the fastest news feed, the fastest information extraction algorithms, the fastest execution. As you can imagine, this is exactly what many quantitative hedge funds, proprietary trading desks, and other high frequency traders do--they compete for speed.
It's predictable money. Whoever can both identify a news article and interpret it before the general market has reacted, stands a good chance of making money. The theories behind quantitative finance usually ignore such a possibility, because it implies there is a so called "free lunch" for the traders who can capitalize on it. Since high frequency trading--and HFT event arbitrage in particular--is like picking stocks when you know what they will do next, this puts the efficient market hypothesis and the random walk hypothesis into question.
So the real trick is to develop a class of algorithms to deal with each possible kind of corporate event (including earnings reports, earnings outlook, mergers and acquisitions, and analyst rating changes). There are a few ways to do the information extraction that is necessary to turn news articles into algorithmic trading signals. The first is to set up the keyword and/or 'keyphrase' search lists by hand, meaning that the article is scanned for relevant words and the sentiment attached to them. The second is more complicated: use unsupervised machine learning to connect keywords and keyphrases to actual market performance of equities at the time the news is released. To do this, algorithmic trading firms commonly group articles together using clustering algorithms like the segmental k-means algorithm.
The first is method easy, and its algorithms are fast. The second method yields a better quality interpretation of news articles, though is somewhat slow in comparison. In practice, quant funds usually use a news feed that has already been parsed by such algorithms, and is complete with XML tags to describe what the article is reporting.
The rest of this post will be about the first method. (Being relatively easy to describe!) Words might be, for a list to identify earnings-related stories:
Analyst
analyst
Analysts
analysts
Analysts'
analysts'
Earnings
earnings
EPS
Estimate
estimate
Estimates
estimates
Expect
expect
Expected
Expected
Expects
expects
Expectations
expectations
Forecast
forecast
Forecasts
forecasts
Guidance
guidance
Income
income
Outlook
outlook
Prediction
prediction
Predictions
predictions
Profit
profit
Profits
profits
Words might be, for a good earnings piece:
Above
above
Amaze
amaze
Amazing
amazing
Beat
beat
Beyond
beyond
"Book a profit"
"book a profit"
"Booked a profit"
"booked a profit"
"Books a profit"
"books a profit"
"Books profit"
"books profit"
"Booking a profit"
"booking a profit"
Confidence
confidence
Confident
confident
Exceed
exceed
Exceeded
exceeded
Good
good
Great
great
Greatly
greatly
Grow
grow
Growing
growing
Growth
growth
High
high
Higher
higher
Highest
highest
Improved
improved
Improving
improving
Increase
increase
Increasing
increasing
Optimistic
optimistic
Outperformed
outperformed
Positive
positive
Raise
raise
Raised
raised
Raises
raises
Raising
raising
Rise
rise
Rising
rising
Rose
rose
Soar
soar
Soars
soars
Soaring
soaring
Strength
strength
Strengthen
strengthen
Strengthens
strengthens
Strengthening
strengthening
Strong
strong
Succeeded
succeeded
Success
success
Successes
successes
Surge
surge
Surging
surging
Up
up
"Up Huge"
"Up huge"
"up huge"
Underestimated
underestimated
Words might be, for a bad earnings piece:
Bad
bad
Badly
badly
Below
below
"Book a loss"
"book a loss"
"Booked a loss"
"booked a loss"
"Books a loss"
"books a loss"
"Booking a loss"
"booking a loss"
Cost
cost
Costs
costs
Cut
cut
Cuts
cuts
Decrease
decrease
Decreases
decreases
Decreasing
decreasing
Disappoint
disappoint
Disappoints
disappoints
Disappointed
disappointed
Disappointing
disappointing
Disappointment
disappointment
Down
down
Drop
drop
Dropped
dropped
Dropping
dropping
Fail
fail
Failed
failed
Failing
failing
Fall
fall
Falling
falling
Fell
fell
Hit
hit
Hurt
hurt
Hurts
hurts
Hurting
hurting
"Increased cost"
"increased cost"
"Increased costs"
"increased costs"
"Increasing cost"
"increasing cost"
"Increasing costs"
"increasing costs"
"Increased * cost"
"increased * cost"
"Increased * costs"
"increased * costs"
"Increasing * cost"
"increasing * cost"
"Increasing * costs"
"increasing * costs"
Lack
lack
Lacks
lacks
Lacking
lacking
Less
less
Loss
loss
Low
low
Lower
lower
Lowest
lowest
Negative
negative
Overestimated
overestimated
Poor
poor
Poorly
poorly
Problem
problem
Problems
problems
Reduce
reduce
Reduces
reduces
Reduced
reduced
Reducing
reducing
Shrank
shrank
Shrink
shrink
Shrunk
shrunk
Struggle
struggle
Struggles
struggles
Struggling
struggling
Under
under
Underperform
underperform
Underperformed
underperformed
Underperforming
underperforming
Worse
worse
Worsening
worsening
Worst
worst
Of course, this is child's play compared to what classification algorithms are capable of. One nice way to start an information extraction system is with Mahout, a Java library specializing in machine learning and commonly used with Hadoop, a framework for data-intensive applications requiring distributed processing.
Sunday, June 5, 2011
Hidden Markov Models
Hidden Markov models have been proven successful for speech recognition, and their success carries over to the prediction of financial time series. According to Patterson's The Quants, and Mallaby's More Money Than God, Renaissance Technologies owes a great deal of their success to hidden Markov models. Research by academics in this paper and this other paper further validated the financial utility of hidden Markov models, and papers such as this one demonstrated their superiority over GARCH(1, 1) models for accurate volatility modeling.
The major issue with using hidden Markov models to predict financial time series is that we are trying to forecast the inherently chaotic. Put differently, forcing HMMs to learn from raw financial data is not always the best idea because it forces them to learn to try and predict the outcome of Brownian motion. On the other hand, that's what information theory is supposed to be about--detecting and predicting signals through a 'noisy' passageway. So while HMMs can still certainly be used on financial data, it's a bit much to ask. The one glaring exception to this is the use of high frequency data, which contains more data and hence is more likely to contain some pattern or other that daily or longer-term data does not reveal. So if dealing with daily or longer-term data, it's a lot easier to do something that eliminates market noise and results in a more statistically calm, pattern-containing time series. Some such methods for hidden Markov models include statistical arbitrage, volatility arbitrage, correlation forecasting, and volume prediction.
Of course, HMMs can also be used even less directly; for instance, by doing information extraction--getting pure information from humans' news articles, such as those on Reuters.com. (My next post will discuss this briefly, and the one after that will talk about other information extraction methods.)
But the most fruitful, direct application of HMMs is in high frequency trading. Because they inherently sort returns into groups (with observations of these returns corresponding to certain probability distributions) that are the underlying 'states,' hidden Markov models can separate out statistically different price movements the same way they can distinguish between vowels and consonants in a two-state model. Put differently, the way underlying states fit together with each other means that even if observed returns are uncorrelated with each other across time, they may be related in a more subtle way: one single certain type of return may be followed by another certain type of return more often than by returns not belonging to that type. I'll leave the rest up to the reader's imagination and programming skills.
The major issue with using hidden Markov models to predict financial time series is that we are trying to forecast the inherently chaotic. Put differently, forcing HMMs to learn from raw financial data is not always the best idea because it forces them to learn to try and predict the outcome of Brownian motion. On the other hand, that's what information theory is supposed to be about--detecting and predicting signals through a 'noisy' passageway. So while HMMs can still certainly be used on financial data, it's a bit much to ask. The one glaring exception to this is the use of high frequency data, which contains more data and hence is more likely to contain some pattern or other that daily or longer-term data does not reveal. So if dealing with daily or longer-term data, it's a lot easier to do something that eliminates market noise and results in a more statistically calm, pattern-containing time series. Some such methods for hidden Markov models include statistical arbitrage, volatility arbitrage, correlation forecasting, and volume prediction.
Of course, HMMs can also be used even less directly; for instance, by doing information extraction--getting pure information from humans' news articles, such as those on Reuters.com. (My next post will discuss this briefly, and the one after that will talk about other information extraction methods.)
But the most fruitful, direct application of HMMs is in high frequency trading. Because they inherently sort returns into groups (with observations of these returns corresponding to certain probability distributions) that are the underlying 'states,' hidden Markov models can separate out statistically different price movements the same way they can distinguish between vowels and consonants in a two-state model. Put differently, the way underlying states fit together with each other means that even if observed returns are uncorrelated with each other across time, they may be related in a more subtle way: one single certain type of return may be followed by another certain type of return more often than by returns not belonging to that type. I'll leave the rest up to the reader's imagination and programming skills.
Wednesday, June 1, 2011
A New Kind of Global Macro
This is going to be kind of esoteric because I'm holding my cards close to the vest.
Imagine:
Since the entire basis of global macro is risk management, what if we could put portfolios together one risk factor at a time? Wouldn't that be interesting? Risk exposure is the beginning and the end of all global macro strategy. So why wait until the end to integrate it? It would be much better to bet on the risk factors right off the bat? Why focus on certain commodities, currencies, and corporation's securities (or the respective derivatives)? That was never the point of global macro. We care about capitalizing on macroeconomic changes anyway. Why get exposure to those changes from a few instruments when you could get right to the source, and just buy the factor itself?
It's simpler. It's more efficient. It's more potent. It's better diversified.
So the answer is to construct portfolios consisting of nothing but a few uncorrelated risk factors selected in whatever relative portion desired. The method driving the prediction of risk factors is not the issue. That is for another time. What we're concerned with is portfolio allocation once the decisions have been made.
And best of all? Portfolio optimization is easy. The risk factors are approximated by baskets of many instruments, and the instruments each have a covariance with each other. But that has already addressed by principal components analysis. We can merely treat each factor as an instrument to be bought. (This is reasonable--because it is like buying a stock, which is a basket of risk factors, only these ones have with non-zero sensitivity to only one risk factor.) Once we have bought these baby, de-facto 'stocks,' we realize that they have no correlation at all. Thus, the terms inside the first sigma in the Markowitz Mean-Variance model are eliminated, and the second one is easily maximized by investing the most money where the highest return is expected. This is similar to setting our risk adversity parameter to zero, (though we have done nothing of the kind,) since it is now trivial.
This may seem dangerous, because the allocation weights are not constrained, but a rule can be added separately, allowing a maximum position size in any given risk factor, as well as a maximum position and/or maximum portfolio variance. On top of that, we can use VaR systems and stress testing like any global macro fund would.
The exact method for predicting factors varies considerably. There are several approaches that could work, including the John Paulson approach, the Soros approach, and the Robert Frey approach.
Imagine:
Since the entire basis of global macro is risk management, what if we could put portfolios together one risk factor at a time? Wouldn't that be interesting? Risk exposure is the beginning and the end of all global macro strategy. So why wait until the end to integrate it? It would be much better to bet on the risk factors right off the bat? Why focus on certain commodities, currencies, and corporation's securities (or the respective derivatives)? That was never the point of global macro. We care about capitalizing on macroeconomic changes anyway. Why get exposure to those changes from a few instruments when you could get right to the source, and just buy the factor itself?
It's simpler. It's more efficient. It's more potent. It's better diversified.
So the answer is to construct portfolios consisting of nothing but a few uncorrelated risk factors selected in whatever relative portion desired. The method driving the prediction of risk factors is not the issue. That is for another time. What we're concerned with is portfolio allocation once the decisions have been made.
And best of all? Portfolio optimization is easy. The risk factors are approximated by baskets of many instruments, and the instruments each have a covariance with each other. But that has already addressed by principal components analysis. We can merely treat each factor as an instrument to be bought. (This is reasonable--because it is like buying a stock, which is a basket of risk factors, only these ones have with non-zero sensitivity to only one risk factor.) Once we have bought these baby, de-facto 'stocks,' we realize that they have no correlation at all. Thus, the terms inside the first sigma in the Markowitz Mean-Variance model are eliminated, and the second one is easily maximized by investing the most money where the highest return is expected. This is similar to setting our risk adversity parameter to zero, (though we have done nothing of the kind,) since it is now trivial.
This may seem dangerous, because the allocation weights are not constrained, but a rule can be added separately, allowing a maximum position size in any given risk factor, as well as a maximum position and/or maximum portfolio variance. On top of that, we can use VaR systems and stress testing like any global macro fund would.
The exact method for predicting factors varies considerably. There are several approaches that could work, including the John Paulson approach, the Soros approach, and the Robert Frey approach.
Subscribe to:
Posts (Atom)