Market Model Thingy

CallMeIshmael · #1 10-31-2007, 11:45 PM

As preface, I'll note that I have very little knowledge of the market, or anything to do with finance. So, some things I could have implicitly assumed could be very silly in the eyes some of you. I've done a lot of biological modelling, and thought Id take a stab at a market model.

I pulled data from yahoo for the NYSE over the past 7.5 years. The sample size ended up being 1747 stocks. Stocks with less than 2050 trading days or ones with the dash in the name (messed w/ my download script, and I figured leaving them out wasnt a big deal) werent included.

I used a 41 term model of various stats regarding past price and volume for the stocks over the previous 50 trading days. The equation was designed to predict the ratio of the next days stock to its current price. The data was cut into pieces for training purposes. Ie. remove days 2000->1800, then regress for coefficients using the updated data set, then use those coefficients to predict market changes over days 2000->1800. Repeat until done.

I tested it in two ways. 1) Examine the results of the top 5 picks for each trading day 2) For each trading day, predict the 100 stocks with the highest percentage increase and then see how many match the actual top 100 performers.

1) Showed promising results. The arithmetic mean for the daily returns was 1.0060 (not sure if this is how its said, but, on average, the stock multiplied themselves by 1.0060), and the geometric mean of the daily returns was 1.0051. To compare: the arithmetic mean return for the entire data set was 1.0010. Calculating the geometric mean for the entire data set would be hard, but its going to be a ways under 1.0051.

2) The test showed an average of 13.11 picks out of 100, for the top 100 performers. This is significantly greater than the random expectation of 5.72.

For something that took only maybe 10 days to put together, and a model that took all of 10 minutes think of, these results seems surprisingly good. Given the debate over the EMH, they seem too good to be true. But, I've checked the code several times over, and if I made a misake, I cant find it.

I havent tested the obvious questions: how much would it make / could it beat a buy-and-hold strategy? The reason being, I dont know how to compute transaction costs. How much are they? Do they go up by # of stocks, or just constant cost per trade? Do you pay when you buy and sell, or just buy? If someone could help me out with those, that would be appreciated. Also, what level of invesment is the assumption that the stocks would behaved similar enough to the way they did without that investment no longer valid?

Also, this has sort of sparked an interest in doing a project like this, but actually putting some time into the model. Can anyone recommend some good reading?

Jimbo · #2 11-01-2007, 12:01 AM

Go back 7 years and see that your results change radically.

Jimbo

CallMeIshmael · #3 11-01-2007, 04:15 AM

[ QUOTE ]
Go back 7 years and see that your results change radically.

Jimbo

[/ QUOTE ]

Hmm. Im not sure if there is some hidden meaning here (ie. referencing some market trend that Im unawre of), so Ill take if on face value.

I did the calculations for stocks with > 4050 trading days, analyzing days 4000->2000 (ie. roughly spanning 7.5 to 15 years ago).

Although it wasnt quite as successful, it still outperformed the market by quite a bit (geometric mean of 1.0026).

Also, it got 15.97 right, on average, for the top 100 picks of the day, which is better than the 10.3 you'd expect at random.

Also, the model was trained on the days 2000->1, meaning that in some cases the model was using training data 15 after the test day. Arguably, part of the difference could be the result of that.

hawk59 · #4 11-01-2007, 09:24 AM

When you have a lot of data and you try to do things with it you will find lots of relationships, most of which have no predictive ability. It's called data mining. I think a lot of the time you just have to ask yourself if it intuitively makes sense, ie you can say a low price/book strategy makes sense to outperform or buying tax loss stocks at the end of the year will outperform in the next year. But when you have 41 variables all mashed up and you think you have something meaningful then that wouldn't make sense to me.

Phone Booth · #5 11-01-2007, 09:51 AM

[ QUOTE ]
As preface, I'll note that I have very little knowledge of the market, or anything to do with finance. So, some things I could have implicitly assumed could be very silly in the eyes some of you. I've done a lot of biological modelling, and thought Id take a stab at a market model.

I pulled data from yahoo for the NYSE over the past 7.5 years. The sample size ended up being 1747 stocks. Stocks with less than 2050 trading days or ones with the dash in the name (messed w/ my download script, and I figured leaving them out wasnt a big deal) werent included.

I used a 41 term model of various stats regarding past price and volume for the stocks over the previous 50 trading days. The equation was designed to predict the ratio of the next days stock to its current price. The data was cut into pieces for training purposes. Ie. remove days 2000->1800, then regress for coefficients using the updated data set, then use those coefficients to predict market changes over days 2000->1800. Repeat until done.

I tested it in two ways. 1) Examine the results of the top 5 picks for each trading day 2) For each trading day, predict the 100 stocks with the highest percentage increase and then see how many match the actual top 100 performers.

1) Showed promising results. The arithmetic mean for the daily returns was 1.0060 (not sure if this is how its said, but, on average, the stock multiplied themselves by 1.0060), and the geometric mean of the daily returns was 1.0051. To compare: the arithmetic mean return for the entire data set was 1.0010. Calculating the geometric mean for the entire data set would be hard, but its going to be a ways under 1.0051.

2) The test showed an average of 13.11 picks out of 100, for the top 100 performers. This is significantly greater than the random expectation of 5.72.

For something that took only maybe 10 days to put together, and a model that took all of 10 minutes think of, these results seems surprisingly good. Given the debate over the EMH, they seem too good to be true. But, I've checked the code several times over, and if I made a misake, I cant find it.

I havent tested the obvious questions: how much would it make / could it beat a buy-and-hold strategy? The reason being, I dont know how to compute transaction costs. How much are they? Do they go up by # of stocks, or just constant cost per trade? Do you pay when you buy and sell, or just buy? If someone could help me out with those, that would be appreciated. Also, what level of invesment is the assumption that the stocks would behaved similar enough to the way they did without that investment no longer valid?

Also, this has sort of sparked an interest in doing a project like this, but actually putting some time into the model. Can anyone recommend some good reading?

[/ QUOTE ]

What kind of assumptions are you using for execution prices? This is really the key for this type of analysis - a lot of great relationships aren't tradable.

I know people will give you a hard time here but all these qualitative insights and patterns people write about and trade on have a much flimsier basis (this stock always does this at time X; when fed cuts, this happens; these sectors are correlated in this way, etc, etc).

Jimbo · #6 11-01-2007, 11:33 AM

I find your analysis quite ineteresting but can't quite define exactly what you mean by the bolded portion in this below quote. In other words what are you calling "the market".

[ QUOTE ]
Although it wasnt quite as successful, it still outperformed the market by quite a bit (geometric mean of 1.0026).

[/ QUOTE ]

Thanks in advance for the interesting topic,

Jimbo

CallMeIshmael · #7 11-01-2007, 01:46 PM

[ QUOTE ]
When you have a lot of data and you try to do things with it you will find lots of relationships, most of which have no predictive ability. It's called data mining. I think a lot of the time you just have to ask yourself if it intuitively makes sense, ie you can say a low price/book strategy makes sense to outperform or buying tax loss stocks at the end of the year will outperform in the next year. But when you have 41 variables all mashed up and you think you have something meaningful then that wouldn't make sense to me.

[/ QUOTE ]

Obviously predictive ability is the key here, I certainly agree to that. Regressing data means nothing if what is about to happen isnt related to what has already happened.

But, the model produced gusses for the top 100 movers for each day over 15 years that were 19.96 and 26.30 (for each 2000 day period) standard deviations above what you'd expect if the model had no predictive ability (technically speaking its not binomial so this isnt 100% correct, but its close, and doing it right wont change the conclusion)

Also, at least to someone without much knowledge of the market, a model like this does make intuitive sense. How a stock is doing today relative to its past recent prices, its max/min over the previous 2 months, and how much volume its been trading for over the past few days, seem like they should have a small correlation to how it will to today.

(just to note: 41 = 10 variables * 4 transformations (x,x^2,ln(x),log10(x)) + 1 constant)

CallMeIshmael · #8 11-01-2007, 01:59 PM

Phone,

"What kind of assumptions are you using for execution prices?"

Sorry, what is an execution price?

Jimbo,

I ran three tests to see how the model compared to market fluctuation.

Imagine we have some amount of money, and opt to buy the top5 picks of the model everyday. Assume we use all of our money (ie. just for the sake of testing assume you can buy fractions of stocks), always buy at open and sell at close. Compare this to:

1) (Amount of money we started with) * (Total Market Value of Stocks at End) / (Total Market Value of Stocks at Start)

(ie. How much the entire test market went up)

2) Randomly buy 5 stocks on day 1, and hold them for 2000 days. (Do this test 1 million times)

3) Use the strategy of buying/selling 5 each day, but do so at random. (Do this test 1 million times)

By "outperforming the market" I mean that the model produced better than the market in test 1, and in a very high percentile of the 1 million results for test 2/3. Basically, I want to test to make sure any increase seen isnt the result of the average stock price going up.

The model produced results that were WAY WAY above average, but, given I didnt take transaction costs into account, the utility of the model is still unknown.

haakee · #9 11-01-2007, 02:07 PM

[ QUOTE ]
I find your analysis quite ineteresting but can't quite define exactly what you mean by the bolded portion in this below quote. In other words what are you calling "the market".

[ QUOTE ]

Although it wasnt quite as successful, it still outperformed the market by quite a bit (geometric mean of 1.0026).

[/ QUOTE ]

[/ QUOTE ]

Does that really matter? Excluding commissions 1.0026/day is about 85%/year assuming 240 trading days.

You'd need pretty big individual positions if you're trading every day through a discount online broker. At $10/trade if you just picked one of your 100 per day you'd eat up close to $5000 in commissions in a year. If your positions were $50K each that would still be 10% of your annual return.

Phone Booth · #10 11-01-2007, 02:28 PM

[ QUOTE ]
Phone,

"What kind of assumptions are you using for execution prices?"

Sorry, what is an execution price?

[/ QUOTE ]

What prices are you buying and selling at?