PDA

View Full Version : Algorithm For Finding Streaks


gaming_mouse
04-03-2006, 02:28 PM
I'm trying to improve the simulation program for finding downswings over X number of hands, which you can get from this thread (http://forumserver.twoplustwo.com/showthreaded.php?Cat=0&Number=5288919), but in order to answer this question you don't need to look at it or even to know anything about programming.

Over N hands, a downswing of size X has occured if there are two points t1 and t2 (t1 < t2) such that your bankroll at t1 minus your bankroll at t2 is greater than or equal to X.

My current program takes the straightforward and probably very inefficient route of (more or less) testing all possible combinations. There must be a MUCH better way to do this.

Note, however, that you cannot do something silly like look for the highest point of the graph, look for the lowest point of the graph, and then subtract in order to find the largest downswing. If you can answer this question, you should be able to see why immediately.

In any case, I simply need an algorithm that could quickly test if there was a downswing of size X in a given set of data. And all I need is an explanation in words -- no psuedo code or anything like that.

Thanks for any ideas,
gm

gaming_mouse
04-03-2006, 02:55 PM
One thought I had.....

Decide on some number of hands that is small compared to N, and in which we can safely assume that a streak of X is close to impossible. I think 100 or 200 would be safe if we are interested in swings of 150BB or more.

Now take our original data which is essentially a list of numbers giving our bankroll after each hand, and replace it with two lists which are each 100x shorter, and represent our max bankroll over hands 1-100, 101-200, etc, and our min bankroll over those hands too.

Now for each number index i in the max bankroll, find the minimum in the min bankroll list where that the index in the min bankroll list is > i. Call the index of this minimum j. Check if the difference:

max bankroll[ i] - min bankroll[j] > X

If yes, return true.

AvivaSimplex
04-03-2006, 06:51 PM
Two parts:
first, go through the data and identify local maxima and minima. This is easy-if a point is higher than the point before and after it, it is a local maximum, if it is lower than both those points, it's a local minimum.

next, proceed stepwise through the local maxima, starting with the chronologic first. Compare that point to the lowest subsequent local minimum. If there is a difference greater than X, return true, otherwise proceed to the next local maximum. If this local maximum is higher than all previous local maxima, compare it to subsequent minima. If not, skip it and go to the next maximum. If you get through the entire set without finding a difference greater than X, return false.

It might be more efficient to skip identifying the maxima and just go directly to stepwise testing of successively higher points.

This can be combined with your other idea of compressing the data.

gaming_mouse
04-03-2006, 08:06 PM
Thanks aviva...

Thats basically what I ended up doing.

RocketManJames
04-03-2006, 08:36 PM
Forgive me if I've completely missed something, but it seems like you could do this in a single pass of a sequence...

You start at the first number and you mark that as the CurrentMaximum. As you walk the sequence, you continue to take differences between the current number and CurrentMaximum. If you ever see a number greater than CurrentMaximum, you update it and keep walking, all the while you will keep track of largest difference you've seen.

I must be missing something here.

The way I see it is that you're looking for the largest difference found in a sequence where the later value is smaller than the earlier value.

As soon as you update CurrentMaximum, then you have a new fresh point to compare. If eventually you see a new low, then the difference is going to be larger compared against the new "higher" CurrentMaximum than if you used the older CurrentMaximums.

*In Edit*

Or, maybe I misunderstood where the bottle neck was in your code. If it's all the subtraction and if check that's making it slow, forget what I said, I'm a moron.

-RMJ

Nottom
04-03-2006, 11:28 PM
Maybe I don't understand what you are trying to do, but I did a similar thing in excel to see what my biggest downswing in SNGs was.

Granted all I did was find the maximum downswing, but if you are just checking to see if a downswing greater than X occurs than this should work for that as well.

Start by setting your current_downswing to 0. If you lose $x then you set you current_downswing to current_downswing-x. If this is greater than what you are testing for then return true if not keep going. If you win $y then set current_downswing to min(current_downswing+y, 0) [the 0 will occur whenever your current BR is at its all time maximum through that point in time].

This should do the job in one pass (or less) if all you are looking for is the existance of a downswing greater than some amount. If you want maximum downswing then its easy enough to keep that updated as you go.

P.S. - Sorry if this is a bit psuedo-codish but it seemed like the easy way to present it.

gaming_mouse
04-04-2006, 12:02 AM
[ QUOTE ]

You start at the first number and you mark that as the CurrentMaximum. As you walk the sequence, you continue to take differences between the current number and CurrentMaximum. If you ever see a number greater than CurrentMaximum, you update it and keep walking, all the while you will keep track of largest difference you've seen.

I must be missing something here.

[/ QUOTE ]

No, actually I don't think you're missing anything. I think I was missing this very simple solution. And this takes O(N) as opposed to O(N^2) computations, from what I can see. Combining this idea with my interval idea would result in a very fast calculation.

FWIW, the interval idea was actually good enough in itself. It seems the majority of time is taken up by the bootstrap randomizaton of the data, and nothing can be done about that. In any case, I can run 100 trials of 500K hands in about 5 minutes, which is good enough.

Thanks for the idea though... that's exactly what I had in mind.

gm