|
#1
|
|||
|
|||
I need a good algorithm! (Regression related)
Ive got a bunch of data, in the form
[ a1 a2 a3 .. an a(n+1)] [ b1 b2 b3 .. bn b(n+1)] [ c1 c2 c3 .. cn c(n+1)] ... I want to find the best set of n coefficients so that the sum of (coefficient*letter) for the first n terms predicts the last term. Im not really sure what the name for this is, but its some kind of regression. Does anyone have any suggestion for places to look for an algorithm for this type of problem? Or some other point in the right direction? To note: Im looking at a lot of data here, so Im thinking the slower stuff like matlab/R (despite the built in functions for this) wont do it fast enough, and I'll be looking to do it in C. |
#2
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
I'm not a math major but it looks like a linear algebra problem. Which leaves me dumbfounded because I'd bet some money you've taken that before. If that's the case, I don't know how to help you, but if not, you should read up a bit on linear algebra.
|
#3
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
[ QUOTE ]
I'm not a math major but it looks like a linear algebra problem. Which leaves me dumbfounded because I'd bet some money you've taken that before. If that's the case, I don't know how to help you, but if not, you should read up a bit on linear algebra. [/ QUOTE ] Just to clarify: Im looking for ONE set of coefficients, such that the mean square error sum of the approximation is minimized. Im assuming you thought I was looking for a coefficient set for each row. I only wish! |
#4
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
I don't have a good algorithm, but I'd "brute force" it with a simple neural net, and then train the hell out of it. Like one node for each of the coefficients simple. I suppose that it would be pretty tough to avoid local maximums in this sort of thing, though, so that might be a crappy solution.
|
#5
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
[ QUOTE ]
I don't have a good algorithm, but I'd "brute force" it with a simple neural net, and then train the hell out of it. Like one node for each of the coefficients simple. I suppose that it would be pretty tough to avoid local maximums in this sort of thing, though, so that might be a crappy solution. [/ QUOTE ] This exact thought process of "ohh, this might be great. Ohhh, wait, local/global problem. Crap." was something I went through yesterday. |
#6
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
If n>= (number of rows) then it's quite trivial linear algebra problem.
If n< (number of rows) it is much less trivial and requires formulating a hypothesis about your data. It also depends on the nature of your data, i.e. can some data, column 4 for instance, be completely uncorrelated with column (n+1). Check factor analysis for specifics. http://en.wikipedia.org/wiki/Factor_analysis |
#7
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
[ QUOTE ]
If n>= (number of rows) then it's quite trivial linear algebra problem. If n< (number of rows) it is much less trivial and requires formulating a hypothesis about your data. It also depends on the nature of your data, i.e. can some data, column 4 for instance, be completely uncorrelated with column (n+1). Check factor analysis for specifics. http://en.wikipedia.org/wiki/Factor_analysis [/ QUOTE ] v nice. Thanks. re bolded part: would that, in the end, matter? Obv it would be best to not have uncorrelated data for sake of computational time, but, I would assume that as long as there is a lot of data, the coefficient on uncorrelated terms --> 0, no? |
#8
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
[ QUOTE ]
[ QUOTE ] If n>= (number of rows) then it's quite trivial linear algebra problem. If n< (number of rows) it is much less trivial and requires formulating a hypothesis about your data. It also depends on the nature of your data, i.e. can some data, column 4 for instance, be completely uncorrelated with column (n+1). Check factor analysis for specifics. http://en.wikipedia.org/wiki/Factor_analysis [/ QUOTE ] v nice. Thanks. re bolded part: would that, in the end, matter? Obv it would be best to not have uncorrelated data for sake of computational time, but, I would assume that as long as there is a lot of data, the coefficient on uncorrelated terms --> 0, no? [/ QUOTE ] Yeah, in principle you'd get 0 for an uncorrelated data, it just can increase the computational cost by a big factor if you try to 'brute force' it. If I had to solve such a problem I would try to formulate a hypothesis, which would take the physics of the process into account. |
#9
|
|||
|
|||
Re: I need a good algorithm! (Regression related)
Looks like an analysis of variance problem. Fitting the data and a F-test.
|
|
|