Hello again, i've slowly turned over the ideas in my head i wrote about last time, and i think there are 2 big flaws in my idea. first fitting a gbm to data you already have in the training and test data adds nothing. (that's huge). this is with regards to me selecting a few features to make something that has a correlation coefficent equal to some fixed amount of the final result. And 2nd making groups in any fashion that is not some form of transform will likely never give me any new information to work with.
I have a fix to both (of course :) ) i mentioned using tsne as well. i think that's where i need to get my groups from. the features i send in to tsne is the how i get various new groups. everthing else i've said though is still relevant. so it is no longer layers of gbm as much as layers of tsne. once i have relevant results from that i send it all in to gbm and let it do its work.
The groups i make out of tsne (2-d btw) will utilize the linear tsne i've already written (As its super fast and does what i need to do pretty well). and while i will send the the results from the tsne right in to the gbm I'll do more than that. I will also isolate out each regions in the results and build groups from each of those.
To do that, i make a gravity-well map for every point on the map... like they are all little planets. the resulting map is evenly subdivided (likes its a giant square map of X by X...i've been using 50 for each side) and have a 0 or larger value in every square. (inverse squared distance ... 1/(1+ (sourceX - x0)^2 + sourceY - y0)^2 ). i've been limiting the influence to 5 squares from each point just to keep the runtime speedy (its only a 50x50 map anyway so that's pretty good reach and at 6 away the influence is 1/(1+36+36) which is only 0.013 so that's a decent cutoff anyways. Once i have the map i look for any positive points that have lower or equal points in all directions. those points are then used to make groups. so there may only be 1... but likely there will be bunches.
the groups are based off of distance from the various centers of local low spot in the gravity well map. if you were to take the differential of the map, these would be 0 points. so we just find the distance of every point from those centers. This has the added effect of possibly making multiple groups in 1 one pass (which is great) not to mention the data is made via a transformation that works completely statistically and has no direct 1 to 1 correlation with the data you feed it. So you are actually adding something to the gbm mechanism to work with.
At that point the resulting groups should probably be filtered to see if they add any real value using the correlation mechanism i already mentioned in the last post. ideally gbm would just ignore them if they are noisy.