I've been thinking about this for weeks. I'm normally a jump in there and do it kind of guy but with no contests around to motivate me I've been stewing on it more than writing it. I did implement one version of the GBM in GBM mechanism but it underperforms compared to my current normal tree. There are probably more things to do to it to hone it and make better but this is where I stopped and started stewing.
I've been thinking i'm approaching this the wrong way. I think that trees are great for emulating the process of making decisions based on a current known group of data, but don't we know more? Or rather can't we do better? We actually have the answers so isn’t there a way we can tease out the exact splits we want or get the precise score we want? I think there might be.
I've been looking at build a tree from the bottom up. I'm still thinking about the details but the short version is you start out with all the terminal nodes. You then take them in pairs of 2 and construct the tree. Any odd node sits out and gets put in next go. The "take them in pairs of 2" is the part i of been really thinking hard about. Conventionally going down a tree your splits are done through some system of finding features that cut the data in to pieces you are happy with. I'm going to be doing the same thing but in reverse. I want the 2 data pieces paired together to be as close to each other as possible from a Euclidian distance perspective at least with regards to whatever features I use. But (and this is one of the things I debate on) I also want the scores to be far apart.
When you think about what I’m trying to accomplish putting two items with really far apart scores makes sense. You want to figure out shared qualities in otherwise disjointed items. Similar items could be joined as well if we approach it that way the idea is you are building a tree that hones the answer really quickly and exactly. This however wouldn’t do a good job of producing scores we can further boost... we wouldn’t be finding a gradient. Instead we would be finding 1 point and after 1 or 2 iterations we'd be done.
By taking extreme values the separation would ideally be the difference from the maximum value and the minimum value. If we did that though it would only work for 2 of our data points. The rest would have to be closer each other (unless they all shared those 2 extremes) I think it would be best to match items in the center of the distribution with items on the far extreme. Giving all pairs a similar distance of (max-min)/2 and likely a value that actually is 1/2 the max or the min since it would average to the middle.
In this way we merge up items till we get to a fixed depth from the top (top being a root node). we could keep merging till then and try to climb back down the tree, i might try that at some point but since the splitters you would make won’t work well, i think the better way is to then introduce the test data at the closest terminal node (much like how nodes were merged together) and follow it up the tree till you get to a stopping spot. The average answer there is score you return.
Again I still haven’t implemented it, I’ve been stewing on it. The final piece of the puzzle is exactly how I want to do feature selection for the merging. There has to be some system for maximizing score and minimizing distance so it isn’t all ad-hoc.