Dim Red Glow

A blog about data mining, games, stocks and adventures.

Gradient boosting series preamble

Hello folks! I'm going to do another series of blogs this time on gradient boosting. This series will be on-going with no definite time line. I expect it will only take 1 maybe 2 posts to cover the basics of gradient boosting. After that I want to spend some time exploring possible ways to improve it. I may find none, we'll see when we get there. Allow me to catch you up from last week and explain some of my motivation in doing series.

Last time i wrote I had finished 1 kaggle contest and was moving back to another I had previously started. That contest is now over, having ended some 4 hours ago. If you go and look at the leader board you'll actually see I fell some 300 spots after results got posted. It seems I forgot to move my selected submission to the more recent submission I made *grin*. Well, it didn't matter much anyways. I didn't get much of a chance to work on the contest this week and while the gradient boosting version of my code worked much better, nothing was better than my initial submission with it. And that submission is still a seriously far cry from the top of the leader board.

The time I did spend on my code for the contest was spent half trying to improve my code and half trying to find optimal values for the contest. Neither of which was much of a success. The time I spent away from the computer just thinking about boosting in general has led me back here. I find that the way that Gradient boosting has been explained to me over and over again was possibly part of the reason I was complacent about getting it working. I want to present it in another way that perhaps is new to some people, maybe most people. A change of perspective if you will. I'll get on about that when I do my first post in the series.

I'm hoping this all leads to something better. I'm optimistic, but with all things there is a functional best version. You don't see people really improving hash sort or bucket sort. And quicksort is really about as good as it gets for a comparison sort. Compression only goes so far too. I remember back in the 80s and 90s when there always seemed to be a new compression algorithm for better space savings. At some point that stopped. At least when it comes to lossless compression. (I'm still waiting on someone to figure out lossless fractal compression). It seems possible but unlikely GBM is the end of the road for general data mining.