First... yes 2 posts in one day. crazy! I wanted to add a thought that the data miners of the world may not like and definitely something kaggle would be dismayed at. If the genetic algorithm stuff i've been working on works... like really works. You wont be able to use their kernel system to compete at least not as it stands.
why? because the results the genetic algorithm produce are based on processing power and time. Not on some particular mechanism you put in place. Now you might be able to get to a good result faster by tweaking some code. but its not like you would run that code indefinitely on their servers. Or if you find a good gene, sharing it is an option but basically you are giving away a winning solution by doing so.
The hard part of this of course is turning any particular problem in to one that the gene mechanism can make short work of. There is definitely skill there, at least until most problems can be distilled via known mechanisms. Unfortunately this isn't much different than a loading/preprocessing script.
I have to admit i'm anxious to try my tool on other problems. So much so that I'm feeling greedy as to its true value and I kind of don't want to share the actual code, lest the world catches up in a hurry. (it will likely get improved a ton more as time goes on) This leads me to... I wonder if I just gave them an order of operations to predict a great solution (the math actions of the genes/variables stored internally) if that would just be a good solution to their contest without me giving them anything in particular? I mean they could use it to get what they wanted without knowing the first thing of how i got there.
There is a function that generates a value using the genes and the raw data. its a single function and essentially that and the string of genes comprises the whole program to claim the result. (Or it should be enough) https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/rules and as long as i can generate the result and give them the mechanisms to do that (and the gene) they have all they need. I could probably distill down to something even simpler. (remove unused available functions, move any external calls internal etc, change the database model to simpler. write a custom loader for the new database model etc. essentially make a 1 off standalone program that does the most exact version of what they need)
I am of course getting way ahead of myself, i havent even come close to the top of the leader board let alone beat my OWN previous score which is meh at best. all of this is, i guess, just food for thought.