a new contest and more scoring stuff
4/27/2012
I went and had a look at kaggle.com to see if I had missed any new contests starting. I had!wooo wooo! They have a new contest which is right up my ally, Predicting a Biological Response. This contest is setup exactly like the previous. That is, I have a training set of data and test set of data. All attributes for each datum are provided. The datum is classified either as being responsive "1" or not responsive "0". I can for my purposes consider that a single group that you are either in or not. Since they are polar opposites and you can never be in both groups, if I did consider it two groups I end up with the same result in reverse on each line (which would score to a one or a zero any way).
Great, so have you submit a result? It sounds like other than loading the data there is no work to do here Yes, I've loaded the data and submitted a result. But unfortunately there is work to do here. my initial submission put me dead last in 338 teams.hahahhahah loser but there is a very good reason.we already covered that. loser The way they are scoring isn't built for predicting a 1 or a 0. they are using a log loss model (you can read up on it here), which basically produces an infinity when you guess wrong with a 100% certainty. Now my score wasn't infinity, so they are clearly considering an infinity to be a very large number but not infinity. The moral of the story is you can dramatically improve your score just by moving the score over a tiny bit off of 1 or 0 (assuming you guessed at least 1 wrong).
And, how did you score? Initially, I came in dead last of 338 teams with a score of 14.+++ the number one score is currently around .40++ . When I moved my predictions over a tiny bit my score dropped to 4.+++ . any other details?There are still 48 days left in the contest it's about half over. The top prize is $10,000 dollars and they expect to see how you did it to award the prize (as well as an explanation).
What I need to get busy doing is figuring out a good way to turn my predictions in to a percentage chance. I think I know how I'm going to do it at first at least. I'll build a normal curve of the prediction's distance from 0 typically I score each group in terms of positive and negative distance from zero. positive is a success, negative is not. Then turn that in to a percentage. So zero would be a 50% which is exactly right. and -100 would be closer to 0 and +100 would be closer to 1. I'll probably do all my calculations then artificially scale the number down to 0.02 to 0.98 just to make sure no infinities crop in to the scoring.

