I don't want to sound like a broken record, but I don't want sound like a broken record, but I don't want to sound...
4/25/2012
Dear lord, this is about data mining again isn't it? And I bet it's more crap without picturesIt so is!!:DAll right, what innocuous thing have use done this time? Well, I've made further improvements, and my locally tested score is getting better. I'll share some numbers but first a word about scoring....joy.
there are really 3 scores I look at. They all approach 1 one as the algorithm get's closer and closer to the perfect prediction but each score measures different things so, depending on what you are looking for 1 may be preferable over the other. They are: accuracy, F-Score average and F-Score total.
Accuracy is simply the percentage correct over the total possible answers. This may seem like the best way to score, but it's not. Here's why, if you are picking yes or no for each item and 90% of the answers are no. Then naively picking no every time will produce a 90% result. The thing is the value is in picking out the Yeses. not the Nos. I list it mainly for insight into, well, overall accuracy. if my program is not naively picking all one or the other there is going to be a uniformity of error. So, when I see 90% (in the above example) it's generally going to be more like 90% of the yeses were right and 90% of the nos were right. not always mind you but generally.
F-Test scores are more complicated but are essentially a measure of explained variability divided by unexplained variability ... times 2. if you don't double the score it's actually a 0 to 0.5 score 0 to 1 is what I want. So what is variability? Wow, ok... I don't want to give a long winded explanation here that no one will care about,read or understand. So I'm going to jump to the chase and show the final result and let you stew on it. Then explain a little.
private double fTestMeasure(Set userLabels, Set trueLabels) {
double commonSize = intersection(userLabels, trueLabels).size();
long userLabelsSize = userLabels.size();
long trueLabelsSize = trueLabels.size();
double fmeasure = 0D;
if (userLabelsSize != 0L) {
double precision = commonSize / userLabelsSize;
double recall = commonSize / trueLabelsSize;
if (precision + recall > 0D)
fmeasure = 2.0D * precision * recall / (precision + recall);
}
return fmeasure;
}
Consider That userLabels is the set of Yeses we selected. Also consider trueLabels to be the real set of answers (the correct yeses). commonSize is the overlap (what we got right). Or, another way to think about it, really there are 4 possible out comes. We predict yes, and the answer is yes yea!group a. We predict yes and the answer is noboo!group b. we predict no and the answer yesboo!group c. we predict no and the answer is no yea!group d. commonSize is group a. userLabelsSize is group a and group b. trueLabelsSize is group a and group c. Using these values we get our measure.
But you have two measures for f-test? I measure two different sets of solutions. There are many different groups being predicted. I measure each result and average them for one score and for the other I measure results of everything all at once.
So...? when the contest ended I had a best score (f-test average) of 0.439 the winning score was 0.535 the score to beat!. I don't have the answer set so I can't continue to do the exact same scoring this changes at the end of the month when they give us the answers. To continue working on this, I split my training data in half and use half to do testing with and have to train with. This made my score worse but at least I now have a mechanism to self score. my base method (which scores 0.43 not 0.439. 0.439 has an improvement on the base method) scores 0.954139 (accuracy) 0.38439 (f-test avg) 0.40644 (f-test overall). My most recent best method has since score 0.943880 (accuracy) 0.41057543 (f-test avg) 0.41424 (f-test overall). which is a nice improvement! note that the actual accuracy has dropped and the f-test score has gone up. this is due to me actually picking out more correct answers but getting more of the "no"s wrong, but since yeses are so much more rare it's a gain. I think that's enough for today!

