where to begin... first I started looking at just how big 32x32 is on the 512x512 images. It seemed too small to make good clear identification of abnormalities. So I increased my grid segments to 128 ... this might be too big. I'll probably drop them down to 64 next run.
I found a post by a radiologist in kaggle's forums that further reinforced this as a good size (solely based on his identification of a 33x32 legion, given that the arbitrary location 64x64 would be perfect.) .So that's on the short list of changes to make.
also, I have fixed a bug in my sorting (it's really bad this existed :( ) and added a log version of the number used for sorting. (That's actuallly how I caught the sorting bug.) I don't generally need both but it is useful in some contexts to use 1 and in others to use the other. The Log number alone lacks the specificity since 512x512 image is awful big in number form. So rounding/lack of percusion is a problem.
unfortunately even with me fixing my garbage inputs from before my score didn't improve. This leads me to the real problem I need to solve.
I need to be able to identify which image actually has cancer. The patient level identification is just not enough. That's why my score didn't improve when I fixed the inputs. .. but how?
i hope t-sne can help here. I feed in all images with their respective grid data and let it self organize. That's running right now. Hopefully the output puts the images into groups based on abnormalities. Ideally grouping images with cancer. Since I don't actually know which do have cancer, I'll be looking for a grouping of images all marked 1 (from a cancer patient) and hope he that gets it. there will also hopefully be groups made of a mishmash from non cancer images .
I'm building one of those animations from before for fun. I'll share when it's finished
if this works I either feed that tsne data (x,y) in as a new feature or use kmeans to group up the groups and use that. We will see!