Dim Red Glow

A blog about data mining, games, stocks and adventures.

Some animations from the cancer research

I ran the data through (well my layered version of t-sne)  and got some pretty good results. good at least visually. I'm not sure if these are going to pan out well, but i'm hopeful. Let say 2 things before i get to them.

First the more I use the new log version of the image, the more i think that is fundamentally the best way to look at the image data. In fact, I think it might be the best way to deal with just about any real world "thing" that can't be modeled in discrete values. it just does everything you want and in a way that makes good sense. It is, essentially a wave form of the whole object broken down in to 1 number. It might actually be a great way to deal with sounds, 3d models, electrical signals, pictures ... you name it. I think there is room for improvement in the details but the idea itself seems good.

The 2nd thing I want to say is the files are huge, so I'm going to share still images of the final result and then have links to the .Gif animations. I might upload them to youtube but i'm pretty sure gif format is the most efficient way to send them to anyone who wants to see. They lossless which is great and smaller than corresponding mp4s/mpeg/fla files etc. This is because normal video compressors don't handle 1000s moving pixels nearly as efficiently as a gif file with its simple difference layers does.

In these images a purple dots represents cat scan images from a patient with no cancer. The green dots represent cat scan images where they had cancer. and in the final image yellow is test data we don't have solutions for. The goal here is to identify which slices/images of actually have cancer telling info in it. So I would hope most of the images  fall a mix of purple and green dots.

Okay this one is a mess. and your initial reaction might be. how is that useful? well there is an important thing to know here. the fields i fed in included the indexed location of the slices after they were sorted. while this is fairly useful for data mining it is all but useless for visualization (a linear set of numbers is not something that has a meaningful standard deviation or localized average for use in grouping.)  I knew this going in to the processing but I wanted to see the result all the same.

So now we remove the indexing features and try again. (remember each pixel is actually a bunch of features created from multiple grids of images)

Okay that looks a LOT better. in the bottom right you see a group of green pixels. that is some very nice auto-grouping. I went ahead and ran this once more this time with the test data in there as well. it is in yellow.

If anything that's even better. Adding more data does tend to help things. the top group is really good. the ones on the left might be something too. its hard to know, but you dont have to! that's what the data miner tools will figure out for you.

I want to do even more runs and see if I can build a better picture. a few of my features are based on index distances and they should probably be based on my log number instead. Either way it's fun to share! the rubber will really meet the road when i see if this actually gives me good results when i make a submission. (probably a day away at least)

here are the videos they are 69, 47 and 77 meg each so... it'll probably take a while to download.

http://dimredglow.com/images/animation1.gif

http://dimredglow.com/images/animation2.gif

http://dimredglow.com/images/animation3.gif