Dim Red Glow

A blog about data mining, games, stocks and adventures.

Training for a marathon (part 1)

I enjoy running. Well, most of the time. I've run in some form regularly for most of my life. I started in in track field in jr high and then did it again in high school. I ran for fun and to stay in shape periodically in my 20s and much more so in my 30s. In my mid 30s I decided it was time to try a marathon, partly because I wanted to see what it was like, partly cause it was an 'Everest' sort of thing.

I ended up running 3 marathons in about 2 years.  I did the rock and roll marathon in St. Louis 2 years in a row and the Beat the Blerch one in Seattle the first year they had it. The st. Louis rock and roll marathon stopped having full marathons last year (I guess turn out was too low), so that wasn't an option last year. The Beat the Blerch one, while fun, was never meant to be a reoccurring thing for me. I just wanted to travel and try it.

So, I didn't run a marathon at all last year. I guess I could have found one but I was 'meh' in terms of motivation. Also I have had chronic plantar fasciitis. I think I've finally beaten that thanks to a number of things: stretching, lots of time off, changing my running shoes to newtons (which emphasize toe sticking not heal striking) and changing my walking around shoes to a minimalist cross trainer (over a running shoe). I've always had mild problems with foot pain because I tend to be a heal toe walker/runner. I think however what caused this chronic condition was something else. I trained for a 5k (I was running for time) and at the same time I was training for the 2 marathons i did. When one marathon was done and the 5k was run I was already suffering pretty constant pain, and I didn't give my self any real time off to heal since I had 1 more 2 months later. That I think sealed the condition.

Skipping ahead around a year and half. I'm 40 now and it seems like as good a time as any to try again. I didn't ever really think marathons would be my thing and they probably won't be long term, but I did have a few goals that I haven't accomplished in the 3 i ran. I wanted an under 4 hours time (4:25 was my best) and I actually even wanted to do a 3:30 time once I did the 4. Some day I thought it might be fun to do a biatholon or triathalon as well but I want to make those goals first.

I've spent the last 3 or 4 weeks running periodically and mildly dieting, trying to get my body back in to some sort of shape. I can do my "stay in shape" now with out much though. I did it today here are the stats

distance: 4.02 miles

time: 37:46.6

pace: 9:23 /mile

weight: 197.6

The watch I have that tracks this stuff has heart rate and gps in it too, though i'm not sure if it keeps it all. At some point I'll have to hook it up to my computer and download the data and see what is what. my current weight is a heavy 197.6 (heavy for me anyways). A few things I'd like to point out, in a perfect world I'd weigh about 20 pounds less (i'm 5'10-5'9"). This is especially true when i'm about to run the marathon (lighter means less weight to carry). I've never quite reached it when training previously and always end up around 180-185. So that's a goal of sorts to go along with the 4 hour run. 1 really feeds the other. If you put in the miles the weight goes away and it helps you run faster. I'm not what you call hardcore about this though which is why previous attempts have met with limited success.

The mileage I want to be putting in eventually will be around 40 a week probably 4-5 days a week for 8-10 miles each with maybe a big run every other week of like 15-20. Right now i'm probably doing about 8 (two 4 mile runs), though the weather is changing and i'll be putting in a lot more soon. This weekend I'll probably put in 10 more and I'll probably put in 4 more tomorrow. I wont go too nuts till I pick a marathon and get a timeline. training for a marathon eats up a lot of time.

Diet is something I need to think about too.  I've actually been losing weight, sort of... I started a low carb diet but i havent been running enough or been strict enough to really shed the weight. I was actually around 201 when is started about a month ago.. lost 5 pounds real quick and have floated there for the last month probably taking in around 60 or so grams on average a day. That's going to stop now though. I want the energy to train with and not eating the starches puts undue strain on me. yes you will lose weight faster but this isn't about losing weight over night.

It's worth mentioning having a regular exercise routine is important to a long life, looking and feeling healthy, and once the marathon are over i plan on continue running regularly. I will try and give updates on the days I run so anyone interested can follow along.

Improving t-sne (part 3)

This post will be the last for the immediate future on t-sne and really will just cover what I've done and how it all ended up. I have a new direction to head. And while I may use t-sne , I don't think it will be the main focus (though i guess it could be! surprises do happen).

So without further adieu let me surmise what I've written about so far. T-sne is an algorithm that reduces dimensions of a data set while preserving the statistical probability dictated by distance between rows of data and the features there in. It does this by randomly distributing the points in whatever N dimensional space you choose. Then it slowly moves the points iterativly in such a way as best reflects the same probability distribution as what existed in higher order space. You could i suppose increase dimensions in this way too if you want to.

I covered how I would speed this up at the end of part 1. I use a series of smaller "mini" t-sne's and combine them together. All of those t-snes share a lot a large number of the same data points. these were seeded in to the same position. I use those points to stitch together all the various frames. using the closest point as an anchor to find its final relative position in 1 key frame i use to put all the points. In that way all points from all the frames are mapped to 1 of the frames. This seems to work really well though the results aren't exact but the run times are linear. (i'm calling this a win)

In the 2nd part I covered how one can augment the results to get a clear separation of data points based on a particular score.  In short it's not magical, you build features from a data mining tool and provide the results as new features. you then use only those as the t-sne inputs. In my case my GBM does some sub selecting and bagging internally. because of this, my outputs are never exact. This makes for a very good set of inputs for seeing the separation of the scores.

Unfortunately, the results you get don't provide a way for improving the score. I had hoped they would, but truly it's just another way at looking at the final output. If you train on a sub section of the data and feed the t-sne results back in to the GBM machine and re train using the other part and no overall improvements will be found.  (i'm calling this a loss or at best a draw. yes I did get wonderful separation which was my starting goal, but it's nothing more than a visualization tool and didn't lead to something that actually "improves" anything from a data mining perspective.)

It's worth noting that if you don't remove the bias and go ahead and retrain from the t-sne result using the same data you fed it originally to get scores to build the t-sne results, it will slowly get better and better but this is just a serious case of over-fitting. GBM does such a good job of finding connections between data points it will emphasize any aspects that are represented in the new t-sne feed that allow it to get closer to the right score.

I tried this out and found I could turn a .47 results (log-loss) in to a .43 result by adding in the t-sne data. then I could further reduce it all the way to .388 by letting the algorithm pick and choose the features to use (well if brute force trial and error is pick and choose). The point is it's not useful for improving the results of the test set (which actually score .48 when the cross validation set score .388). 

I'll leave this right there for now.

Improving t-sne (part 2)

I ended up last time saying "This brings me to the big failing of t-sne. Sometimes (most of the time) the data is scored in a way that doesn't group on it's own." I intend on trying to tackling that in today's post. I will tell you what I know, what I think might work and the initial results I've gotten so far.

Let me start out by saying I've been casually active in two kaggle contests as of late. https://www.kaggle.com/c/santander-customer-satisfaction and https://www.kaggle.com/c/bnp-paribas-cardif-claims-management . Both contests use a binary classifier for scores one scores using log-loss (which penalizes you very harshly for predicting wrong) and the other uses AUC (ROC if you prefer) though their AUC is predicted probability vs observed probability which is a little weird for AUC, but whatever.

Both of these contests are kind of ideal for t-sne graphing in that there are only 2 colors to represent. Also these contests represent different extremes. One has a very weak signal and the other is more of a 50/50. That is in weak signal case that the "No"s far outnumber the "Yes"s. It's easy to have the signal lost in the noise when the data is like that. If we can leverage t-sne to reduce our dimensions down to 2 or 3 and see via a picture that our grouping for positives and negatives are where they need to be, we probably really will have something.

T-sne gives us 1 very easy way to augment results and 1 not-so-easy way to augment then. The very easy way is to change how it calculates distances. It's important to remember at its core it is trying to assign a probability to a distance for every point vs every other point and then rebalance all the point-to-point probabilities in a lower dimensions. If we augment how it evaluates distance by say expanding or collapsing the distance between points in a uniform way we can change our results as certain dimensions will have more or less importance. The not so easy way i'll talk about later down the article.

I tried at first to improve the solutions using various mathematical transforms. Some where me taking dimensional values to powers (or roots). I tried Principle component analysis. And I even tried periodically rebuilding the probabilities based on some adhoc analysis of where the training data was laid out. All of this was a mess. Not only was I never really successful, but in the end I was trying to do build a data mining tool in my data mining tool!

Skipping a few days ahead I started looking at ways to import results from a Data mining tool to augment the dimensions. I tried using data from Gradient boosting. I tracked which features get used and how often. Then when I used that data, I shrunk the dimensions by the proportion they were used. So if a feature/dimension was used 10% of the time (which is a TON of usage) that dimension got reduced to 10% of its original size. If it was used .1% of the time it got reduced to 1/1000th of the size. This produced... better results but how much better wasn't clear and I definitely wasn't producing results that made me go "OMG!" I was still missing something.

Now we come to the hard way to augment the results. We can build better data for the t-sne program to use. This is what I did, I abandoned the usage information entirely and tried using entirely new features generated by GBM. The first step was to build a number based on how the row flowed down the tree. flowing left was a 0 flowing right was a 1. I actually recorded this backwards though since at the final node I want the 0 or 1 to be the most significant digit, so like values ended up on the same end of the number line. Since GBM is run iteratively I could take each loop and use it as a new feature.

This didn't really work. I think because a lot of the splits are really noisy and even with me making the last split the most significant digit, it's still way to noisy to be a useful way to look at the data. This brings me to my most recent idea which finally bears some fruit.

I thought "Well if I just use the final answers instead of the flow of data, not only do i get a method for generating features that can produce either reals or categorical answers (depending on the data you are analyzing) but you also can use any method you like to generate the features. you arent stuck with fixed depth trees."  I stuck with GBM but turned off the fixed depth aspect of the tree and just built some results. I ran 5 iterations with 9 cross validation and kept the training set's predictions from the tree building exercise. The results are really promising. See below (note: I will continue to use my psedo-t-sne which runs in linear time due to time constraints. if you want "perfect results" you will have to go else where for now)

BNP

You can see the different folds being separated since in this implementation they don't cover all the data (it zero fill the rows it was missing) I have since fixed this. But yes there were 9 folds.

Santander

Santander is the one I'm most impressed with its really hard to isolate those few positives in to a group. BNP.... is well less impressive, it was an easier data set to separate so the results are meh at best. The rubber will really meet the road when I put the test data in there and do some secondary analysis on the 2-d version of the data using GBM or random forest.

It's important to know that I'm "cheating" i build my boosted trees from the same training data that I'm then graphing. but due the GBMs baby step approach to solving the problem and the fact that I use bagging at each step of the way to get accuracy the results SHOULD translate in to something that works for the testing data set. I'll only know for sure once I take those results you see above... put the test data in as well and then send the whole 2-d result in to another GBM and see what it can do with it. I'm hopeful (though prior experience tells me I shouldn't be).

*Note: Since I initially put this post together, i sat on it since I wanted to try it before i shared it with the world. I have finished using the initial results on the BNP data and they are biased.. i think for obvious reasons, training on data then using the results to train further... I'll talk more about it in t-sne (part 3). if you want to read more about it right now. check out this thread https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/forums/t/19940/t-sne-neatness/114259#post114259 *

For fun here are the animations of those two image being made. (they are really big files 27 and 14 meg ... also firefox doesn't like them much... try IE or chrome.)

http://dimredglow.com/images/Santander.gif

http://dimredglow.com/images/bnp.gif

 

 

The ban update is in, eye of ugin is out

Well they posted the the new ban list for magic. Also poor eye of ugin you were .... so broken. It was totally the right choice though. Two free mana for any eldrazi spell when there are so many good fairly cheap eldrazi? yes please. They made the right decision. And really eldrazi is still totally viable. I'm going to try keeping the modern, standard and legacy decks I have together for now... though I might tweak the modern one some.

They unbanned 2 cards and that I found surprising. sword of the meek enables a pretty ridiculous combo and Ancestral Vision just makes blue decks and possibly any cascade decks better. If they have room for it that is.

The sword combo deck should change the meta. It's too strong of a combo not to have some people play it. I'll eventually get my hands on a playset and see if I like playing it, but i'm in no hurry.

However, the ancestral visions card will just make some decks better. There is no shardless agent or bloodbraid elf in modern, so you can't get a decent creature AND trigger visions in 1 go, at least not for 3 or 4 mana. I'm sure some decks will just throw it in for the draw advantage 4 turns later for the long game. There might be a way to cheat it in to play besides cascade. It occurs to me just now, I suppose you could use the new shadows over innistrad card Brain in a Jar to cast it outright. You would have to remove the jar from play before the ability resolves and have had no counters on it to begin with (it uses the last known information about the brain in a jar card when resolving the ability). Regardless, I picked up some of these up because they seem like a staple for deck building in modern now that it's legal.

Speaking of the new set, I have just a few small observations from playing in the prerelease(s). The green hate bears seem... nuts. White has some really good creatures as well as Archangel Avacyn (which just wins games). Red got some nice agro support and black got more zombies (never a bad thing) not to mention a vampire or two. I can see black agro variants and Green/white or red/green creature decks being a thing. Especially since collected company has not rotated out yet.

As I already mentioned I'll will continue to play my standard eldrazi for a while, probably changing it slightly to use a few of the newer cards... but I think (eventually) I'm going to try my hand at either a blue or blue/black mill deck. It seems like that could be the next thing.  Blue didn't seem to get all the obvious love that the other colors did but i think it probably got enough stall between Jace, Unraveler of Secrets and Engulf the Shore and mill cards like Startled Awake and Manic Scribe that it could be a real threat especially with a nicely timed languish. That being said add Sphinx's Tutelage to Forgotten Creation or dark deal and all at once the game gets over in a hurry.

Improving T-SNE (part 1)

Well, that's I've been doing (improving t-sne). What is t-sne? Here's a link and here is another link. The tldr version is t-sne is intended as dimensional reduction of higher order data sets. This helps us make pretty pictures when you reduce the data set down to 2-d (or sometimes 3-d). All t-sne is doing is trying to preserve the statistics of the features and re-represent the data in a smaller set of features (dimensions). This allows to more easily see how the data points were organized. However, all to often your labeling and the reality of how the data is organized have very little in common. But when they do the results are pretty impressive.

This isn't my first foray in to messing with t-sne as you can see here and in other places in kaggle if you do some digging. I should probably explain what I've already done and then i can explain what I've been doing. When I first started looking at the algorithm it was in the context of this contest/post. I did some investigation and found an implementation i could translate to C#. Then I started "messing" with it. It has a few limitations out of the box. First the data needs to be real numbers, not categorical. Second the run time does not scale with large sets of data.

The second problem has apparently been addressed using the Barnes-Hut n-body simulation  technique that (as i understand it) windows areas that have little to no influence on other areas. But you can read about it there. This drops the runtime from N Square to n log n . which is a HUGE improvement. I tried to go one better and make it linear. The results I came up with are "meh". They work in linear time but are fuzzy results at best. Not the perfect ones you see from the barnes-hut simulations.

Before I say what I did, let me explain what the algorithm does. It creates a statistical model of the different features of the data getting a probability picture of any given value. Then it randomly scatters the data points in X dimensions (this will eventually be the output and is where the dimensional reduction comes from). Then it starts moving those points step by step in a process that attempts to find a happy medium where the points are balanced statistically. That is, the points pull and push on each other based on how out of place they are with all other points. They self organize via the algorithm. The movement/momentum each row/point gains in each direction is set by the influence those other points should have based on distance and how far away they should be with their particular likelihood. I've toyed with making an animation out of each step. I'll probably do that some day so people (like me) can see first hand what is going on.

So to improve it, what I did is split the data in a whole bunch of smaller sections (windows) each of the same. Each window though shares a set of points with 1 master section. These master points are placed in the same seed positions in all windows and have normal influence in all windows. They get moved and jostled in each window but then it comes time to display results, we have some calculations to do. The zero window is left unaltered but each other window has it's non-shared points moved to positions relative to the closest shared point. so if say row 4000 isn't shared in all windows and its closest shared row is row 7421 and we see what it's dimensional offset is and place it according to that.

The idea is that statistically speaking the influence of those points is the same as any other points. And the influence the points in the main window have on those points is approximately the same as well. The net effect is that all the points move and are cajoled in to groups as if they were one big window. This of course is why my results are fuzzy. because truly only a tiny amount of data is drawing the picture. normally all the interactions between all the points should be happening. But because I wrote it he way I did, my results scale linearly.

Here's example from the original otto group data. first a rendering of 2048 rows randomly selected (of the 61876 ) with 1500 steps using the traditional t-sne.  It took about 10 minutes. A 4096 row should have taken something like 30 minutes (probably 5 minutes of the 10 minutes of processing was more or less linear)... I'm ball parking it definitely would have take a long while.

Now here's my result of all 61876 rows using the windowing technique where i windowed with a size of 1024. (note if i did 2048, it would have take longer than the above photo since each window would take that long) 1500 steps... run time was like 50 minutes.

As you can see what I mean by it not being as good at making clear separation. the first image is MUCH better about keeping 1 group out of another... but the rendering time is SOOOOO much faster in the 2nd when you think about how much it did. And is parallelized (which I'm taking advantage of) since the various windows can run concurrently. Larger windows make the results better but slow everything down. I believe there are 9 different groups in there (the coloring is kind of bad, i really need a good system for making the colors as far away from each other as possible).

Someday (not likely soon) I will probably combine the n-body solution and my linear rendering to get the best of both worlds. So each window will render as fast as possible (n log n time of the window size) and the whole thing will render-scale out linearly and be parallelize-able.

This brings me to the big failing of t-sne. Sometimes (most of the time) the data is scored in a way that doesn't group on it's own. See the groupings are based on the features you send in, the algorithm doesn't actually look at the score or category you have on the data. The only thing the score is used for is coloring the picture after it is rendered. What I need is a way to teach it what is really important not not just statistically significant. I'll talk about that next time! 

A Standard eldrazi deck and a PPTQ

A PPTQ is a Preliminary Pro Tour Qualifier (for magic the gathering). I've not won one before, but I came as close as I ever have this last weekend. I did it playing this deck http://tappedout.net/mtg-decks/24-03-16-4-color-eldrazi/ . It was a 33 person tournament (6 rounds) I ended up losing the semi finals to a rally deck. Rally the Ancestors has pretty much dominated the tournaments in standard at least locally. Though based on mtgtop8.com it seems to be a thing in other places too.

It uses either Zulaport Cutthroat triggers or an unblocked Nantuko Husk to win. Usually the former in my experience. I tried to prepare for it using hallowed moonlight (which is also good against other things) and a single cranial archive which would force them to reshuffle their graveyard (making the rally worthless) but it wasnt enough. In the end I neglected to always leave my 2 mana open to trigger cranial and lost on a top deck rally. Game one they won as they tend to do without my sideboard.

This leads me to possibly a better solution. I think I'll be trying tainted remedy next time. For the rally match up it should work wonders as they don't run enchantment removal. Most of the time when they start trying to win with the cutthroat they have less life than I do and in short it would kill them. I'll have it as a 2 of in the sideboard somewhere. It's also worth mentioning, it will also would help against any decks running normal life gain monsters like seeker of the way or soulfire grand master. Not that we see a lot of those these days.

On a final note concerning the two planeswalkers I ran. The unsung hero of my deck was Sorin, Solemn Visitor. He pulled his weight more than you would expect, i got paired against two prowess decks and he got me through both matches. As far as Gideon, Ally of Zendikar goes, he's been underwhelming as of late. I think that has more to do with the prevalence of dragons and rally in the format. Both decks try to win in a way that make him more or less a non issue.

Gradient Boosting (Part 2)

This time around let’s start from scratch. We have some training data. Each row of the data has some various features with real number values and a score attached to it. We want to build a system that produces those scores from the training data. If we forget for a second "how" and think about it generically we can say that there is some magical box that gives us a result. The result isn't perfect put its better than a random guess.

The results that we get back from this black box are a good first step, but what we really need is to further improve the results. To further improve the results we take the training data and instead of using the scores we were given we assign new scores to the rows. This score is how much we were off from the black box prediction.

Now we send this data in to a new black box and get back a result. This result isn't perfect either but it’s still better than a random guess. The two results then combined give us a better answer then just the first black box call. We repeat this process over and over making a series of black box calls that slowly gets the sum of all the results to match the original scores. If we then send in a single test row (that we don't know the score to) in to each of these black boxes that have been trained on the train data, the results can be added up to produce the score the test data has.

In math this looks something like this

f(x) = g(x) + g'(x) + g''(x) + g'''(x) + g''''(x) .... (etc)

Where f(x) is our score and  g(x) is our black box call. g'(x) is the 2nd black box call which was trained with the adjusted scores, g''(x) is the 3rd black box call with the scores still further adjusted. etc...

A few questions should arise. First how many subsequent calls do we do? And 2nd what exactly is the black box?

I'll answer the second question first. The black box (at least in my case) is the lowly decision tree. Specifically it is a tree that has been limited so that it terminates before it gets to a singular answer. In fact it is generally stopped while large fractions the the training data are still grouped together. The terminal nodes just give averages for the scores of the group at the terminal node. It is important that you limit the tree because you want to build an answer that is good in lots of cases.

Why? Because if you build specific answers and the answer is wrong, correcting the result is nearly impossible. Was it wrong because of a lack of data? Was it wrong because you split your decision tree on the wrong feature 3 nodes down? Was it wrong because this is an outlier? Any one of these things could be true so to eliminate them all as possibilities you stop relatively quickly in building your tree and the get an answer that is never right but at least puts the results in to a group. If you go too far down you start introducing noise in the results. Noise that creeps in because your answers are too specific. It adds static to the output.

How far down should you go? It depends on how much data you have how varied the answers are and how the data is distributed. In general I do a depth of ((Math.Log(trainRows.Length) / Math.Log(2)) / 2.0) but it varies from data set to data set. This is just where I start and I adjust it if need be. In short I go half way down to a specific answer.

Now if you limit the depth uniformly (or nearly uniformly) the results from each node will have similar output. That is the results will fall between the minimum and maximum score just like any prediction you might make, but probably somewhere in the middle (that is important). It will have the same amount of decisions used to derive the result as well so the answers information content should on average be about the same.  Because of this next iteration will have newly calculated target values the range of those scores in the worst case is identical to the previous range. In any other case the range is decreased since the low and high ends got smothered in to average values. So it is probably shrinking the range of scores and at worst leaving the range the same.

Also the score input for the next black box call will still have most of the information in it since all we have done is adjust scores based on 1 result and we did it in the same way to a large number of results. Results that from the previous tree's decisions share certain traits. Doing this allows us to slowly tease out qualities of similar rows of data in each new tree. But since we are starting over when a new tree the groupings end up different each time. In this way subtle shared qualities of rows can be expressed together once their remaining score (original score minus all the black box calls) line up.

This brings us to the first question how many calls should I do? To answer that accurately it’s important to know that usually the result that is returned is not added to the rows final score unmodified. Usually they decrease the value by some fixed percentage. Why? This further reduces the influence that decision tree has on the final prediction. In fact I've seen times where people reduce the number to 1/100th of its returned value. Doing these tiny baby steps can help, but sometimes it just adds noise to the system as well since each generation of a tree may have bias in it or might to strongly express a feature. In any case it depends on your decision trees and your data.

This goes for how many iterations to do as well. At some point you have gotten as accurate as you can get and doing further iterations over fits to the training data and makes your results worse. Or worse yet it just adds noise as the program attempts to fit noise of the data to the scores. In my case I test the predictions after each tree is generated to see if it is still adding accuracy. If I get 2 trees in a row that fail to produce an improvement. I quit (throwing away the final tree). Most program just have hard iteration cut off points. This works pretty well, but leads to a guessing game based on parameters you have setup.

 

 

 

Detroit Magic the Gathering Grand Prix

I went to the Magic the gathering grand prix event in Detroit. It was interesting but maybe less fun than past events, this is mainly due to me having done it too many times before. The novelty has finally worn off.

I made day two playing an eldrazi tron deck I put together. My record was 7-2 going in to day 2 (I had 2 byes) I think this had less to do with my magical brewing skills and more to do with eldrazi being so easy to play and overly powerful. I fully expect eye of ugin to be banned and possibly eldrazi mimic. Though to be honest temple and mimic would work as well. Why mimic? cause it's a 2 mana creature that enables the super fast wins.

If you curve perfectly with nothing more than eye of ugin then waste waste waste... it's a 3/2 turn 2, 4/4 turn 3, 5/5 turn 4. Granted it might die along the way, but essentially you are paying 2 mana for 2 of whatever your biggest creature is. And if it dies the real creature lives. It gets worse if you get up to ulamog (which my tron deck of course ran). It is also abusive in other ways since any colorless creature you put in to play will trigger it's abbility. Just be happy phyrexian dreadnought isn't available in modern. :)

Don't get me wrong without the rest of the eldrazi it's a 'meh' card. If they leave it, it'll be because they want the eldrazi deck to remain a 'thing' in modern. it still could be with just temple and mimic. I would think it'll still be to strong/consistent with that though. The consistency is what really makes it a good deck. Most games I didn't have an eye of ugin opening hand. Losing eye would slow up my deck maybe half the games and make the end game more difficult. But regardless eldrazi tron would probably remain a viable deck.

I should add I didn't play against much eldrazi. I think this was just luck and part of what got me to day 2. I did play about everything else under the sun though. Day one went something like this: I played infect (lost 1-2), white blue planeswalkers (really surprised to play this! won 2-0), mardu tokens/goodstuff (2-0), black white tokens (2-1), red/green eldrazi (lost 1-2), abzan chord of calling deck (2-1) and affinity (won 2-1). Day Two went: storm (lost 1-2), merfolk (won 2-1), eldazi tron (mirror match! won 2-0),  living end(lost 1-2 drop)

My losses could have gone either way most of the time, but that's the luck of the draw. my deck could have used more testing. there were cards I cut almost every game but that just shows you how good eldrazi is that it could carry itself with maybe 4 so-so cards being run with it.

I'm not going to abandon eldrazi post ban. In fact I've long since put together my legacy eldrazi deck and have been tweeking it. I dont see it going anywhere anytime soon :) it's really competitive.

 

Gradient Boosting (Part 1)

Okay! So you want to learn about gradient boosting. Well, first let me point you to the obvious source https://en.wikipedia.org/wiki/Gradient_boosting I'll wait for you to go read it and come back.

Back? Think you understand it? Good! Then you don't need to read further.... probably. I should warn you now in the strictest sense this post is entirely backstory on getting to the point of implementing gradient boosting. You might want to skip to part 2 if you want more explanation of what gradient boosting is doing.

When I first tackled Gradient boosting, I tried it and it didn't work. What I mean to say is I got worse results than Random Forest https://en.wikipedia.org/wiki/Random_forest. Perhaps I'm getting ahead of myself. Let me back up a little more and explain my perspective.

Most people at https://www.kaggle.com/ use tool kits or languages with libraries and built in implementations of all the core functionality they need. That is, the tool kits that they use have everything written for them. They make API calls that perform the bulk of the work. They just assemble the pieces and configure the settings for the calls. 

I write my own code for pretty much everything when it comes to data mining. While I don't reinvent things I have no plans on improving there aren't to many things like that. I didn't write my own version of  PCA https://en.wikipedia.org/wiki/Principal_component_analysis I use the one that was out there in a library on the rare occasion I want to use it. And while I've got my own version of TSNE https://lvdmaaten.github.io/tsne/, it was a rewrite of a javascript implementation that someone else had written. Granted I've tweaked the code a lot for speed and to do some interesting things with it, but I didn't sit down with a blank class file and just write it. But everything else, I've written all by myself.

So why does that make difference (toolkit vs handwritten)? Well, i try stuff and have to figure things out. And because of that my version of the technology might work in a fundamentally different way. Or prehaps what I settle on isn't as good (though it probably seems better at the time). Then when I try and leverage that code for the implementation of gradient boosting, it doesn't work like it should.

The core of gradient boosting and random forests is the decision tree https://en.wikipedia.org/wiki/Decision_tree_learning. When it comes to random forest I have been very pleased with the tree algorithm I've designed. However they just didn't seem to work well stubbed out for gradient boosting. I can only think of three explanations for this.

  1. My stubbed out trees tended to be biased a certain way.
  2. They have a lot of noise in the output.
  3. My gradient boosting algorithm got mucked up due to poor implementation.

That at least is the best I can figure.

Recently I made a giant leap forward in my decision tree technology, I had an 'AH HA!' moment and changed my tree implementation to use my new method. When I got it all right my scores went up like 10% (ball-parking here) and all of a sudden when I tried gradient boosting it worked as well.The results I got with that were fully another 15-20% better still! All at once I felt like a contender. The rest of the world had long since moved on to XGBoost https://github.com/dmlc/xgboost which was leaving me in the dust. Not so much anymore, but i still haven't won any contests or like made million on the stock market :) .

What changed in my trees that made all the difference? I started using the sigmoid https://en.wikipedia.org/wiki/Logistic_function to make my splitters. I had tried that as well, many years ago but the specific implementation was key. The ah-ha moment was when I realize how to directly translate my data in to a perfect splitter without any messing around "honing in" on a the best split. I think this technology not only gives me an great tree based on better statistics model than using accuracy alone (accuracy is how my old tree worked). But the results are more "smooth" so noise is less of an issue.

 

 

Legacy Eldrazi

Just quick post about legacy. I tried an eldrazi deck I built last night at my local shop's legacy fnm. The short version is the deck did remarkably well and went 4-0. It might just have been luck as any decent deck can fight it's way through the variance every once in a while, but this was fairly one sided. The only game I lost was in the 4th round and it was with a hand I probably shouldn't have kept.

I was thinking I was going to try a green and black deck at the legacy grand prix in June, but I think I might try this if it continues to impress. I didn't get a real test against any combo decks but it beat 2 shardless bug decks, a merfolk deck and miracles. I also played against death and taxes afterwards and it seemed to hold it's own just fine. Anyways here's the deck

Eldrazi Post

I should warn you I might modify the deck the link goes to slightly. This was was just a first go at the deck and as I figure out where the holes are I'll modify the side board and tweak the main board.