Running and thoughts on data mining

Briefly, a few quick things. I went to vegas a while back and checked out the sphere. It's pretty neat and I fully recommend it if you get a chance to go. Also the new job is still going well.

Now that the quick stuff is out of the way.. let's talk about body changing and running. I do like to run sometimes, but it can ebb at times so there are periods, usually once or twice a year, where I stop for a while. Sometimes the inspiration is just the temperature (too cold or hot) or just getting slowly out of the habit til suddenly its been 3 weeks so I just go with it for a few months, but I always do go back cause I genuinely enjoy it at times and it's good for me. Anyway where i'm going with this is I took an extended break off to allow a running injury to heal. An injury with my heel at that. It was just a sore spot that was probably due to overuse and maybe the wrong footwear. When I returned I didn't think too much about it and went for a "normal" run. For me that's around 40-50 minutes (depending weather and my shape) traveling 4 miles. Boy... it was like i had never run before. I developed an injury in the small of my back that seem to extend down my leg (and my heel hurt again). At the time attributed to just being sore from running anew and/or age. In retrospect while that was surely part of it i think it was almost entirely from the shoes i was using. I ran with said shoes for another 2 months or so before in an attempt to figure out if they were a problem i replaced them. It is worth saying that I had used that particular model of shoe for ages, rebuying new versions as they were released or wore out. It was probably the right choice at the time as i was getting over plantar fasciitis and they did help me with that, but at some point i probably should have changed to a different shoe. I replaced them with a different "model" of shoe that was still designed for running but supported the feet in different ways and what a difference it made! i was still sore but I felt the change immediately. Now probably 2-3 months later, I still have the slightly sore back issue and oddly the heel hurts a little but the pains in the muscles hamstring and calves have gone away. I think once hurt other things in my life continued to aggravate the injury. So, i changed my office chairs back to an old one about 2 weeks ago. One that I liked a lot but needed new foam (enter: new foam cushion) and things are finally improving! The morale of the story, your body changes and if you are a runner double check that your footwear is doing all that it can for you.

The other thing I wanted to mention I thought was interesting enough to share. I saw this video recently, and the author states "AI can't cross this line and we don't know why." (read: Neural Networks and probably expert systems as a whole)  he then goes on to describe how the learning process approaches a line related to data being input in the system and training the model. At the end of the day he says the accuracy (how often it fails a testing metric) of the model is tied directly to the side of the data put in. he even goes as far to explain how the raw data gives us insight in to view of the data (he describes it as a manifold that the classifier projects to) this is all great. I just don't get why he would say we don't know why. He basically explains exactly why. the training data and model are just describing a "surface of answers". The more data the more accurate the resolution of detail on the surface. That is to say, it should be directly related to the information content present in the system. At least as it relates to the classifier. So, you should not be able to cross an error line that is proportional to said information present. In simple system you can cross the line as you derive the underlining nature of the process. like figuring out the force = mass times acceleration. given enough samples of the right type of data if you are trying different equations, you will eventually find the exact fit. Then the line goes away. In cases where there is too much going on or you can never have the information you need, the best you can hope to do is model the surface of answers with samples. searching that surface in a clever/accurate way is what data mining is all about but there is hard limit as described. Just feels weird he didn't put that last bit in there so I figured I would share it here.

Other than that, not to much going on. Still working on my Elliptic curve stuff nothing exciting to report there. It has just a nice way to pass time.

new job, t-sne, ecdlp and dlp

It's been 8 months! I would say time flies, but really it seems to go in spurts. The last 2 months have zipped by but the 4 before that were painfully slow. I think it has something to do with going back to a day job. So, let's start there. i consulted briefly taking a job with a right to hire at AB. Needless to say I didn't stay. While the consultant company k-force was fine and the client Anheuser Busch was fine, the environment lacked the focus and future I was looking for. Maybe said another way, there were a lot of moving parts and I was inheriting a lot of work that didn't fit the tech stack they used internally which makes for an awkward experience all around. Being relegated to working on old technology because you know it, might be okay for a person in general, but when the rest of the world is doing other things and the old school way of doing things is at odds with the new school, it adds stress and general discomfort. In short it wasn't the places or people it was the nature of the work in the environment they had built.

So what happened? Randomly an old friend of mine texted me looking for programming staff for a startup he was involved in. The company was/is tiny and long story made short I was offered the job and took it. I wasn't looking but it really was the best option. I wasn't very happy at AB and I didnt like my prospects there. I'm MUCH happier where I am now. The company is called Keyrune. Its main product is melee.gg which is software as a service. At melee.gg you can setup and run tournaments. At this point it is principally card tournaments (think magic the gathering and lorcana) though I could see e-sports being a good fit in the future. So yay! job! Oh and if it wasn't clear the 4 months from January to April i was at AB and then the last 2 were at Keyrune ... hence i think that is why the last 2 months have felt like they have gone by much quicker.

What else is in the news? I decided that my t-sne exploits were never really going to work as i envisioned them working. I mentioned before that the inherent noise in the system causes problems when using it for data mining. So... Skip a bit. 2 weeks ago after watching a video about a guy using a genetic algorithm for some other random project I was inspired to re-approach the problem and this time just see if I could do things in a single pass. Part of the problem with injecting noise as you refine results, as the noise takes over eventually. Meaning once the resolution of the signal drops to the noises level... you have no signal. This new technique doesn't refine squat. It does the analysis and creates a result in a single pass. So yes, t-sne will have some noise in it but at least i'm not building on the noise over and over again.

The big oh that works. moment was when i decided to use an old stand by of neural networks as my mechanism for weighting features, the max function. The max function works like this; If the result is less than 0 return 0 otherwise return that value. Without getting in to the weeds this was the tool i needed to build an if statement in to the results. It allowed me to weight features add a bias and either return a new value (weight*value + bias) or 0. The genetic algorithm worked "okay" for picking features to weigh but the weighting mechanism was the shining star.

Does it work? Yes, I think. Visually it does seem to do what i set out to do group classifiers. But i'm selling it short if i only talk about the genetic algorithm. genetic algorithms work when you have way to many options and way to much computational power for trial and error. If you can take an intelligent stab at the problem you will likely do better. This is what I did. So I swapped out the genetic algorithm with some least squares fitted results. Normally, in data mining solving an series of equations with least squares gives you too good a fit, as in an overfit. However, here we are only doing 1 analysis and no subsequent refinement. needless to say it works/worked really well... like really well.

Is it an overfit? It's hard to say. I mean right now I'm just doing visualization stuff I'm not actually making data mining predictions so i haven't evaluated a hold out to see if it's garbage or not. It looks good! and there is all sorts of tuning you can do to which features you are selecting for your visualization. I tried against some datasets i know fairly well. 1 from the biological response contest at kaggle (now 12 years old!!!!) and another from the 1 kaggle days I went too. the kaggle days one is a good litmus test. It was notorious hard to do anything with as it only had 250 rows of data and it had like 400 features. in both cases the visualization was meaningful. Rather that is to say in biological response I saw good grouping in the various scenarios i tested. for the "dontoverfit" kaggle set I saw some minor grouping. But that is good. If I saw strong clear defined grouping it would like indicate that it is just overfitting. Regardless until I actually measure the error on a hold outset it is hard to say more. As I type this I was trying it on some minst visual data for handwritten characters. I didnt really design it to work with more than one classifier so... the results are kinda bleh, right now. But honestly, that's a good sign too. it means it is that much more likely not to be overfitting.

Okay so other than that what have I been up to? I've been working on ECDLP and DLP a lot lately. Which is to say, for a long time i've been working on finding new ways factoring large semi-prime numbers. So basically factoring which long ago took on the form form of solving the DLP (discrete logarithm problem) and the ECDLP (Elliptic Curve Discrete Logarithm Problem). First what is the DLP in a nut shell solve this. a^x ≡ b (mod n) where n is asprime a is a primative root of n and b is known. This means you 'just' have to solve x. Since it is done with integers it is not a trivial problem. The ECDLP version is simply  q ≡ k*p (mod n) in this case q and p are points on an elliptic curve and k is a scaling factor. n is generally a prime here as well. You need to solve k. Also generally p is just "1" (it is an X and Y position on the elliptic curve we have decide is represents 1) and q is the point you move to when you add p to itself  k times. The mod n is there because there are only so many points to go through before it repeats. The semantics aren't too important just know that on an elliptic curve you really don't have all the tools you have in normal math. you can compare points, add points, double points, subtract points and using doubling you can multiply points or divide points by a known value but not an known value. Rather you can't take a random X,Y point on the curve and use it to divide another X,Y point. But you can move to a point that is say, 41 times greater where 41 is just constant you know. So, getting that known value "k" in q ≡ k*p (mod n)  is problematic.

In my case I turn the DLP problem in to the factoring problem by changing the modular value "n" from a prime in to a semi-prime p*q  (prime times a prime). I then produce the value I'm trying to solve b. It can actually be calculated from n. The details are beyond the scope of this blog. Maybe i'll add a blog about it specifically someday but suffice to say I try and solve that problem. As for the ECDLP problem, many of the solution mechanisms you use on one you can use on the other. For a time (the last 4 years or so) I've tried many... many algorithms and bits of math to solve ECDLP faster than known methods. And not without some forms of success. Much of my research the last year and half has had to do with recursive functions. There is a lot of promise there. I'll go on more about this another time, but again. I've made some headway on both DLP and ECDLP in recent months. I think it is time to go to conference on cryptography (and probably one on machine learning to see if I can learn what others are up to and see if what I've figured out is meaningful/exciting to others in the field.

 

 

 

 

 

ECDLP, t-sne and job hunting

Heh, I reread that last blog of mine. Boy talk about poorly written! I went ahead and cleaned it up some and it's still kinda bad. This tends to happen when you write via stream of consciousness and then do ad-hoc editing without fully rereading. I'm sure there are writers and editors out there who saw it and went "for the love of all that is sacred re-read what you type!" ah good times. good times. Anyway sorry about that, I'll try and do better.

Anyways, it's been a busy-ish month. I still haven't found a job but I've picked up the pace of applying for them. I wasn't really sure what kind of response I would get. It feels fairly tame so I've compensated and started to up the number applications I'm putting out. It certainly hasn't been like it was 23 years ago! I mention that specifically because I think that was the time I got the most response with hardly any effort. Yeah back in 2000 it was the end of the tech boom and I was still new at my career. At my price point then the demand was kinda crazy. I probably should have asked for more money but alas I was naive to my value. These days each time I go looking for a job it's hit or miss how fast I find something. Something that works well for me I mean, if you take any old job at any old price point you will probably have work the very next day, making next to nothing and be really unhappy doing it. Don't let that happen to you! Regardless I'm still looking.

Beyond job hunting I've spent a couple weeks exploring more stuff with t-sne. I more or less worked through what i was trying to do, which is to say I was trying to make t-sne a tool for data mining. Something beyond visualization and feature reduction. The thing is, sometimes the data is organized in a way that t-sne will auto-group the data for you nicely, but only sometimes. I was trying to write code so that it to adjust itself towards training data automatically. Needless to say it overfit every time. I think the idea is kind of flawed at a certain level. The problem with t-sne (and umap if that's your bag) is they aren't exact. When you do normal data mining you just refine the approximation of the answer over and over again using the training data and cross validation. With t-sne, i can do the same but the output has noise in it and each iteration uses the output with the noise and introduces error.  It treats the noise like facts and they end up dominating any signal in the data. The training data ends up nice and neat and the test data ends up random. In the end I set it aside and called it a failure.

Don't get me wrong it worked really really well at overfitting as if that was some sort of great thing. There just seemed no good way to deal with the noise I was introducing (folding and such wasn't cutting it). I still haven't given up on the idea of doing a 2-d representation of the data and using that to do data mining. It's just i don't feel at this point that t-sne is the way to go. At least not without an 'Ah-ha!' moment where I realize how to stop it from being so biased to the random elements.

I've also been working on the Elliptic Curve Discrete Log Problem (or rather work has continued). It's... gone 'okay'. About 6 months ago I had a small revelation that made it so I could solve a scalar unknown on a prime curve in about 1/2 the steps as before. This built up from an idea I had about 11 months ago. Everything since then has been me trying to push it further (and not doing so). I've learned a lot about some aspects of math I wasn't really well versed in (if at all). I've also done more than a few things that are pretty neat to maybe a small niche crowd but honestly no real progress. I think it's safe to say I'm stuck and just about out of ideas at this point. I say just about cause its hard to let it go. I find myself spending a lot of time thinking and re-thinking certain problems I've come across. Maybe someday I write a blog all about them but for now suffice to say it fills my hours and days when I'm not job hunting.

 

Job hunting and 4 years of exploration

Wow! So, it's weird to think i left my job at Aspire 4 years ago but here I am. I set off from that job wanting to do a number of things. Some of those things had the potential to be money makers. Some were just personal pursuits. In both cases I've had the time to try all of them. It both does and doesn't feel like it's been 4 years. That is to say it feels like the world hasn't changed too much, but I've been busy so those are at odds and maybe it has and I haven noticed. Also, the 2 plus years of pandemic time has a weird separate feeling to it (almost like that time doesn't get counted).

Regardless of how fast or slow it went by I figured it made sense to give a summary of what i've been up to. It's worth throwing out there that I had a blog or two after these last few years which had covered some of it but those are off line now due to the website reset and it's been a good long while since I wrote one any way so I'll give a full run down of all the things I've been up to. It'll at least give you a clear picture of what I did with myself. Here's what I've been up to!

  • Put together a stock investment website ( https://securitiesminer.com/ is still there but i havent' updated it in a long while)
  • Automated financial trading (on https://trade.collective2.com/ I no longer do this though )
  • Automated data mining, including pulling daily data (data from quandl / data.nasdaq.com and the us treasury. turned off currently)
  • Wrote a bunch of add-ons for minecraft (still there on curseforge, a few for java, a bunch for bedrock)
  • Streamed countless hours of gameplay on twitch (Rimworld, Oxygen not included and minecraft)
  • Went on a few trips/vacations (vegas twice for fun, new york for a quandl conference on financial data, Seattle for business)
  • Spent 1000s of hours on algorithms and math around the discrete log problem and the elliptic curve discrete log problem
  • Worked 100s-1000+ of hours  on 3 different games in elecrton.js using babylon.js as the rendering engine (none finished, either lost interest or hit scope problems)
  • Worked out an alternate way to do AI (something I should probably spend more time on. The world's technology could use the variety)
  • Recodified some t-sne stuff I did years ago (clarifying it to be more clear/work better though technically the same stuff i created about 8-9 years ago when t-sne was new)
  • Upgraded computers (it's worth noting this as it's almost always a case of learning where the hardware tech has gone in the last 5-7 years. It's never as easy as "ooh that looks nice")
  • Ran ... lots of running (still do that) no marathons or anything but I still put in probably 4+ miles every other day or so.
  • Played through a number of games I didn't stream: borderlands 3 all the expansions, tiny tina's wonder land, cyberpunk 2077, diablo 4, elden ring (didn't finish , wasn't my thing) and balder's gate 3 (playing that now)
  • Watched countless hours of television, youtube and movies

That's pretty much it, nothing else of note really. I did debate about going to a cryptography conference this last month but ended deciding it was too expensive / made no sense to go right now. Which kind of brings me to where I am today.  That is to say... what's next?

Well I guess it's time to go back to work. That is back to a day job. I had hoped I could produce something truly remarkable with the DLP and ECDLP (discrete logarithm problem and elliptic curve discrete logarithm problem) work I've been doing before I went back. I suppose I have come up with a few remarkable things, but nothing groundbreaking. I figure I need groundbreaking to turn heads. I had hoped I could solve ECCp-131 challenge before I needed to go back to work (https://www.certicom.com/content/certicom/en/the-certicom-ecc-challenge.html) and not just by throwing more cycles at it. That would be quite a feather in my cap and would have opened a lot of doors. It's something i genuinely enjoy working on. So why quit now? the ideas and money are drying up. So I think it's time to turn my focus to getting that job.

It's worth noting if I had managed to do what I wanted my job prospects would probably shift considerably (which I really want as I figured some of those jobs would be a fair bit more interesting). It's hard to stand out to think tanks and places that do novel research without... something. This definitely would have done that. But eh, you can't expect to solve that sort of thing.

I'm currently looking on linked in and am going to try out gun.io . I should probably take a look at the old job sites too. I'm not sure how relevant they are, 20 years ago monster.com was the thing. and 15 years ago dice.com was the up and comer for tech jobs. I'll look in to it some more the next few days.