Getting in shape for Kaggle
On the face of it, Kaggle is pretty simple. Companies offer up data sets and ask you to use them to build useful models to predict stuff. Who ever can provide the best predictions, wins. While this may not be everyone’s idea of fun, with a background in Statistics and large part of my actual job being devoted to forecasting and model building, I should have an advantage. The reality is somewhat different.
Like many new Kaggle recruits, I rushed to the most recent competition page. In my case, the aim was to prediction the onset of epileptic seizures from EEG data. This was fantastic, an opportunity to help real, actual people right from my laptop. But first, there were two problems.
- Each patient data file was roughly 8Gb. It took forever to download and crashed any program in which I tried to open them
- I had no idea what an EEG was or how you would spot a seizure on one, even if I could look at the data
What to do?
At this point it was either, give up and go back to watching Game of Thrones, or accept that I had quite a bit to learn before I could expect to do well a competition.
On this site I plan to record my successes and failures on Kaggle and display any interesting data and analysis on the way. Hopefully others who are thinking about competing on Kaggle will find it useful.