Using Amazon Web Services and RStudio for Kaggle

Not all of us can afford (or fit) a super computer in our bedroom. Luckily Amazon offer the next best thing.

Many Kaggle competitions use datasets large enough that you’re likely to need some specialist equipment and a big budget. Amazon Web Services changes this by allowing you to rent computing power and storage space at reasonably low prices.

This is all very well, but with only a vague understanding of cloud computing and having never used Linux, I had no idea how to actually get it working.

First, sign up for a AWS account. You’ll need to put some card details in, but they won’t charge you unless you start using some computing power or running an ‘instance’.

I do pretty much all my analysis in R using RStudio which are both free. It turns out that the open source community, and Louis Aslett in particular, have done a lot of the hard work in getting RStudio to work on an EC2 instance. Louis has created a series of AMIs (Amazon Machine Images) that can be used with RStudio Server. Simply go to the EC2 dashboard, click ‘Launch Instance’ and search for Louis’s AMIs.
awslouisaslett.PNG
Click next to chose the size of instance you need. The t1.micro one is free and good to test on. At this point you can skip ahead by clicking ‘Review and Launch’. You will need to edit or create the security group, or else anyone will be able to connect. The security group you set up should allow SSH connections from your IP only and HTTP from any IP (not sure why you need this but seems to make it work).
awssecgroup.PNG
The first time you do this, you’ll need to create a Key-Pair. This will allow you to connect to your instance once it’s running. Once that’s set up, you can launch the EC2 instance.

To start using you RStudio on your new instance, copy the Public DNS into your browser. Log in with username ‘rstudio’ and password ‘rstudio’.
awsRstudio.PNG
Once you’ve logged in you can change the password by editing and running the initial file.

And… you’re good to go.

 
20
Kudos
 
20
Kudos

Now read this

Installing R packages on to your EC2 RStudio instance

Once you’ve got your EC2 instance running with RStudio, you will probably want to install some of your favourite packages. I use ggplot2 and plyr a lot, but installing them isn’t as simple as on your local pc. First you need to connect... Continue →