Automating Cloudera on AWS

The Cloudera user-guide is designed to be an all-encompassing instruction set for how to deploy Hadoop. Sometimes though you just want to setup a simple development environment and need to spin it up quickly and effectively. Using this guide you can get your hadoop started without even having to SSH into your node.

The trick is to use the user-data script option provided to you by aws

 

Download the Script here

 

The Steps to Launch Cloudera Automatically

  • Select Amazon Linux as your operating system

 

 

  • Use any instance size, we used T2.large in this demo. It is probably best to go with 2 cores + 8 Gigs of ram minimum

  • Under advanced details paste the user-data script

 

 

Download the Script here

 

  • Continue to security groups and be sure to allow connections coming from your own IP. There are a few ways to configure this but in a development environment I typically allow all traffic from my own IP.

 

 

  • Launch instances and wait several minutes

  • Connect to your instance with HOSTNAME:7180

  • The default password for cloudera is admin/admin

 

Next: Download the Script here

Continue: Learn to move data to Amazon’s S3 Storage