Favourite Videos

Loading...

Tuesday, February 22, 2011

Puppetizing Amazon EC2






For those who aren't there yet ?

Amazon EC2 is a cloud computing service from Amazon. Using various mechanisms like API, Web Console, Command Line tools, one can reap benefits of IAAS facility provided by Amazon.

Puppet is systems configuration management software developed by Puppet Labs (Formally Reductive Labs). Puppet provides a framework to simplify the majority of technical tasks that System Administrators perform. It provides a declarative language, which can be used to express system configuration.

Why bother about Puppet? Didn't we have ghost do same job for us in real server scenario. And since I learnt about EC2, so there is concept of AMIs as well !

Yeah, AMIs are great way to package a group of software and configuration files on top of an Operating System, but that's all that's there about them. With development in today's circumstances going Agile, we see a lot of changes in configurations and software installed on a system. Re creation of AMIs at regular intervals is definitely not the coolest thing one would like to do. Imagine running 10 webservers on EC2, which were spawned using a private custom AMI with httpd package pre-installed and configuration built into the AMI. During the course of day, developers made changes to the code base and also added some features, which required 10 changes to the configuration of httpd. This situation is nothing less than a nightmare for system administrators when deployments go out in Agile fashion. Our imaginary circumstance would require 10 changes to AMIs and we'd be required to propagate these changes to either 10 existing servers, 10 times or re spawn 10 servers every time a deployment went out to production.

Puppet comes to our rescue !

With puppet these changes are done using it's declarative language in a file called recipe. So in our setup, we make required changes to the recipe and push the code change to puppetmaster server and puppet client on the webservers connect to puppetmaster after a specified interval (default=30 minutes) and pull the new configuration. Puppet allows designing a setup where in machines are classified under various roles and we can write down configuration for these roles using the declarative language of Puppet.


Cloud and Puppet

EC2 instances can be spawned using command line tools (provided by ec2-api-tools) and we've created a set of shell scripts wrappers around these commands to start of new instances on EC2 and assign them some role through which the instances will get configured using appropriate puppet recipe.

How did we do it ?

Slideshare has been using EC2 for building up it’s conversion stack and as most sysadmins will agree, performance of machines tend to degrade after they’ve been operational for a certain amount of time. We’ve tried to kill this problem by designing conversion stack where new machines on EC2 are spawned at regular interval and old instances are killed.

We've setup a puppet server on one EC2 instance and used Apache proxy to proxy through multiple instances of puppetmasterd running on different ports. We've also reconfigured init script for puppetmasterd to allow for stop and start of puppetmasterd service on different ports using well known /etc/init.d/ start/stop syntax.

Puppet client by default tries to connect to a host named puppet on port 8140 and hence we've setup firewall settings for this host to allow connection to be established on 8140 port and setup apache to listen on 8140 port. Apache than proxies the request to puppetmasterd server running on different ports in backend. This allows for easy scaling of the setup.

Puppet server is configured to automatically self sign the certificate request by the client and this allows new clients to connect directly to puppetmaster and get required configuration.

We’ve also created a separate set of private AMIs which we use to fire up adhoc instances in times of high load on our regular instances. These instances are meant to be running for a small time duration and have all the software and configuration baked into the AMI itself.

We’ve created our own home grown scripts which spawn instances on EC2 at regular intervals and code on these machines take care of killing the machine after a fixed time.

Following is excerpt of the script which we use to spawn different instances. Role of the instance is passed on as argument to the script:

#!/bin/bash

...

...

...

#Place request for spot instance

ec2-request-spot-instances $AMI -n $NUM_INSTANCES -p $PRICE -t $TYPE -k $KEY_PAIR --group $SECURITY_GROUP > /tmp/ec2_spot_instance_request

SIR_REQUEST=`cat /tmp/ec2_spot_instance_request | cut -f 2`

rm -f /tmp/ec2_spot_instance_request

#Capture status of request. Initially request has STATUS=open and we need it to be active in order to continue

STATUS=`ec2-describe-spot-instance-requests | grep $SIR_REQUEST | cut -f 6`

#We won't want to wait till infinity for instance to spawn up. Our threshold is 5 minutes.

COUNT=1

#This variable checks if instance is spot or regular

IS_SPOT=1

#Wait for spot instance request to succeed.

REQUIRED_STATUS="active"

while [ $STATUS != $REQUIRED_STATUS ]

do

sleep 60

STATUS=`ec2-describe-spot-instance-requests | grep $SIR_REQUEST | cut -f 6`

if [ $COUNT -gt 5 ]

then

IS_SPOT=0

break

fi

COUNT=`expr $COUNT + 1`

done

if [ $IS_SPOT -eq 1 ]

then

INSTANCE_ID=`ec2-describe-spot-instance-requests | grep $SIR_REQUEST | cut -f 12`

else

#Kill spot instance request we made earlier

ec2-cancel-spot-instance-requests $SIR_REQUEST

#Spawn up a regular instance

ec2-run-instances $AMI -n $NUM_INSTANCES -t $TYPE -k $KEY_PAIR --group $SECURITY_GROUP > /tmp/ec2_instance_request

INSTANCE_ID=`cat /tmp/ec2_instance_request | tail -1 | cut -f2`

STATUS=`cat /tmp/ec2_instance_request | tail -1 | cut -f6`

rm -f /tmp/ec2_instance_request

REQUIRED_STATUS="running"

while [ $STATUS != $REQUIRED_STATUS ]

do

sleep 60

STATUS=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f6`

done

fi

sleep 120

#Instance is now active. Capture data associated with instance like instance-id, external and internal dns.

INSTANCE_EXTERNAL_DNS=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 4`

INSTANCE_INTERNAL_HOSTNAME=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 5 | cut -f 1 -d'.' `

#We need to do the record keeping

echo "`date`: $INSTANCE_ID: $INSTANCE_EXTERNAL_DNS: $INSTANCE_INTERNAL_HOSTNAME: PDF2SWF" >> $DB_FILE

#If we are not able to get internal hostname within next 1 minutes for some reason then quit

while [ -z $INSTANCE_INTERNAL_HOSTNAME ]

do

COUNT=`expr $COUNT + 1 `

sleep 10

INSTANCE_INTERNAL_HOSTNAME=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 5 | cut -f 1 -d'.' `

INSTANCE_EXTERNAL_DNS=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 4`

if [ $COUNT -ge 6 ]

then

exit 1

fi

done

#Configure puppetmaster to associate relevant class with node

$SCP root@$PUPPET_MASTER:/etc/puppet/manifests/nodes.pp /tmp/nodes.pp

grep $INSTANCE_INTERNAL_HOSTNAME /tmp/nodes.pp

if [ $? -eq 0 ]

then

sed "/$INSTANCE_INTERNAL_HOSTNAME/d" /tmp/nodes.pp > /tmp/newnodes.pp

mv /tmp/newnodes.pp /tmp/nodes.pp

fi

echo "node $INSTANCE_INTERNAL_HOSTNAME { include $1 }" >> /tmp/nodes.pp

$SCP /tmp/nodes.pp root@$PUPPET_MASTER:/tmp

$SSH root@$PUPPET_MASTER "mv /tmp/nodes.pp /etc/puppet/manifests/nodes.pp"

rm -f /tmp/nodes.pp

$SSH root@$INSTANCE_EXTERNAL_DNS "cat /tmp/puppet_host >> /etc/hosts"

...

...

...

#Install puppet on newly built ec2 host

$SSH root@$INSTANCE_EXTERNAL_DNS "apt-get -y install puppet"

This script tries to spawn spot instances on EC2 (for more details about spot instances, refer to this wonderful article by Jonathan Boutelle, CTO, Slideshare). Since availability of spot instances is not certain and we wanted our script to spawn up a instance in stipulated time, so we decided to set a time limit of 5 minutes, to wait for ascertaining availability of spot instance. If we don't get a spot instance in 5 minutes, we spawn up a regular instance instead.

Once an instance has been spawned we do some record keeping tasks and add an entry for puppet server in /etc/hosts file of the node. Puppet client by default tries to connect to a host named puppet and hence our entry in /etc/hosts looks like follows:

XXX.XXX.XXX.XXX puppet

Finally we make an entry for this newly spawned node in nodes.pp on puppet master. This is a tricky part as DNS on EC2 reassigns previously assigned hostname to a new node and this can cause duplicate entries in nodes.pp also many a times it so happens that we are able to spawn off an instance however the script is unable to get hostname using ec2-api-tools and this can cause an invalid entry to appear in nodes.pp. We've tried to circumvent these errors by implementing few error checks.

There are several improvements that can be made to existing setup. We are relying on DNS service of EC2 and this causes lots of issues while identifying instances in cases of troubleshooting errors. This can be resolved by setting up a home grown DNS server on EC2.

We’ve tried to setup a nagios monitoring system on EC2 as well to monitor our services running on cloud. This can be very useful as most of the monitoring facilities provided by Amazon are pretty basic and are more oriented towards system health checks rather than service health checks.

0 comments: