

For those who aren't there yet ?
Amazon EC2 is a cloud computing service from Amazon. Using various mechanisms like API, Web Console, Command Line tools, one can reap benefits of IAAS facility provided by Amazon.
Puppet is systems configuration management software developed by Puppet Labs (Formally Reductive Labs). Puppet provides a framework to simplify the majority of technical tasks that System Administrators perform. It provides a declarative language, which can be used to express system configuration.
Why bother about Puppet? Didn't we have ghost do same job for us in real server scenario. And since I learnt about EC2, so there is concept of AMIs as well !
Yeah, AMIs are great way to package a group of software and configuration files on top of an Operating System, but that's all that's there about them. With development in today's circumstances going Agile, we see a lot of changes in configurations and software installed on a system. Re creation of AMIs at regular intervals is definitely not the coolest thing one would like to do. Imagine running 10 webservers on EC2, which were spawned using a private custom AMI with httpd package pre-installed and configuration built into the AMI. During the course of day, developers made changes to the code base and also added some features, which required 10 changes to the configuration of httpd. This situation is nothing less than a nightmare for system administrators when deployments go out in Agile fashion. Our imaginary circumstance would require 10 changes to AMIs and we'd be required to propagate these changes to either 10 existing servers, 10 times or re spawn 10 servers every time a deployment went out to production.
Puppet comes to our rescue !
With puppet these changes are done using it's declarative language in a file called recipe. So in our setup, we make required changes to the recipe and push the code change to puppetmaster server and puppet client on the webservers connect to puppetmaster after a specified interval (default=30 minutes) and pull the new configuration. Puppet allows designing a setup where in machines are classified under various roles and we can write down configuration for these roles using the declarative language of Puppet.
Cloud and Puppet
EC2 instances can be spawned using command line tools (provided by ec2-api-tools) and we've created a set of shell scripts wrappers around these commands to start of new instances on EC2 and assign them some role through which the instances will get configured using appropriate puppet recipe.
How did we do it ?
Slideshare has been using EC2 for building up it’s conversion stack and as most sysadmins will agree, performance of machines tend to degrade after they’ve been operational for a certain amount of time. We’ve tried to kill this problem by designing conversion stack where new machines on EC2 are spawned at regular interval and old instances are killed.
We've setup a puppet server on one EC2 instance and used Apache proxy to proxy through multiple instances of puppetmasterd running on different ports. We've also reconfigured init script for puppetmasterd to allow for stop and start of puppetmasterd service on different ports using well known /etc/init.d/
Puppet client by default tries to connect to a host named puppet on port 8140 and hence we've setup firewall settings for this host to allow connection to be established on 8140 port and setup apache to listen on 8140 port. Apache than proxies the request to puppetmasterd server running on different ports in backend. This allows for easy scaling of the setup.
Puppet server is configured to automatically self sign the certificate request by the client and this allows new clients to connect directly to puppetmaster and get required configuration.
We’ve also created a separate set of private AMIs which we use to fire up adhoc instances in times of high load on our regular instances. These instances are meant to be running for a small time duration and have all the software and configuration baked into the AMI itself.
We’ve created our own home grown scripts which spawn instances on EC2 at regular intervals and code on these machines take care of killing the machine after a fixed time.
#!/bin/bash
...
...
...
#Place request for spot instance
ec2-request-spot-instances $AMI -n $NUM_INSTANCES -p $PRICE -t $TYPE -k $KEY_PAIR --group $SECURITY_GROUP > /tmp/ec2_spot_instance_request
SIR_REQUEST=`cat /tmp/ec2_spot_instance_request | cut -f 2`
rm -f /tmp/ec2_spot_instance_request
#Capture status of request. Initially request has STATUS=open and we need it to be active in order to continue
STATUS=`ec2-describe-spot-instance-requests | grep $SIR_REQUEST | cut -f 6`
#We won't want to wait till infinity for instance to spawn up. Our threshold is 5 minutes.
COUNT=1
#This variable checks if instance is spot or regular
IS_SPOT=1
#Wait for spot instance request to succeed.
REQUIRED_STATUS="active"
while [ $STATUS != $REQUIRED_STATUS ]
do
sleep 60
STATUS=`ec2-describe-spot-instance-requests | grep $SIR_REQUEST | cut -f 6`
if [ $COUNT -gt 5 ]
then
IS_SPOT=0
break
fi
COUNT=`expr $COUNT + 1`
done
if [ $IS_SPOT -eq 1 ]
then
INSTANCE_ID=`ec2-describe-spot-instance-requests | grep $SIR_REQUEST | cut -f 12`
else
#Kill spot instance request we made earlier
ec2-cancel-spot-instance-requests $SIR_REQUEST
#Spawn up a regular instance
ec2-run-instances $AMI -n $NUM_INSTANCES -t $TYPE -k $KEY_PAIR --group $SECURITY_GROUP > /tmp/ec2_instance_request
INSTANCE_ID=`cat /tmp/ec2_instance_request | tail -1 | cut -f2`
STATUS=`cat /tmp/ec2_instance_request | tail -1 | cut -f6`
rm -f /tmp/ec2_instance_request
REQUIRED_STATUS="running"
while [ $STATUS != $REQUIRED_STATUS ]
do
sleep 60
STATUS=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f6`
done
fi
sleep 120
#Instance is now active. Capture data associated with instance like instance-id, external and internal dns.
INSTANCE_EXTERNAL_DNS=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 4`
INSTANCE_INTERNAL_HOSTNAME=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 5 | cut -f 1 -d'.' `
#We need to do the record keeping
echo "`date`: $INSTANCE_ID: $INSTANCE_EXTERNAL_DNS: $INSTANCE_INTERNAL_HOSTNAME: PDF2SWF" >> $DB_FILE
#If we are not able to get internal hostname within next 1 minutes for some reason then quit
while [ -z $INSTANCE_INTERNAL_HOSTNAME ]
do
COUNT=`expr $COUNT + 1 `
sleep 10
INSTANCE_INTERNAL_HOSTNAME=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 5 | cut -f 1 -d'.' `
INSTANCE_EXTERNAL_DNS=`ec2-describe-instances $INSTANCE_ID | tail -1 | cut -f 4`
if [ $COUNT -ge 6 ]
then
exit 1
fi
done
#Configure puppetmaster to associate relevant class with node
$SCP root@$PUPPET_MASTER:/etc/puppet/manifests/nodes.pp /tmp/nodes.pp
grep $INSTANCE_INTERNAL_HOSTNAME /tmp/nodes.pp
if [ $? -eq 0 ]
then
sed "/$INSTANCE_INTERNAL_HOSTNAME/d" /tmp/nodes.pp > /tmp/newnodes.pp
mv /tmp/newnodes.pp /tmp/nodes.pp
fi
echo "node $INSTANCE_INTERNAL_HOSTNAME { include $1 }" >> /tmp/nodes.pp
$SCP /tmp/nodes.pp root@$PUPPET_MASTER:/tmp
$SSH root@$PUPPET_MASTER "mv /tmp/nodes.pp /etc/puppet/manifests/nodes.pp"
rm -f /tmp/nodes.pp
$SSH root@$INSTANCE_EXTERNAL_DNS "cat /tmp/puppet_host >> /etc/hosts"
...
...
...
#Install puppet on newly built ec2 host
$SSH root@$INSTANCE_EXTERNAL_DNS "apt-get -y install puppet"
This script tries to spawn spot instances on EC2 (for more details about spot instances, refer to this wonderful article by Jonathan Boutelle, CTO, Slideshare). Since availability of spot instances is not certain and we wanted our script to spawn up a instance in stipulated time, so we decided to set a time limit of 5 minutes, to wait for ascertaining availability of spot instance. If we don't get a spot instance in 5 minutes, we spawn up a regular instance instead.
Once an instance has been spawned we do some record keeping tasks and add an entry for puppet server in /etc/hosts file of the node. Puppet client by default tries to connect to a host named puppet and hence our entry in /etc/hosts looks like follows:
XXX.XXX.XXX.XXX puppet
Finally we make an entry for this newly spawned node in nodes.pp on puppet master. This is a tricky part as DNS on EC2 reassigns previously assigned hostname to a new node and this can cause duplicate entries in nodes.pp also many a times it so happens that we are able to spawn off an instance however the script is unable to get hostname using ec2-api-tools and this can cause an invalid entry to appear in nodes.pp. We've tried to circumvent these errors by implementing few error checks.
There are several improvements that can be made to existing setup. We are relying on DNS service of EC2 and this causes lots of issues while identifying instances in cases of troubleshooting errors. This can be resolved by setting up a home grown DNS server on EC2.
We’ve tried to setup a nagios monitoring system on EC2 as well to monitor our services running on cloud. This can be very useful as most of the monitoring facilities provided by Amazon are pretty basic and are more oriented towards system health checks rather than service health checks.
0 comments:
Post a Comment