<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Data Wrangling Blog - Latest Comments in On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://datawranglingblog.disqus.com/</link><description></description><atom:link href="https://datawranglingblog.disqus.com/on_demand_mpi_cluster_with_python_and_ec2_part_1_of_3/latest.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Fri, 31 Oct 2008 11:14:00 -0000</lastBuildDate><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078377</link><description>&lt;p&gt;Iwein,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;That is correct, that last line is unnecessary as /mnt is excluded from bundling.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;-Pete&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Fri, 31 Oct 2008 11:14:00 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078376</link><description>&lt;p&gt;"Now remove the keys and delete the bash history:" Is after the bundle-vol command. Surely that won't matter anymore at that point. Deleting the keys form /mnt/ is an unneeded step afaics.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Iwein Fuld</dc:creator><pubDate>Fri, 31 Oct 2008 05:11:23 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078375</link><description>&lt;p&gt;I found the secret to avoiding a lot of MPI errors on EC2, but haven't found time to do an additional post...&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;The secret seems to be that just because Amazon says that an instance is "running", doesn't mean that the ssh daemons are available.  This caused all kinds of intermittent problems setting up the hosts and my old scripts would fail silently.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;In my current codebase, I do some checks like the following:&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;pre&gt;    print "Instance is %s" % BOOTING_INSTANCE&lt;br&gt;    &lt;br&gt;    # wait for instance description to return "running" and grab HOSTNAME variable&lt;br&gt;    print "Polling server status (ec2-describe-instances %s)" % BOOTING_INSTANCE&lt;br&gt;    while 1:&lt;br&gt;      print "waiting for instance to boot..."&lt;br&gt;      HOSTNAME = commands.getoutput("ec2-describe-instances %s | grep running | awk '{print $4}'" % BOOTING_INSTANCE)&lt;br&gt;      if len(HOSTNAME) &amp;gt; 1:&lt;br&gt;        print "-------Instance booted, The server is available at %s" % HOSTNAME&lt;br&gt;        DOM_NAME = commands.getoutput("ec2-describe-instances %s | grep running | awk '{print $5}'" % BOOTING_INSTANCE).split('.')[0]&lt;br&gt;        break&lt;br&gt;      time.sleep(1)    &lt;br&gt;    &lt;br&gt;    # sometimes it takes a while for the ssh service to start, even when the ec2 api describes an instance as running.&lt;br&gt;    # A machine in the "running" state may not have finished booting. Try executing a no-op command until a valid response is found&lt;br&gt;    print "verifying ssh daemon has started..."&lt;br&gt;    counter=0&lt;br&gt;    while 1:&lt;br&gt;      print "Waiting for ssh daemon to start..."&lt;br&gt;      counter += 1        &lt;br&gt;      REPLY = commands.getoutput('''ssh %s "root@%s" 'echo "hello"' ''' % (SSH_OPTS, HOSTNAME) )&lt;br&gt;      if REPLY == 'hello':&lt;br&gt;        print "-------ssh has started, proceeding with AMI build"&lt;br&gt;        break&lt;br&gt;      if counter &amp;gt; 24:&lt;br&gt;        print "Instance not respoding to SSH hails, aborting..."&lt;br&gt;        ## sshd should not take more than 2 minutes to launch&lt;br&gt;        terminate_status = commands.getoutput('ec2-terminate-instances %s' % BOOTING_INSTANCE)&lt;br&gt;        ec2_launch_failed = True&lt;br&gt;        print "Base Instance terminated"&lt;br&gt;        break&lt;br&gt;      time.sleep(5)&lt;br&gt;    &lt;br&gt;    if ec2_launch_failed:&lt;br&gt;        print "Aborting build"&lt;br&gt;        return&lt;br&gt;    &lt;br&gt;&lt;br&gt;&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Sat, 28 Jun 2008 00:18:14 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078373</link><description>&lt;p&gt;I am just curious. Where did you specify the maximum number of nodes? You said 20, can that be increased? If so, how?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Raghav</dc:creator><pubDate>Mon, 10 Mar 2008 13:00:33 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078372</link><description>&lt;p&gt;I wonder if the benchmarking exercise was successful or not? It would be an interesting datapoint. Mine do not show much advantage to using EC2 for scientific computations, and it seems to be geared more towards hosting web services rather than scalable computing.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mark J.</dc:creator><pubDate>Sat, 09 Feb 2008 11:20:53 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078371</link><description>&lt;p&gt;Mark, I've just wrapped up some projects this week and should have time to check this out now, I'll update the blog when I have an analysis ready.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Sat, 05 May 2007 07:30:58 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078370</link><description>&lt;p&gt;Any update on the test? Would be interesting to see if something more substantial actually performs well on EC2.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mark J.</dc:creator><pubDate>Fri, 04 May 2007 19:09:15 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078369</link><description>&lt;p&gt;Good point, I will have to run the numbers on that comparison, but I expect EC2 to come out on top for large clusters which are only used intermittently (unless the latency kills it).   Also, we might be underestimating the power, cabling, and cooling costs - especially for larger clusters.  All that aside, it looks like your estimate is pretty close, Jeff Layton at ClusterMonkey has a post from January,  &lt;a href="http://www.clustermonkey.net//content/view/184/33/" rel="nofollow noopener" target="_blank" title="http://www.clustermonkey.net//content/view/184/33/"&gt;Kronos Pricing Redux&lt;/a&gt;, which gives numbers for a 4 node cluster similar to the one you describe, and he puts the price tag at $2,505.44*&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;-- *This is a Correction, I originally quoted the 8-node , 16 core system price of $4,563.72---&lt;/em&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I think the sweet-spot for EC2 will be for shoestring 2-3 person analytical or bioinformatics startups where they need to run occasional large jobs (50-100 nodes), but can't afford to build a large permanent cluster without additional funding.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;For instance, I'd rather not spend $30K right now for a 100 core cluster to run a few large jobs a week...not to mention heating/cooling bills and construction time.  If I could get comparable performance on Amazon, it would run me around $1K per month to get past the proof-of-concept stage (assuming 3 eight hour jobs per week).  Once I had the capital and space, I could transition to my own large cluster.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Thu, 19 Apr 2007 18:42:52 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078368</link><description>&lt;p&gt;Another issue that would probably merit a detailed analysis is the cost structure of using EC2, in its current form, over a fully-owned cluster. For a small consulting shop running simulations on a 8 EC2-instances, it comes out to 0.8$/per hour, or approximately $1600/year assuming a typical 8 hour simulation per day investigating various designs etc. However, since each instance is only the equivalent of a 1.7GHz Xeon (SPECfp 700). Compare that with a dual-core Intel Core2 E6700, which has a single-core SPECfp rating of 2700, and amounts to the same total compute power as the 8-instance EC2 cluster. Such a machine can be purchased outright for something like $2000.00 with 4GB of memory.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I think for memory-bound applications, EC2 makes sense, where each VM has 1.7GB of RAM, and with 8 instances, the total RAM available becomes almost 12GB. From a transaction processing, or database-driven application point of view, EC2 may exhibit excellent cost-effectiveness. For a compute-intensive application however, it does not seem to be a very compelling argument.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;While my simplistic comparison does not account for maintenance, power, backup infrastructure, etc for the fully-owned machine, I would not expect a dramatic difference.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mark J.</dc:creator><pubDate>Thu, 19 Apr 2007 18:16:16 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078367</link><description>&lt;p&gt;I haven't looked into the Xen/cpu time issue, but I definitely expect latency to be an issue given the unpredictable nature of which nodes you are assigned, their proximity to eachother, and the usage of bandwidth on the shared boxes.  I'm planning on running some statistics this week on the distributions of job run times, hopefully it will be somewhat predictable.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Tue, 17 Apr 2007 20:35:17 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078366</link><description>&lt;p&gt;Hi Peter,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I have a question regarding your MPI setup. I did a benchmark of a simple application on a single CPU, and found that the elapsed time (wall-clock time) of the application varied widely, by more than 40%, even though the CPU time was the same. It is my belief that the virtual machine is not guaranteed a set slice of CPU cycles by Xen. Given this, if a parallel application is doing frequent communication, during its solution between multiple instances, the overall performance could be very unpredictable. Not only that, since the user is charged based on the elapsed time for each instance, the total charges for a project are also hard to estimate.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Do you have any insight into the above issue, or any experiences to share? Thanks.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mark J.</dc:creator><pubDate>Tue, 17 Apr 2007 19:18:08 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078365</link><description>&lt;p&gt;I'm on the wait list for EC2, so I don't know when I'll be trying this out. I suspect that this will not be hard to get working. I think that virtual clusters like this are going to be pretty important tools in the near future, or maybe they already are in private businesses.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Creel</dc:creator><pubDate>Fri, 30 Mar 2007 01:11:40 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078363</link><description>&lt;p&gt;Nice post dude.  Make your comments font one size larger.  :)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mo</dc:creator><pubDate>Thu, 29 Mar 2007 17:11:43 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078362</link><description>&lt;p&gt;Michael,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Let me know how it goes, that would simplify things a lot.   Right now I use some client side python scripts to configure the cluster based on the list of EC2 instances I start from my laptop (I will be posting that code along with an AMI later this week).&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I started off on my MPI kick with a small Parallel Knoppix cluster at home and would like to eventually have the same system on EC2.  There are already some EC2 debian base images in the public AMI section so it should be possible to get up and running.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;As a relative newbie, I wanted to avoid digging into the PK build and just get something running quickly, but I think the ideal setup would be to find a way to get the PK node auto-discover working and do a network launch of the mpi cluster within a single security group on EC2.  I suspect there is a bit of work in getting the iptables configuration right.  EC2 uses its own custom setup instead of the standard iptables config.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;-Pete&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Debian iptables thread:&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="" rel="nofollow noopener" target="_blank" title=""&gt;http://developer.amazonwebservices.com/connect/thread.jspa?messageID=44592&amp;amp;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Debian AMIs:&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;a href="" rel="nofollow noopener" target="_blank" title=""&gt;http://www.ioncannon.net/system-administration/118/debian-ec2-ami/&lt;/a&gt;&lt;br&gt;&lt;a href="" rel="nofollow noopener" target="_blank" title=""&gt;http://developer.amazonwebservices.com/connect/entry.jspa?externalID=639&amp;amp;categoryID;=101&lt;/a&gt;&lt;a&gt;&lt;br&gt;&lt;/a&gt;&lt;a href="" rel="nofollow noopener" target="_blank" title=""&gt;http://developer.amazonwebservices.com/connect/entry.jspa?externalID=638&amp;amp;categoryID;=101&lt;/a&gt;&lt;br&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Thu, 29 Mar 2007 12:02:40 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078361</link><description>&lt;p&gt;Pretty interesting stuff. I'll try to get ParallelKnoppix working with this. Looks like a great way to do some sporadic embarrassingly parallel work.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Creel</dc:creator><pubDate>Thu, 29 Mar 2007 07:31:53 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078359</link><description>&lt;p&gt;Great writeup! You might want to check out rBuilder Online. AMI Images you create are automatically uploaded to Amazon's S3 and can be booted on Amazon's EC2--saving developers the trouble of deploying appliances by hand. All images created on rBuilder Online are freely available. The MPI tools you mention haven't been packaged in Conary by anybody yet, but that should be a SMOP.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stu Gott</dc:creator><pubDate>Mon, 26 Mar 2007 13:24:16 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078358</link><description>&lt;p&gt;I'll try to bundle a public image this week, I just need to clean out my working directories first.  I think this basic approach will be good for benchmarking MPI, but I'm looking forward to someone making an image with one of the real cluster distributions as well.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Sun, 18 Mar 2007 19:37:26 -0000</pubDate></item><item><title>Re: On-Demand MPI Cluster with Python and EC2 (part 1 of 3)</title><link>http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3#comment-11078357</link><description>&lt;p&gt;Awesome!  Thanks for writing this up for the rest of us.  I am looking forward to benchmarking some mpi jobs on ec2 and comparing them to my own beowulf.&lt;br&gt;Do you have a version of your mpi enabled image you could make public?  You have laid out all the steps to make one, but if you had a public image we could boot into that would be great.&lt;br&gt;Thanks a lot, looking forward to parts 2 and 3 :-)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Fairchild</dc:creator><pubDate>Sun, 18 Mar 2007 17:01:49 -0000</pubDate></item></channel></rss>