Amazon Elastic MapReduce: Power to Burn, On Demand
Amazon Web Services today announced Amazon Elastic MapReduce, a new member of the Amazon web services family designed to help users process vast amounts of data using the divide-and-conquer parallel processing approach made famous by Google's MapReduce and as implemented in the Apache Hadoop project.
Background on Hadoop (from the project site):
- Scalable: Hadoop can reliably store and process petabytes.
- Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
- Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
- Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.
Here's some background on MapReduce (from Google Labs):
So Amazon Elastic MapReduce is a cloud-based service that enables you to perform highly parallel operations against large amounts of data, all in an on-demand model. This strikes me as a great offering, particularly for those organizations who have an intermittent need for large Hadoop clusters.
From the Amazon press release:
It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). Using Amazon Elastic MapReduce, you can instantly provision as much or as little capacity as you like to perform data-intensive tasks for distributed applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. As with all AWS services, Amazon Elastic MapReduce customers will still only pay for what they use, with no up-front payments or commitments.
Amazon says they made the offering in response to users who were already deploying Hadoop clusters on their lower-level EC2 framework -- i.e., that this was an organic evolution:
“Some researchers and developers already run Hadoop on Amazon EC2, and many of them have asked for even simpler tools for large-scale data analysis,” said Adam Selipsky, Vice President of Product Management and Developer Relations for Amazon Web Services. “Amazon Elastic MapReduce makes crunching in the cloud much easier as it dramatically reduces the time, effort, complexity and cost of performing data-intensive tasks.”
I suspect this was a bad day at CloudEra, an Accel-backed startup that wants to be the RedHat of Hadoop. Perhaps, like SugarCRM in competing against Salesforce, CloudEra will soon offer an on-demand Hadoop as well. But that means supporting two business models at once and buying a lot of hardware to boot. And, I suspect, a lot more hardware than SugarCRM needs to buy to support sales automation as a service.