Amazon Web Services has announced the public beta launch of Amazon Elastic MapReduce, a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.
Amazon Web Services are a collection of remote web services offered by Amazon.com. It is using the Apache Hadoop distributed computing technology (open-source, Java software framework) to make it easier for users to access large amounts of computing power to run data-intensive tasks. This time Amazon used Hadoop, which is already being used by such companies as Yahoo and Facebook, for new cloud computing initiative.
Amazon Elastic MapReduce utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).
Using Amazon Elastic MapReduce, users can instantly provision as much or as little capacity as they like to perform data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. It’s important advantage is that users pay only for what they use, with no up-front expenses or long-term commitments.
How does it work ?
Amazon Elastic MapReduce automatically spins up a Hadoop implementation of the MapReduce framework on Amazon EC2 instances, sub-dividing the data in a job flow into smaller chunks so that they can be processed (the “map” function) in parallel, and eventually recombining the processed data into the final solution (the “reduce” function). Amazon S3 serves as the source for the data being analyzed, and as the output destination for the end results.
Comments