Virtualizing Hadoop

HDFS, the “storage” and MapReduce, the “compute” are combined in traditional Hadoop model. If this Hadoop model is directly translated into a VM, it will affect the ability to scale up and down as the lifecycle of VM is tightly coupled to the data. When this kind VM is powered off, data is lost. Scaling out also requires rebalancing data to expand the cluster. Hence this model is not very elastic.

Separating compute from storage in a virtual Hadoop cluster can achieve elasticity and improves resource utilization. It is very simple to configure HDFS storage always available with the compute layer with variable number of TaskTracker nodes that can be extended or shrunk on demand. Multi-tenancy can be achieved with data-compute separation on the virtualized Hadoop cluster. Thus each virtual compute cluster can enjoy performance, security, and configuration isolation.

EMC brings two solutions – Isilon for storage layer and vSphere for Topology awareness

For more details and step by step installation notes, check out EMC Hadoop Starter Kit. This Hadoop Starter Kit (HSK) is intended to simplify all Hadoop distribution deployments, reduce time and cost of deployment.

This entry was posted in Hadoop and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s