Hadoop can handle extremely large, unstructured data sets efficiently and at affordable cost, makes it a valuable technology for enterprises across a number of applications and fields. Market Analysis predicts that the market for Hadoop MapReduce is forecast to grow at a compound annual growth rate (CAGR) of 58% reaching $2.2 billion in 2018. At the same time, Hadoop has created operational challenges that include deployment difficulties, poor utilization of storage/processor, inefficient data loading, and the lack of multi-tenancy. For enterprises working on analytics framework built on Hadoop, on-premise solution was the best option. When a solution could create performance or security isolation between different tenants and provides resource containment for different service levels, which brings Hadoop to the cloud – Hadoop as a service (HaaS/HDaaS). This could eliminate the time, resources, and cost that are required to build and maintain complex Hadoop installation on premise. Allied Market Research states that HaaS is expected to reach $16.1 billion by 2020, registering a CAGR of 70.8% from 2014 to 2020.
EMC Hybrid Cloud (EHC) HaaS provides a multi-tenant, self-service portal that leverages EMC Federation – EMC II storage and data protection, Pivotal Big Data Suite, VMware cloud management and virtualization solutions. EHC HaaS is a solution stack made up of EHC IaaS, integrated with VMware Big Data Extensions (BDE) and Pivotal Hadoop (PHD). It is possible to deploy or extend Hadoop cluster within minutes using vCAC portal. Automation of Hadoop clusters is achieved by using custom workflows created with vCO. These workflows are configured from within vCAC to present enterprises with a self-service portal that includes a catalog of pre-configured Hadoop deployment use cases.
VMware vSphere Big Data Extensions (BDE) is the commercial version of Serengeti, an open source project by VMware to deploy and manage Hadoop and big data clusters in a vCenter Server managed environment. BDE runs on top of Serengeti that includes Virtual Appliance that has Serengeti Management Server and a Template Server. BDE provide the GUI for managing Hadoop clusters that includes the basic Apache Hadoop but is also very easy to add commercial Hadoop distributions such as Pivotal Hadoop(PHD), Cloudera Hadoop, Hortonworks Hadoop, or MapR Hadoop. This solution uses Pivotal Hadoop Distribution integrated with EMC Hybrid Cloud IaaS stach to create Hadoop as a Service.
The following video demonstrates EMC Hybrid Cloud used to deploy Hadoop-as-a-Service (HaaS), the underpinnings of a Virtual Data Lake.