Greenplum introduced first Hadoop distribution GPHD (Greenplum Hadoop Distribution) in 2011 removes the need in building out a Hadoop cluster from scratch. In February this year, Pivotal – Greenplum announced the first product Pivotal HD to expand the capabilities of Hadoop that already has an enterprise data platform. Quick check on the differences here…
GPHD includes –
- Installation and Configuration Manager (ICM) – cluster installation, upgrade, and expansion tools.
- GP Command Center – visual interface for cluster health, system metrics, and job monitoring.
- Hadoop Virtualization Extension (HVE) – enhances Hadoop to support virtual node awareness and enables greater cluster elasticity.
- GP Data Loader – parallel loading infrastructure that supports “line speed” data loading into HDFS.
- Isilon Integration – extensively tested at scale with guidelines for compute-heavy, storage-heavy, and balanced configurations.
Pivotal HD adds the following to GPHD –
- Advanced Database Services (HAWQ) – high-performance, “True SQL” query interface running within the Hadoop cluster.
- Extensions Framework (GPXF) – support for HAWQ interfaces on external data providers (HBase, Avro, etc.).
- Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale.
- YARN – has the old MapReduce 1.0 algorithm as well as the new YARN (also known as MapReduce 2.0) algorithm as Pivotal HD has the Hadoop 2.0.2 core.
Core components and versions are listed below –