Category Archives: Hadoop

Difference between MapReduce 1.0 and MapReduce 2.0

Apache Hadoop, introduced in 2005 has a core MapReduce processing engine to support distributed processing of large-scale data workloads. Several years later, there are major changes to the core MapReduce so that Hadoop framework not just supports MapReduce but other … Continue reading

Posted in Hadoop | Tagged | 2 Comments

Self-Service Data Access – Pivotal DD

Enterprise data resides in heterogeneous systems and of different data types. IT has its challenges to consolidate data in the right time. Also, many times it is difficult to know what data sources are required to access data. Pivotal DD … Continue reading

Posted in Big Data, Hadoop | Tagged , , | Leave a comment

Virtualizing Hadoop

HDFS, the “storage” and MapReduce, the “compute” are combined in traditional Hadoop model. If this Hadoop model is directly translated into a VM, it will affect the ability to scale up and down as the lifecycle of VM is tightly … Continue reading

Posted in Hadoop | Tagged , | Leave a comment


Recently I heard “moving content into Hadoop” – although I did not further question their motive, I was wondering seriously about “effective solutions” on Hadoop for the day-to-day business problems. Hadoop is not a magic wand to wipe away all … Continue reading

Posted in Conceptual, Hadoop | Tagged | Leave a comment

Hadoop Invades My Desk

I gave Hadoop elephant an Indian makeover with red and gold – acrylics on paper.  

Posted in General, Hadoop | Tagged | Leave a comment

Spring for Apache Hadoop

Hadoop has a poor out of the box programming model. Applications often become spaghetti code in the form of scripts calling Hadoop command line applications. Spring aims to simplify Hadoop applications by leveraging several Spring eco-system projects. Spring for Apache … Continue reading

Posted in Hadoop, Spring | Tagged | Leave a comment

HAWQ Soars Higher

HAWQ is a modern distributed and parallel query processor on top of HDFS that gives enterprises the best of both worlds: high-performance query processing with SQL, and scalable open storage. When the data is directly stored on HDFS, it provides … Continue reading

Posted in Hadoop | Tagged | Leave a comment