Author Archives: Mercy Beckham

Measuring Databases

Measuring the first V – Volume of big data becomes critical and essential. Here are some samples in the technologies I’ve worked on. SQL Server sp_helpdb ‘database_name’ – returns the size of data and log files of a database sp_spaceused … Continue reading

Posted in Code Snippet | Tagged | Leave a comment

Hosting Big Data

Rackspace recently introduced its new Big Data hosting options – customize your configuration for managing big data platform, run Hadoop on the public cloud, or configure your own private cloud. Rackspace eliminates the complex process of building and maintaining a … Continue reading

Posted in Hadoop | Tagged , | Leave a comment

PivotalR

PivotalR package is an R front-end to PostgreSQL, Pivotal (Greenplum) database, and a wrapper for machine learning open-source library MADlib. It also interacts with Pivotal HD/HAWQ for Big Data analytics by providing an interface to the operations on tables/views in … Continue reading

Posted in R | Tagged , | Leave a comment

Difference between MapReduce 1.0 and MapReduce 2.0

Apache Hadoop, introduced in 2005 has a core MapReduce processing engine to support distributed processing of large-scale data workloads. Several years later, there are major changes to the core MapReduce so that Hadoop framework not just supports MapReduce but other … Continue reading

Posted in Hadoop | Tagged | 2 Comments

Self-Service Data Access – Pivotal DD

Enterprise data resides in heterogeneous systems and of different data types. IT has its challenges to consolidate data in the right time. Also, many times it is difficult to know what data sources are required to access data. Pivotal DD … Continue reading

Posted in Big Data, Hadoop | Tagged , , | Leave a comment

Virtualizing Hadoop

HDFS, the “storage” and MapReduce, the “compute” are combined in traditional Hadoop model. If this Hadoop model is directly translated into a VM, it will affect the ability to scale up and down as the lifecycle of VM is tightly … Continue reading

Posted in Hadoop | Tagged , | Leave a comment

Run Splunk with EMC

Splunk is a powerful data analytics platform that collects, indexes, and analyzes data from virtually any source, including application and machine-generated data in a searchable repository from which it can generate meaningful insights. Splunk makes this data available and usable … Continue reading

Posted in General | Tagged , | Leave a comment

Preping for Data Scientist Associate

I come from a content management background handling terabytes of content. Content lifecycle starts with capture/create, versioning, managing, publishing, to end with archival and retention. Content falls thru information rights, compliance, governance, and retention either at the organization level or … Continue reading

Posted in Big Data | Tagged , | Leave a comment

EMC Kazeon – Dark Data Explorer

Dark matter in astronomy and cosmology is a type of matter that hypothetically accounts for the large part of total mass in the universe. It neither emits nor absorbs light or other electromagnetic radiations so that it cannot be observed … Continue reading

Posted in Big Data, Conceptual | Tagged | Leave a comment

Hadoopable?

Recently I heard “moving content into Hadoop” – although I did not further question their motive, I was wondering seriously about “effective solutions” on Hadoop for the day-to-day business problems. Hadoop is not a magic wand to wipe away all … Continue reading

Posted in Conceptual, Hadoop | Tagged | Leave a comment