I come from a content management background handling terabytes of content. Content lifecycle starts with capture/create, versioning, managing, publishing, to end with archival and retention. Content falls thru information rights, compliance, governance, and retention either at the organization level or at the worldwide web level. Soon I moved to content management on the cloud. Establishing trust in managing content both on cloud and on premise is essential. Being part of EMC, it is natural to cover all three “cloud, big data, and trust”. I started exploring big data. Here is my little journey exploring big data…
Best place to start – To understand big data concepts and business drivers are
- Big Ideas by EMC TV – Patricia Florissi, EMC VP and Global Sales CTO, walks thru a creative video explaining concepts of big data, Hadoop, and associated solutions. This is good to getting acquainted with the terminologies and concepts
- InFocus Blogs – EMC leaders April Reeve, Bill Schmarzo, David Dietrich, Frank Coleman, Laddie Suk, and Scott Burgess brings great insights in their posts related to big data. These posts brings the industry perspective and leadership thoughts on big data
Get Organized – It is easy to get lost in all the reading and surfing web. I prefer disciplined, structured approach to learning. That’s where EMC Education comes to rescue.
- Business Transformation Course – This was introduced later for data-savvy business leaders who can identify opportunities to solve business problems using advanced analytics.
- Data Science and Big Data Analytics Course – My personal favorite, covers extensively on data science, advanced analytics, big data project lifecycle, and available solutions from EMC.
Dive Deeper – If you already have statistical or analytical background, you can skip this step. Since I worked with content management for long time, I wanted to brush up on my analytics basics that I had in college. Coursera MOOCs are very helpful to get the basics straight
- Statistics One by Princeton University (Andrew Conway) – provides a great introduction to statistics
- Computing for Data Analysis by Johns Hopkins (Roger Peng) – learning fundamental computing skills necessary for effective data analysis using R
- Machine Learning by Stanford University (Andrew Ng) – effective machine learning techniques and how to apply in practice
Dig More – After getting the fundamentals straight, it is easy to learn the solutions associated with it
- Greenplum Unified Analytics Platform – EMC Education offers a comprehensive approach to learn the massive parallel programming architecture for big data analytics. I took the Greenplum Architecture and Administration course.
- Hadoop – I had this training thru EMC Academic Alliance and have been trying out practically with Greenplum Hadoop and now started working with Pivotal HD.
EMC Education Services Data Scientist course and Greenplum Analytics Labs helps you to get started on your big data projects. Hope this helps! If you have any questions, I’ll be more than happy to answer.